Due date: Tuesday, October 12, 2004.
Ex. 1 Download the following files.
porter.cc
porter.h
main.cc
Makefile
Compile the program with the command make. The name of the
executable will be porter. This program implements the Porter transform
that removes suffixes and prefixes from words to keep only the
linguistic root for indexing purposes. It can be used by a web search
engine among other applications.
a. Implement a parallel version of the program following the pipeline model where the functions strip_prefixes, step_1 to step_5 are each called by a different process.
The master process will do the input operation the same way it is done in the sequential program. The slave processes would receive a string from the previous process and send the transformed string to the next process. The last process will output the results.
Notes.
1. None of the processes should be calling the functions
strip_affixes and strip_suffixes.
2. When processes are exchanging a string, this is also an
array which is already a pointer so you don't need the '&' in
front of its name. Also, the size of the exchange must be KEYWORDSIZE
and not the actual length of the word as a string because there's no way
for the receiving process to know that size in advance (unless you want
to send a message first with the size, wich is overkill).
3. When you send me this homework, you have to specify how
many processes the program is supposed to run with.
b. (5 extra credit points) Implement an extra task (an extra
process too) which eliminates the words from the input that are too
common. This process would receive all of the input words but would
only send forward those words that are not in the list of common
words. This operation must be placed before any transformation
is applied to the word.
For this you can use the following file containing about 500 such
words generally accepted as too common:
stop_words
If you divide the task of filtering the common words between more than one process you get 2 more points.