Dana Vrajitoru
B424 Parallel and Distributed Programming
Embarrassingly Parallel Algorithms
 Sometimes called naturally parallel algorithms.
 Simplest type of parallel algorithms requiring almost no
communication between the processes.
 Each process can perform their own computations without any need
for communication with the others.
 It may require some initial partition of the data or collecting
the results in the end, but not always.
Ideal Case
 All the subproblems or tasks are defined before the
computations begin.
 All the subsolutions are stored in independent memory locations
(variables, array elements).
 Thus, the computation of the subsolutions is completely
independent.
 If the computations require some initial or final communication,
then we call it nearly embarrassingly parallel.
Variations / Hybrid Systems
 The solutions must be accumulated in a shared data structure 
this requires some coordination at the end.
 Some nonobvious termination condition, such as something being
found. This requires intermediate coordination.
 Not all subproblems are known initially. Additional overhead
necessary to keep track of the tasks (pool of tasks).
Classic Examples
 Protein folding software that can run on any computer with each
machine doing a small piece of the work.
 SETI project (Search for ExtraTerrestrial Intelligence)
SETI
 Generation of All Subsets.
 Generation of Pseudorandom numbers.
 Monte Carlo Simulations.
 Mandelbrot Sets (a.k.a. Fractals)
Monte Carlo Simulations
 Algorithms that approximate a function using random selections in
calculations.
 Example: approximating pi by comparing the area of a circle to that
of a rectangle enclosing it.
 If r is the radius of the circle, then the area of the rectangle
is 4 r^{2}. That of the circle is pi r^{2}.
The ratio between then is pi/4.
 A MC algorithm can generate n random points inside the rectangle,
count the ones inside the circle, divide by the total, and multiply
by 4.

Total points: 65
Points inside the circle: 56
pi ~= 4*56/65 = 3.44615

Sequential Algorithm
Count_pt(int N) {
double count_in = 0;
for (i=0; i < N; i++) {
x = randr(1, 1);
y = randr(1, 1);
if (x*x+y*y <= 1)
count_in++;
}
return count_in;
}
Pi(N) {
return 4*Count_pt(N)/N;
}
Parallel Algorithm
n = N/no_proc; // assuming N is a multiple of no_proc
Master_count(n) {
Broadcast(n, 0);
my_count = Count_pt(n);
for (i=1; i < no_proc; i++) {
Recv(count, i);
my_count += count;
}
return my_count;
}
Slave_count() {
Broadcast(n, 0);
my_count = Count_pt(n);
Send(my_count, 0);
}
Here the call to the function Master_count replaces the function call
Count_pt in the function PI.
Pseudo Random Numbers
 Technique 1. One process can generate the pseudorandom
numbers and send to other processes that need the random
numbers. This is sequential.
 Technique 2. Use a separate pseudorandom number generator
on each process. Each process must use a different seed. This is
embarrassingly parallel.
 The choice of seeds used at each process is important. Simply
using the process id or time of the day can yield to bad
distributions. One can use the /dev/random device driver in Linux to
get good seeds.
 Technique 3. Convert the linear generator such that each
process only produces its share of random numbers. It is easier to
control but requires more communication.
An Example
A common way to generate pseudorandom numbers is using large primes a and c:
y0 = s; // the seed
y_{0} = s;
y_{i+1} = a y_{i} + c
We can convert it to a dependency on a number several positions back.
y_{i+k} = a^{k} y_{i} + (a^{k1} +
... + a^{2} + a + 1) c = a^{k} + c (a^{k}1)/(a1)
k Steps
If we denote by A = a^{k} and C = c(a^{k}1)/(a1), then
y_{i+k} = A y_{i} + C (*)
computes the k^{th} next number in the sequence.
The master computes p_{0}, p_{1}, ..., p_{k1}
using the simple formula, then sends them to each process.
Each process generates its own random numbers from the seed received
from the master using (*).