Dana Vrajitoru
B424 Parallel and Distributed Programming

### Embarrassingly Parallel Algorithms

• Sometimes called naturally parallel algorithms.
• Simplest type of parallel algorithms requiring almost no communication between the processes.
• Each process can perform their own computations without any need for communication with the others.
• It may require some initial partition of the data or collecting the results in the end, but not always.

Ideal Case

• All the sub-problems or tasks are defined before the computations begin.
• All the sub-solutions are stored in independent memory locations (variables, array elements).
• Thus, the computation of the sub-solutions is completely independent.
• If the computations require some initial or final communication, then we call it nearly embarrassingly parallel. Variations / Hybrid Systems

• The solutions must be accumulated in a shared data structure - this requires some coordination at the end.
• Some non-obvious termination condition, such as something being found. This requires intermediate coordination.

Classic Examples

• Protein folding software that can run on any computer with each machine doing a small piece of the work.
• SETI project (Search for Extra-Terrestrial Intelligence)
SETI
• Generation of All Subsets.
• Generation of Pseudo-random numbers.
• Monte Carlo Simulations.
• Mandelbrot Sets (a.k.a. Fractals)

Monte Carlo Simulations

• Algorithms that approximate a function using random selections in calculations.
• Example: approximating pi by comparing the area of a circle to that of a rectangle enclosing it.
• If r is the radius of the circle, then the area of the rectangle is 4 r2. That of the circle is pi r2. The ratio between then is pi/4.
• A MC algorithm can generate n random points inside the rectangle, count the ones inside the circle, divide by the total, and multiply by 4. Total points: 65 Points inside the circle: 56 pi ~= 4*56/65 = 3.44615

Sequential Algorithm

```Count_pt(int N) {
double count_in = 0;
for (i=0; i < N; i++) {
x = randr(-1, 1);
y = randr(-1, 1);
if (x*x+y*y <= 1)
count_in++;
}
return count_in;
}
Pi(N) {
return 4*Count_pt(N)/N;
}
```

Parallel Algorithm

```n = N/no_proc; // assuming N is a multiple of no_proc
Master_count(n) {
my_count = Count_pt(n);
for (i=1; i < no_proc; i++) {
Recv(count, i);
my_count += count;
}
return my_count;
}
Slave_count() {
my_count = Count_pt(n);
Send(my_count, 0);
}
```
Here the call to the function Master_count replaces the function call Count_pt in the function PI.

Pseudo Random Numbers

• Technique 1. One process can generate the pseudo-random numbers and send to other processes that need the random numbers. This is sequential.
• Technique 2. Use a separate pseudo-random number generator on each process. Each process must use a different seed. This is embarrassingly parallel.
• The choice of seeds used at each process is important. Simply using the process id or time of the day can yield to bad distributions. One can use the /dev/random device driver in Linux to get good seeds.
• Technique 3. Convert the linear generator such that each process only produces its share of random numbers. It is easier to control but requires more communication.

An Example
A common way to generate pseudo-random numbers is using large primes a and c:
y0 = s; // the seed
y0 = s;
yi+1 = a yi + c
We can convert it to a dependency on a number several positions back.
yi+k = ak yi + (ak-1 + ... + a2 + a + 1) c = ak + c (ak-1)/(a-1)

k Steps If we denote by A = ak and C = c(ak-1)/(a-1), then
yi+k = A yi + C (*)
computes the kth next number in the sequence.

The master computes p0, p1, ..., pk-1 using the simple formula, then sends them to each process.

Each process generates its own random numbers from the seed received from the master using (*). 