B424 Parallel and Distributed Programming

**General Ideas**

- Identify the portions of code that can be done in parallel.
- Mapping the code onto multiple processes.
- Distributing the input, output, and intermediate data
- Managing the access to shared resources.
- Synchronizing the processes at various stages of the program.

**Code Decomposition**

*Decomposition:*the operation of dividing the computation into smaller parts, some of which may be executed in parallel.*Task:*programmer-defined units of code resulting from decomposition.*Granularity:*the number / size of the tasks.*Fine-grained*decomposition: a large number of tasks*Coarse-grained*decomposition: small number of tasks.*Degree of concurrency:*the maximum number of tasks that can be executed in the same time.

**Decomposition Techniques**

*Recursive*decomposition: used for traditional divide-and-conquer algorithms that are not easy to solve iteratively.*Data*decomposition: the data is partitioned and this induces a partitioning of the code in tasks.*Functional*decomposition: the the functions to be performed on data are split into multiple tasks.*Exploratory*decomposition: decompose problems equivalent to a search of a space for solutions.*Speculative*decomposition: when a program may take one of many possible branches depending on results from computations preceding the choice.

**Characteristics of Tasks**

*Task generation*:

- static - the tasks are known in advance (data decomposition)
- dynamic - decided at runtime (recursive decomposition)

*Task size*:

- uniform (they require approximately the same amount of time) or
- non-uniform
- known/not known.

*Task Interaction*

*Static*: it happens at predetermined times and the set of tasks to interact with is known in advance.*Dynamic*: the timing of the interaction or the set of tasks to interact with are unpredictable. Harder to implement.*Regular/irregular*: it is regular if the interaction follows a pattern that can be exploited for efficiency.

**Recursive Decomposition**

Examples: QuickSort or MergeSort.

- In both cases the operation of sorting an array is divided into two sub problems that can be solved recursively.
- Both problems are hard to implement iteratively.
- For the Quicksort the task generation is dynamic and the task size is non-uniform.
- For the MergeSort the task generation is static and the task size is uniform.

**The MergeSort**

- Divides the array in 2, sorts the 2 parts recursively, then merges the arrays.
- The computations are organized in a binary tree.
- Each process receives an array to sort from the parent (except for the master).
- The process divides the array in 2 and sends the halves to the children.
- After the children are done computing, they send the sorted arrays back to the parent.
- The parent performed the merge and sends the array back up in the tree.

**Sequential MergeSort**

void merge_sort(int a[], int first, int last, int aux[]) { if (last <= first) return; int mid = (first+last)/2; merge_sort(a, first, mid, aux); merge_sort(a, mid+1, last, aux); merge_arrays(a, first, mid, a, mid+1, last, aux, first, last); for (int i=first; i<=last; i++) a[i] = aux[i]; }

void merge_arrays(int a[], int afirst, int alast, int b[], int bfirst, int blast, int c[], int cfirst, int clast) { // skip verification of size of c. int i=afirst, j=bfirst, k=cfirst; while (i<=alast && j<=blast) { if (a[i] < b[j]) c[k++] = a[i++]; else c[k++] = b[j++]; } while (i<=alast) c[k++] = a[i++]; while (j<=blast) c[k++] = b[j++]; }

**Parallel MergeSort**

void parallel_merge_sort() { if (proc_id > 0) { Recv(size, parent); Recv(a, size, parent); } mid = size/2; if (both children) { Send(mid, child1); Send(size-mid, child2); Send(a, mid, child1); Send(a+mid, size-mid, child2); Recv(a, mid, child1); Recv(a+mid, size-mid, child2); merge_arrays(a, 0, mid, a, mid+1, size, aux, 0, size); // declare aux local for (int i=first; i<=last; i++) a[i] = aux[i]; } else merge_sort(a, 0, size); if (proc_id > 0) Send(a, size, parent); }

**Data Decomposition**

- Input, output, or intermediate data decomposition.
- Input: if each output is described as a function of the input directly. Some combination of the individual results may be necessary.
- Output data decomposition: if it applies, it can result in less communication.
- Intermediate data decomposition more rare.
- Owner computes rules: the process that owns a part of the data performs all the computations related to it.