Dana Vrajitoru
B424 Parallel and Distributed Programming

Merge Sort

General idea

Here is the sequential algorithm that we start with:

void merge_sort(int a[], int first, int last, int aux[])
{
  if (last <= first) return;
  int mid = (first+last)/2;
  merge_sort(a, first, mid, aux);
  merge_sort(a, mid+1, last, aux);
  merge_arrays(a, first, mid, a,
    mid+1, last, aux, first, last);
  for (int i=first; i<=last; i++)
    a[i] = aux[i];
}

void merge_arrays(int a[], int afirst, int alast, 
                  int b[], int bfirst, int blast, 
                  int c[], int cfirst, int clast)
{
  // skip verification of size of c.
  int i=afirst, j=bfirst, k=cfirst;
  while (i<=alast && j<=blast) {
    if (a[i] <= b[j])
      c[k++] = a[i++];
    else
      c[k++] = b[j++];
  }
  while (i<=alast)
    c[k++] = a[i++];
  while (j<=blast) 
    c[k++] = b[j++];
}

Global data structures: the two arrays.

int a[], aux[]; // We assume that they are taken care of properly.

We will store the information that the threads need in a structure. We'll have to communicate the first and last indexes inside which the array needs to be sorted, and the recursion level. We assume that we'll start with the maximum number of recursion levels for which we want to divide the function into threads, and we'll decrease it by one each time for the threads created for the recursive calls. The structure is declared this way:

struct Merge_data {
  int first, last;
  int rec_level;
};

In the main function we only need to create one thread that starts the initial call. We have to declare a variable of type struct, store the first (0), the last (size-1) and the maximum number of recursive levels that we'll create the threads for in it. Then we pass its address in the thread creation, type cast as a void pointer.

int main () { 
  ...
  Merge_data md = {0, size-1, max_level};
  pthread_t t;
  pthread_create(t, NULL, pmerge_sort, (void *) (&md));
  pthread_join(t, NULL); 
  ...
}

The parallel merge sort function used to create the threads. It starts by extracting the data from the argument after casting it as a Merge_data pointer. It checks the base case that the array has less than 2 elements first. Second, if the level has reached 0, then it calls the sequential merge sort. Otherwise it call the function launch_thread that creates a thread in place of each of the recursive calls. The storage structures md1 and md2 are declared in this function so that they are not deallocated because they've gone out of scope until the threads have properly extracted the information from them.

void *pmerge_sort(void *arg)
{
  Merge_data *md = (Merge_data *)arg;
  int first = md->first, last = md->last;
  int level = md->level; 
  if (last <= first) return;
  if (level == 0)
    merge_sort (a, first, last, aux); 
  else {
    pthread_t t1, t2;
    Merge_data md1, md2;
    int mid = (first+last)/2;
    launch_thread(t1, md1, first, mid, level-1);
    launch_thread(t2, md2, mid+1, last, level-1);
    join_merge(t1, t2, first, mid, last);
  }
}

The function launching new threads: it stores the information received through parameters into the Merge_data structure, then creates the thread.

void launch_thread (pthread_t &t, 
                    Merge_data &md, 
                int first, int last, int level)
{
  md.first = first;
  md.last = last;
  md.level = level;
  void * ptr_md = (void *)&md;
  pthread_create(&t, NULL, pmerge_sort, ptr_md);
}

The last function join_merge, called after the threads for the recursive calls were created. It waits for the two child threads to finish their execution first. At that point, two halves of the array are sorted, but they still need to be merged. It does this by calling the sequential function.

void join_merge(pthread_t &t1, pthread_t &t2,
                int first, int mid, int last)
{
  pthread_join(t1, NULL);
  pthread_join(t2, NULL);  
  merge_arrays(a, first, mid, a, mid+1, last, aux, first, last);
  for (int i=first; i<=last; i++)
    a[i] = aux[i];
}