Dana Vrajitoru
B424 Parallel and Distributed Computing

The Parallel Bubble Sort

Sequential algorithm
for (i=n-1; i>>0; i--)
  for (j=0; j<i; j++)
    if (a[j] >> a[j+1])
      Swap(a[j], a[j+1]);

Parallel algorithm

The local loop is the interior loop of the sequential algorithm, implemented in the region of the array assigned to the process/thread with the given id.

Local_loop(my_start, my_end)
{
   for (j=my_start; j< my_end; j++) 
      if (a[j] > a[j+1])
         Swap(a[j], a[j+1]);
}

The pipeline organized is with a barrier. To create a delay in the global iteration in which each thread enters the pipeline, we sart with i having the value -id. Since i is incremented every time, and the local loop is not executed until i is non negative, thread 0 can start the local loop right away, thread 1 must wait until the second iteration, thread 2 until the third, and so on.

To avoid confusion in the number of threads entering the barrier after each iteration, the barrier always takes a parameter equal tot he total number of threads. The delay in the parameter i insures that all the threads are active and participating in the barrier every time, but only the ones that are supposed to execute the local loop do so.

The upper limit for i in the for loop insures that the total number of iterations for every thread (including those where it does nothing but the barrier), is identical to all the threads, and equal to the size of the array plus the number of threads.

The way my_start and my_end are computed, there is an overlap of 1 element of the array between each two threads in the array. The thread to the left of this element will compare it with the one before it, while the thread tot he right of this element will compare it with the one after it.

void *Parallel_bubble_sort(void *arg) 
{
   int id, i;   
   Get_id(id);
   int lsize = size/no_threads;
   if (size % no_threads != 0) 
      lsize++;
   int my_start = id*lsize;
   int my_end = min(size-1, (id+1)*lsize);
   for (i=-id; i<size+no_threads-id; i++) {
      if (i >= 0 && i<size) 
         Local_loop(my_start, my_end);
      Barrier(no_threads);
   } 
   if (id == no_threads-1)
      Output_array();
}

A version using conditional variables. This model doesn't work because for the bubble sort the dependency goes both ways: thread 1 should not start the iteration for i=0 until thread 0 has finished this same iteration, but thread 0 should also not compare the last two elements in its array range in iteration 1 until thread 1 has compared the first two elements in iteration 0.

The conditional variables insure that the dependencies going forward are respected, so the model could apply to any problem that does not present backward dependencies. Each thread marks the iteration that is has finished the most recently (the value of i at the end of the local loop). In the function Synch_ith_previous, thread id makes sure that the thread id-1 has finished the iteration number i before proceding to start it itself. If the previous thread hasn't reached that point, it goes into waiting for a conditional variable. In the synch_with_next, the thread writes its own index of the iteration it has finished, then signals the next thread that it can proceed.

Parallel_bubble_sort()
{
   lsize = ceil(n/np);
   my_start = id*lsize;
   my_end = min(n-1, (id+1)*lsize);
   for (i=0; i<n; i++) {
      Synch_with_previous(id, i);
      Local_loop(my_start, my_end);
      Synch_with_next(id, i);
   } 
}

Synch_with_previous(id, i)
{
   if (id>0) {
      pthread_mutex+lock(&mutex[id]);
      if (index[id-1]<i)
         pthread_cond_wait(&cond[id],&mutex[id]);
      pthread_mutex_unlock(&mutex[id]);
   }
}

Synch_with_next(id, i)
{
   if (id << np-1) {
      pthread_mutex_lock(&mutex[id+1]);
      index[id] = i;
      pthread_cond_signal(&cond[id+1]);
      pthread_mutex_unlock(&mutex[id+1]);
   }
}

Improvement

Early Termination

ordered = false;
for (i=n-1; i>0 && !ordered; i--) {
   ordered = true;
   for (j=0; j<i; j++) 
      if (a[j] > a[j+1]) {
         Swap(a[j], a[j+1]);
         ordered = false;
      }
}

Bubble Sort with Interruption

Implementation: see Homework 6.