Dana Vrajitoru
B424 Parallel and Distributed Programming

Parallel Libraries

Major Libraries / APIs

OpenMP

Pragma

Directive-Based OMP

OpenMP Directives

  • Spawning a parallel region
  • Dividing blocks of code among threads
  • Distributing loop iterations between threads
  • Serializing sections of code
  • Synchronization of work among threads
  • Shared / Not Shared Memory

    OpenMP Routines / Functions

  • Setting and querying the number of threads
  • Querying a thread's unique identifier (thread ID), a thread's ancestor's identifier, the thread team size
  • Setting and querying the dynamic threads feature
  • Querying if in a parallel region, and at what level
  • Setting and querying nested parallelism
  • Setting, initializing and terminating locks and nested locks
  • Querying wall clock time and resolution.
  • OpenMP Environment Variables

    Example

    #include <omp.h>
    #include <cstdio>
    #include <cstdlib>
    int main (int argc, char *argv[]) {  
       int nthreads, tid;  
       /* Fork a team of threads giving them their own 
          copies of variables */
    #pragma omp parallel private(nthreads, tid){   
       /* Obtain thread number */
       tid = omp_get_thread_num();
       printf("Hello World from thread = %d\n", tid);
       /* Only master thread does this */
       if (tid == 0) {
          nthreads = omp_get_num_threads();
          printf("Number of threads = %d\n", nthreads);
       }
       /* All threads join master thread and disband */
    }
    

    Critical Section

    Loop Parallelization

    Synchronization

    OpenCL

    Main Idea

    Specific Examples

    More Examples

    CUDA

    GPU Structure

    Ideas

    Hello World

    __global__ void kernel( void ) {
    }
    int main( void ) {
      kernel<<<1,1>>>();
      printf( "Hello, World!\n" );
      return 0;
    }
    // Angle brackets: 
    // <<<#blocks, #threads>>>
    

    Memory Management

    Vector Operations

    __global__ void add( int *a, int *b, int *c ) {
      c[blockIdx.x] = a[blockIdx.x] + b[blockIdx.x];
    }
    add<<< N, 1 >>>( dev_a, dev_b, dev_c );
    
    where blockIdx.x identifies the current block. OR
    __global__ void add( int *a, int *b, int *c ) {
      c[threadIdx.x] = a[threadIdx.x] + b[threadIdx.x];
    }
    add<<< 1, N >>>( dev_a, dev_b, dev_c );
    
    where threadIdx.x identifies the current thread.

    Threads / Blocks Properties

    Links