Parallel Computer Graphics

Dana Vrajitoru
B424 Parallel and Distributed Programming

Parallel Computer Graphics

Convex Hull

Given a number of points, find a subset of points forming a convex polygon enclosing all the points.
The points of the hull are called extreme points.

Graham's Scan

Start with the lowest, left-most point.
Order all the other points by angle with the first.
Traverse the points from the bottom in order of the angle. For each of them, if it's a left turn (1), accept it in the hull.
If it's a right turn (2), discard the last point and connect the new point with the one before the last (3). Then check backwards to make sure the hull is convex.
O(n log(n))

Jarvis's March

Start similarly to Graham, with a left-most lowest point. Order the others by angle with the first.
Wrap around the points in counter-clockwise direction, starting from the vertical down direction. The first encountered point is on the hull.
Start wrapping again from the new point with the first direction being the line connecting it to the previous point.
O(n*hull size), worst O(n²).

Parallel Version

Divide the points in equal sets of points by the x coordinate.
Compute the convex hull for each of them using some other algorithm.
Then combine the hull by connecting the lowest and highest points from each of them, then remove the intermediate points.

Geometrical Transformations

Translating - The coordinates of a two-dimensional object shifted by dx in the x-dimension and dy in the y-dimension are given by
x' = x + dx
y' = y + dy
Scaling by factors Fx and Fy:
x' = x Fx
y' = y Fy
Rotating by angle a around the origin:
x' = x cos(a) + y sin(a)
y' =-x sin(a) + y cos(a)

Partitioning into Processes

The transformation can start from the target region and go backward towards the source or the other way around.
The division can be done in a grid or by rows.

Master() {
   for (i = 0, row = 0; i < 48; i++, row = row + 10)
      send(row, i); // send row no to each process
   for (i = 0; i < 480; i++) /* initialize temp */
      for (j = 0; j < 640; j++)
         temp_map[i][j] = 0;
   for (i = 0; i < (640 * 480); i++) { // for each pixel 
      recv(oldrow,oldcol,newrow,newcol, any_source); 
      if (!((newrow < 0)||(newrow >= 480)||
            (newcol < 0)||(newcol >= 640)))
         temp_map[newrow][newcol]=map[oldrow][oldcol];
   }
   for (i = 0; i < 480; i++) /* update bitmap */
      for (j = 0; j < 640; j++)
         map[i][j] = temp_map[i][j];
}
Slave() {
   recv(row, master); // receive row no. 
   for (oldrow = row; oldrow < (row + 10); 
        oldrow++)
      for (oldcol = 0; oldcol < 640; oldcol++)
      { // transform coords 
          newrow = oldrow + delta_x; // shift
          newcol = oldcol + delta_y; 
          send(oldrow,oldcol,newrow,newcol, 
               master); // coords to master
      }
}
// In reverse order
Slave() {
   recv(row, master); // receive row no. 
   for (newrow = row; newrow < (row + 10); 
        newrow++)
      for (newcol = 0; newcol < 640; newcol++)
      { // transform coords 
         oldrow = newrow / Fx; // reverse scale
         oldcol = newcol / Fy; 
         send(oldrow,oldcol,newrow,newcol, 
              master); // send coords to master
    }
}

Reversing the Rotation

Basically, it's a rotate of -angle, knowing that sin(-a) = -sin(a) cos(-a) = cos(a)
oldrow = newrow*cos(a) - newcol*sin(a);
oldcol = newrow*sin(a) + newcol*cos(a);

Mandelbrot Set

Set of points in a complex plane that are quasi-stable (will increase and decrease, but not exceed some limit) when computed by iterating the function
z_k+1 = z_k² + c
where z_k+1 is the (k+1)th iteration of the complex number
z = a + bi
and c is a complex number giving the position of the point in the complex plane. The initial value for z is zero.
The iterations are continued until magnitude of z is greater than 2 or the number of iterations reaches some arbitrary limit.
For each point in the image, we start from c = x + i y, the coordinates, and z=0+0i, (can be scaled by a factor). The amplitude is sqrt(x²+y²).
z² = a² + 2abi + bi² = (a² - b²) + (2ab)i
zreal = zreal²-zimag²+creal
zimag = 2 zreal zimag + cimag
The number of iterations necessary for zreal²+zimag² >= 4 (or a given max) gives us the color/shade of the pixel: 0 = white, max = black.

Range of the image: [-2, 2]x[-2, 2].


// Calculates the number of iterations
// for a pixel with creal = x, cimag = y.

int cal_pixel(float creal, float cimag) {
   int count=0, max = 256;
   float temp, lengthsq, zreal=0, zimag=0;
   do {
      temp = zreal*zreal - zimag*zimag + creal;
      zimag = 2*zreal*zimag + cimag;
      zreal = temp;
      lengthsq = zreal*zreal + zimag*zimag;
      count++;
   } while ((lengthsq < 4.0) && (count < max));
   return count;
}

Generate_fractal(int width, int height, 
        float real_min, float real_max, 
        float imag_min, float img_max) {
  // scale the fractal to the image size 
   float scalex = (real_max - real_min)/width;
   float scaley = (imag_max - imag_min)/height;
   // generate the fractal:
   for (x = 0; x < width; x++)
      for (y = 0; y < height; y++) {
         creal = real_min + (float) x * scalex;
         cimag = imag_min + (float) y * scaley;
         color = cal_pixel(creal, cimag);
         display(x, y, color);
     }
}

Parallel Mandelbrot

It is almost embarrassingly parallel.
It depends on the access of each process to the display or the stored image.
Problem division: the calculation of the number of iterations for each pixel can vary quite a lot.
In general it is more suited for the pool of tasks division.

Master(min_real, max_real, min_imag, max_imag) {
   float scalex = (real_max - real_min)/width;
   float scaley = (imag_max - imag_min)/height;

   bcast([width,height,scalex,scaley,min_real, 
         max_real,min_imag,max_imag]);

   for (i=0, row=0; i<48; i++, row += 10)
      send(row, i); // send row no. to process i
   for (i=0; i<(480 * 640); i++) {
      recv([x, y, color], any_source);
      display(x, y, color); 
  }
}
Slave () {
   bcast([width,height,scalex,scaley,min_real, 
         max_real,min_imag,max_imag]);
   recv(row, master);
   for (x = 0; x < width; x++) 
      for (y = row; y < (row + 10); y++) {
         creal = real_min + (float) x * scalex;
         cimag = imag_min + (float) y * scaley;
         color = cal_pixel(creal, cimag);
         send([x, y, color], 0); //send to master
      }
}

Flood Fill

Starting from a pixel in the image, fill the entire area that is connected to it with a new color.
We define the connectivity for (x, y) based on 4 neighbors: (x-1, y), (x+1, y) (x, y-1), (x, y+1).
The method is highly recursive. It scans the row containing the pixel and as long as the color is contiguous, it calls itself recursively on the pixel above and below.
Hard to determine from the start how many recursive calls will happen.
Decomposition: exploratory and recursive.
Probably easier in shared memory mode.

FloodFill (x0, y0, fill_color, original_color) {
   int color, x, y;
   x = x0; y = y0;
   getPixel(x, y, color);
   if (color == original_color) {
      while (color == original_color) {
         setPixel(x, y, fill_color);
         FloodFill(x, y+1, fill_color, original_color);
         FloodFill(x, y-1, fill_color, original_color);
         x = x - 1; // go left
         getPixel(x, y, color);
      }
      x = x0 + 1; y = y0;
      getPixel(x, y, color);
      while (color == original_color) {
         setPixel(x, y, fill_color);
         FloodFill(x, y+1, fill_color, original_color);
         FloodFill(x, y-1, fill_color, original_color);
         x = x + 1; // go right
         getPixel(x, y, color);
      }
   }
}

Ray Tracing - the Method

Send a ray from each pixel to the reverse direction of the light coming in.
Compute the closest intersection of this ray with any object.
Consider the ray in the direction of each light source from the intersection point (shadow rays).
For each of them, check if the ray intersects another object or not. Compute the local contribution from the rays without intersection.
Add to this the light obtained by reflection and refraction in this point computed recursively.

Speed-up Techniques

Recursive depth limit.
Bounding boxes or spheres reduce the intersection calculation.
Vista buffer: projecting the objects onto the screen in pre-processing so we know what regions are empty.
Light projection on a cube around each object in pre-processing.

Parallel Ray Tracing

The image can be divided equally into pieces (rectangles, rows, pixels) assigned to each process.
The computation of each pixel is quite intensive and can vary based on the number of objects intersecting the ray.
Also suitable for pool of tasks.
2008: Intel on 4 quad-core Xeon CPUs, running Quake Wars at 15-20fps. GPU-based implementations: Nvidia: programmable Ray Tracing Engine OPTIX (2009).
Currently: Intel using cloud computing and the Intel Many Integrated Core (MIC) architecture. Reported 38fps in HD.

Existing Techniques

Screen subdivision: dividing the pixels to be computed among the processes. Load balancing issues.
Object subdivision: geometrical distribution of objects to processes. The processes need to exchange rays.
Functional decomposition: assigning different task, such as "reflective rays computation" or "shadow ray" computations to each process. Can be implemented in a client-server model.