Introduction

Examples

Advanced use

GPGPU

Documentation

Installation

Downloads

Browse source

Forums

Tracker

Wiki

Blog







Using the GPU to speed-up calculations

GPGPU stands for General Processing on the GPU, and this is a technique that consists in using the GPU chip on the video card as a coprocessor that accelerates operations that are normally executed on the CPU.

Since GPUs are inherently parallel, there are certain type of calculations can be executed much faster than in conventional CPUs. In fact, GPU programming is part of a wide trend towards multi-core and parallel programming. A modern CPU can have 2 or 4 cores, while a medium range GPU has usually 32 processing cores as minimum. However, the nature of the cores in a GPU is very different from the ones found on a CPU. They are optimized for parallel vector operations, and are not very efficient in algorithms that have complex flow structures or random memory access. In any case, the latest advances are making GPUs much more flexible and closer to CPUs. On the other hand, CPUs companies are currently working in hybrid designs that will combine CPU and GPU in a single chip.

Again, the purpose of this website is not to provide a full tutorial on GPGPU. Pages as the following provide plenty of information on this topic:

Float textures


The GLGraphics allows to perform GPGPU computations by means of creating floating-point textures. The basic idea in GPGPU is to use the textures as buffers for input/output of the calculations, while the shaders represent the computational kernels that operate on the textures.


CPU array to store coordinates = floating-point texture
(taken from Mark Harris' presentation "Mapping computational concepts to the GPU" at Siggraph 2005 GPGPU course



A loop to update positions on the CPU maps to a GPU shader or kernel

A floating-point texture is created in GLGraphics as follows:
GLTextureParameters floatTexParams = new GLTextureParameters();
floatTexParams.format = GLTexture.FLOAT4;
GLTexture fpTex = new GLTexture(this, 400, 400, floatTexParams);
The created texture can store floating-point values in each one of its 4 (RGBA = XYZW) components. Another important concept in the context of GPGPU is that of ping-pong textures. Since a texture can be only read or write, but not both simultaneously, an operation that needs to update a texture based on its previous values requires two textures that are swapped continuously. This is, after the step where one of the textures has been used as input (read) and the other as write (output), the role of the textures are exchanged, so the one where the latest data was written to is now used as input. This tecnique is called ping-pong, and there is a class in GLGraphics called GLTexturePingPong that facilitates this operation:
GLTextureParameters floatTexParams = new GLTextureParameters();
floatTexParams.format = GLTexture.FLOAT4;

GLTexture fpTex1 = new GLTexture(this, 400, 400, floatTexParams);
GLTexture fpTex2 = new GLTexture(this, 400, 400, floatTexParams);

pingpongTex = new GLTexturePingPong(fpTex1, fpTex1);
// At this point, fpTex1 is considered read (input) texture,
// and fpTex2 is write (output).

...

// Swaping textures, now fpTex1 is write and fpTex2 is read.
pingpongTex.swap();

Particle system example

Simulations of large particle systems are a situation well suited for GPGPU acceleration. The latest version of the library comes with a few particle systems as examples. The "SimpleGPUParticleSystem" implements a system where the velocities of the particles are controled by the position and motion of the mouse. This example contains two filters used as computational/rendering kernels: one to simulate the motion of the particles, and another to draw the particles on the screen.


SimpleGPUParticleSystem when running

The positions and velocities of the particles are stored in four floating-point textures, which are grouped inside two GLTexturePingPong objects (one for the two position textures and the other for the two velocity textures). The system has N = W * H, particles,  where W and H are the width and height of these textures. This means that the texel (i, j) contains the position (velocity) of the (i * W + j) particle:
GLTextureParameters floatTexParams = new GLTextureParameters();
floatTexParams.format = GLTexture.FLOAT4;
partPosTex = new GLTexturePingPong(new GLTexture(this, SYSTEM_SIZE, floatTexParams),
new GLTexture(this, SYSTEM_SIZE, floatTexParams));
partVelTex = new GLTexturePingPong(new GLTexture(this, SYSTEM_SIZE, floatTexParams),
new GLTexture(this, SYSTEM_SIZE, floatTexParams)); 
The variable SYSTEM_SIZE is the number of particles specified by the user, however the final number of particles is approximated by power-of-two integer N which is closes to SYSTEM_SIZE.

The initial position of the particles is set to random values inside the (0, width)x(0, height) box, while the velocities are set to zero:
partPosTex.getReadTex().setRandom(0, width, 0, height, 0, 0, 0, 0);
partPosTex.getWriteTex().setRandom(0, width, 0, height, 0, 0, 0, 0);

partVelTex.getReadTex().setZero();
partVelTex.getWriteTex().setZero();
The movement of the particles is calculated by the movePartFilter filter, which takes as parameters the width and height of the screen, current mouse position and displacement vector with respect to the position in the previous frame:
movePartFilterParams.setVec21(width, height);
movePartFilterParams.setVec22(mouseX, mouseY);
movePartFilterParams.setVec23(mouseX - pmouseX, mouseY - pmouseY);

GLTexture[] inputTex = { partPosTex.getReadTex(), partVelTex.getReadTex() };
GLTexture[] outputTex = { partPosTex.getWriteTex(), partVelTex.getWriteTex() };
movePartFilter.apply(inputTex, outputTex, movePartFilterParams);
partPosTex.swap();
partVelTex.swap();
The values stored in the position texture are then used to render the particles at their correct positions with the renderPartFilter. This filter takes two textures as input: the position texture, and a texture to paint the particles:
inputTex[0] = partPosTex.getReadTex();
inputTex[1] = bubbleTex;
renderPartFilter.apply(inputTex, canvasTex);
The method to draw each particle on the correct position involves a technique called displacement mapping, which consists in altering the vertex coordinates using values read from a texture in the vertex shader. The filter pushes 4 * N vertices (a quad for each particle) through the GPU pipeline, each quad being centered at (0, 0). This is configured in the xml of te render filter by indicating a texture grid as follows:
<grid mode="compiled">
<resolution nx="w0" ny="h0" mode="quads"></resolution>
<point>
<coord x="-0.5" y="+0.5"></coord>
<texcoord s="s" t="t"></texcoord>
<texcoord s="0.0" t="1.0"></texcoord>
</point>
<point>
<coord x="+0.5" y="+0.5"></coord>
<texcoord s="s" t="t"></texcoord>
<texcoord s="1.0" t="1.0"></texcoord>
</point>
<point>
<coord x="+0.5" y="-0.5"></coord>
<texcoord s="s" t="t"></texcoord>
<texcoord s="1.0" t="0.0"></texcoord>
</point>
<point>
<coord x="-0.5" y="-0.5"></coord>
<texcoord s="s" t="t"></texcoord>
<texcoord s="0.0" t="0.0"></texcoord>
</point>
</grid>
The resolution of the grid is set to (w0, h0), which represents the resolution of the first input texture, in this case the position texture. Each vertex in this grid has texture coordinates (s, t) in the first texture unit, and these coordinates are used to read the position texture in the vertex stage of the shader and displace the vertex positions to the correct particle location. The second texture coordinate is used to draw the particle texture.