Efficient Computation Considerations

As processor capability increases, so does the complexity of effectively utilising the available resources. According to [Owe05], ``The keys to making the best use of transistors for computation are to maximise the hardware devoted to computation, to allow multiple computation units to operate at the same time through parallelism, and to ensure that each computation unit operates at maximum efficiency''. There are several ways to achieve these. Some of them are:

Task Parallelism:

refers to the ability of performing complex tasks (such as memory fetches, triangle rasterisation, vector multiplication, etc.) on different data at the same time.

Data Parallelism:

achievable through applying the same task on several sets of data at the same time.

Instruction Parallelism:

by performing several simple operations on the same data element.

Stream Computing Model

The Stream Computing Model allows high efficiency in computation by treating all data as a stream and exploiting all three types of parallelism at the cost of some restrictions. The following, are the key concepts behind stream computing:

Streams:

are ordered sets of data of the same type. It can be anything from basic numbers to more complex types such as triangles, vectors, colours or even objects. They can be of any length and allow certain operations such as copying, concatenation, indexing and operations through the use of kernels

Kernels:

operate on entire streams. They can take or more input streams and produce one or more output streams. Kernels are used to perform different kinds of operations on streams such as mappings (evaluating a function on each element of the stream), expansions(when more than one output element is produced from an input one), reductions(when more than one input element is combined into a single output one) and filters(selecting and outputting a subset of the input elements)

One of the main characteristics of this model is that kernel operations on one element of a stream are independent from any other element. This allows us to execute the same kernel over different elements of a stream at the same time thus, resulting in almost ``free'' data parallelism.

Complex operations or entire applications can be modelled by chaining several kernels together and, provided that we have hardware support, gain task, data and instruction parallelism for maximum computing efficiency.

Of course, not every algorithm can be mapped into a stream computing model and even additional operations are required for the most trivial ones. One of the most useful extensions to the ``pure'' stream programming model is the inclusion of two types of inter-element communication: gather and scatter

Gather:

is performed when a kernel needs information on stream elements other than the the one it is processing at a given time. Common tasks like collision detection, anti-aliasing and surface properties determination require information on neighbour elements of a stream. In terms of memory, this operation requires a random-access reading mechanism.

Scatter:

occurs when a kernel needs to distribute information amongst other stream elements. This operation requires a random-access writing mechanism.

Although the graphic-optimised architecture of modern GPUS does not permit communication between stream elements per-se, there are ways to overcome this limitation by using other hardware resources. In the next sections, we will explore the general GPU architecture and explain how it can be mapped to a stream computing model.

What is Stream Computing?

Efficient Computation Considerations

Stream Computing Model