What is Stream Computing?
In order to understand the advantages of using modern GPUS to perform general
computing tasks it is necessary to describe the traditional sequential way of
computing and the stream model as well as some differences and implications
of using these models.
As processor capability increases, so does
the complexity of effectively utilising the available resources. According to
[Owe05], ``The keys to making the best use of
transistors for computation are to maximise the hardware devoted to computation,
to allow multiple computation units to operate at the same time through parallelism,
and to ensure that each computation unit operates at maximum efficiency''. There
are several ways to achieve these. Some of them are:
- Task Parallelism:
-
refers to the ability of performing complex tasks (such
as memory fetches, triangle rasterisation, vector multiplication, etc.)
on different data at the same time.
- Data Parallelism:
- achievable through applying the same task on several sets of data at the same time.
- Instruction Parallelism:
- by performing several simple operations on the same data element.
Stream Computing Model
The Stream Computing Model allows high efficiency in computation
by treating all data as a stream and exploiting all three types of parallelism
at the cost of some restrictions. The following, are the key concepts behind
stream computing:
- Streams:
-
are ordered sets of data of the same type. It can be
anything from basic numbers to more complex types such as triangles, vectors,
colours or even objects. They can be of any length and allow certain operations
such as copying, concatenation, indexing and operations through the use
of kernels
- Kernels:
-
operate on entire streams. They can take or more input
streams and produce one or more output streams. Kernels are used
to perform different kinds of operations on streams such as mappings
(evaluating a function on each element of the stream), expansions(when
more than one output element is produced from an input one), reductions(when
more than one input element is combined into a single output one) and filters(selecting
and outputting a subset of the input elements)
One of the main characteristics of this model is that kernel
operations on one element of a stream are independent from any other element.
This allows us to execute the same kernel over different elements of a stream
at the same time thus, resulting in almost ``free'' data parallelism.
Complex operations or entire applications can be modelled
by chaining several kernels together and, provided that we have hardware support,
gain task, data and instruction parallelism for maximum computing efficiency.
Of course, not every algorithm can be mapped into a stream
computing model and even additional operations are required for the most trivial
ones. One of the most useful extensions to the ``pure'' stream programming model
is the inclusion of two types of inter-element communication: gather and
scatter
- Gather:
-
is performed when a kernel needs information on stream
elements other than the the one it is processing at a given time. Common
tasks like collision detection, anti-aliasing and surface properties determination
require information on neighbour elements of a stream. In terms of memory,
this operation requires a random-access reading mechanism.
- Scatter:
-
occurs when a kernel needs to distribute information
amongst other stream elements. This operation requires a random-access writing
mechanism.
Although the graphic-optimised architecture of modern GPUS
does not permit communication between stream elements per-se, there are ways
to overcome this limitation by using other hardware resources. In the next sections,
we will explore the general GPU architecture and explain how it can be mapped
to a stream computing model.