Using the DMAC in Games Programming
This article will describe how the various Direct Memory Access Control Tags (DMATags) can be used to help manage the transfer of model and texture data through the graphics pipeline of the PlayStation2 in a typical Computer Game application.
The internal structure and main data paths within the PS2 are shown in figure 1. The DMAC is responsible for transferring data between main memory and each of the independent processors and between main memory and scratchpad RAM.
During the execution of typical game code, the DMAC is responsible for transferring vertex data and transformation/lighting matrices to Vector Unit 1 (VU1), and image data for primitive texturing to the Graphics Synthesiser (GS). In order to maintain an effective frame rate it is important that as much of this data as possible is pre-compiled and efficiently organised prior to run time. Such organisation frees up the main processor from this mundane task and allows it to perform other important game related functions such as AI and game logic during game execution.
Image data used for texturing is normally sent to the GS via path 2 or 3. Path 3 is a direct path to the GIF whilst Path 2 is through VIF1 to the GIF. There are a few additional overheads associated with sending data via Path 2 but Path 2 has the advantage of providing inherent synchronisation between texture and vertex data.
Typical image data may be many KiloBytes in size and generally larger than the 4 kByte memory block allocation size provided under SPS2. It is therefore necessary to split the image data into 4 kByte blocks and stitch these blocks together with appropriate DMATags. As discussed above, such organisation of the texture data should be undertaken prior to run time. Achieving this with memory stitching is outlined below.
The process of pre-compiling image data will be demonstrated using two different methods of memory stitching. The first method uses cnt and next tags and the second uses ref tags.
Organising data with cnt and next tags is illustrated in Figure 2. A cnt tag with itís qword count field (QWC) set to 254 is inserted at the start of each full 4k block. The value in the address field (ADDR) is not used with cnt tags and can be cleared to zero. The cnt tag instructs the DMAC to transfer QWC of data following the tag, and read the quad word after that data as the next DMATag, which in this case is a next tag. The purpose of the next tag is to direct the DMAC to the start of the next 4k block to be transferred. This is achieved by setting the ADDR field of the next tag to point to address A1 (which is the start of the next 4k Block) and the QWC field of the tag to zero to indicate that no data is to be transferred with this tag. The DMAC therefore reads the cnt tag at address A1 as the next instruction and this process repeats until the last block is reached. The QWC of the cnt tag in the last block is set to the amount of data to be transferred and the transfer process is ended by inserting an appropriately configured end tag after the final data section.
It is interesting to note that the final end tag could be replaced with a ret tag if the data packet is part of a call chain, but this will be described in more detail later in this article.
Organisation of data with ref tags is illustrated in figure 3. In this case, the 4k block contain only the data to be transferred and there are no embedded DMATags within the data. A separate area of memory is required to build the DMAC command chain which is constructed using ref tags and ended with a refe tag. The tag at address A3 is the first to be read and this instructs the DMAC to transfer the 4k block starting at address A0 then read the tag after the one at A3 as the next tag. This process continues until the final refe tag is reached, this transferring the final section of data then ending the transfer. In this case, if the DMA chain is part of a call chain the final refe tag can be replaced by an appropriately configured ret tag.
There are relative advantages to both of these methods of memory stitching. The use of cnt and next tags requires only one area of memory to be configured, whist the use of ref tags requires two areas of memory but only about half the number of tags.
Each of the DMAC channels to VIF0, VIF1 and the GIF contain tag address save registers which can be used to facilitate the creation of data subroutines. Data subroutines are similar to normal program subroutines in that once called, the subroutine performs itís function then returns control back to the main line of execution.
An example of a call chain is illustrated in figure 4. The data section at the right of the figure is stitched together into as large a packet as required and is ended with a return (ret) tag. The organisation of the data into this format would be undertaken prior to run time. The transfer is initiated when the DMAC reads the first call tag from the start of the call chain shown on the left hand side of figure 4.
On reading the first call tag from the call chain, the DMAC pushes the following qword (which in this case is the next call tag) onto the call stack and reads the qword pointed to by the ADDR field in the call tag as the next tag. This action is carried out since the qword count (QWC) field of the call tag is set to zero. DMAC control then passes to the first cnt tag in the data section which is the first qword of the stitched data to be transferred. When the DMAC reads the ret tag at the end of the data, it transfers the number of qwords following this tag (which in this case is zero) then reads the qword popped from the call stack as the next tag. The next tag will thus be the second call tag in the call chain. This process repeats until the final end tag is reached in the call chain and the transfer is ended.
Now that the process of creating and transferring pre-compiled data chains has been describes, the use of such techniques in the writing of games programs will be discussed.
Consider the situation of a game consisting of several animated 3D models which must be sent down the graphics pipeline for rendering. It is advisable in such situations to cull as many objects as possible from the pipeline as early as possible within the pipeline thus saving valuable processing time. A simple, first approximation method might be to generate bounding spheres round each model and test each sphere against the view frustum. Models inside or partly inside the frustum will require further processing whilst models fully outside the frustum can be culled. Consider therefore the pseudo-code shown in figure 5:
Test visibility of model1;
if visible (CALL pointing to Subchain1);
Test visibility of model2;
if visible (CALL pointing to Subchain2);
REF pointing to model1 texture;
REF pointing to model1 matrix data;
REF pointing to model1 vertex data;
REF pointing to model2 texture;
REF pointing to model2 matrix data;
REF pointing to model2 vertex data;
In the main chain, the visibility of each model is checked and the appropriate sub chain is only called if the model is visible, thus requiring further processing.
Another use of call chains in games programming is in the rendering of animated models in either 2 or 3 dimensions. Consider that the data for an animated model is precompiled and organised in the manner shown in figure 6.
All of the data necessary to render any animation frame for the model is contained within the model data section. Various call chains are configured within the call chain section to call the appropriate model data needed for a specific animation frame. For example, the call chain for frame 0 may call the model data sections 0, 1, 2, 7, 9 and 12; the call chain for frame 6 may call the model data sections 0, 1, 5, 7, 10 and 11. Given that the data is pre-compiled into the correct format, it is thus possible to quickly render a specific animation frame for a model at run time with minimal processing overheads.
This article has illustrated the use of DMATags for the organisation of precompiled data within a computer game application. Pre-compiling and efficiently organising data prior to run time is essential in order to achieve effective application performance.
Much of the information presented here has been gleaned from various post on the Playstation2-linux.com developer forum. The author is grateful to the many contributors to this forum.
Lecturer in Computer Games Technology
University of Abertay Dundee