Writing a good replacement for LibGS

Second part of the article on how to write fast and reliable code for the PlayStation. More after the jump.

Let’s start by setting the general environment to initialize the console; a structure will be useful for the scope:

[code language=”CPP”]typedef struct tagGsEnv
// rendering related structures
DRAWENV Draw_env[2];
DISPENV Disp_env[2];
u32 OTag[2][OT_SIZE]; // sort tables
u32 *pOt; // current OTag pointer
u16 OTag_id; // current OTag index, flip every frame
u8 VSync_rate; // 0 = 60 fps, 2 = 30 fps
u8 Clear_mode; // 0 = clear with rgb, 1 = no clear
s16 Screen_x, Screen_y; // option menu adjustments
u16 Screen_w, Screen_h; // screen size, internal usage
u32 *Gfx_alloc[2]; // packet allocators
u32 *pGfx; // current packet seek
CVECTOR Clear; // clear color

// the actual object, put this somewhere in a .C file and declare it as external in a header
volatile GS_ENV G;[/code]

Now the actual code to populate some of this structure:

[code language=”CPP”]// This function sets up the draw/display environment
// —————————
// Parameters
// x/y: frame buffers starting position in VRAM
// w/h: size of display/draw
// mode: bitflag to determine if we’re using interlacement or sideways frame buffers
void SetDisplay(int x, int y, int w, int h, u32 mode)
int x0, x1, y0, y1;

// copy resolution for later needs
G.Screen_w = w;
G.Screen_h = h;

// interlaced mode
SetDefDrawEnv(&G.Draw_env[0], x, y, w, h);
SetDefDispEnv(&G.Disp_env[0], x, y, w, h);
SetDefDrawEnv(&G.Draw_env[1], x, y, w, h);
SetDefDispEnv(&G.Disp_env[1], x, y, w, h);
G.Disp_env[0].isinter = TRUE;
G.Disp_env[1].isinter = TRUE;
// frame buffers are stored sideways
x0 = x;
x1 = x + w;
y0 = y;
y1 = y;
// otherwise they are placed vertically
x0 = x;
x1 = x;
y0 = y;
y1 = y + h;

// libgpu calls to set up the environment
SetDefDrawEnv(&G.Draw_env[0], x0, y0, w, h);
SetDefDispEnv(&G.Disp_env[0], x0, y1, w, h);
SetDefDrawEnv(&G.Draw_env[1], x1, y1, w, h);
SetDefDispEnv(&G.Disp_env[1], x1, y0, w, h);
// disable interlacement, we don’t need it
G.Disp_env[0].isinter = FALSE;
G.Disp_env[1].isinter = FALSE;

// enable draw on display area
G.Draw_env[0].dfe = G.Draw_env[1].dfe = TRUE;

This function does basically what GS functions do to initialize the frame buffer, with 2-3 calls merged into just one. Also notice how I’m not using a million parameters to set up frame buffer mode. That is because we don’t wanna use more than 4 parameters most of the time; remember older versions of the compiler tend to push anything past parameter 4 into the stack, which we don’t want since it kills performance and produces messy binaries. Limit yourself as much as possible when you create a function prototype or it’s going to look ugly and perform worse.

Let’s move to packet allocators, which correspond to Gfx_alloc in the big structure above. You can fill them depending on your need of primitives, but always remember to make them as big as possible for the task (example: sprites for menu interfaces). Go for malloc3 or even a global variable in your program, it doesn’t matter in the end as you probably won’t ever need to resize them at any point of the program’s life. Some example code of how you would populate the rest of the structure:

[code language=”CPP”]// dynamic allocation
void InitGfxAlloc(int size)

// static allocation
#define GFX_ALLOC_SIZE 15*1024 // 15 KB buffer

static char _gfxAlloc[2][GFX_ALLOC_SIZE];
void InitGfxAlloc(int size)

// set handy pointers for frame buffer swap
void ResetGsEnv()
int Id = G.OTag_id;
// set references for quick access
G.pGfx = G.Gfx_alloc[Id]; // graphics
G.pOt = G.OTag[Id]; // otag
G.pDraw = &G.Draw_env[Id]; // enviroments
G.pDisp = &G.Disp_env[Id];

// deal with frame buffer swap and clear background if it’s necessary
// put this at the beginning of a screen loop
void BeginDraw()

ClearOTagR(G.pOt, OT_SIZE);

pD = G.pDraw;
if (G.Clear_mode == 0)
pD->r0 = G.Clear.r;
pD->g0 = G.Clear.g;
pD->b0 = G.Clear.b;
pD->isbg = TRUE;
else pD->isbg = FALSE;

// draw all linked primitives and perform the actual swap for next frame buffer
// put this at the end of a screen loop
void EndDraw()

// display previous frame
// set current buffer for display

DrawOTag(&G.pOt[OT_SIZE – 1]);

G.OTag_id ^= 1;

// —————————-
// this code goes into a header
// —————————-

// retrieve current primitive pointer
static __inline void *gfxGetPtr()
return G.pGfx;

// update packet allocator
static __inline void gfxSetPtr(void* p)
G.pGfx = (u32*)p;

static __inline u32* GetOTag() { return G.pOt; }
static __inline int GetBufferIndex() { return G.OTag_id; }

static __inline RECT* SysGetTexWin() { return &G.pDraw->tw; }
static __inline u32 SysGetTPage() { return G.pDraw->tpage; }
static __inline RECT* SysGetDisplay() { return &G.pDisp->disp; }
static __inline RECT* SysGetScreen() { return &G.pDisp->screen; }[/code]

If you’re asking why I have two allocators instead of just one, the reason is pretty simple: double buffering. The PlayStation expects you to provide two memory locations to store packet data because the GPU takes a while to send them all on screen. It’s not an operation that takes place immediately, so you need a back buffer to store new primitives while the old ones are getting through the DMA.
So, the first set of functions is what makes packet allocators work and provides an environment for frame buffer swaps, while the second slice it how you would retrieve pointers in order to actually draw and seek forward. Most of those static inline functions aren’t actual calls but code that gets copied as-is into the caller, providing no overheat from real calls while keeping your code slim.

For the code above being used in a real dev case, let’s see how that gets pieced together with another sample:

[code language=”CPP”]#define SCREEN_W 320
#define SCREEN_H 240

// set frame buffers to be placed sideways
G.Clear_mode = 0; // force LibGPU to clear frame buffers at each swap
*(u32*)&G.Clear.r = 0x808000; // set clear color to blueish green
G.VSync_rate = 0; // 60 fps mode
// main loop at the core of the program
while (1)
// set allocators for this frame
// this is where all your logic goes
// the actual DMA draw and swap

That’s literally all the code you need to replace all LibGS calls that usually take care of setting the environment.