View previous topic :: View next topic |
Author |
Message |
Shine
Joined: 03 Dec 2004 Posts: 728 Location: Germany
|
Posted: Thu Dec 14, 2006 11:24 am Post subject: Framebuffer hello world and performance measurement |
|
|
I've written a very unoptimized program, which draws a fullscreen background image and on top of this a moving bar. In 720x480 resolution mode (mode 480i, set with "ps3videomode -v 1") the usable area is 648x432 and with a bar height of 20 pixel, nearly 60 fps are possible. I think when using the SPEs for blitting and more optimized code, good 2D games, like jump-and-run games, with multi layer parallax scrolling, should be no problem.
Code: |
// performance test with VSync IRQ, inspired by the VSync example on the cell add-on CD
//
// compile:
// gcc -I /usr/src/linux-2.6.16-cell-r1/include -lm vsync.c -o vsync
//
// tested on Gentoo, installed with this guide: http://wiki.ps2dev.org/ps3:linux:installing_gentoo
#include <stdio.h>
#include <stdlib.h>
#include <fcntl.h>
#include <stdint.h>
#include <math.h>
#include <sys/ioctl.h>
#include <sys/mman.h>
#include <linux/kd.h>
#include <sys/time.h>
#include <linux/fb.h>
#include <asm/ps3fb.h>
int width, height, memoryWidth, memoryHeight;
uint32_t* background;
void draw(uint32_t* fb)
{
int x, y, yp;
static int t = 0;
int barHeight = 20;
float amplitude = ((float) (height - barHeight)) / 2.0;
float frequency = 40.0;
// blit background
for (x = 0; x < width; x++) {
for (y = 0; y < height; y++) {
fb[y * memoryWidth + x] = background[y * width + x];
}
}
// draw a bar
yp = sin(((float)t) / frequency) * amplitude + amplitude;
for (y = yp; y < yp + barHeight; y++) {
if (y < height && y >= 0) {
for (x = 0; x < width; x++) {
fb[y * memoryWidth + x] = 0xffffff;
}
}
}
t++;
if (t == height) t = 0;
}
void enableCursor(int enable)
{
int fd = open("/dev/console", O_NONBLOCK);
if (fd >= 0) {
ioctl(fd, KDSETMODE, enable ? KD_TEXT : KD_GRAPHICS);
close(fd);
}
}
int main(int argc, char *argv[])
{
int fd;
void *addr;
int length;
struct ps3fb_ioctl_res res;
int x, y;
uint32_t frame = 0;
struct timeval tv;
uint32_t time;
int count;
// switch to graphics mode (disable cursor)
enableCursor(0);
// access framebuffer
fd = open("/dev/fb0", O_RDWR);
ioctl(fd, PS3FB_IOCTL_SCREENINFO, (unsigned long)&res);
printf("xres: %d, yres: %d, xoff: %d, yoff: %d, num_frames: %d\n",
res.xres, res.yres, res.xoff, res.yoff, res.num_frames);
length = res.xres * res.yres * 4 * res.num_frames;
addr = mmap(NULL, length, PROT_WRITE, MAP_SHARED, fd, 0);
// stop flipping in kernel thread with vsync
ioctl(fd, PS3FB_IOCTL_ON, 0);
// create test background image
memoryWidth = res.xres;
memoryHeight = res.yres;
width = res.xres - 2 * res.xoff;
height = res.yres - 2 * res.yoff;
background = malloc(width * height * 4);
for (x = 0; x < width; x++) {
for (y = 0; y < height; y++) {
int c = (11 * x) & 255;
background[y * width + x] = x*y << 3;
}
}
// start timing
gettimeofday(&tv, NULL);
time = tv.tv_sec * 1000000 + tv.tv_usec;
// draw test
count = 300;
for (x = 0; x < count; x++) {
// wait for vsync interrupt */
uint32_t crt = 0;
ioctl(fd, FBIO_WAITFORVSYNC, (unsigned long)&crt);
// draw frame
draw(addr + frame * memoryWidth * 4 * memoryHeight);
// blit and flip with vsync request */
ioctl(fd, PS3FB_IOCTL_FSEL, (unsigned long)&frame);
// switch frame
frame = 1 - frame;
}
// end timing
gettimeofday(&tv, NULL);
time = tv.tv_sec * 1000000 + tv.tv_usec - time;
printf("fps: %d\n", count * 1000000 / time);
free(background);
// start flipping in kernel thread with vsync
ioctl(fd, PS3FB_IOCTL_OFF, 0);
munmap(NULL, length);
// close device
close(fd);
// back to text mode
enableCursor(1);
return 0;
}
|
|
|
Back to top |
|
|
Shine
Joined: 03 Dec 2004 Posts: 728 Location: Germany
|
Posted: Sat Dec 16, 2006 8:24 am Post subject: Re: Framebuffer hello world and performance measurement |
|
|
No need to worry: When compiling with -O2, copying the memory with 64 bit access and reordering the access (line-by-line instead of column-by-column) you can do 10 full screen blits with an additional 50 pixel bar overlay in 13 ms :-) |
|
Back to top |
|
|
mtb
Joined: 19 Oct 2006 Posts: 19 Location: UK/Tokyo
|
Posted: Tue Dec 19, 2006 2:52 am Post subject: Re: Framebuffer hello world and performance measurement |
|
|
Shine wrote: | No need to worry: When compiling with -O2, copying the memory with 64 bit access and reordering the access (line-by-line instead of column-by-column) you can do 10 full screen blits with an additional 50 pixel bar overlay in 13 ms :-) |
Shine, do you have that updated code, would be cool of we could get it tested at a number of resolutions:) |
|
Back to top |
|
|
tweakoz
Joined: 17 Feb 2004 Posts: 21 Location: Santa Cruz, CA
|
Posted: Mon Jan 08, 2007 8:57 pm Post subject: next baby step (spe offloaded computation of screen) |
|
|
got a next baby step working!!!
I have spe oflloaded computation of the screen (configurable from 1 to 6 spe's)
i have basic Blit / DMA working,
unfortunately it requires the PPE to initiate the DMA per screen line per tile
( i just get a hang when attempting to have the SPE initiate the DMA)
with the PPE initiating DMA requests in 1080, that would be 1080 * (1920/128) = 16200 individual dma requests, would be nice if possible to get this down to (1080/128) * (1920/128) = ~ 127 requests.
also note i am procedurally generating the screen (sort of like a simple pixel shader)
at any rate with this baby step i hit these framerates
23 FPS in 1920x1080i (vsync off)
52 FPS in 1280x720p (vsync off)
150 FPS in 720x480i (vsync off)
next baby step im gonna get working is spe initiated DMA
then try smaller tiles...
eventually i would like to turn this into a TBDR (Tile Based Deferred Renderer)
code at
http://www.tweakoz.com/portfolio/spurast1.tar
enjoy.....
michael t. mayers |
|
Back to top |
|
|
tweakoz
Joined: 17 Feb 2004 Posts: 21 Location: Santa Cruz, CA
|
Posted: Mon Jan 08, 2007 9:02 pm Post subject: almost forgot |
|
|
those FPS numbers mentioned last post were with 4 SPU's enabled
michael t. mayers |
|
Back to top |
|
|
tweakoz
Joined: 17 Feb 2004 Posts: 21 Location: Santa Cruz, CA
|
Posted: Mon Jan 08, 2007 9:09 pm Post subject: 1 more thing |
|
|
upgrade to the latest libspe2 (think its 2.01 or 2.02) the code wont work with 2.0
mtm |
|
Back to top |
|
|
tweakoz
Joined: 17 Feb 2004 Posts: 21 Location: Santa Cruz, CA
|
Posted: Tue Jan 09, 2007 6:54 pm Post subject: spu initiated DMA working |
|
|
problem was 64bit addresses vs 32bit addresses .
i was calling a 64bit address mfc function,
im now calling the 32 bit version.
peak blit transfer rate reached so far is > 800MB /sec
at tilesize=128 and nspus = 6 (~ same result for 4 spus)
max fps of just blitting so far at 720x480i is > 1000fps
im pretty sure im ppe limited now,
adding SPU's doesnt help unless i add more work to each spu
i am still single buffering the spe - calc / dma cycle.
next babystep is double buffering....
updated code posted at the same url
mtm. |
|
Back to top |
|
|
tweakoz
Joined: 17 Feb 2004 Posts: 21 Location: Santa Cruz, CA
|
Posted: Tue Jan 09, 2007 7:01 pm Post subject: sorry, 1 error in numbers |
|
|
sorry - need to correct a misquote:
im not actually blitting 720x480, im blitting 512x384 (integer multiples of the tile size (128))
dma blit transfer rate still > 800MB/sec
mtm |
|
Back to top |
|
|
Shine
Joined: 03 Dec 2004 Posts: 728 Location: Germany
|
Posted: Tue Jan 09, 2007 7:07 pm Post subject: Re: spu initiated DMA working |
|
|
tweakoz wrote: | peak blit transfer rate reached so far is > 800MB /sec
at tilesize=128 and nspus = 6 (~ same result for 4 spus)
max fps of just blitting so far at 720x480i is > 1000fps
|
This sounds better, the 150 fps was really too slow, because this is already possible with pure C loops from the main CPU and without DMA.
A library which provides fast blittings with alpha blending would be nice. An idea: the SPE sends jobs to the SPUs, which transfers the background to local memory (in stripes, because of limited memory), then the image to blit, then the SPU performs the alpha blending and finally transfers it back to the framebuffer or other memory region (for double buffering or for blitting to images). Alpha blending should be really fast with SIMD operations. |
|
Back to top |
|
|
tweakoz
Joined: 17 Feb 2004 Posts: 21 Location: Santa Cruz, CA
|
|
Back to top |
|
|
tweakoz
Joined: 17 Feb 2004 Posts: 21 Location: Santa Cruz, CA
|
Posted: Wed Jan 10, 2007 10:35 pm Post subject: new spu blittest tar file up |
|
|
now im getting:the following performance
1080i : >700fps (4.5GB/sec)
720p: >1350 fps (3.7GB/sec)
480i: > 2700 fps (2.3 GB/sec)
this is with 4 spu's and pure blitting (spu localmem -> framebuffer)
(no per pixel computation)
again the tar is at
http://www.tweakoz.com/portfolio/spurast1.tar |
|
Back to top |
|
|
tweakoz
Joined: 17 Feb 2004 Posts: 21 Location: Santa Cruz, CA
|
Posted: Wed Jan 10, 2007 10:42 pm Post subject: |
|
|
i noticed it goes up to these numbers when i use -f on ps3videomode
(fullscreen)
1080i: fps: 801 [6336 MB/Sec]
720p: fps: 1577 [5544 MB/Sec]
480i: fps: 2759 [3637 MB/Sec]
not only is the bandwidth going up, but the fps is going up too...
the non-fullscreen mode is reducing performance for some reason...
mtm |
|
Back to top |
|
|
Arwin
Joined: 12 Jul 2005 Posts: 426
|
Posted: Fri Jan 12, 2007 1:43 am Post subject: |
|
|
I wonder how this works. Is the framebuffer in RSX (GDDR) memory and then directly manipulated by Cell (which in the schematics we've seen was rated at about 4GB/s, wasn't it?)? If so, then you've already seem to have gotten better performance than in the original specs.
Or is the RSX using a framebuffer in XDR memory? |
|
Back to top |
|
|
J.F.
Joined: 22 Feb 2004 Posts: 2906
|
Posted: Fri Jan 12, 2007 5:35 am Post subject: |
|
|
tweakoz wrote: | the non-fullscreen mode is reducing performance for some reason... |
Windowed mode has to go through an extra layer of system functions to provide clipping to the window's layer... just in case the window isn't fully visible. Fullscreen mode doesn't. Most of the time, this amounts to only a minor performance decrease that is absorbed in the rest of the app (game), but you're pushing the edge to try to get absolute speed figures, so naturally you'll see this overhead. |
|
Back to top |
|
|
tweakoz
Joined: 17 Feb 2004 Posts: 21 Location: Santa Cruz, CA
|
Posted: Mon Jan 22, 2007 7:21 pm Post subject: |
|
|
Quote: |
Windowed mode has to go through an extra layer of system functions to provide clipping to the window's layer...
|
hmm - the SPU's DMA controller has no notion of this, it is just DMA'ing from the local store to the framebuffer with no clipping,
although maybe you are speaking of the DMA "from" the framebuffer to the RSX
mtm |
|
Back to top |
|
|
tweakoz
Joined: 17 Feb 2004 Posts: 21 Location: Santa Cruz, CA
|
Posted: Mon Jan 22, 2007 7:30 pm Post subject: new spu blit test build up |
|
|
(getting closer to real world workloads)
new build has following features:
1. basic destination additive blending (read modify write cycle)
2. basic texture mapping with 2D rotated UV space
performance numbers (4 spu's, single 64x64 texture)
1080i: 208fps (1645MB/sec read, 1645MB/sec write)
720i: 435fps (1529MB/sec read, 1529MB/sec write)
480i: 1004fps (1323MB/sec read, 1323MB/sec write)
as usual:
http://www.tweakoz.com/portfolio/spurast1.tar
mtm |
|
Back to top |
|
|
DaveRoyal
Joined: 25 Jun 2005 Posts: 16 Location: Northern California, USA
|
Posted: Tue Jan 23, 2007 4:34 am Post subject: Hello |
|
|
TO,
I just started following your work. I haven't tested your code yet, but hope to this evening.
Just this past weekend, I got frame buffer working, after many trial and errors, wish I had found your work earlier.
I see you're using a main loop counter. My main interest is a gameloop, and I'm currently using the typical while(1)...
And then inside the loop, I got the joystick in nonblock mode, so I test to see if a particular button has been pressed, and exit that way.
I'm hoping to get old-school demos working, from the early days of DOS coding, where they used to use mode 13h.
I've been looking over the articles at flipcode, and hope to come up to speed over the next few weeks where you are, so I can understand the relationship between the processors.
Thanks for your great work, I look forward to more!
Dr. Dave 'Wheels' Royal |
|
Back to top |
|
|
J.F.
Joined: 22 Feb 2004 Posts: 2906
|
Posted: Tue Jan 23, 2007 4:44 am Post subject: |
|
|
tweakoz wrote: | Quote: |
Windowed mode has to go through an extra layer of system functions to provide clipping to the window's layer...
|
hmm - the SPU's DMA controller has no notion of this, it is just DMA'ing from the local store to the framebuffer with no clipping,
although maybe you are speaking of the DMA "from" the framebuffer to the RSX
mtm |
If you're simply setting the DMA to blit the data directly to the framebuffer without any regard to the windows, then yes, there won't be any clipping and what-not. If you're using system functions associated with the window, it will adjust things automatically to take into account the window borders and overlap and such. I guess I should probably look at the code to see exactly what you're doing rather than just guessing. :) |
|
Back to top |
|
|
tweakoz
Joined: 17 Feb 2004 Posts: 21 Location: Santa Cruz, CA
|
Posted: Tue Jan 23, 2007 9:22 pm Post subject: its getting harder and harder |
|
|
its getting harder and harder to squeeze more performance out
of the 64x64 texture additive blending test.
spu-gcc compiler seems to be choking with too much inlining.
(after a point with unrolling/inlining the demo still compiles
but stops working)
current performance:
1080i: fps: 337 [2665 MB/Sec]
720p: fps: 688 [2418 MB/Sec]
480i: fps: 1380 [1819 MB/Sec]
so thats >5GB/sec ( in and out combined ) bandwidth
since there seems to be a lot of overhead in just loops (which i am unrolling to a point) there may be more effective use of bandwidth to be had via multiple texture lookups (higher math/bandwidth ratio) .... |
|
Back to top |
|
|
Warren
Joined: 24 Jan 2004 Posts: 173 Location: San Diego, CA
|
Posted: Wed Mar 14, 2007 9:44 am Post subject: |
|
|
I just downloaded and tried your program tweakoz but it seems to lock up right after it starts the 4th (#3) SPU unit. I'm running YDL with libspe2 installed. |
|
Back to top |
|
|
Minase
Joined: 03 Apr 2005 Posts: 6
|
Posted: Thu Mar 22, 2007 4:56 am Post subject: |
|
|
Any idea where I can get <asm/ps3fb.h> from? (Can't compile your code because it's missing)
It doesn't seem to be in the toolchain tarball from bsc.es.
If it just comes with the FC or Gentoo installs, well, I'm running Debian... :) |
|
Back to top |
|
|
Minase
Joined: 03 Apr 2005 Posts: 6
|
Posted: Thu Mar 22, 2007 5:12 am Post subject: |
|
|
Oh pfft, forgot the obvious, nevermind :)
(For anyone else: apt-get install linux-headers-2.6.16-1-ps3pf, and add -I/usr/src/linux-headers-2.6.16-1-ps3pf/include) |
|
Back to top |
|
|
JuSho
Joined: 06 Dec 2007 Posts: 4 Location: Scottsdale, AZ
|
Posted: Thu Dec 06, 2007 2:43 am Post subject: SDL frontend for native fb access ? |
|
|
I see this is a rather old topic but I haven't seen any update/post on SDL recently. So my question, has there been any work done to use a native fb interface for the SDL video drivers ? (I saw the media lib seems to be on track to allow some X acceleration, but anything low level for fb?)
Sorry to post it here, but this seems a pretty good thread for some fb performance programming. |
|
Back to top |
|
|
IronPeter
Joined: 06 Aug 2007 Posts: 207
|
Posted: Thu Dec 06, 2007 5:49 pm Post subject: |
|
|
Switch to fullscreen mode, map video ram, use SPU DMA scattering.
What do you want beyond this functionality? |
|
Back to top |
|
|
|