forums.ps2dev.org Forum Index forums.ps2dev.org
Homebrew PS2, PSP & PS3 Development Discussions
 
 FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

Framebuffer hello world and performance measurement

 
Post new topic   Reply to topic    forums.ps2dev.org Forum Index -> PS3 Linux Development
View previous topic :: View next topic  
Author Message
Shine



Joined: 03 Dec 2004
Posts: 728
Location: Germany

PostPosted: Thu Dec 14, 2006 11:24 am    Post subject: Framebuffer hello world and performance measurement Reply with quote

I've written a very unoptimized program, which draws a fullscreen background image and on top of this a moving bar. In 720x480 resolution mode (mode 480i, set with "ps3videomode -v 1") the usable area is 648x432 and with a bar height of 20 pixel, nearly 60 fps are possible. I think when using the SPEs for blitting and more optimized code, good 2D games, like jump-and-run games, with multi layer parallax scrolling, should be no problem.

Code:

// performance test with VSync IRQ, inspired by the VSync example on the cell add-on CD
//
// compile:
// gcc -I /usr/src/linux-2.6.16-cell-r1/include -lm vsync.c -o vsync
//
// tested on Gentoo, installed with this guide: http://wiki.ps2dev.org/ps3:linux:installing_gentoo

#include <stdio.h>
#include <stdlib.h>
#include <fcntl.h>
#include <stdint.h>
#include <math.h>
#include <sys/ioctl.h>
#include <sys/mman.h>
#include <linux/kd.h>
#include <sys/time.h>
#include <linux/fb.h>
#include <asm/ps3fb.h>

int width, height, memoryWidth, memoryHeight;
uint32_t* background;

void draw(uint32_t* fb)
{
   int x, y, yp;
   static int t = 0;
   int barHeight = 20;
   float amplitude = ((float) (height - barHeight)) / 2.0;
   float frequency = 40.0;

   // blit background
   for (x = 0; x < width; x++) {
      for (y = 0; y < height; y++) {
         fb[y * memoryWidth + x] = background[y * width + x];
      }
   }
   
   // draw a bar
   yp = sin(((float)t) / frequency) * amplitude + amplitude;
   for (y = yp; y < yp + barHeight; y++) {
      if (y < height && y >= 0) {
         for (x = 0; x < width; x++) {
            fb[y * memoryWidth + x] = 0xffffff;
         }
      }
   }
   t++;
   if (t == height) t = 0;
      
}

void enableCursor(int enable)
{
   int fd = open("/dev/console", O_NONBLOCK);
   if (fd >= 0) {
      ioctl(fd, KDSETMODE, enable ? KD_TEXT : KD_GRAPHICS);
      close(fd);
   }
}

int main(int argc, char *argv[])
{
   int fd;
   void *addr;
   int length;
   struct ps3fb_ioctl_res res;
   int x, y;
   uint32_t frame = 0;
   struct timeval tv;
   uint32_t time;
   int count;

   // switch to graphics mode (disable cursor)
   enableCursor(0);
   
   // access framebuffer
   fd = open("/dev/fb0", O_RDWR);
   ioctl(fd, PS3FB_IOCTL_SCREENINFO, (unsigned long)&res);
   printf("xres: %d, yres: %d, xoff: %d, yoff: %d, num_frames: %d\n",
      res.xres, res.yres, res.xoff, res.yoff, res.num_frames);
   length = res.xres * res.yres * 4 * res.num_frames;
   addr = mmap(NULL, length, PROT_WRITE, MAP_SHARED, fd, 0);

   // stop flipping in kernel thread with vsync
   ioctl(fd, PS3FB_IOCTL_ON, 0);

   // create test background image
   memoryWidth = res.xres;
   memoryHeight = res.yres;
   width = res.xres - 2 * res.xoff;
   height = res.yres - 2 * res.yoff;
   background = malloc(width * height * 4);
   for (x = 0; x < width; x++) {
      for (y = 0; y < height; y++) {
         int c = (11 * x) & 255;
         background[y * width + x] = x*y << 3;
      }
   }

   // start timing   
   gettimeofday(&tv, NULL);
   time = tv.tv_sec * 1000000 + tv.tv_usec;

   // draw test
   count = 300;
   for (x = 0; x < count; x++) {
      // wait for vsync interrupt */
      uint32_t crt = 0;
      ioctl(fd, FBIO_WAITFORVSYNC, (unsigned long)&crt);
      
      // draw frame
      draw(addr + frame * memoryWidth * 4 * memoryHeight);

      // blit and flip with vsync request */
      ioctl(fd, PS3FB_IOCTL_FSEL, (unsigned long)&frame);
      
      // switch frame
      frame = 1 - frame;
   }

   // end timing   
   gettimeofday(&tv, NULL);
   time = tv.tv_sec * 1000000 + tv.tv_usec - time;
   printf("fps: %d\n", count * 1000000 / time);

   free(background);

   // start flipping in kernel thread with vsync
   ioctl(fd, PS3FB_IOCTL_OFF, 0);
   munmap(NULL, length);

   // close device
   close(fd);
   
   // back to text mode
   enableCursor(1);

   return 0;
}
Back to top
View user's profile Send private message
Shine



Joined: 03 Dec 2004
Posts: 728
Location: Germany

PostPosted: Sat Dec 16, 2006 8:24 am    Post subject: Re: Framebuffer hello world and performance measurement Reply with quote

No need to worry: When compiling with -O2, copying the memory with 64 bit access and reordering the access (line-by-line instead of column-by-column) you can do 10 full screen blits with an additional 50 pixel bar overlay in 13 ms :-)
Back to top
View user's profile Send private message
mtb



Joined: 19 Oct 2006
Posts: 19
Location: UK/Tokyo

PostPosted: Tue Dec 19, 2006 2:52 am    Post subject: Re: Framebuffer hello world and performance measurement Reply with quote

Shine wrote:
No need to worry: When compiling with -O2, copying the memory with 64 bit access and reordering the access (line-by-line instead of column-by-column) you can do 10 full screen blits with an additional 50 pixel bar overlay in 13 ms :-)


Shine, do you have that updated code, would be cool of we could get it tested at a number of resolutions:)
Back to top
View user's profile Send private message
tweakoz



Joined: 17 Feb 2004
Posts: 21
Location: Santa Cruz, CA

PostPosted: Mon Jan 08, 2007 8:57 pm    Post subject: next baby step (spe offloaded computation of screen) Reply with quote

got a next baby step working!!!

I have spe oflloaded computation of the screen (configurable from 1 to 6 spe's)
i have basic Blit / DMA working,
unfortunately it requires the PPE to initiate the DMA per screen line per tile
( i just get a hang when attempting to have the SPE initiate the DMA)
with the PPE initiating DMA requests in 1080, that would be 1080 * (1920/128) = 16200 individual dma requests, would be nice if possible to get this down to (1080/128) * (1920/128) = ~ 127 requests.

also note i am procedurally generating the screen (sort of like a simple pixel shader)

at any rate with this baby step i hit these framerates
23 FPS in 1920x1080i (vsync off)
52 FPS in 1280x720p (vsync off)
150 FPS in 720x480i (vsync off)

next baby step im gonna get working is spe initiated DMA
then try smaller tiles...
eventually i would like to turn this into a TBDR (Tile Based Deferred Renderer)

code at
http://www.tweakoz.com/portfolio/spurast1.tar

enjoy.....

michael t. mayers
Back to top
View user's profile Send private message Send e-mail Visit poster's website
tweakoz



Joined: 17 Feb 2004
Posts: 21
Location: Santa Cruz, CA

PostPosted: Mon Jan 08, 2007 9:02 pm    Post subject: almost forgot Reply with quote

those FPS numbers mentioned last post were with 4 SPU's enabled

michael t. mayers
Back to top
View user's profile Send private message Send e-mail Visit poster's website
tweakoz



Joined: 17 Feb 2004
Posts: 21
Location: Santa Cruz, CA

PostPosted: Mon Jan 08, 2007 9:09 pm    Post subject: 1 more thing Reply with quote

upgrade to the latest libspe2 (think its 2.01 or 2.02) the code wont work with 2.0

mtm
Back to top
View user's profile Send private message Send e-mail Visit poster's website
tweakoz



Joined: 17 Feb 2004
Posts: 21
Location: Santa Cruz, CA

PostPosted: Tue Jan 09, 2007 6:54 pm    Post subject: spu initiated DMA working Reply with quote

problem was 64bit addresses vs 32bit addresses .

i was calling a 64bit address mfc function,
im now calling the 32 bit version.

peak blit transfer rate reached so far is > 800MB /sec
at tilesize=128 and nspus = 6 (~ same result for 4 spus)
max fps of just blitting so far at 720x480i is > 1000fps

im pretty sure im ppe limited now,
adding SPU's doesnt help unless i add more work to each spu

i am still single buffering the spe - calc / dma cycle.
next babystep is double buffering....

updated code posted at the same url

mtm.
Back to top
View user's profile Send private message Send e-mail Visit poster's website
tweakoz



Joined: 17 Feb 2004
Posts: 21
Location: Santa Cruz, CA

PostPosted: Tue Jan 09, 2007 7:01 pm    Post subject: sorry, 1 error in numbers Reply with quote

sorry - need to correct a misquote:

im not actually blitting 720x480, im blitting 512x384 (integer multiples of the tile size (128))

dma blit transfer rate still > 800MB/sec

mtm
Back to top
View user's profile Send private message Send e-mail Visit poster's website
Shine



Joined: 03 Dec 2004
Posts: 728
Location: Germany

PostPosted: Tue Jan 09, 2007 7:07 pm    Post subject: Re: spu initiated DMA working Reply with quote

tweakoz wrote:
peak blit transfer rate reached so far is > 800MB /sec
at tilesize=128 and nspus = 6 (~ same result for 4 spus)
max fps of just blitting so far at 720x480i is > 1000fps


This sounds better, the 150 fps was really too slow, because this is already possible with pure C loops from the main CPU and without DMA.

A library which provides fast blittings with alpha blending would be nice. An idea: the SPE sends jobs to the SPUs, which transfers the background to local memory (in stripes, because of limited memory), then the image to blit, then the SPU performs the alpha blending and finally transfers it back to the framebuffer or other memory region (for double buffering or for blitting to images). Alpha blending should be really fast with SIMD operations.
Back to top
View user's profile Send private message
tweakoz



Joined: 17 Feb 2004
Posts: 21
Location: Santa Cruz, CA

PostPosted: Tue Jan 09, 2007 7:45 pm    Post subject: for fun Reply with quote

just for fun i put up a parallel juliaset -> framebuffer blitter...

http://www.tweakoz.com/portfolio/spurast_julia.tar

mtm
Back to top
View user's profile Send private message Send e-mail Visit poster's website
tweakoz



Joined: 17 Feb 2004
Posts: 21
Location: Santa Cruz, CA

PostPosted: Wed Jan 10, 2007 10:35 pm    Post subject: new spu blittest tar file up Reply with quote

now im getting:the following performance
1080i : >700fps (4.5GB/sec)
720p: >1350 fps (3.7GB/sec)
480i: > 2700 fps (2.3 GB/sec)

this is with 4 spu's and pure blitting (spu localmem -> framebuffer)
(no per pixel computation)

again the tar is at
http://www.tweakoz.com/portfolio/spurast1.tar
Back to top
View user's profile Send private message Send e-mail Visit poster's website
tweakoz



Joined: 17 Feb 2004
Posts: 21
Location: Santa Cruz, CA

PostPosted: Wed Jan 10, 2007 10:42 pm    Post subject: Reply with quote

i noticed it goes up to these numbers when i use -f on ps3videomode
(fullscreen)

1080i: fps: 801 [6336 MB/Sec]
720p: fps: 1577 [5544 MB/Sec]
480i: fps: 2759 [3637 MB/Sec]

not only is the bandwidth going up, but the fps is going up too...
the non-fullscreen mode is reducing performance for some reason...

mtm
Back to top
View user's profile Send private message Send e-mail Visit poster's website
Arwin



Joined: 12 Jul 2005
Posts: 426

PostPosted: Fri Jan 12, 2007 1:43 am    Post subject: Reply with quote

I wonder how this works. Is the framebuffer in RSX (GDDR) memory and then directly manipulated by Cell (which in the schematics we've seen was rated at about 4GB/s, wasn't it?)? If so, then you've already seem to have gotten better performance than in the original specs.

Or is the RSX using a framebuffer in XDR memory?
Back to top
View user's profile Send private message
J.F.



Joined: 22 Feb 2004
Posts: 2906

PostPosted: Fri Jan 12, 2007 5:35 am    Post subject: Reply with quote

tweakoz wrote:
the non-fullscreen mode is reducing performance for some reason...


Windowed mode has to go through an extra layer of system functions to provide clipping to the window's layer... just in case the window isn't fully visible. Fullscreen mode doesn't. Most of the time, this amounts to only a minor performance decrease that is absorbed in the rest of the app (game), but you're pushing the edge to try to get absolute speed figures, so naturally you'll see this overhead.
Back to top
View user's profile Send private message AIM Address
tweakoz



Joined: 17 Feb 2004
Posts: 21
Location: Santa Cruz, CA

PostPosted: Mon Jan 22, 2007 7:21 pm    Post subject: Reply with quote

Quote:

Windowed mode has to go through an extra layer of system functions to provide clipping to the window's layer...


hmm - the SPU's DMA controller has no notion of this, it is just DMA'ing from the local store to the framebuffer with no clipping,
although maybe you are speaking of the DMA "from" the framebuffer to the RSX

mtm
Back to top
View user's profile Send private message Send e-mail Visit poster's website
tweakoz



Joined: 17 Feb 2004
Posts: 21
Location: Santa Cruz, CA

PostPosted: Mon Jan 22, 2007 7:30 pm    Post subject: new spu blit test build up Reply with quote

(getting closer to real world workloads)

new build has following features:

1. basic destination additive blending (read modify write cycle)
2. basic texture mapping with 2D rotated UV space

performance numbers (4 spu's, single 64x64 texture)
1080i: 208fps (1645MB/sec read, 1645MB/sec write)
720i: 435fps (1529MB/sec read, 1529MB/sec write)
480i: 1004fps (1323MB/sec read, 1323MB/sec write)

as usual:
http://www.tweakoz.com/portfolio/spurast1.tar

mtm
Back to top
View user's profile Send private message Send e-mail Visit poster's website
DaveRoyal



Joined: 25 Jun 2005
Posts: 16
Location: Northern California, USA

PostPosted: Tue Jan 23, 2007 4:34 am    Post subject: Hello Reply with quote

TO,

I just started following your work. I haven't tested your code yet, but hope to this evening.

Just this past weekend, I got frame buffer working, after many trial and errors, wish I had found your work earlier.

I see you're using a main loop counter. My main interest is a gameloop, and I'm currently using the typical while(1)...

And then inside the loop, I got the joystick in nonblock mode, so I test to see if a particular button has been pressed, and exit that way.

I'm hoping to get old-school demos working, from the early days of DOS coding, where they used to use mode 13h.

I've been looking over the articles at flipcode, and hope to come up to speed over the next few weeks where you are, so I can understand the relationship between the processors.

Thanks for your great work, I look forward to more!


Dr. Dave 'Wheels' Royal
Back to top
View user's profile Send private message
J.F.



Joined: 22 Feb 2004
Posts: 2906

PostPosted: Tue Jan 23, 2007 4:44 am    Post subject: Reply with quote

tweakoz wrote:
Quote:

Windowed mode has to go through an extra layer of system functions to provide clipping to the window's layer...


hmm - the SPU's DMA controller has no notion of this, it is just DMA'ing from the local store to the framebuffer with no clipping,
although maybe you are speaking of the DMA "from" the framebuffer to the RSX

mtm


If you're simply setting the DMA to blit the data directly to the framebuffer without any regard to the windows, then yes, there won't be any clipping and what-not. If you're using system functions associated with the window, it will adjust things automatically to take into account the window borders and overlap and such. I guess I should probably look at the code to see exactly what you're doing rather than just guessing. :)
Back to top
View user's profile Send private message AIM Address
tweakoz



Joined: 17 Feb 2004
Posts: 21
Location: Santa Cruz, CA

PostPosted: Tue Jan 23, 2007 9:22 pm    Post subject: its getting harder and harder Reply with quote

its getting harder and harder to squeeze more performance out
of the 64x64 texture additive blending test.

spu-gcc compiler seems to be choking with too much inlining.
(after a point with unrolling/inlining the demo still compiles
but stops working)

current performance:
1080i: fps: 337 [2665 MB/Sec]
720p: fps: 688 [2418 MB/Sec]
480i: fps: 1380 [1819 MB/Sec]

so thats >5GB/sec ( in and out combined ) bandwidth

since there seems to be a lot of overhead in just loops (which i am unrolling to a point) there may be more effective use of bandwidth to be had via multiple texture lookups (higher math/bandwidth ratio) ....
Back to top
View user's profile Send private message Send e-mail Visit poster's website
Warren



Joined: 24 Jan 2004
Posts: 173
Location: San Diego, CA

PostPosted: Wed Mar 14, 2007 9:44 am    Post subject: Reply with quote

I just downloaded and tried your program tweakoz but it seems to lock up right after it starts the 4th (#3) SPU unit. I'm running YDL with libspe2 installed.
Back to top
View user's profile Send private message
Minase



Joined: 03 Apr 2005
Posts: 6

PostPosted: Thu Mar 22, 2007 4:56 am    Post subject: Reply with quote

Any idea where I can get <asm/ps3fb.h> from? (Can't compile your code because it's missing)

It doesn't seem to be in the toolchain tarball from bsc.es.
If it just comes with the FC or Gentoo installs, well, I'm running Debian... :)
Back to top
View user's profile Send private message
Minase



Joined: 03 Apr 2005
Posts: 6

PostPosted: Thu Mar 22, 2007 5:12 am    Post subject: Reply with quote

Oh pfft, forgot the obvious, nevermind :)

(For anyone else: apt-get install linux-headers-2.6.16-1-ps3pf, and add -I/usr/src/linux-headers-2.6.16-1-ps3pf/include)
Back to top
View user's profile Send private message
JuSho



Joined: 06 Dec 2007
Posts: 4
Location: Scottsdale, AZ

PostPosted: Thu Dec 06, 2007 2:43 am    Post subject: SDL frontend for native fb access ? Reply with quote

I see this is a rather old topic but I haven't seen any update/post on SDL recently. So my question, has there been any work done to use a native fb interface for the SDL video drivers ? (I saw the media lib seems to be on track to allow some X acceleration, but anything low level for fb?)

Sorry to post it here, but this seems a pretty good thread for some fb performance programming.
Back to top
View user's profile Send private message
IronPeter



Joined: 06 Aug 2007
Posts: 207

PostPosted: Thu Dec 06, 2007 5:49 pm    Post subject: Reply with quote

Switch to fullscreen mode, map video ram, use SPU DMA scattering.

What do you want beyond this functionality?
Back to top
View user's profile Send private message Send e-mail Visit poster's website
Display posts from previous:   
Post new topic   Reply to topic    forums.ps2dev.org Forum Index -> PS3 Linux Development All times are GMT + 10 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © 2001, 2005 phpBB Group