forums.ps2dev.org Forum Index forums.ps2dev.org
Homebrew PS2, PSP & PS3 Development Discussions
 
 FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

SPE Media Lib
Goto page 1, 2  Next
 
Post new topic   Reply to topic    forums.ps2dev.org Forum Index -> PS3 Linux Development
View previous topic :: View next topic  
Author Message
unsolo



Joined: 16 Apr 2007
Posts: 155
Location: OSLO Norway

PostPosted: Mon Apr 16, 2007 3:08 am    Post subject: SPE Media Lib Reply with quote

Hello ,

Im a norwegian guy hanging in #PS3Dev and #gentoo-ppc64 on irc.freenode.net

I am nearly finished(it works but is not released) with a colorspace converter YV420p ->ARGB.. (more or less the same as YV12->ARGB)

That runs on a spe at more than 60FPS for 1920x1080.

next logical steps is up/down scaling.. then maybe some extra filtering and decoding..

so heres my plan..

We create a SPU Media Lib project here on PS2Dev.org where we define inputs outputs (make reference project on sourceforge). And standards on locations etc etc of binaries. How do handshake and communicate between all the spe's running then we add subprojects of neccesary spu's as we se fit as the lib increases in size.

All spu code needs to be 64 and 32 bit ul compatible..

So please help me create the project and help me write the code..

Thanks
Kristian
_________________
Don't do it alone.
Back to top
View user's profile Send private message
Oobles
Site Admin


Joined: 17 Jan 2004
Posts: 362
Location: Melbourne, Australia

PostPosted: Mon Apr 16, 2007 8:18 am    Post subject: Reply with quote

Sounds like a good project. If you would like to host the code at the subversion repository here (svn.ps2dev.org), then please send me a private message with the userid/password you would like and I will create an account for you.

The same goes for anyone else with project ideas for the ps3. The very few rules I have for subversion access are listed at:

http://ps2dev.org/Site_Information/Subversion

David. aka Oobles.
Back to top
View user's profile Send private message Visit poster's website Yahoo Messenger MSN Messenger
unsolo



Joined: 16 Apr 2007
Posts: 155
Location: OSLO Norway

PostPosted: Wed Apr 18, 2007 8:02 pm    Post subject: Reply with quote

project is up at
http://wiki.ps2dev.org/ps3:spu-medialib

svn
http://svn.pspdev.org/listing.php?repname=ps3ware&path=%2Ftrunk%2Fspu-medialib%2F&rev=0&sc=0
_________________
Don't do it alone.
Back to top
View user's profile Send private message
mbf



Joined: 18 Aug 2006
Posts: 55

PostPosted: Sat May 05, 2007 12:04 am    Post subject: Reply with quote

very very nice one unsolo :)

Now, in order to benefit from this in the greatest number of applications without having to tweak each one of them for the PS3, I think the best thing would be to implement this in something like SDL or DirectFBor any other similar media layer (ggi? xv via a custom ps3fb based custom X server?). The idea being to provide transparent SPE based hardware acceleration and vsync support for any app using those backends.

What do you guys think?
Back to top
View user's profile Send private message
jimparis



Joined: 10 Jun 2005
Posts: 1179
Location: Boston

PostPosted: Sat May 05, 2007 9:36 am    Post subject: Reply with quote

I think making an SPU-accelerated Xv driver with XvMC could be a good place to do it.
Back to top
View user's profile Send private message
popper



Joined: 15 Jan 2007
Posts: 9

PostPosted: Sat May 05, 2007 12:12 pm    Post subject: Reply with quote

is it all going to be SPU code or will there be Altivec additions too seeing as theres one there currently (mostly)unused ;) until Lu_zero chips in does his magic LOL.
Back to top
View user's profile Send private message
ldesnogu



Joined: 17 Apr 2004
Posts: 95

PostPosted: Sat May 05, 2007 8:09 pm    Post subject: Reply with quote

jimparis wrote:
I think making an SPU-accelerated Xv driver with XvMC could be a good place to do it.

I think too it is the best option:
- SDL has support for xv
- mplayer lib vo can use xv.

However I don't know how easy (or difficult) it is to add xv into the X server (can it be done without touching the server or does it have to be put into it?).
Back to top
View user's profile Send private message
mbf



Joined: 18 Aug 2006
Posts: 55

PostPosted: Sat May 05, 2007 11:46 pm    Post subject: Reply with quote

and MPlayer supports -vo sdl ;)

After posting this yesterday, I did some digging on xv but I hit a problem with vblank sync. Not easily done under X it seems. Anyone got an idea on how to do this properly? Does MPlayer or VLC use vsinc in conjunction with xv and how?
Back to top
View user's profile Send private message
jimparis



Joined: 10 Jun 2005
Posts: 1179
Location: Boston

PostPosted: Sun May 06, 2007 6:16 am    Post subject: Reply with quote

mbf wrote:
After posting this yesterday, I did some digging on xv but I hit a problem with vblank sync. Not easily done under X it seems. Anyone got an idea on how to do this properly? Does MPlayer or VLC use vsinc in conjunction with xv and how?

Some applications (mythtv) have an option to use OpenGL to do the vsync. But from what I can gather, Xv drivers are already supposed to handle vsync internally. For example, if you look at the i810_video.c source in the intel X video driver, it does double buffering inside I810DisplaySurface() and waits for vsync before flipping:
Code:


   /* wait for the last rendered buffer to be flipped in */
    while (((INREG(DOV0STA)&0x00100000)>>20) != pI810Priv->currentBuf) {
      if(loops == 200000) {
        xf86DrvMsg(pScrn->scrnIndex, X_INFO, "Overlay Lockup\n");
        break;
      }
      loops++;
    }

    /* buffer swap */
    if (pI810Priv->currentBuf == 0)
      pI810Priv->currentBuf = 1;
    else
      pI810Priv->currentBuf = 0;

    I810ResetVideo(pScrn);

    I810DisplayVideo(pScrn, surface->id, surface->width, surface->height,
                     surface->pitches[0], x1, y1, x2, y2, &dstBox,
                     src_w, src_h, drw_w, drw_h);

In other words, the application just feeds video data to the Xv driver and it's up to the driver to decide how to best display the video without tearing. Makes sense.

ldesnogu wrote:
However I don't know how easy (or difficult) it is to add xv into the X server (can it be done without touching the server or does it have to be put into it?).
Well, you'll need to write a new Xv-capable display driver, but with the new modular X.org, that no longer involves rebuilding the whole X tree.
Back to top
View user's profile Send private message
jockyw2001



Joined: 29 Sep 2005
Posts: 339

PostPosted: Tue May 08, 2007 12:34 am    Post subject: Reply with quote

@unsolo a.o.
See:
http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/2007-May/028757.html

I can imagine we do something similar with a bunch of interested ps2dev devs.
Back to top
View user's profile Send private message
ralferoo



Joined: 03 Mar 2007
Posts: 122

PostPosted: Tue May 08, 2007 12:48 am    Post subject: ac3/a52 Reply with quote

Also, I noticed on the wiki that a52 development was being looked at. In actual fact, the PS3 implements a regular ALSA driver which supports A52 passthrough, so surround sound from DVDs should work as-is. You don't need to do any decoding, just pass the bitstream straight through.

As part of my python library, I'm looking at writing an SPE sound system, with several goals. Primarily standard sound effects and MP3 decoding for 2-channels but also porting some of liba52 to the SPU (or re-implementing completely) so that I can have spatially located sound effects for those with DTS amps. I'll do my best to keep that part of the library usable from C too!
Back to top
View user's profile Send private message Visit poster's website
digihoe



Joined: 14 May 2005
Posts: 108

PostPosted: Tue May 08, 2007 1:20 am    Post subject: Encoding? Reply with quote

While there has been some demonstration where the SPE decrypts HD AVC, will encryption work as good as well (full speed HD AVC encryption)?

Has anyone seen such demonstrations?

Best regards!
Back to top
View user's profile Send private message
laichung



Joined: 06 May 2005
Posts: 123

PostPosted: Tue May 08, 2007 6:05 pm    Post subject: Reply with quote

The latest ADDOn of CELL now having a document called "Cell Programming Primer", which have a section contain a sample program of rgb2y using SPE (Chapter 3 Basics of SPE Programming).

For those who want to learn more about how to use SPE, check that out and you will find it is really informative.

Cell Programming Primer
Back to top
View user's profile Send private message
mbf



Joined: 18 Aug 2006
Posts: 55

PostPosted: Thu May 10, 2007 4:52 am    Post subject: Reply with quote

nice find laichung :)

@jockyw2001: that was the point of my initial question. Better optimize the lower layers of the OS in order to improve performance for a broader range of applications in one single go. IMHO. However, optimizing MPlayer directly would certainly be more straightforward and fit the needs of most. I'm game for it anyway :)

@ralferoo: do you mean that there is no need to decode AC3/DTS, whatever audio system your PS3 outputs to? So far whith all distros and kernels I tried, the ALSA driver sucked big time for standard stereo output, only cracks and hisses.

@digihoe: that's doable, but it won't work without optimizing MEncoder or x264 specifically for the CellBE.
Back to top
View user's profile Send private message
ralferoo



Joined: 03 Mar 2007
Posts: 122

PostPosted: Thu May 10, 2007 6:58 am    Post subject: Reply with quote

mbf wrote:
@ralferoo: do you mean that there is no need to decode AC3/DTS, whatever audio system your PS3 outputs to? So far whith all distros and kernels I tried, the ALSA driver sucked big time for standard stereo output, only cracks and hisses.
I used to have FC5 installed which worked OK playing WAV files with aplay. I've manually installed a base version of Ubuntu 7.04 which also seems to work fine with both aplay and mpg123.

There is talk on the forums about cracks and hisses, but I haven't heard any evidence of it myself. So far, all my tests have been up to about 3 minutes as that's how long the MP3s I've tried are.

I might be wrong about the DTS passthrough as "aplay -l" doesn't list an iec958 device, although I was pretty sure I read someone had got it working. This also suggests it doesn't work:
Code:
root@ps3:~# aplay -Dspdif ~ralf/test.dts
ALSA lib pcm.c:2145:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.iec958
aplay: main:550: audio open error: No such file or directory

There's still some hope though because it's possible for most amps to recognise a DTS bitstream even without the "None PCM data" option set in the stream. I'll let you know how I get on...
Back to top
View user's profile Send private message Visit poster's website
ralferoo



Joined: 03 Mar 2007
Posts: 122

PostPosted: Sun May 13, 2007 3:11 am    Post subject: Reply with quote

ralferoo wrote:
I might be wrong about the DTS passthrough as "aplay -l" doesn't list an iec958 device ... There's still some hope though because it's possible for most amps to recognise a DTS bitstream even without the "None PCM data" option set in the stream. I'll let you know how I get on...


Well, I've done some digging and DTS pass-through is definitely not supported by the current kernel. [b]However...[b]

In sound/ppc/snd_ps3_reg.h we see lots of internal hardware definitions including
Code:
S/PDIF Audio Output Channel Channel Status Setting Registers.
Configures channel status bit settings for each block (192 bits).
Output is performed from the MSB(AO_SPDCS0 register bit 31).
The same value is added for subframes within the same frame.

Now, this fills me with a lot of hope, as it's precisely this 192-bit block on SPDIF subcode information that's used to signal to the amp that it's not PCM data but AC3/DTS data.

So, whilst the current kernel driver doesn't support this, it's feasible that we could implement this in the future and without requiring a hypervisor fix from Sony.

See http://www.hardwarebook.info/S/PDIF for more about SPDIF if you're interested.
Back to top
View user's profile Send private message Visit poster's website
mbf



Joined: 18 Aug 2006
Posts: 55

PostPosted: Mon Jun 04, 2007 10:31 pm    Post subject: Alignment? Reply with quote

Unsolo, I've started working on a proof of concept clone of ffplay that uses the spu media lib. While browsing through your code, I noticed that you align the memory allocations to 128 bytes boundaries.... For both memalign() and __attribute__ ((aligned(xy))), the alignment value is in bytes, not bits. DMA transfers require you to align to 128 bits (16 bytes) boundaries, so the memalign calls should be changed to memalign(16,xyz).

Edit: looks like "minimum requirement" doesn't mean "best performance". The CBE Architecture reference document states that:
Quote:
For optimal performance of transfers of 128 bytes or more, the source and destination transfer addresses
should be 128-byte aligned (bits 25 through 31 set to 0).

Fair enough!

So basically, the choice depends which one is faster: the DMA transfer or the actual data processing AND how significant is the loss of available memory due to fragmentation (with large alignments).
Back to top
View user's profile Send private message
ldesnogu



Joined: 17 Apr 2004
Posts: 95

PostPosted: Tue Jun 05, 2007 12:12 am    Post subject: Re: Alignment? Reply with quote

mbf wrote:
Edit: looks like "minimum requirement" doesn't mean "best performance". The CBE Architecture reference document states that:
Quote:
For optimal performance of transfers of 128 bytes or more, the source and destination transfer addresses
should be 128-byte aligned (bits 25 through 31 set to 0).

Fair enough!

Yes, that 128 bytes comes from L2 cache line sizes.

Quote:
So basically, the choice depends which one is faster: the DMA transfer or the actual data processing AND how significant is the loss of available memory due to fragmentation (with large alignments).

I *guess* it would be enough to align small dynamically allocated memory chunks to the hardware requirements (which depends on DMA packet size) and big chunks to the fastest requirement (128 bytes).

The rationale is that anyway for small DMA transfers a significant proportion of time is lost in the setup of the transfer, so a few cycles lost probably matters less than the fragmentation of memory.

One should also take care of allocating in the right order to minimize holes of unallocatable memory :)
Back to top
View user's profile Send private message
mbf



Joined: 18 Aug 2006
Posts: 55

PostPosted: Tue Jun 05, 2007 1:15 am    Post subject: Re: Alignment? Reply with quote

Yes, forgot to mention the cache line size.
ldesnogu wrote:

One should also take care of allocating in the right order to minimize holes of unallocatable memory :)

I might think about this, well sometime :P

Question: YUV->RGB conversion then RGB to RGB scaling, or YUV to YUV scaling first then YUV to RGB conversion.... or YUV to RGB and scaling at the same time? Which would be the fastest? YUV scaling first would seem to be the fastest since there is less data to process and scaling the most CPU intensive step, but that's only a guesstimate and I haven't benchmarked it yet.
Back to top
View user's profile Send private message
Pizza67



Joined: 04 Jun 2007
Posts: 3

PostPosted: Tue Jun 05, 2007 2:41 am    Post subject: Re: SPE Media Lib Reply with quote

unsolo wrote:

I am nearly finished(it works but is not released) with a colorspace converter YV420p ->ARGB.. (more or less the same as YV12->ARGB)

That runs on a spe at more than 60FPS for 1920x1080.


Hi, I'm doing some tests on PS3 with your converter but I'm a bit confuse on the way it has to be used.

What I did:

- I dowloaded this file
ftp://ftp.ldv.e-technik.tu-muenchen.de/pub/test_sequences/601/576i25_stockholm_ter.yuv
that is an uncompressed 576i YUV video of 252 frames @25fps.

- I replicated the file 20 times to finally obtain a 5040 frames video

- I modified the number of frames to run through in yuv2rgb.cpp
Code:
int ftot = 5040;


- I ran
Code:
# ./yuv2rgb 576i25_stockholm_ter_x20.yuv 720 576

obtaining about 40 FPS, that is worst than your 60FPS@1920x1080.

The video also plays with many latches.

Maybe I'm missing something, how do you explain these results?
Back to top
View user's profile Send private message
mbf



Joined: 18 Aug 2006
Posts: 55

PostPosted: Tue Jun 05, 2007 10:17 am    Post subject: Reply with quote

Pizza67 wrote:
obtaining about 40 FPS, that is worst than your 60FPS@1920x1080.

It shouldn't be that bad considering that this conversion takes about 20% of the CPU (PPU) time when playing this kind of stuff with MPlayer. Have you tried with MPlayer?
Back to top
View user's profile Send private message
ldesnogu



Joined: 17 Apr 2004
Posts: 95

PostPosted: Tue Jun 05, 2007 5:46 pm    Post subject: Re: SPE Media Lib Reply with quote

Pizza67 wrote:
- I dowloaded this file
ftp://ftp.ldv.e-technik.tu-muenchen.de/pub/test_sequences/601/576i25_stockholm_ter.yuv
that is an uncompressed 576i YUV video of 252 frames @25fps.

- I replicated the file 20 times to finally obtain a 5040 frames video

- I modified the number of frames to run through in yuv2rgb.cpp
Code:
int ftot = 5040;


- I ran
Code:
# ./yuv2rgb 576i25_stockholm_ter_x20.yuv 720 576

obtaining about 40 FPS, that is worst than your 60FPS@1920x1080.

The video also plays with many latches.

Maybe I'm missing something, how do you explain these results?

Well the original file is 153,090 KB x 20 = 3,061,800 KB.
3,061,800 KB / 5040 x 40 = 24,300 KB/s.
You are hard drive speed limited I guess :)
Back to top
View user's profile Send private message
Pizza67



Joined: 04 Jun 2007
Posts: 3

PostPosted: Tue Jun 05, 2007 6:03 pm    Post subject: Reply with quote

mbf wrote:
Pizza67 wrote:
obtaining about 40 FPS, that is worst than your 60FPS@1920x1080.

It shouldn't be that bad considering that this conversion takes about 20% of the CPU (PPU) time when playing this kind of stuff with MPlayer. Have you tried with MPlayer?


Mplayer works fine with high definition MPEG2 streams: I tried 1080i@50FPS.

It plays ok, so a throughput of 40FPS with a 576i video seems really bad in comparison with MPlayer that uses just PPU.

My concern is that it might be a problem of presentation on the ps3fb. I mean, the conversion with SPU should be very fast but the frames swap maybe slows down the execution maybe because of wait for VSync from Hypervisor or something else.

Does it could be an explanation?
Back to top
View user's profile Send private message
ldesnogu



Joined: 17 Apr 2004
Posts: 95

PostPosted: Tue Jun 05, 2007 6:20 pm    Post subject: Reply with quote

Pizza67 wrote:
Mplayer works fine with high definition MPEG2 streams: I tried 1080i@50FPS.

It plays ok, so a throughput of 40FPS with a 576i video seems really bad in comparison with MPlayer that uses just PPU.

My concern is that it might be a problem of presentation on the ps3fb. I mean, the conversion with SPU should be very fast but the frames swap maybe slows down the execution maybe because of wait for VSync from Hypervisor or something else.

Does it could be an explanation?

Read my post just above yours.
Then see how file is read in yuv2rgb, compare this to file reading in Mplayer. See the difference? :)

The file reading in yuv2rgb is primitive and inefficient, it's only here to demonstrate the use of the library.

I don't say this is the only explanation, yours might be part of the problem too. But there surely is a bottleneck in file reading.
Back to top
View user's profile Send private message
Pizza67



Joined: 04 Jun 2007
Posts: 3

PostPosted: Tue Jun 05, 2007 6:55 pm    Post subject: Reply with quote

ldesnogu wrote:
But there surely is a bottleneck in file reading.


I read your post after I posted mine, sorry :)

You're totally right, I forgot to compute the disk throughput. That's definitively the problem.

Mplayer reads a compressed stream so it doesn't reach the disk throughput.

The best way to test the yuv2rgb converter is probably to use always the same frame cached in ram. I think this is done by launching the program without params. ;)

Thanks.
Back to top
View user's profile Send private message
ldesnogu



Joined: 17 Apr 2004
Posts: 95

PostPosted: Tue Jun 05, 2007 6:59 pm    Post subject: Reply with quote

Pizza67 wrote:
Mplayer reads a compressed stream so it doesn't reach the disk throughput.

Mplayer also uses mmap which might be more efficient than using stdc++ iostream. Also reading the file in a different thread may help...
Back to top
View user's profile Send private message
unsolo



Joined: 16 Apr 2007
Posts: 155
Location: OSLO Norway

PostPosted: Wed Jun 06, 2007 4:02 am    Post subject: regarding speed Reply with quote

you can easely achive 60/50 fps however keep in mind that you need double buffered input and output.

it runs at 300FPS 1920x1080 if you load to images into ram and test with only that you will se results.

The yuvscaler will achive from 150->299 FPS depending on your scalefactor.

ps its very important to compile with spu-elf-gcc -O2 -fno-exceptions -g to achive good performance and i suggest spu-elf-gcc-4.1.1 barelona patches or spu-elf-gcc-4.3

hope this helps

unsolo
_________________
Don't do it alone.
Back to top
View user's profile Send private message
unsolo



Joined: 16 Apr 2007
Posts: 155
Location: OSLO Norway

PostPosted: Fri Oct 12, 2007 6:12 am    Post subject: Reply with quote

Ok time to recruit

who wants to help ? go into spu-medialib section in the forums please.

I need more people and i dont mind helping training them in how to think spu.

Basic consept:
Offloading anything to the spe's gives better overall performance so why not do it.

Currently im looking into if its possible to do make xv work.
and theres a working mplayer-vo using spu-medialib
_________________
Don't do it alone.
Back to top
View user's profile Send private message
IronPeter



Joined: 06 Aug 2007
Posts: 207

PostPosted: Fri Oct 12, 2007 4:37 pm    Post subject: Note about DMA transfers. Reply with quote

128 byte alignment for the DMA is optimal in terms of speed.

Not trivial, but you also need that alignmenet in the local storage. DMAs with addresses aligned in the memory but not aligned in the local storage are slow. Probably, each memory line is accessed twice in that case.
Back to top
View user's profile Send private message Send e-mail Visit poster's website
unsolo



Joined: 16 Apr 2007
Posts: 155
Location: OSLO Norway

PostPosted: Thu Oct 25, 2007 1:29 pm    Post subject: Reply with quote

threw an experimental spu Xv driver on SVN.
It does 1080p in X using 1 spu so it looks good but theres lots of TODO's with it.
Expect install guideline within days
_________________
Don't do it alone.
Back to top
View user's profile Send private message
Display posts from previous:   
Post new topic   Reply to topic    forums.ps2dev.org Forum Index -> PS3 Linux Development All times are GMT + 10 Hours
Goto page 1, 2  Next
Page 1 of 2

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © 2001, 2005 phpBB Group