forums.ps2dev.org Forum Index forums.ps2dev.org
Homebrew PS2, PSP & PS3 Development Discussions
 
 FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

VFPU instruction

 
Post new topic   Reply to topic    forums.ps2dev.org Forum Index -> PSP Development
View previous topic :: View next topic  
Author Message
Criptych



Joined: 12 Sep 2009
Posts: 79

PostPosted: Sun Jun 13, 2010 7:18 am    Post subject: VFPU instruction Reply with quote

While Googling to find out how to add the 'vuc2i' instruction to gas (since it's apparently not in there yet) I found a reference to a 'vcmmul' instruction. Anyone have more info on this instruction? From the usage it looks like just a backward 'vmmul', but there must be something different about it to warrant having a separate instruction, right?
_________________
PSP-2000 // CFW: 5.50 GEN-D2 ...and not upgrading until OFW supports homebrew!
(But I did downgrade to 1.50 with TimeMachine...)
"I want you to tell me how the machine makes you feel."


Last edited by Criptych on Sun Jun 20, 2010 9:35 am; edited 1 time in total
Back to top
View user's profile Send private message
hlide



Joined: 10 Sep 2006
Posts: 750

PostPosted: Sun Jun 13, 2010 7:50 pm    Post subject: Reply with quote

this is just vmmul. vcmmul and vrmmul where c is for column and r for row. the order of register are just swapped/transposed by the assembler to output it as a vmmul.
Back to top
View user's profile Send private message
Criptych



Joined: 12 Sep 2009
Posts: 79

PostPosted: Mon Jun 14, 2010 1:03 am    Post subject: Reply with quote

hlide wrote:
this is just vmmul. vcmmul and vrmmul where c is for column and r for row. the order of register are just swapped/transposed by the assembler to output it as a vmmul.

So it's an assembler shortcut, like 'u[ls]v'? Does that mean 'vrmmul' is identical to a regular 'vmmul'?
_________________
PSP-2000 // CFW: 5.50 GEN-D2 ...and not upgrading until OFW supports homebrew!
(But I did downgrade to 1.50 with TimeMachine...)
"I want you to tell me how the machine makes you feel."
Back to top
View user's profile Send private message
hlide



Joined: 10 Sep 2006
Posts: 750

PostPosted: Mon Jun 14, 2010 6:01 am    Post subject: Reply with quote

vmmul.q M000, M100, M200
<==>
vrmmul.q M000, M100, M200
<==>
vcmmul.q E000, E200, E100
Back to top
View user's profile Send private message
Criptych



Joined: 12 Sep 2009
Posts: 79

PostPosted: Mon Jun 14, 2010 11:28 am    Post subject: Reply with quote

Got it, thanks. Learn something new every day... :)
_________________
PSP-2000 // CFW: 5.50 GEN-D2 ...and not upgrading until OFW supports homebrew!
(But I did downgrade to 1.50 with TimeMachine...)
"I want you to tell me how the machine makes you feel."
Back to top
View user's profile Send private message
Criptych



Joined: 12 Sep 2009
Posts: 79

PostPosted: Sun Jun 20, 2010 9:39 am    Post subject: Reply with quote

While on the topic of the VFPU, can you explain the condition codes for the "f1" format of vcmp (in the list here)? I figure that EZ is "equal zero" and NZ is "not zero," but the others have me confused.

EDIT: Well, I've found that EN/NN test for NaNs and EI/NI for infinities, but ES/NS don't check the Sign as I first thought - in fact everything (non-NaN/infinity) I've tried it with so far gives me CC=0. Does that mean it tests for "Special" values?
_________________
PSP-2000 // CFW: 5.50 GEN-D2 ...and not upgrading until OFW supports homebrew!
(But I did downgrade to 1.50 with TimeMachine...)
"I want you to tell me how the machine makes you feel."
Back to top
View user's profile Send private message
hlide



Joined: 10 Sep 2006
Posts: 750

PostPosted: Tue Jun 22, 2010 12:35 am    Post subject: Reply with quote

Ex = Equals to
Nx = Not equals to

xI = Infinity
xN = NaN
xS = Special, that is Infinity or NaN

So : ES == EI|EN and NS == NI|NN
Back to top
View user's profile Send private message
Criptych



Joined: 12 Sep 2009
Posts: 79

PostPosted: Tue Jun 22, 2010 12:58 am    Post subject: Reply with quote

Thanks for confirming that.

The C equivalent of what I'm trying to write is this:
Code:
if(x < 0) r = (y < 0) ? (-PI - r) : (PI - r);

What I have in assembly (r is in s000, x & y in c010):
Code:
   vzero.p   c012
   vcst.s   s001, VFPU_PI
   vcmp.p   LT, c010, c012
   vcmovt.s   s001, s001[-x], 0
   vsub.s   s001, s001, s000
   vcmovt.s   s000, s001, 1


Can you suggest anything to improve it?
_________________
PSP-2000 // CFW: 5.50 GEN-D2 ...and not upgrading until OFW supports homebrew!
(But I did downgrade to 1.50 with TimeMachine...)
"I want you to tell me how the machine makes you feel."
Back to top
View user's profile Send private message
hlide



Joined: 10 Sep 2006
Posts: 750

PostPosted: Tue Jun 22, 2010 2:10 am    Post subject: Reply with quote

I'm going to depart on the road so you'll probably need to wait for tomorrow. Can i suggest you to post the "else" part of you "if (x < 0)" ?

branching costs a lot and if i know what happens when x >= 0, i may find a better way globally.
Back to top
View user's profile Send private message
Criptych



Joined: 12 Sep 2009
Posts: 79

PostPosted: Tue Jun 22, 2010 6:46 am    Post subject: Reply with quote

hlide wrote:
I'm going to depart on the road so you'll probably need to wait for tomorrow. Can i suggest you to post the "else" part of you "if (x < 0)" ?

branching costs a lot and if i know what happens when x >= 0, i may find a better way globally.

There is no "else," actually. The whole function (an implementation of atan2) would look something like this in C:
Code:
float fast_atan2(float y, float x)
{
   float r = asinf(y/hypotf(x, y));
   if(x < 0) r = (y < 0) ? (-PI - r) : (PI - r);
   return r;
}

My original version had the first line in assembly and the rest in C, because I was just starting with using the VFPU, but I want to rewrite it so everything is done on one (co)processor instead of going back and forth between VFPU and FPU.

The whole assembly version is this:
Code:

fast_atan2:
    mtv     $a1, s010
    mtv     $a0, s011
    vcst.s  s003, VFPU_PI_2
    vdot.p  s000, c010, c010
    vrsq.s  s000, s000
    vmul.s  s000, s011, s000
    vasin.s s000, s000
    vmul.s  s000, s000, s003
    vzero.p     c012
    vcst.s      s001, VFPU_PI
    vcmp.p      LT, c010, c012
    vcmovt.s    s001, s001[-x], 0
    vsub.s      s001, s001, s000
    vcmovt.s    s000, s001, 1
    j       $ra
    mfv     $v0, s000


I know: "make it work, then make it work fast"; but I'm doing this for practice as much as to write something useful.
_________________
PSP-2000 // CFW: 5.50 GEN-D2 ...and not upgrading until OFW supports homebrew!
(But I did downgrade to 1.50 with TimeMachine...)
"I want you to tell me how the machine makes you feel."
Back to top
View user's profile Send private message
hlide



Joined: 10 Sep 2006
Posts: 750

PostPosted: Tue Jun 22, 2010 11:20 am    Post subject: Reply with quote

Yours:
Code:

fast_atan2:
    mtv      $a0, s010           //  1:1(3)
    mtv      $a1, s011           //  2:1(3)
    vcst.s   s003, VFPU_PI_2     //  3:1(3)
    vdot.p   s000, c010, c010    //  5:1(7) *STALLING because mtv needs 3 cycles to be completed*
    vrsq.s   s000, s000          // 12:1(7) *STALLING because vdot.p needs 7 cycles to be completed*
    vmul.s   s000, s011, s000    // 19:1(5) *STALLING because vrsq.s needs 7 cycles to be completed*
    vasin.s  s000, s000          // 24:1(7) *STALLING because vmul.s needs 5 cycles to be completed*
    vmul.s   s000, s000, s003    // 31:1(7) *STALLING because vmul.s needs 5 cycles to be completed*
    vzero.p  c012                // 32:1(3)
    vcst.s   s001, VFPU_PI       // 33:1(3)
    vcmp.p   LT, c010, c012      // 35:1(3) *STALLING because vzero.p needs 3 cycles to be completed*
    vcmovt.s s001, s001[-x], 0   // 42:1+1(5) *STALLING because you need 7 instructions before using vcmp.p result*
    vsub.s   s001, s001, s000    // 48:1(3) *STALLING because vcmovt.s needs 5 cycles to be completed*
    vcmovt.s s000, s001, 1       // 51:1(5) *STALLING because vsub.s needs 3 cycles to be completed*
    j        $ra                 // 52:1
    mfv      $v0, s000           // 56:7    *STALLING because vcmovt.s needs 5 cycles to be completed*


mfv is 7 cycles no matter what the the following instruction is, so your function is taking around 63 cycles

(untested):

reordering some instructions in yours :
Code:

fast_atan2:
    mtv      $a0, s010           //  1:1(3)
    mtv      $a1, s011           //  2:1(3)
    vdot.p   s000, c010, c010    //  5:1(7) *STALLING because mtv needs 3 cycles to be completed*
    vcmp.p   LT, c010, c010[0,0] //  6:1+1(3)
    vcst.s   s003, VFPU_PI_2     //  7:1(3)
    vcst.s   s001, VFPU_PI       //  8:1(3)
    vrsq.s   s000, s000          // 12:1(7) *STALLING because vdot.p needs 7 cycles to be completed*
    vmul.s   s000, s011, s000    // 19:1(5) *STALLING because vrsq.s needs 7 cycles to be completed*
    vcmovt.s s001, s001[-x], 0   // 20:1+1(5)
    vasin.s  s000, s000          // 24:1(7) *STALLING because vmul.s needs 5 cycles to be completed*
    vmul.s   s000, s000, s003    // 31:1(7) *STALLING because vmul.s needs 5 cycles to be completed*
    vsub.s   s001, s001, s000    // 38:1(3) *STALLING because vcmovt.s needs 5 cycles to be completed*
    vcmovt.s s000, s001, 1       // 41:1(5) *STALLING because vsub.s needs 3 cycles to be completed*
    j        $ra                 // 42:1
    mfv      $v0, s000           // 46:7    *STALLING because vcmovt.s needs 5 cycles to be completed*


should be around 53 cycles

or
Code:

fast_atan2:
    mtv     $a0, s010              //  1:1(3)
    mtv     $a1, s011              //  2:1(3)
    vcst.s  s001, VFPU_PI          //  3:1(3)
    vcst.s  s002, VFPU_PI_2        //  4:1(3)
    vslt.p  c020, c010, c020[0,0]  //  5:1+1(3) // (x < 0 ? 1.0 : 0.0, y < 0 ? 1.0 : 0.0)
    vdot.p  s000, c010, c010       //  7:1(7)
    vsge.p  c022, c010, c022[0,0]  //  9:1+1(3) // (x >= 0 ? 1.0 : 0.0, y >= 0 ? 1.0 : 0.0)
    vmul.s  s001, s020, s001       // 10:1(5)   // PI * (x < 0 ? 1.0 : 0.0)
    vsub.p  c012, c022, c020       // 11:1(3)   // (x < 0 ? -1.0 : 1.0, y < 0 ? -1.0 : 1.0)
    vrsq.s  s000, s000             // 14:1(7) *STALLING*
    vmul.p  c002, c012, c002       // 15:1(5)   // (x < 0 ? -PI/2 : +PI/2, PI * (x < 0 ? 1.0 : 0.0) * (y < 0 ? -1.0 : 1.0))
    vasin.s s000, s000             // 21:1(7) *STALLING*
    //vmul.s  s000, s000, s002       //
    //vadd.s  s000, s000, s003
    vdot.p  s000, c002, c000[x, 1] // 28:1+1(7) *STALLING*
    j       $ra                    // 30:1
    mfv     $v0, s000              // 36:7 *STALLING*


should be around 43 cycles
Back to top
View user's profile Send private message
Criptych



Joined: 12 Sep 2009
Posts: 79

PostPosted: Wed Jun 23, 2010 7:32 am    Post subject: Reply with quote

*facepalm* Okay, I'm still getting used to accounting for the pipeline, but I didn't even consider vslt or some of your other alternatives; I can tell I've got a lot to learn about this. Thanks for your help. :-)
_________________
PSP-2000 // CFW: 5.50 GEN-D2 ...and not upgrading until OFW supports homebrew!
(But I did downgrade to 1.50 with TimeMachine...)
"I want you to tell me how the machine makes you feel."
Back to top
View user's profile Send private message
Display posts from previous:   
Post new topic   Reply to topic    forums.ps2dev.org Forum Index -> PSP Development All times are GMT + 10 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © 2001, 2005 phpBB Group