• 12
• 12
• 9
• 10
• 13
• ### Similar Content

• Greetings,
I am looking for team members to potentially collaborate on the development of various game projects as well as assistant applications potentially, tabletop games for example.  At the moment, I am doing the entire production on my own, which as a result is incredibly slow.  Any contributions will of course be credited, and as far as experience or skills, if you're confident that you can accomplish the tasks, then I'm more than willing to allow you to try.
The biggest need at the moment is some art skills.  I 'can' draw, but not well, which means that if I'm going for positive asthetics, that it's going to take all year.  In my 2D games, and 3D games, art has been the one hold up.  I'm currently trying to work around the art issue, using placeholders and the likes, but the result is that no matter how far I take the game in concept, it's still lightyears from completion.  The more I accomplish, the more art assets will be needed to utilize it.  I intend to work on my own skills still in this department, but that being said, people who just want to get their art into a product, or people who want to expand their portfolio, are more than welcome to take over the production of art assets.  If you only have experience in pixel art, that's fine, I have a pixel art project on-going.  If you only 3D model, that's fine too, I've had some success in concept art and the likes, and helpful friends as well. Are you more the writing type?  Me too, we can bounce ideas back and forth, help solidify the storyline and concepts as we go on into the development process. Business minded?  I'd love to learn more by seeing how you work.
I will say that, while I am working to advance my skills in all facets of game development, though my primary focus is programming, that being said I will always welcome a comrade, or ally.  Your position as a team member will not be nullified if I become able to fulfill the role.  The fact of the matter is, a team can accomplish more.  I do work a LOT on these projects, but I do understand that if you are joining this team, you aren't doing so for the wealth, meaning you likely have responsibilities elsewhere.  So, do not hesitate to contact me. If you are a beginner, looking to learn by practice, then you are welcome to come as well. We will utilize the best suited works for any development done, but it will always be merit based, meaning that whether you're a beginner that just joined, or me, if yours is more suited to the situation, yours will be used, and you will be credited for it.  Students, hobbyists, or professionals, all welcome.  If you're a professional though, I'm going to wonder why you are joining, but you are still welcome to join!
Samples are always welcome, but if you don't have any, or don't know what to submit to the diversity of my product description, then just contact me, elaborate on what you do, and I'll give you a subject.  One that will not be used unless you join the team, of that you have my word.
Matthew Suttles,
Seik Luceid#9656 on Discord, luceid.dezeir on Skype, or MatthewSuttles@Gmail.com
You can also respond to this thread though response time may be slower.

• I am looking for talents to form a team of making a strategy base action game. Talents I am currently looking for are : -
(I) Unity programmer (mobile)
(II) Game designer
(III) 3d Artist
(IV) SFX Artist
The attachment is some game concept for the game. All the concept will be turn into 3d or card form. The game will be strategy game where the players can form their own team and control the units in the battle field real time to fight against each others.  If you are interested to know more details please pm me or send an email to damnwing0405@gmail.com

• By bsudheer
Leap Leap Leap! is a fast-paced, endless running game where you leap from rooftop to rooftop in a computer simulated world.

This is a free run game and get excited by this fabulous computer simulated world of skyscrapers and surreal colors in parallax effect. On your way, collect cubes and revival points as many as you can to make a long run.

Features of Leap Leap Leap:
-Option of two themes: Black or White.
-Simple one touch gameplay.
-Attractive art.
-Effective use of parallax.
Appstore: https://itunes.apple.com/us/app/leap-leap-leap/id683764406?mt=8

• By BillyGD

Play Flick Football 3D @ https://gamejolt.com/games/flickfootball3d/326078
Check out our Facebook page @ https://www.facebook.com/FlickFootball3D/
Flick Football 3D is a turn based football game inspired by the table top classic 'Subbuteo'.
The game is currently in very early Alpha development. There is still a lot to be done before the first proper release but I have decided to release this playable version to get as much feedback as possible.
The only game mode currently available in this release is the 'Practice Mode' which gives you control of both teams. Either play against yourself to get used to how the game works or play against friends and family on the same computer!
Planned Future Features Include:
-Take control of your own custom team in the single player campaign.
-Play in online leagues and tournaments against other players in the multiplayer mode.
-Fully customisable stadiums to make you stand out from the rest of the players.
-Improve your players stats and skills by playing matches and setting up training sessions.
Flick Football 3D is available for Windows, Mac and Browser.
Thank you for viewing my game, all feedback is greatly appreciated. I can be contacted at; BillyGDev@outlook.com
'Flick Football 3D' is also the development name for the game and I haven't yet decided what the full release will be called, so if you have any ideas please drop me a message!
• By drcrack

It is a combination of fundamental RPG elements and challenging, session-based MOBA elements. Having features such as creating your unique build, customizing your outfit and preparing synergic team compositions with friends, players can brave dangerous adventures or merciless arena fights against deadly creatures and skilled players alike.

This time with no grinding and no pay to win features.

We're still looking for:
1) 3D Character Artist
2) 3D Environment Artist
3) Animator
4) Sound Designer
5) VFX Artist

Discord https://discord.gg/zXpY29V or drcrack#4575

# Unity Branchless math ops (like fsel)

This topic is 3812 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

## Recommended Posts

Hey all, I'm working on some branchless math ops. When using SSE, selecting a vector based on a mask is easy.
_mm_or_ps( _mm_andnot_ps( maskvect, zerovect ), _mm_and_ps( maskvect, nonzerovect) );


Now, I would like to do the same with scalar floats. It is possible on PowerPC with fsel, but I am trying to do the same on x86. I have seen the MOVC/FMOVC instructions, but I do not see any instrinsics exposing them (and I'd like to avoid x86 asm for now). (Using VS2005.) I am trying to use C++ to generate the best branchless ASM possible. It seems that SSE has a scalar instruction set. But they operate on 128-bit vectors. My questions are: (1) Is it an expensive operation to transfer between SSE vector registers and the x87 floating point stack, in the first place? This assumption is the reason for my goal. It seems like the vector <-> scalar transfers might not be as expensive on x86 as it is on other certain PowerPC+Altivec architectures. (I remember, in fact, that one revision of SSE actually shared its XMM registers with the x87 stack.) (2) (I've wondered this for a while): Why are there scalar SSE instructions that operate on only one component of a 128-bit vector? Does it save processing time? (3) What are your suggestions? The naive implementation of fsel simply using C produces the following in an optimized build:
__declspec(noinline) float MyFSel( float mask, float a, float b )
{
return (mask >= 0.0f ? a : b);
004010E0  fldz
004010E2  fcomp       dword ptr [esp+4]
004010E6  fnstsw      ax
004010E8  test        ah,41h
004010EB  jp          MyFSel+1Ah (4010FAh)
004010ED  fld         dword ptr [esp+8]
004010F1  fstp        dword ptr [esp+4]
004010F5  fld         dword ptr [esp+4]
//}


I realize the sign bit is the leftmost bit in IEEE754, and I'm thinking of ways to use that, but I'm not sure what's the best way on x86. Thanks. (I have browsed GDnet threads one and two before posting). [Edited by - discman1028 on October 10, 2007 6:28:16 PM]

##### Share on other sites
Something like this seems to be branchless, but isn't correct: if mask == -0.0f (0x80000000), it should select a, not b.

But I am more interested in knowing if fetches/stores to "dword ptr [esp+14h]" are expensive. If I see disasm with "dword ptr", is my calculation not happening completely in registers? (I know most of what is shown is just non-inlined-function pre- and post- work, but I'm wondering in general.)

__declspec(noinline) float MyFSel( float mask, float a, float b ){	float fArray[2] = { a, b };	return fArray[ *((u32*)(&mask)) >> 31 ];}// Disasm below...__declspec(noinline) float MyFSel( float mask, float a, float b ){004010E0  sub         esp,8 	float fArray[2] = { a, b };004010E3  fld         dword ptr [esp+10h] 	return fArray[ *((u32*)(&mask)) >> 31 ];004010E7  mov         eax,dword ptr [esp+0Ch] 004010EB  fstp        dword ptr [esp] 004010EE  shr         eax,1Fh 004010F1  fld         dword ptr [esp+14h] 004010F5  fstp        dword ptr [esp+4] 004010F9  fld         dword ptr [esp+eax*4] }004010FC  add         esp,8 004010FF  ret

EDIT: As a final note, I don't really need to emulate fsel. If I can get a boolean comparison to return an int without a branch, that would be nice too:

__declspec(noinline) int MyGTE(float a, float b){	return (int)(a >= b);004010F0  fld         dword ptr [esp+4] 004010F4  fld         dword ptr [esp+8] 004010F8  fcompp           004010FA  fnstsw      ax   004010FC  test        ah,41h 004010FF  jp          MyGTE+17h (401107h) // argh00401101  mov         eax,1 }

[Edited by - discman1028 on October 10, 2007 7:25:34 PM]

##### Share on other sites
Today must be assembly day or something on Gamedev. I think someone else started a thread on getting the compiler to generate cmov and wasn't able to either.

1) I haven't done anything with SSE in ages, but IIRC you have to go through memory, just like PowerPC. It won't kill you, but it may be preferable to a hard to predict branch if you really can't get the compiler to generate fcmov.

2) I have no idea. It doesn't save any time on a sensible vector unit.

3) If you need cmov or fcmov and can't get the compiler to generate it for you, do it yourself with some inline assembly and hope the compiler doesn't do anything stupid trying to get your operands into the right registers.

##### Share on other sites
1. Yes it can be, loading to the fpu from SSE requires a store to memory and a load to fpu stack. If you are lucky this will only hit the cache.

2. Because sometimes you only need to operate on one of the elements in the vector. Imagine a dot product in SSE without haddps. Also SSE float ops work differently than fpu ops. See my journal for more details on those differences (.net optimization part 2 iirc).

3. Branch prediction misses arent as expensive as you think (excepting tigh loop conditions, but if you have a hard to predict condition there then you have issues). Profile and see, and try to stick to compiler intrinsics if you can( journal entry on this too)

##### Share on other sites
Quote:
 Original post by Washu2. Because sometimes you only need to operate on one of the elements in the vector. Imagine a dot product in SSE without haddps.

I understand the first sentence (although I actually don't see the utility of operating on the 1st element ONLY, except that it saves you a shuffle instruction or two in case you really wanted to pass thru Y, Z, and W to the result), but how does the 2nd sentence have to do with anything?

Quote:
 Original post by Washu3. Branch prediction misses arent as expensive as you think...

Is this just generally because the general x86 architecture kicks ass at pipelining & prefetching & prediction, compared to other architectures? Because "expensive" is all relative, and I am sure that branch misprediction is a pipeline flusher that ain't up to no good.

Thanks guys -- I guess I'll keep working at generating an fcmov.. and maybe try inline asm if I gotta.

##### Share on other sites
Quote:
Original post by discman1028
Quote:
 Original post by Washu2. Because sometimes you only need to operate on one of the elements in the vector. Imagine a dot product in SSE without haddps.

I understand the first sentence (although I actually don't see the utility of operating on the 1st element ONLY, except that it saves you a shuffle instruction or two in case you really wanted to pass thru Y, Z, and W to the result), but how does the 2nd sentence have to do with anything?

heh, good point, wasn't thinking clearly here. However by providing a general purpose set of single floating point operations, SSE saves you from having to switch back and forth between the FPU and SSE. You might, for instance, perform a series of calculations to end up at a floating point number, then fill an SSE vector register with that number. By not having to switch between the FPU and SSE you avoid having to do a write out to memory and a read in from memory (which may hit the cache if you're lucky... if not, then it's expensive).
Quote:
Quote:
 Original post by Washu3. Branch prediction misses arent as expensive as you think...

Is this just generally because the general x86 architecture kicks ass at pipelining & prefetching & prediction, compared to other architectures? Because "expensive" is all relative, and I am sure that branch misprediction is a pipeline flusher that ain't up to no good.

Nope, it's not that branch prediction is so awsome on the x86/x64 platform, it's that memory hits are extremely expensive, so much so that the minor amount of time a branch miss will attract is insignificant compared to the amount of time waiting for something from main memory. Herb Sutter had a recent talk on this, and it's well worth viewing the video of it (and the slides side by side). This is especially important in high performance computing.

##### Share on other sites
Quote:
 Original post by WashuSSE saves you from having to switch back and forth between the FPU and SSE. You might, for instance, perform a series of calculations to end up at a floating point number, then fill an SSE vector register with that number. By not having to switch between the FPU and SSE you avoid having to do a write out to memory and a read in from memory.

Of course.. but anything you could do with *ss() function (and then splat (or "shuffle") into the whole SSE vector reg), you could do with *ps() functions.

e.g.

D = A*B+C

If the float values you care about were in the X components, you could do:

D = _mm_mul_ss(A,B);
D = _mm_shuffle_ps(D,D,0);

Or you could simply do:

D = _mm_mul_ps(A,B);
D = _mm_shuffle_ps(D,D,0);

Or, if A and B and C already had the same uniform value in all four components, the shuffle isn't even needed in case 2.

So I am still adamant about the unusefulness of the *ss() operations.

Quote:
 Original post by WashuNope ... the minor amount of time a branch miss will attract is insignificant compared to the amount of time waiting for something from main memory.

Of course... but as I said, everything's relative! Of COURSE we don't want a mem fetch (even worse is a cache miss, but even if a hit). But on a more picky level, we also would like to avoid a mispredicted branch (or even a correctly predicted branch on an architecture that doesn't prefetch instructions well)!

##### Share on other sites
Quote:
 Original post by discman1028Of course.. but anything you could do with *ss() function (and then splat (or "shuffle") into the whole SSE vector reg), you could do with *ps() functions.e.g.D = A*B+CIf the float values you care about were in the X components, you could do:D = _mm_mul_ss(A,B);D = _mm_add_ss(D,C);D = _mm_shuffle_ps(D,D,0);Or you could simply do:D = _mm_mul_ps(A,B);D = _mm_add_ps(D,C);D = _mm_shuffle_ps(D,D,0);Or, if A and B already had the same value in all four components, the shuffle isn't even needed in case 2.So I am still adamant about the unusefulness of the *ss() operations.

SS instructions are minorly faster than the PS variety. Exactly why this is I'm not sure, since they should be operating entirely in parallel. Also, the SS instructions allow you to operate on single floats throughout, thus avoiding the almost FPU entirely (not all functionality is replicated). Due to the number of SSE registers this allows for significantly more parallel calculation capabilities than the FPU can perform, typically. Especially since the destination can be seperated from the source, something that the FPU stack isn't so great at :)
Quote:

Quote:
 Original post by WashuNope ... the minor amount of time a branch miss will attract is insignificant compared to the amount of time waiting for something from main memory.

Of course... but as I said, everything's relative! Of COURSE we don't want a mem fetch (even worse is a cache miss, but even if a hit). But on a more picky level, we also would like to avoid a mispredicted branch (or even a correctly predicted branch on an architecture that doesn't prefetch instructions well)!

Again, watch his lecture, it's very much worth it.

##### Share on other sites
Quote:
 Original post by WashuSS instructions are minorly faster than the PS variety.

Really? Where did you find that, I couldn't find any documentation on that anywhere. :D

Quote:
 Original post by WashuAgain, watch his lecture, it's very much worth it.

Will do. I also read your archived journal entries.. very good! Die-hard singleton fans/critic rants are always fun to read too.

##### Share on other sites
Quote:
Original post by discman1028
Quote:
 Original post by WashuSS instructions are minorly faster than the PS variety. Exactly why this is I'm not sure, since they should be operating entirely in parallel.

Really? Where did you find that, I couldn't find any documentation on that anywhere. :D

One of the intel reference manuals relating to optimizing code for the Core2Duo platform.
Quote:

Quote:
 Original post by WashuAgain, watch his lecture, it's very much worth it.

Will do. I also read your archived journal entries.. very good! Die-hard singleton fans/critic rants are always fun to read too.

Eh, lets leave singletons out of this.