• Announcements

    • khawk

      Download the Game Design and Indie Game Marketing Freebook   07/19/17

      GameDev.net and CRC Press have teamed up to bring a free ebook of content curated from top titles published by CRC Press. The freebook, Practices of Game Design & Indie Game Marketing, includes chapters from The Art of Game Design: A Book of Lenses, A Practical Guide to Indie Game Marketing, and An Architectural Approach to Level Design. The GameDev.net FreeBook is relevant to game designers, developers, and those interested in learning more about the challenges in game development. We know game development can be a tough discipline and business, so we picked several chapters from CRC Press titles that we thought would be of interest to you, the GameDev.net audience, in your journey to design, develop, and market your next game. The free ebook is available through CRC Press by clicking here. The Curated Books The Art of Game Design: A Book of Lenses, Second Edition, by Jesse Schell Presents 100+ sets of questions, or different lenses, for viewing a game’s design, encompassing diverse fields such as psychology, architecture, music, film, software engineering, theme park design, mathematics, anthropology, and more. Written by one of the world's top game designers, this book describes the deepest and most fundamental principles of game design, demonstrating how tactics used in board, card, and athletic games also work in video games. It provides practical instruction on creating world-class games that will be played again and again. View it here. A Practical Guide to Indie Game Marketing, by Joel Dreskin Marketing is an essential but too frequently overlooked or minimized component of the release plan for indie games. A Practical Guide to Indie Game Marketing provides you with the tools needed to build visibility and sell your indie games. With special focus on those developers with small budgets and limited staff and resources, this book is packed with tangible recommendations and techniques that you can put to use immediately. As a seasoned professional of the indie game arena, author Joel Dreskin gives you insight into practical, real-world experiences of marketing numerous successful games and also provides stories of the failures. View it here. An Architectural Approach to Level Design This is one of the first books to integrate architectural and spatial design theory with the field of level design. The book presents architectural techniques and theories for level designers to use in their own work. It connects architecture and level design in different ways that address the practical elements of how designers construct space and the experiential elements of how and why humans interact with this space. Throughout the text, readers learn skills for spatial layout, evoking emotion through gamespaces, and creating better levels through architectural theory. View it here. Learn more and download the ebook by clicking here. Did you know? GameDev.net and CRC Press also recently teamed up to bring GDNet+ Members up to a 20% discount on all CRC Press books. Learn more about this and other benefits here.

lukabratzi

Members
  • Content count

    21
  • Joined

  • Last visited

Community Reputation

198 Neutral

About lukabratzi

  • Rank
    Member
  1. Hey everyone,   For the past few weeks or so I've been working on setting up tessellation with displacement mapping in my deferred renderer.  Everything works, the displacement is correct, etc, etc but the one thing that's been giving me terrible difficulty is calculating my tessellation factors for my triangles based on the distance from the camera.   In particular I'm seeing a lot of gaps between triangle edges whose tessellation factors are different.  So I know that at its most basic level, the problem is that adjacent triangles that "share" an edge (share as in the triangles don't share vertices but just have edges at the same position) aren't being calculated to have the same tessellation factors.  The part that has me stumped is that by my understanding I should be handling this case correctly.   The two adjacent triangles are actually two separate sets of 3 vertices.  I know its not efficient and could be better used if I had just 4 vertices and 6 indices, but the 4 vertices that make up a set of 2 adjacent edges should be identical in terms of position, so I don't think that could be the source of the problem.  But in the interest of completion here's the code that creates the mesh: int cellCount = 25; m_totalSize = 320.f; float cellSize = m_totalSize / cellCount; std::vector<Vertex> finalVertices(6 * cellCount * cellCount); float deltaX = cellSize, deltaZ = cellSize; float deltaU = 1.f / cellCount, deltaV = 1.f / cellCount; float minX = -m_totalSize * 0.5f, maxX = minX + deltaX; float minU = 0.f, maxU = deltaU; float minZ = -m_totalSize * 0.5f, maxZ = minZ + deltaZ; float minV = 0.f, maxV = deltaV; int startingIndex = 0; for(int row = 0; row < cellCount; ++row) { for(int col = 0; col < cellCount; ++col) { finalVertices[startingIndex++].SetParameters( maxX, 0.f, maxZ, maxU, maxV); finalVertices[startingIndex++].SetParameters( minX, 0.f, minZ, minU, minV); finalVertices[startingIndex++].SetParameters( minX, 0.f, maxZ, minU, maxV); finalVertices[startingIndex++].SetParameters( minX, 0.f, minZ, minU, minV); finalVertices[startingIndex++].SetParameters( maxX, 0.f, maxZ, maxU, maxV); finalVertices[startingIndex++].SetParameters( maxX, 0.f, minZ, maxU, minV); minX += deltaX; maxX += deltaX; minU += deltaU; maxU += deltaU; } minZ += deltaZ; maxZ += deltaZ; minV += deltaV; maxV += deltaV; minX = -m_totalSize * 0.5f; maxX = minX + deltaX; minU = 0.f; maxU = deltaU; } My Hull constant function takes in a patch of 3 points for each triangle and calculates the midpoint of each edge from those three points.  It then calculates the distance from the position of the camera and each midpoint and uses that distance to lerp between a minimum and a maximum tessellation factor (each of which has a corresponding range associated with it).  Since this is where the actual tessellation factors are calculated I'm guessing there's a good chance that its the culprit.  Here is the code from that portion of my shader: float maxDistance = 150.0f; float minDistance = 0.f; HS_CONSTANT_FUNC_OUT HSConstFunc(InputPatch<VS_OUTPUT, 3> patch, uint PatchID : SV_PrimitiveID) { HS_CONSTANT_FUNC_OUT output = (HS_CONSTANT_FUNC_OUT)0; float distanceRange = maxDistance - minDistance; float minLOD = 1.0f; float maxLOD = 32.0f; float3 midpoint01 = patch[0].Position + 0.5 * (patch[1].Position - patch[0].Position); float3 midpoint12 = patch[1].Position + 0.5 * (patch[2].Position - patch[1].Position); float3 midpoint20 = patch[2].Position + 0.5 * (patch[0].Position - patch[2].Position); float3 centerpoint = (patch[0].Position + patch[1].Position + patch[2].Position) / 3; // calculate the distance from camera position to each edge midpoint and the center of the triangle float e0Distance = distance(cameraPosition, midpoint01) - minDistance; float e1Distance = distance(cameraPosition, midpoint12) - minDistance; float e2Distance = distance(cameraPosition, midpoint20) - minDistance; float eIDistance = distance(cameraPosition, centerpoint) - minDistance; float tf0 = lerp(minLOD, maxLOD, (1.0f - (saturate(e0Distance / distanceRange)))); float tf1 = lerp(minLOD, maxLOD, (1.0f - (saturate(e1Distance / distanceRange)))); float tf2 = lerp(minLOD, maxLOD, (1.0f - (saturate(e1Distance / distanceRange)))); float tfInterior = lerp(minLOD, maxLOD, (1.0f - (saturate(eIDistance / distanceRange)))); output.edgeTesselation[0] = tf0; output.edgeTesselation[1] = tf1; output.edgeTesselation[2] = tf2; output.insideTesselation = tfInterior; return output; } Assuming that the math is correct, that makes me think that the other problem area could be my hull shader's partitioning method.  Currently I'm using the integer method as it seems the simplest and easiest to debug, though I know that it will eventually lead to visual popping.   [domain("tri")] [partitioning("integer")] [outputtopology("triangle_cw")] [outputcontrolpoints(3)] [patchconstantfunc("HSConstFunc")] HS_OUTPUT HSMain(InputPatch<VS_OUTPUT, 3> patch, uint i : SV_OutputControlPointID) { HS_OUTPUT output = (HS_OUTPUT)0; output.Position = patch[i].Position; output.UVCoords = patch[i].UVCoords; output.tc1 = patch[i].tc1; output.tc2 = patch[i].tc2; output.normal = patch[i].normal; return output; } Could it be that the integer partitioning method is the cause for the gaps?  I know that if you specify a tessellation factor of 3.1 for an integer partitioning method, it gets bumped down to a value of 3.***  But even in this case it confuses me that any two edges between two sets of identical points would return differing tessellation factors.   Thanks to anyone who takes a look.  Let me know if I can provide any other code or explain anything.   *** I have a very basic and disgusting understanding of the various tessellation partitioning types.  If some kind, knowledgeable stranger happens to understand these and wants to throw a bit of an explanation for the advantages/disadvantages of pow and fractional_even/odd then that would be amazing.
  2. Scrath that, figured it out.  I was mapping my normals incorrectly and got it fixed now!
  3. Hey everyone,   Still working on my deferred renderer and have everything working for my point lights except it seems that only half of the point lights are being properly lit.  Hopefully the attached image will show the problem I'm facing.   Now the first thing that I can think of is that it maybe my normals are getting corrupted between when I draw my scene and when I do the lighting calculations?  The reason I think this is because I have to convert the normals from the [-1, 1] range to the [0, 1] range before I render them to my normal render target (in the "scene" shader) and then back into the [-1, 1] range when I perform my lighting calculations in the light shader.   I did some searching and it looks like I have the same issue as http://www.gamedev.net/topic/627587-solved-pointlight-renders-as-halfsphere/ but that thread was never updated with what the actual solution/problem was.   I've looked at my shader code and my render target format for my normals yesterday and this evening and haven't been able to figure it out and was hoping someone could lend a second set of eyes over some of my code. Normal Render Target Creation Parameters: ID3D11Texture2D* text; D3D11_TEXTURE2D_DESC desc; desc.MipLevels = 1; desc.ArraySize = 1; desc.SampleDesc.Count = 1; desc.SampleDesc.Quality = 0; desc.Usage = D3D11_USAGE_DEFAULT; desc.BindFlags = D3D11_BIND_SHADER_RESOURCE | D3D11_BIND_RENDER_TARGET; desc.CPUAccessFlags = 0; desc.MiscFlags = 0; desc.Format = DXGI_FORMAT_R8G8B8A8_SNORM; desc.Width = width; desc.Height = height; "Scene" Shader: // Vertex shader code that outputs the normal in world space output.Normal = normalMap.SampleLevel(samplerStateLinear, output.UVCoords, 0.0f).xyz; output.Normal = mul(output.Normal, World); // Pixel shader code that transforms the normals to [0, 1] domain and then stores in the render target // transform the normal domain output.Normal = input.Normal; output.Normal = 0.5f * (normalize(output.Normal) + 1.0f); Light Pixel Shader: float4 PSMain(VS_OUTPUT input) : SV_Target { input.ScreenPosition.xy /= input.ScreenPosition.w; float2 texCoord = 0.5f * (float2(input.ScreenPosition.x, -input.ScreenPosition.y) + 1); // read normal data from normal RT float4 normalData = normaltex.Sample(samplerStateLinear, texCoord); // map back to [-1, 1] range and normalize float3 normal = 2.0f * normalData.xyz - 1.0f; normal = normalize(normal); float depthVal = depthtex.Sample(samplerStateLinear, texCoord).r; float4 position; position.xy = input.ScreenPosition.xy; position.z = depthVal; position.w = 1.0f; position = mul(position, InvViewProjection); position /= position.w; float3 lightVector = lightPosition.xyz - position; float attenuation = saturate(1.0f - length(lightVector)/lightRadius.x); // calculate the dot product of surface normal and light vec lightVector = normalize(lightVector); float dp = dot(lightVector, normal); float3 diffuseLighting = lightColor.xyz * dp; return float4(diffuseLighting, 1.0f) * attenuation * lightIntensity; } Thank you for any help/thoughts you might have.
  4. DX11

    All of these replies have been very helpful and solved my problem.  Thanks everyone for the help!
  5. Hey everyone,   I'm working on changing my DX11 forward renderer to be a deferred renderer and have everything implemented but am having difficulty figuring out how to disable the depth testing when it comes time to render my lights (which I'm rendering as billboarded quads that always face the camera).   Here's how I create my depth stencil view: virtual DepthStencilView* CreateDepthStencilView( int width, int height ) { DepthStencilViewD3D* dsv = new DepthStencilViewD3D( ); D3D11_TEXTURE2D_DESC desc = { width, height, 1, 1, DXGI_FORMAT_D32_FLOAT, { 1, 0 }, D3D11_USAGE_DEFAULT, D3D11_BIND_DEPTH_STENCIL, 0, 0 }; ID3D11Texture2D *tex = NULL; HV(device->CreateTexture2D(&desc, NULL, &tex), "Creating texture for depth/stencil buffers" ); HV(device->CreateDepthStencilView(tex, NULL, &dsv->dsv), "Creating depth stencil view." ); CreateDepthStencilState(true); return dsv; } virtual void CreateDepthStencilState(bool depthTestEnabled) { D3D11_DEPTH_STENCIL_DESC dsDesc; dsDesc.DepthEnable = depthTestEnabled; dsDesc.DepthWriteMask = depthTestEnabled ? D3D11_DEPTH_WRITE_MASK_ALL : D3D11_DEPTH_WRITE_MASK_ZERO; dsDesc.DepthFunc = depthTestEnabled ? D3D11_COMPARISON_LESS : D3D11_COMPARISON_ALWAYS; // Stencil test parameters dsDesc.StencilEnable = false; dsDesc.StencilReadMask = 0xFF; dsDesc.StencilWriteMask = 0xFF; // Stencil operations if pixel is front-facing dsDesc.FrontFace.StencilFailOp = D3D11_STENCIL_OP_KEEP; dsDesc.FrontFace.StencilDepthFailOp = D3D11_STENCIL_OP_INCR; dsDesc.FrontFace.StencilPassOp = D3D11_STENCIL_OP_KEEP; dsDesc.FrontFace.StencilFunc = D3D11_COMPARISON_ALWAYS; // Stencil operations if pixel is back-facing dsDesc.BackFace.StencilFailOp = D3D11_STENCIL_OP_KEEP; dsDesc.BackFace.StencilDepthFailOp = D3D11_STENCIL_OP_DECR; dsDesc.BackFace.StencilPassOp = D3D11_STENCIL_OP_KEEP; dsDesc.BackFace.StencilFunc = D3D11_COMPARISON_ALWAYS; // Create depth stencil state ID3D11DepthStencilState * pDSState; device->CreateDepthStencilState(&dsDesc, &pDSState); } And before I render the lights I call: renderer.CreateDepthStencilState(false); With the expected result being that my light quads will be drawn without any consideration of depth testing.  Unfortunately it looks like that's not the case and as far as I can tell, my changes are having no effect and depth testing is still being done.  Anyone have experience with this that can tell me what I'm doing wrong?   Thanks  
  6. I was a dingus and posted this in the gameplay section as I thought it would be more relevant to that forum but then saw all the threads about visibility in here and wanted to cross post it.  Sorry if this is a foul.  If this is a huge issue feel free to delete and close the thread in the gameplay forum: http://www.gamedev.net/topic/646914-help-with-visibility-test-for-ai-in-2d/   Hey everyone,   I'm working with some friends on a top down, stealth based, 2D tiled game.  One of the issues we're facing is how to have the AI detect that they can see the player.     The reason this is difficult is because our levels have the concept of layers (ie: climb a ladder in the scene to get from the lower layer (1) to the upper layer of the scene (2).  This factors into gameplay because we want the AI to be able to see players that are beneath them, unless the players are against the wall or close enough to the wall that the AI shouldn't be able to see them.  See the attached image for an example, red is the AI, green is the player, and yellow is the detectable range that the AI can see.  Note how the ground cuts off the AI's frustum.   The issue is that half the team wants to push for using a 3D collision system in our 2D game.  Meaning everything in the game will have a 3D collision shape in this 2D game where the height is nonvariable except for when you're on different layers.   This seems wasteful to me and I'm trying to think of a way that this system can be implemented simply using 2D AABB's and basic trigonometry.  I'm trying to come up with a solution for this in 2D and wanted to see if anyone else had some thoughts.  Its safe to assume  that we know the height difference between the ai and the player, the distance of the AI from the edge, and the distance of the player from the bottom of the cliff.  The map is tiled and it would be fairly trivial to map the player/ai to the nearest tile (the sacrifice of accuracy isn't an issue) so its possible that there's some use in mapping the ai and player to a tile and figuring out visible tiles from that info.   This may be the one problem that we can't solve in 2D and that requires us to go with a 3D system for checking, but I'd really like to avoid that if at all possible.     If I can clear anything up or give more detail please feel free to let me know.
  7. Hey everyone,   I'm working with some friends on a top down, stealth based, 2D tiled game.  One of the issues we're facing is how to have the AI detect that they can see the player.     The reason this is difficult is because our levels have the concept of layers (ie: climb a ladder in the scene to get from the lower layer (1) to the upper layer of the scene (2).  This factors into gameplay because we want the AI to be able to see players that are beneath them, unless the players are against the wall or close enough to the wall that the AI shouldn't be able to see them.  See the attached image for an example, red is the AI, green is the player, and yellow is the detectable range that the AI can see.  Note how the ground cuts off the AI's frustum.   The issue is that half the team wants to push for using a 3D collision system in our 2D game.  Meaning everything in the game will have a 3D collision shape in this 2D game where the height is nonvariable except for when you're on different layers.   This seems wasteful to me and I'm trying to think of a way that this system can be implemented simply using 2D AABB's and basic trigonometry.  I'm trying to come up with a solution for this in 2D and wanted to see if anyone else had some thoughts.  Its safe to assume  that we know the height difference between the ai and the player, the distance of the AI from the edge, and the distance of the player from the bottom of the cliff.  The map is tiled and it would be fairly trivial to map the player/ai to the nearest tile (the sacrifice of accuracy isn't an issue) so its possible that there's some use in mapping the ai and player to a tile and figuring out visible tiles from that info.   This may be the one problem that we can't solve in 2D and that requires us to go with a 3D system for checking, but I'd really like to avoid that if at all possible.     If I can clear anything up or give more detail please feel free to let me know.    
  8. Is there a more programmatic way to do it?  Having to have separate lookup tables for each possible layout for each possible language seems like a nightmare that I'm hoping to avoid.     That's the method I'm currently using to convert the scan code into the virtual key code.  Now I just need a way to display that virtual key as a string
  9. Hello everyone,   Given a scan code for a key on a keyboard, is there a way to "stringify" and display the key that the scan code maps to?  I know that I can convert the scan code to the corresponding virtual key code, but I'm not sure how to best convert that virtual key into a string.   The reason for this is that my input is based on the scan code of the keyboard, but different keyboard layouts (DVORAK, etc) will have different keys mapped to the scan codes.  Additionally, this needs to work for languages other than english as well.
  10. Good god, I feel ridiculously foolish for missing that.  I'll test that out later today and see if it fixes it, but it would explain a lot.  Sometimes its the simple things I guess.   I had tried running through the process multiple times and that lessened the problem.  Is this an actual approach that some physics engines take?  It seems so inefficient.
  11.     Would this work?       aPhysicsComponent->m_velocityChangesToApply += (hitNormalA) * impulse / spriteAMass;     bPhysicsComponent->m_velocityChangesToApply += (hitNormalB) * impulse / spriteBMass; Well both hitNormalA/B and impulse are both vectors.  Ideally impulse should be some force and direction applied on collision to both objects for a bounce effect, but I'm not even using that right now.  currently I'm just trying to move the objects apart by the penetration amount (hitNormalA = hitNormal * penetrationDepth) to separate them.   What's happening is that my method breaks when any one object is colliding with more than one other object.  I think that the middle object is being acted upon by both sides and can't resolve either collision as its always blocked by one of the other two objects.   So that makes me think that it may not necessarily be a problem with my collision resolution code but is instead more of a fundamental, conceptual problem with how I'm conducting my collision tests?
  12. Greetings everyone,   I'm working on implementing my own *very* basic 2D collision engine for AABB objects to use in my 2D platformer I'm trying to make.     I've managed to get the following working: Collision detection Collision between one movable sprite and one nonmovable sprite (mass == 0) Collision between exactly two movable sprites What I'm having trouble with is resolving multiple collisions at a time.  For instance, suppose that the player is pushing another object into a third object along the x axis.  In this case, the second object should push up against the third object with the player pushing against the middle object.  Instead, all of my objects are penetrating each other and the resolution isn't happening.  In the attached image, the player (rectangle 1) is pushing another object (rectangle 2) into a third object (rectangle 3).   Here are the relevant portions of my code.  Again, I've never implemented any kind of collision detection before so this is probably hideous and I'm sure that I am doing an unfathomable number of things wrong.  ANY critique or suggestions would be appreciated but I'd specifically like to address the above problem.   If anyone needs additional information or code then just let me know and I'd be happy to upload it.   void CollisionManager::HandleSpriteCollisions(const std::vector<Sprite*>& sprites) { for(size_t k1 = 0; k1 < sprites.size(); ++k1) { PhysicsComponent* physics = sprites[k1]->GetFirstComponentOfType<PhysicsComponent>(); if(physics) physics->m_velocityChangesToApply = Vector2D::ZeroVector(); for(size_t k2 = k1 + 1; k2 < sprites.size(); ++k2) { if(k1 == k2) continue; bool spritesOverlap = DoSpritesOverlap(sprites[k1], sprites[k2]); if(spritesOverlap) HandleSpriteCollision(sprites[k1], sprites[k2]); } } for(size_t k1 = 0; k1 < sprites.size(); ++k1) { PhysicsComponent* physics = sprites[k1]->GetFirstComponentOfType<PhysicsComponent>(); if(physics) physics->m_position += physics->m_velocityChangesToApply; } } void CollisionManager::HandleSpriteCollision( Sprite* spriteA, Sprite* spriteB ) { Vector2D overlapAxis; float overlapAmount = CalculateOverlapAmount(spriteA, spriteB, overlapAxis); if(overlapAmount == 0) return; PhysicsComponent* aPhysicsComponent = spriteA->GetFirstComponentOfType<PhysicsComponent>(); PhysicsComponent* bPhysicsComponent = spriteB->GetFirstComponentOfType<PhysicsComponent>(); if(!aPhysicsComponent || !bPhysicsComponent) return; float spriteAMass = aPhysicsComponent->m_mass; float spriteBMass = bPhysicsComponent->m_mass; float totalMass = spriteAMass + spriteBMass; float proportionalMassA = spriteAMass / totalMass; float proportionalMassB = spriteBMass / totalMass; float penetrationDepth = overlapAmount + 1; Vector2D spriteDirectionVector = (bPhysicsComponent->m_collisionShape->position - aPhysicsComponent->m_collisionShape->position).Normalize(); Vector2D hitNormalA = overlapAxis * penetrationDepth; Vector2D hitNormalB = hitNormalA * -1.f; Vector2D resolutionVector; if(aPhysicsComponent->m_mass == 0 && bPhysicsComponent->m_mass == 0) return; if(aPhysicsComponent->m_mass > 0 && bPhysicsComponent->m_mass > 0) { Vector2D impulse = hitNormalA; float restitutionCoefficient = -0.5f; impulse *= (-(1.0f + restitutionCoefficient)); impulse /= (1/spriteAMass + 1/spriteBMass); aPhysicsComponent->m_velocityChangesToApply += (hitNormalA); bPhysicsComponent->m_velocityChangesToApply += (hitNormalB); } else if(aPhysicsComponent->m_mass == 0) { bPhysicsComponent->m_velocityChangesToApply += ( hitNormalB ); } else if(bPhysicsComponent->m_mass == 0) { aPhysicsComponent->m_velocityChangesToApply += ( hitNormalA ); } }      
  13. [quote name='Nypyren' timestamp='1333838874' post='4929178'] That line is a comment. It's just explaining what the branchless code is up to. [/quote] [quote name='edd²' timestamp='1333839394' post='4929180'] A branch means moving the program counter forwards (beyond the next instruction, typically) or backwards. There's no branching going on. _mm_cmplt_ps is an instruction that can be viewed as taking two inputs and generating a single output. Once that instruction has finished, the program counter advances to the next instruction. No branch. The same holds for the other instructions/intrinsics used in the example. Here's the same algorithm for scalar variables in regular C: [code] const unsigned choice_masks[2] = { 0, (unsigned)-1 }; unsigned mask = choice_masks[lhs < rhs]; unsigned result = (mask & option1) | ((~mask) & option2); // assuming option1 and option2 are also of type unsigned [/code] No conditionals, no branching. [/quote] Fantastic, that really clears things up for me and will let me eliminate a lot of gross bit manipulations I had to do. Thanks for clearing this up for me guys!
  14. [quote name='edd²' timestamp='1333833206' post='4929152'] There are a number of SSE instructions that generate bit masks (all ones or all zeros) depending on the result of a comparison. These masks are then used to choose between two options. An example using intrinsics: [code] // result = (lhs < rhs) ? option1 : option2; __m128 mask = _mm_cmplt_ps(lhs, rhs); __m128 result = _mm_or_ps(_mm_and_ps(mask, option1), _mm_andnot_ps(mask, option2)); [/code] [/quote] Right, I saw that. But here's what concerns me: // result = (lhs < rhs) ? option 1 : option2 How is that not a branch? Does SSE do some special bit algorithm? Otherwise I can't see how that's not the same as: if ( lhs < rhs ) result = option1 else result = option 2
  15. Hey everyone, So I'm working on a relatively simple SSE function. Essentially, I give it three inputs and compute three outputs based on the inputs. The issue is that one of these outputs has three potential values. For instance: Output = < a1, b, c > if component 1 is the largest Output = < a2, b, c > if component 2 is the largest Output = < a3, b, c > if component 3 is the largest Now my main goals is to be able to implement this completely without branching. To do this I use SSE to compute all three possible outputs <a1, a2, a3 > in parallel. This part works perfectly and I've verified its correct. My problem is that I then need to choose an output value for a without branching. I initially tried to do this completely without using any SSE cmp instructions because I figured that intrinsic instructions like max, min, cmplt, etc would all use branching in some fashion behind the scenes. But I've been reading up on a few sites (Intel, lecture slides, etc) and I'm not so sure that these instructions actually use branching, which kind of blows my mind. I was hoping that someone on here would have a definitive answer they could give me. If the SSE comparison instructions DO use branches, does anyone have an idea for how I could select my output without branching? I've been working on this for like the past 24 hours and am almost at my wits end. If anyone needs additional info just let me know and I can answer your questions.