Jump to content
  • Advertisement

D.V.D

Member
  • Content count

    124
  • Joined

  • Last visited

Community Reputation

1032 Excellent

About D.V.D

  • Rank
    Member

Personal Information

  • Role
    Programmer
  • Interests
    Programming

Recent Profile Visitors

The recent visitors block is disabled and is not being shown to other users.

  1. Hey all, so I did some experimenting to try and see what rendered outputs of various methods give me. I'm focusing right now on making sure my rendered image renders hard edges properly and I want it to do so in the least amount of octree traversals as possible. My octree render function looks a little like this: void RenderOctree(v3 Center, octree* Node, u32 Level) { b32 IsLeaf = CheckIfLeaf(Node); i32 MinX, MaxX, MinY, MaxY; f32 FMinX, FMaxX, FMinY, FMaxY; f32 MinZ, MaxZ; ApproximateNodeSize(Center, Level, &FMinX, &FMaxX, &FMinY, &FMaxY, &MinZ, &MaxZ); if (MaxZ <= 0.0f) { // NOTE: Node is behind camera, cull it return; } MinX = SomeRoundingMethod(FMinX); MaxX = SomeRoundingMethod(FMaxX); MinY = SomeRoundingMethod(FMinY); MaxY = SomeRoundingMethod(FMaxY); if (NodeIsPixelSize()) { // Render the node as a dot v2 ProjectedCenter = ProjectPoint(Center); u32 PixelId = SomeRoundingMethod(ProjectedCenter.y)*ScreenX + SomeRoundingMethod(ProjectedCenter.x); f32 Depth = DepthMap[PixelId]; if (Depth > Center.z) { RenderState->DepthMap[PixelId] = Center.z; } return; } if (MinX >= ScreenX || MinY >= ScreenX || MaxX <= 0 || MaxY <= 0) { // NOTE: Node is outside of the screen, don't render it or its children return; } MinX = Max(0, MinX); MinY = Max(0, MinY); MaxX = Min(ScreenX, MaxX); MaxY = Min(ScreenX, MaxY); b32 IsOccluded; { IsOccluded = true; f32* DepthRow = DepthMap + MinY*ScreenX + MinX; for (i32 Y = MinY; Y < MaxY; ++Y) { f32* Depth = DepthRow; for (i32 X = MinX; X < MaxX; ++X) { if (*Depth > MinZ) { // NOTE: We aren't occluded IsOccluded = false; goto EndLoop; } ++Depth; } DepthRow += ScreenX; } } EndLoop: if (IsOccluded) { return; } for (u32 CurrNodeId = 0; CurrNodeId < 8; ++CurrNodeId) { u32 SortedNodeId = Indicies[GetFrontBackSortId(CurrNodeId)]; if (Node->Children[SortedNodeId]) { RenderOctree(GetChildCenter(Node, SortedNodeId), Node->Children[NodeId], Level + 1); } } } So in the above code, we first approximate the nodes size on screen and get its float min/max xy values, we then use some sort of rounding method to convert those values to ints, we check if the nodes size is a pixel (and render if it is), otherwise we clip the node, check for occlusion, and traverse its children. For the various experiments I tried, I used the above code and only modified the rounding method for min/max, the rounding method for rendering the 1 pixel node, and whether the occlusion check was inclusive/exclusive of max x,y values. The first methods I tried render single pixel nodes by projecting and flooring the center of the node to convert to a integer pair, which represents the pixel that node takes up. The reason we floor the center of the node is because flooring corresponds to rendering the node to the pixel it covers the most. Method 1: Render Info: Num Traversals: 5056321 NumRejected: 562733 NumRendered: 3656973 This method takes the floor of FMinX/Y values and the ceiling of FMaxX/Y values to get the pixels a node intersects (for occlusion checks). The idea here is that we try to be conservative with the pixels the node is expected to cover. This method produces renders with sharp edges but traverses a ton of extra nodes to do so. We also check if a node is ready to render by using the FMin/Max values. The node must satisfy (FMaxX - FMinX) <= 1.0f and (FMaxY - FMinY) <= 1.0f to be rendered. Method 2: Render Info: Num Traversals: 3406177 NumRejected: 646950 NumRendered: 2193007 Here we round the FMin/Max values to the nearest integer to calculate the integer Min/Max values. If you look at some of the edges in the renderer, they have pixel crawling (random dots sticking out). Method 1 didn't have this artifact because it would fill in the extra line where the dots stick out (the edges were thicker). In this method, we get a ton of nodes that get rounded down to be inside the edge instead of outside, and thus are rejected and prevented from being rendered correctly, which gives us the pixel crawling. We again use the FMin/Max values to check if a node is small enough to be rendered. Method 3: Render Info: Num Traversals: 3705105 NumRejected: 299739 NumRendered: 2784177 Here we use integer Min/Max values to decide if a node is small enough to be considered a single pixel in size. We check if MaxX - MinX <= 1 and MaxY - MinY <= 1. We take the floor of our FMin/FMax values to generate the min/max values, and we make our occlusion check inclusively check pixels at MaxX and MaxY coordinates. The result has lots of holes and pixel crawling which seems to be caused by improper coverage of the nodes, due to the rounding method. I figured that being inclusive with the bounds check would make the node size correspond to the actual pixels the node takes up but I guess this isn't the case. Method 4: Render Info: Num Traversals: 2544985 NumRejected: 340016 NumRendered: 1777500 Here we use integer Min/Max values to decide when to render a node and we calculate it by rounding FMin/Max to the nearest integer. The idea is that the previous method wasn't capturing the node size properly so maybe rounding provides the actual node size that the nodes take up on screen. We again use integer Min/Max to decide if a node is ready to be rendered. The result looks almost perfect, but it still has a little pixel crawling occurring for hard edges. Method 5: Render Info: Num Traversals: 2686033 NumRejected: 1045595 NumRendered: 1184283 This method (like method 4) uses integer Min/Max to decide when a node should be rendered, and it calculates the Min/Max values by rounding to the nearest integer. The only difference is, if a node's MinX==MaxX or MinY == MaxY, then the node is discarded completely and not rendered. This seems to cleanup Method 4's image and make all the edges sharp. Method 5 seems to be the most correct so far. And best of all, it also has a similar number of traversals as Method 4, so we aren't being overly conservative with approximating the nodes size when converting the coordinates to integers. One thing that bothers me with this method is the fact that nodes which are 0 pixels wide or tall. Of course those nodes should be discarded since they have 0 pixels covered but it feels odd that they exist in the first place and have to be removed. I guess that's just a sacrifice that has to happen because of weird rounding errors that can occur. The other thing that bothers me with this method and the other methods which use integer min/max values to calculate if a node should be rendered is the resulting level map that gets rendered. So when I render each node, I also render into a separate buffer a color for the given level (recursion depth) that the node is. So we have a color for level 1, another color for level 2, and so on. If we use float FMin/FMax values to decide if a node should be rendered, our level map looks like this: If we use integer Min/Max, our level map looks like this: Now there are cases where a node which is 1 pixel by 1 pixel can actually color 4 pixels if it happens to have its center where 4 pixel corners intersect. So from that logic, it makes sense for the node to be rendered based off of integer min/max values. But from the above images, we would expect layers of levels moving farther away from the camera like what we get when we use float FMin/FMax values to decide if a node is small enough to render. So my question is, whats the right answer here? Should my rendering method generate 0 sized nodes that need to be discarded? Should it also have the weird level map as shown above? Or are there some other rounding methods that make more sense theoretically/experimentally?
  2. Right so I guess I should have stated what I'm doing differently. I'm rendering voxels stored in octrees. Part of that process is checking for occlusion. So I walk through the octree front to back, and if a node is occluded, I skip it and all of its children. Otherwise, I traverse it as well. So when I say it would render 10 times more nodes, that's what it did when I tested it. I'm assuming the reason is because you get lots of nodes that are rounded up to have a diameter of 1 (so MaxX - MinX == 1) instead of 0, so the program traverses way further then it should be. I know that it is traversing way more nodes than it should because on average, the children of a parent node in a octree are like 60% of the size of their parent on screen. So doing the math, I'm rendering on a 512x512 screen, I should traverse 12-13 levels to get pixel sized octree nodes, which my current method gets me. When I ask the nodes to have a MinX == MaxX and MinY == MaxY, that becomes a couple levels deeper, which shouldn't be happening. Right so to get 100% accurate results that would the correct path, but once the nodes become 1-2 pixels in size, the difference between doing either becomes very close to 0. My issue is with the small nodes, the large nodes seem to be working just fine, it's the small nodes that approach that small size that give me inconsistent results. So in the examples I drew in my first post, I have a 2x2 node that would be considered visible on screen. But when I subdivide it, none of its full children are overlapping a pixel so I get nothing drawn to the screen. This makes the traversal provide no results and I don't think its visually the correct result to be getting.
  3. So I guess I should clarify. The model geometry is the actual octree. When I check for occlusion, I approximate the cube as a screen bounding box of the cubes 8 vertices. When I render the cube, I render it as the screen bounding box as well. I just hope that the model is detailed enough such that the cubes I render are 1 pixel in size, so the screen bounding box converges to the same result as rendering the cube outright. You can see this below when I modify the max level of detail: So if I where to test conservatively, I'd also have to render conservatively, and then a node is of size one pixel when Minx - MaxX == 0 and MinY - MaxY == 0 which is what I did before, and the program would render 10 times more nodes then it currently does.
  4. Hey guys, I've been writing a software occlusion culling algorithm (mostly for fun and learning), and I came across an issue. I store my geometry as an octree that I traverse front to back, and I check each node for coverage to see if it and its children are occluded. When checking a node for occlusion, the node itself is a cube but I approximate it as the minimum on screen bounding box that covers the cube. Now my issue is, when checking this on screen bounding box for occlusion, I don't exactly know when a pixel should be considered as touched by the geometry. Currently, I generate min/max points for every node on screen, I round them to the nearest integer, and loop as shown in the below code: f32* DepthRow = RenderState->DepthMap + MinY*ScreenX + MinX; for (i32 Y = MinY; Y < MaxY; ++Y) { f32* Depth = DepthRow; for (i32 X = MinX; X < MaxX; ++X) { if (*Depth > NodeZ) { // NOTE: We aren't occluded } ++Depth; } DepthRow += ScreenX; } This method is kind of the same as what software rasterizes do, in that a pixel is only considered a part of the box if the center of the pixel is inside the box. Sometimes though, I get nodes that are 2x1 pixels in size, and when I subdivide them, none of the children are rendered because of where they happen to be relative to the pixel centers. This is illustrated in the below 2 images: So in the above examples, one node happens to overlap 2 pixels, but once it's subdivided, some its children are empty and the ones which aren't overlap 0 pixels because the min max values are equal, so we get a nothing rendered. So my issue is how to better decide when a node should overlap a pixel to avoid such cases. I thought I can maybe make my loop inclusive by checking Y <= MaxY and X <= MaxX but then I have to subdivide nodes until MinX == MaxX and MinY == MaxY, which makes my program render 10 times more nodes than it usually does. My other option is to check if the pre rounded Min/Max values are of a distance less than or equal to 1 of each other, and render that, regardless of if the block takes up more than one pixel because of its orientation. The only issue I have with this method is if I can really consider my renders as pixel accurate then, in the worst case scenario, a node can be of diameter 1 on the screen but cover 4 pixels, so I can get some blockier results than I should be getting. Is there a good fix for this issue? There aren't many resources on software rasterization, especially with very small geometry so I don't know how explored this issue really is.
  5. Yeah it for sure is. Ill try to be more careful, but I'm happy I implemented the changes anyways, now I have an idea of how to do it :P. I think in my case I'm completely computationally bounded, almost all the time is taken projecting octree nodes on to the screen, and I wrote a SIMD version of that (its what got my times from 600ms to 100ms), but I got a couple more ideas for optimizations to make it faster! I'll try to be more careful from now XD.
  6. I actually had a routine I wrote for doing just that. It would only do masked reads for the edges/corners of the box. The only issue was that it failed for rendering 1 pixel sized boxes, it needed more if statement checks to do that correctly. I just left it alone since I figured I should at least get the easy case right. So I did profiling before and as I was writing code, but I think I made the most rookie mistake. When I was adding SIMD to my code, I didn't just apply it for pixel checks, I also did it for other parts of the program. So when I bench marked that, I was way faster than scalar code. I originally used AVX to check a line of 8 pixels at a time and later I did the 2x2 pixel approach. Using Very Sleepy profiler, I saw that those routines took around 8-12% of my rendering time so I figured oh they need to be optimized. What I should have checked was if the pixel checks only are faster in SIMD than they are in scalar code. So these are my profiling results: Average Scalar: 611ms, Average 8x1 AVX: 652ms, Average 2x2 SSE: 644ms With other parts of the rendering optimized for SIMD, the results are Average Scalar: 103ms, Average 8x1 AVX: 112ms, Average 2x2 SSE: 129ms So these are the average times it takes to render my frame, with 8x1 AVX checking a line of 8 pixels at a time, and 2x2 SSE checking a block of 2x2 pixels at a time. So yeah, this is really wild :P. With Very Sleepy, the routine for pixel checking in scalar code takes like 2% of the total runtime, so its already incredibly fast. I guess I should have checked this earlier :P. It may be that in the future, once my scenes get more complex, Ill have a lot more nodes being occluded which require checking a block of pixels (currently, its somewhere between 10-20% of traversed nodes). So yeah, I lost a lot of time because of my ignorance XD. I ended up looking only and I found that Ryg's blog has a post on traversing images which store data in 2x2 blocks. He provided the below code that shows how to address a particular pixel location if the texture is stored in tiles of pixels, with 4x4 pixels inside: // per-texture constants uint tileW = 4; uint tileH = 4; uint widthInTiles = (width + tileW-1) / tileW; // actual addressing uint tileX = x / tileW; uint tileY = y / tileH; uint inTileX = x % tileW; uint inTileY = y % tileH; pixel = image[(tileY * widthInTiles + tileX) * (tileW * tileH) + inTileY * tileW + inTileX]; The post can be found here: https://fgiesen.wordpress.com/2011/01/17/texture-tiling-and-swizzling/. So the optimization would be to use the % operator instead of a multiplication and subtraction. I'll probably look more into this, mainly because its interesting to reorder 2d buffers into these kinds of formats, but it looks like for my app now, its a decent amount slower. Lastly, yeah I should have commented my code more. I guess I got to into what I was doing and didn't realize if it was readable or not Thanks a lot for still sticking by though
  7. That link is what I was looking for, looks like visual studio 2013 just doesn't support that instruction. Out of interest, what do you mean they can be disastrous? If a CPU doesn't support some x86 instruction, the program just crashes but all CPU's that support AVX should work fine right?
  8. Yeah, writing software renderers is actually kind of fun :). So when you say tile rendering, you mean binning? Like dividing the screen into 16x16 blocks, figuring out which objects lie in which block, and then later rendering everything in a block at a time? So I was going to do that (if thats what you are talking about) using multi threading but I guess I should say that my renderer is not a triangle rasterizer. I'm writing a octree voxel splatter (again). I represent my geometry in the scene as an octree and all the leaf nodes are either marked full or empty, so the geometry is just a ton of cubes assembled with an octree. When I render, I traverse the octree in front to back order and render nodes that aren't occluded. When I render the nodes, rather than rasterizing a cube, I find the cubes bounding box on screen and fill those pixels as if that entire block was the cube. The idea is that once I have a around a single node per pixel, rasterizing a cube vs rendering its min/max box becomes the same. Below I attached some photos to show what I mean, I'm changing the max recursion depth so you can see what I'm rendering. So in my circumstance, its a little hard to do binning mostly because the main speed up for my algorithm is the occlusion culling, which I don't think can be made parallel. That's why I'm trying to optimize stuff like how fast I can render or check a box on the screen for some depth value instead. I've been thinking about the 4x4 blocks of pixels that I wanted to implement and I figured that instead of having 4x4 blocks with 2x2 blocks inside, I can have a 4x4 block and one AVX vector will be able to index 2 2x2 blocks at a time anyways, because its 8 wide. I attached a picture below showing what I mean. The problem now becomes the masks. The setup costs are still the same so if someone has a way to reduce that, it would be very helpful but for this case, I also have to figure out how to reduce the setup cost for more masks. So lets say I'm writing to pixel 5, 6, 7, 9, 10, 11, 13, 14, 15. That's a 3x3 block inside of the 4x4 block. For masks, I have two options. One is to load all 16 pixels in the 4x4 block, and pick a mask for the configuration I'm writing. Unfortunately, this would mean a huge amount of masks because I'd need masks for every kind of rect that I want to draw (somewhere around 32 masks I think). If I instead point to pixel 5, then I would need to repeat a 4 wide mask of (0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0), across the pixels in this tile. So now I get a larger setup cost because I have to construct the masks for each block instead of just reading off a table, but I reduce the number of masks I'd have to store by a lot. I'm just thinking out loud right now, I have no idea if this will be any good. I was hoping there was some resource that explained how to better do stuff like this but from looking around online, I can't seem to find anything.
  9. Hey guys, I'm writing a software renderer for fun and I decided to try and optimize the renderer by storing the frame buffer and depth buffer as rows of 2x2 blocks of pixels. I figured it would be easier to SIMDify and be more cache local. My main function for the software renderer is traversing boxes on the screen to either check for occlusion or to color in pixels. So if my pixel format is rows of pixels, then the main thing I perform is this: void draw(u32* Pixels, int minx, int miny, int maxx, int maxy) { u32* PixelRow = Pixels + NumPixelsX*MinY + MinX; for (u32 Y = miny; Y < maxy; ++Y) { u32* CurrPixel = PixelRow; for (u32 X = minx; X < maxx; ++X) { *CurrPixel++ = some color/depth; } PixelRow += NumPixelsX; } } I converted this routine to SIMD and to process 2x2 pixels, but I found that by doing so, I added a big setup cost before the actual loop: #define BLOCK_SIZE 2 #define NUM_BLOCK_X (ScreenX / BLOCK_SIZE) global __m128i GlobalMinXMask[] = { _mm_setr_epi32(0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF), _mm_setr_epi32(0, 0xFFFFFFFF, 0, 0xFFFFFFFF), }; global __m128i GlobalMaxXMask[] = { _mm_setr_epi32(0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF), _mm_setr_epi32(0xFFFFFFFF, 0, 0xFFFFFFFF, 0), }; global __m128i GlobalMinYMask[] = { _mm_setr_epi32(0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF), _mm_setr_epi32(0, 0, 0xFFFFFFFF, 0xFFFFFFFF), }; global __m128i GlobalMaxYMask[] = { _mm_setr_epi32(0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF), _mm_setr_epi32(0xFFFFFFFF, 0xFFFFFFFF, 0, 0), }; inline void DrawNode(u32* Pixels, u32 MinX, u32 MaxX, u32 MinY, u32 MaxY) { // NOTE: All of this is the setup cost right here u32 BlockMinX = MinX / BLOCK_SIZE; u32 BlockMaxX = MaxX / BLOCK_SIZE; u32 BlockMinY = MinY / BLOCK_SIZE; u32 BlockMaxY = MaxY / BLOCK_SIZE; u32 DiffMinX = MinX - BLOCK_SIZE*BlockMinX; u32 DiffMaxX = MaxX - BLOCK_SIZE*BlockMaxX; u32 DiffMinY = MinY - BLOCK_SIZE*BlockMinY; u32 DiffMaxY = MaxY - BLOCK_SIZE*BlockMaxY; if (DiffMaxX) { BlockMaxX += 1; } if (DiffMaxY) { BlockMaxY += 1; } f32* RowDepth = RenderState->DepthMap + Square(BLOCK_SIZE)*(BlockMinY*NUM_BLOCK_X + BlockMinX); __m128 NodeDepthVec = _mm_set1_ps(Z); for (u32 Y = BlockMinY; Y < BlockMaxY; ++Y) { f32* DepthBlock = RowDepth; for (u32 X = BlockMinX; X < BlockMaxX; ++X) { __m128i Mask = _mm_set1_epi32(0xFFFFFFFF); if (X == BlockMinX) { Mask = _mm_and_si128(Mask, GlobalMinXMask[DiffMinX]); } if (X == BlockMaxX - 1) { Mask = _mm_and_si128(Mask, GlobalMaxXMask[DiffMaxX]); } if (Y == BlockMinY) { Mask = _mm_and_si128(Mask, GlobalMinYMask[DiffMinY]); } if (Y == BlockMaxY - 1) { Mask = _mm_and_si128(Mask, GlobalMaxYMask[DiffMaxY]); } __m128 CurrDepthVec = _mm_maskload_ps(DepthBlock, Mask); __m128 Compared = _mm_cmp_ps(CurrDepthVec, NodeDepthVec, _CMP_GT_OQ); Mask = _mm_and_si128(Mask, _mm_castps_si128(Compared)); _mm_maskstore_ps(DepthBlock, Mask, NodeDepthVec); DepthBlock += 4; } RowDepth += Square(BLOCK_SIZE)*NUM_BLOCK_X; } } (The min/max masks are if the min/max values fall inside a 2x2 block of pixels, we want to mask out writing/reading the pixels that aren't actually part of our range). Calculating the block min/max as well as the diff min/max all seems to be a little much, I'm not sure if there is a much more efficient way to do that. I also wanted to take advantage of 8 wide SIMD using AVX so I figured I would have rows of 4x4 pixels, but where each block of 4x4 pixels itself stores 4 blocks of 2x2 pixels. Im worried that doing that will add a even larger setup cost to the loop which for my application would negate most of the benefits. My bottom line is, I want to optimize as much as I can the process of filling a box of pixels on the screen with a color because my software renderer does it a lot every frame (4 million times currently), and I figured storing pixels in 2x2 blocks would make it faster, but I'm not sure if I'm missing some trick to more quickly calculate which pixels I have to iterate over.
  10. Hey guys, I've been coding a project using AVX and I got to a point where I needed to use the _mm256_cvtss_f32() instruction. I have AVX2 checked off in my compile commands and other AVX instructions compile and run correctly, but when I write _mm256_cvtss_f32(), the compiler tells me the function cannot be found. I made sure I included immintrin.h and when I go to immintrin.h, the function isn't defined there, but online, it says Visual Studio 2013 supports AVX2. Below I put the build commands I use for my project, does anyone know how to make Visual Studio 2013 have all of the AVX instructions? Thanks in advance. @echo off set CodeDir=..\code set OutputDir=..\build_win32 set CommonCompilerFlags=-Od -arch:AVX2 -MTd -nologo -fp:fast -fp:except- -Gm- -GR- -EHa- -Zo -Oi -WX -W4 -wd4127 -wd4201 -wd4100 -wd4189 -wd4505 -wd4324 -Z7 -FC set CommonCompilerFlags=-DOCTREE_DEBUG=1 -DOCTREE_WIN32=1 %CommonCompilerFlags% set CommonLinkerFlags=-incremental:no -opt:ref user32.lib gdi32.lib Winmm.lib opengl32.lib IF NOT EXIST %OutputDir% mkdir %OutputDir% pushd %OutputDir% del *.pdb > NUL 2> NUL REM Asset File Builder cl %CommonCompilerFlags% -D_CRT_SECURE_NO_WARNINGS %CodeDir%\octree_asset_builder.cpp /link %CommonLinkerFlags% REM For VS 2017, for some reason it needs a path to gl even tho it knows where it is... REM 64-bit build echo WAITING FOR PDB > lock.tmp cl %CommonCompilerFlags% %CodeDir%\octree.cpp -Fmoctree.map -LD /link %CommonLinkerFlags% -incremental:no -opt:ref -PDB:octree_%random%.pdb -EXPORT:GameInit -EXPORT:GameUpdateAndRender -EXPORT:GameProcessDebugData -EXPORT:GameSyncDebugStatePtrs del lock.tmp cl %CommonCompilerFlags% %CodeDir%\win32_octree.cpp -Fmwin32_octree.map /link %CommonLinkerFlags% popd
  11. D.V.D

    Camera Matrix and Axis Vectors

     That's correct. The matrix multiplication operation does not care what is in the matrices. They might not contain basis vecotrs at all, but perhaps weighting of how much you like different ice-cream flavours. Regardless of what kind of mathematical convention you're using (whether you're writing your basis vectors horizontally or vertically), your matrix multiplication function will be implemented the same way.   However, your 1D-array-storage convention does matter. e.g. if you have float data[16]; and you write data[2], then is that row-0/column-2, or is it row-2/column-0?       That depends on what you mean. Row-major and column-major ordering generally refer to the computer science topic of how you map a 2D array to a 1D array. This is just an internal detail of your math library of how it decides to store the matrix elements in memory.   Row-vectors and column-vectors generally refer to the math topic of whether you're writing vectors horizontally or vertically... This choice actually does affect your math (e.g. do you write projection * view, or view * projection).   However, the terms "row major" and "column major" also sometimes get used to describe the mathematical conventions... which makes everything pretty confusing. If someone is writing their basis vectors in the rows of a matrix, they might sometimes say that it's a "row major matrix" -- here they're talking about their math, not about computer science arrays :(     Okay, thanks for clarifying!! 
  12. D.V.D

    Camera Matrix and Axis Vectors

    Yeah, the easist way that I find to create a view matrix is to construct a "local-to-world" (aka world) matrix as if the camera was an object in the world, and then invert this matrix to get a "world-to-camera" (aka view) matrix. If a 3x3 matrix only contains the three axis, then transposing it is the same as inverting it (and cheaper).   No they don't. You can use column-major maths, which looks on paper like: $$\begin{bmatrix} Right.x & Up.x & Forward.x & Pos.x \\ Right.y & Up.y & Forward.y & Pos.y \\ Right.z & Up.z & Forward.z & Pos.z \\ 0 & 0 & 0 & 1 \end{bmatrix}$$   or row-major maths, which looks on paper like: $$\begin{bmatrix} Right.x & Right.y & Right.z & 0 \\ Up.x & Up.y & Up.z & 0 \\ Forward.x & Forward.y & Forward.z & 0 \\ Pos.x & Pos.y & Pos.z & 1 \end{bmatrix}$$   And you can use column-major arrays, or row-major arrays.   If you use column-major maths with column-major arrays, or if you use row-major maths with row-major arrays, then your array of 16 floats will look like: Right.x, Right.y, Right.z, 0, Up.x , Up.y, Up.z, 0, Forward.x, Forward.y, Forward.z, 0, Pos.x, Pos.y, Pos.z, 1 If you use column-major maths with row-major arrays, or if you use row-major maths with column-major arrays, then your array of 16 floats will look like: Right.x, Up.x, Forward.x, Pos.x, Right.y, Up.y, Forward.y, Pos.y, Right.z, Up.z, Forward.z, Pos.z, 0, 0, 0, 1   All four of those choices of conventions are supported by D3D and OpenGL. The mathematical convention alters how you write your math, e.g. whether you write vOut = vIn * projection * view * world, or vOut = world * view * projection * vIn. The array convention alters how you write your matrix library, and whether you write column_major float4x4 myMatrix; or row_major float4x4 myMatrix; in your shader code.   If you're using an existing matrix library, then both of these choices may have already been made for you.     Okay this makes sense. Just to clarify though, are the basis vectors in a row order matrix the rows or are they always columns? I found a blog post on the ryg blog that talks about matrix ordering and he says that whatever algorithm you use for matrix multiplication, it doesn't depend on the ordering of the matrices. Currently, I think of matrix ordering as, you write matrices a certain way and the columns are always the basis vectors but you can choose to store things such that rows are sequential in memory or columns are.       This makes some sense, I'll go over it a bit to better understand it but in the videos, matrix multiplication is not explained as dot products (as it usually is in other resources) since the lectures try to explain matrices more as a change of basis vectors and as linear transformations. I know its equivalent, but the matrix multiplication formula in the videos is easier to understand but requires knowing what your basis vectors are.    If B is some matrix that A can multiply with, and B has n basis vectors, than matrix multiplication is defined as such:   A*B = A*Basis0 | A*Basis1 | ... | A*Basisn, where each of the results of A times the ith basis vector becomes the ith column of the resulting matrix. Then you can decompose matrix vector multiplication to be each component of the vector times the corresponding basis in the matrix and you add all of those results together. Basically, it makes the code become something super simple like this: inline v4 operator*(m4 A, v4 B) { v4 Result = {}; Result = B.x*A.v[0] + B.y*A.v[1] + B.z*A.v[2] + B.w*A.v[3]; return Result; } inline m4 operator*(m4 A, m4 B) { m4 Result = {}; Result.v[0] = A*B.v[0]; Result.v[1] = A*B.v[1]; Result.v[2] = A*B.v[2]; Result.v[3] = A*B.v[3]; return Result; } (v[0-3] are the columns or basis stored in the matrix). This probably isn't the most efficient code for matrix multiplication but its conceptually easy to understand and its not as complicated as other code which has a ton of inner loops and what not.
  13. D.V.D

    Camera Matrix and Axis Vectors

    Oh okay, so the reason that the view matrix isn't what I think it is is because we are trying to perform the opposite rotations that apply to the camera and that happens to be the transpose of the 3x3 rotation matrix? So if the camera's view is rotated to the left by 90 the degrees, the view matrix will contain a rotation by -90 degrees instead right?
  14. Hey guys,    I've been watching 3blue1browns video series on linear algebra () and I decided to try and implement matrices and vectors myself instead of blindly following tutorial code without really understanding it. I've ran into a problem with my camera matrix, specifically the rotation aspect of it. I'm working with column ordered matrices and in a Left hand coordinate system.    From the videos, he explains that matrices are just a set of new axis or basis vectors which define the new x,y,z,... axis for some vector multiplied by that matrix. As I understand it, the camera matrix (if the camera is at the origin) should transform a vector such that its x axis is the cameras horizontal vector, its y axis is the cameras up vector, and its z axis is the cameras target vector. But everywhere I look, they say that for column ordered matrices, the first column should be [Horizontal.x, Up.x, Target.x, 0], the second column should be [Horizontal.y, Up.y, Target.y, 0] and so on. The videos say that the columns of a matrix are the new axis vectors so that would mean that the camera matrix would transform some vector such that its x axis is the x component of the horizontal, up and target vectors, the y axis is the y component of the horizontal, up and target vectors, and so on.   My question is, how does that make sense? Shouldn't the new axis vectors be Horizontal, Up and Target instead? 
  15. Hey guys, I've been trying to setup a batch file that builds a native activity into a apk which i can then run and debug on visual studio 2015. I managed to get the apk built and signed properly but whenever I try to debug it with visual studio, I get the following error:   "Unable to start debugging. Android command run-as failed. Package com.example.native_activity is not debuggable."   The app gets installed just fine on the emulator and it runs properly on one of the two emulators that I tried. However, in both cases, I can't actually debug the apk that I built and I tried setting everything to debug that I could, but it still doesn't work. The code I'm using for my app is the example code for a native activity from google: http://brian.io/android-ndk-r10c-docs/Programmers_Guide/html/md_2__samples_sample--nativeactivity.html   My AndroidManifest.xml does have debuggable set to true: <?xml version="1.0" encoding="utf-8"?> <!-- BEGIN_INCLUDE(manifest) --> <manifest xmlns:android="http://schemas.android.com/apk/res/android" package="com.example.native_activity" android:versionCode="1" android:versionName="1.0" android:debuggable="true"> <!-- This is the platform API where NativeActivity was introduced. --> <uses-sdk android:minSdkVersion="9" /> <!-- This .apk has no Java code itself, so set hasCode to false. --> <application android:label="@string/app_name" android:hasCode="false"> <!-- Our activity is the built-in NativeActivity framework class. This will take care of integrating with our NDK code. --> <activity android:name="android.app.NativeActivity" android:label="@string/app_name" android:configChanges="orientation|keyboardHidden"> <!-- Tell NativeActivity the name of or .so --> <meta-data android:name="android.app.lib_name" android:value="native-activity" /> <intent-filter> <action android:name="android.intent.action.MAIN" /> <category android:name="android.intent.category.LAUNCHER" /> </intent-filter> </activity> </application> </manifest> <!-- END_INCLUDE(manifest) --> I'm using the android_native_glue static lib which I'm not sure if its set to debuggable but I tried to build it myself (which worked), but I don't know how to link the version that I built myself with my native app code (I think it still links with the lib thats given by the ndk). This is what my build.bat looks like: @echo off set CodeDir=W:\untitled\code set OutputDir=W:\untitled\build_android set AndroidDir=%ProgramFiles(x86)%\Android\android-sdk set AndroidCmdDir=%AndroidDir%\build-tools\21.1.2 set GlueDir=W:/untitled/code/android/glue call ndk-build -B NDK_DEBUG=1 APP_BUILD_SCRIPT=%GlueDir%\Android.mk NDK_APPLICATION_MK=%GlueDir%\Application.mk -C %GlueDir% NDK_PROJECT_PATH=%GlueDir% NDK_LIBS_OUT=%OutputDir%\lib NDK_OUT=%OutputDir%\obj call ndk-build -B NDK_DEBUG=1 APP_BUILD_SCRIPT=%CodeDir%\android\Android.mk NDK_APPLICATION_MK=%CodeDir%\android\Application.mk -C %CodeDir%\android NDK_PROJECT_PATH=%CodeDir%\android NDK_LIBS_OUT=%OutputDir%\lib NDK_OUT=%OutputDir%\obj REM Create Keystore for signing our apk REM call keytool -genkey -v -keystore %OutputDir%\debug.keystore -storepass android -alias androiddebugkey -dname "filled in with relevant info" -keyalg RSA -keysize 2048 -validity 20000 pushd %OutputDir% del *.apk >NUL 2> NUL popd REM Create APK file call "%AndroidCmdDir%\aapt" package -v -f -M %CodeDir%\android\AndroidManifest.xml -S %CodeDir%\android\res -I "%AndroidDir%/platforms/android-19/android.jar" -F %OutputDir%\AndroidTest.unsigned.apk %OutputDir% call "%AndroidCmdDir%\aapt" add W:\untitled\build_android\AndroidTest.unsigned.apk W:\untitled\build_android\lib\x86\libnative-activity.so REM Sign the apk with our keystore call jarsigner -sigalg SHA1withRSA -digestalg SHA1 -storepass android -keypass android -keystore %OutputDir%\debug.keystore -signedjar %OutputDir%\AndroidTest.signed.apk %OutputDir%\AndroidTest.unsigned.apk androiddebugkey "%AndroidCmdDir%\zipalign" -v 4 %OutputDir%\AndroidTest.signed.apk %OutputDir%\AndroidTest.aligned.apk The debug key already exists, I just don't recreate it on every build process so thats why its commented out. The first ndk-build builds the native_glue while the second one builds the native-activity. The Android.mk for the native-glue is the same as the one provided in the ndk with no changes. The Application.mk is the same as the one I use for the native-activity. This is what my Android.mk and Application.mk look like for the native activity: LOCAL_PATH := $(call my-dir) include $(CLEAR_VARS) LOCAL_MODULE := native-activity LOCAL_SRC_FILES := main.c LOCAL_LDLIBS := -llog -landroid -lEGL -lGLESv1_CM LOCAL_STATIC_LIBRARIES := android_native_app_glue include $(BUILD_SHARED_LIBRARY) $(call import-module,android/native_app_glue) APP_ABI := x86 APP_PLATFORM := android-9 I looked online and they say that one way to make sure your apk is debuggable is to unzip it and see if the lib folder has the gbserver files. I did that for mine and the gbserver files were there so I'm not sure why my apk is not debuggable. Is it because I'm not properly linking with my own version of the native_glue and if it is, how do I make my makefile link with my version of the native glue and not the default provided by the ndk? 
  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

We are the game development community.

Whether you are an indie, hobbyist, AAA developer, or just trying to learn, GameDev.net is the place for you to learn, share, and connect with the games industry. Learn more About Us or sign up!

Sign me up!