Jump to content

  • Log In with Google      Sign In   
  • Create Account

Ohforf sake

Member Since 04 Mar 2008
Offline Last Active Today, 04:05 AM

#5147332 Great laptop for game development? (Budget of +1200,- euros)

Posted by Ohforf sake on Today, 03:53 AM

First off I'd like to encourage Tjakka to rethink about the choice of going to a specialized school for games development instead of getting a computer science or computer engineering degree from a university. I'm not saying that a specialized school is necessarily a bad idea, but I think a degree from a university keeps more doors open.

I'm probably going to get a bunch of downvotes for sharing a perfectly honest and valid opinion, but, I just really cannot take seriously someone that develops on nothing but a laptop. I don't even know how you physically manage to deal with lugging such a thing around all day or dealing with the ridiculous keyboard.

I have been developing my private stuff on nothing but a laptop for the last couple of years. I actually switched from desktop PCs to notebooks and I'm very happy with them.

You should put a regular keyboard, mouse and screen at your desk, which eliminates the keyboard and mouse troubles and gives you a regular two monitor setup when at home. On the road, you want at least a small notebook mouse and a notebook with a resonable keyboard. The notebook keyboard should have a numpad, because otherwise the Home, End, PageUp, PageDown and cursor keys are probably inaccesible, and they are absolutely necessary for writing code. With that setup you can be just as efficient as with a desktop PC.

The weight can be a problem for 17 inch notebooks, but already with 15.6 inch I would bet that the printed scripts for the lectures weight more then the notebook. I used to carry a 17 and later 15.6 inch notebook to university every day and it wasn't a big deal.

Also, when on the road, don't forget to pack the two most important tools of every software engineer: A piece of paper and a pen!

As to the question if there is any real benefit to a notebook as opposed to a desktop PC:
If your work requires you to be hooked up to the company network, a bunch of console devkits or an industrial robot etc. then the mobility of a notebook won't do you any good, that is correct. But for a student the situation is different.
As a student you go to lectures and either they are important, or it's a bunch of stuff you already know, or the guys who is giving the lecture doesn't know shit. All three cases can and will happen and in the latter two you want your programming gear around so you can utilize the time by educating yourself. Same goes for empty timeslots between lectures, which is also very common at universities.
When you have visited all your lectures of the day and go home, you probably have some exercises to do, or hobby projects to work on. Maybe you don't want to do them at your own desk at home, but rather in a park, or while visiting your parents. Or maybe the exercise requires you to join forces with a fellow student so you meet at his place or in the library. In either case the mobility of a notebook is a very big plus.

#5146507 512x384 15 Layers 30Million Double Precision Calculation Software Blitting wi...

Posted by Ohforf sake on 12 April 2014 - 07:04 AM

I used to own a GeForce 7700M and I'm very sure it was faster then that (512x384 15layer @ 40FPS).

Also let me add another voice to what other professional game developers have already told you: There is no secret NDA license that magically turns shitty code into s.th. thats running several orders of magnitude faster. I have worked on a game that shipped for the PC platform and while we had "secret" NDAs and tools for the consoles, there is no such thing for the PC.


It's not the GPU/driver thats slow, it's your code (or your profiling)!


It's not just that you seemingly try to write the worst GPU code possible for example by forcing all the pixel operations to be performed in the vertex shader instead of the pixel shader, by packing every scalar into it's own vec4 to enforce 1/4 speed on the GF7 hardware, or by abusing the driver/API in a way that  maximises stall.

You also write very slow CPU code. Are you blaming that on NVidia as well? For example: you emphasize that you have a processor with SSE2. Are you using any of the SSE2 functionally? No. But I'm glad that you get 30 million double precision ... s.th.  ... on the cpu. Pro tip: double precision means that you employ the datatype "double", not the datatypes "float" or "int" or "char".

#5144748 Globals

Posted by Ohforf sake on 06 April 2014 - 10:06 AM

Multithreading with globals is also a very dangerous thing.


If all your data resides in isolated blobbs of object instances, you can be very sure that you can churn through each blobb in a seperate thread without any risk of race conditions. But if you use globals, you have that single data point that is accessed by all blobbs which prevents you from easily parallizing stuff.

#5142333 GPU NOR/NAND Gate using Fragment Shader's Dot Product

Posted by Ohforf sake on 26 March 2014 - 10:08 AM

This is interesting but don't modern graphics cards (= the ones you would use for number crunching) already have unified FP/integer ALU's where bitwise operations can be done natively?

Actually, NVidia cards are rather slow at integer arithmetic. According to the cuda documentation, a Kepler SMX can do 192 floating point operations (like add, mul or mad) per cycle, but only 160 integer add/sub and bitwise and/or/xor etc. Integer mul and shift is as slow as 32 operations per cycle.

This is why ATI/AMD cards are better suited for cryptographic stuff like bitcoin mining or burteforcing.

I didn't really read the links the op posted so the following might be totally off topic, but I think there is a misconception here about DP4. The GeForce 7000 series was the last NVidia GPU that did SIMD, and AMD/ATI followed shortly after. Today, an DP4 is 1 FMUL followed by 3 dependent FMADD. So it's not 1 cycle. It has a throughput of 1/4 per cycle and alu if properly pipelined and a latency of 32 cycles (assuming 8 cycles per operation). So 192 float ALUs with 1 DP4 every 4 cycles yields 48 logical operations per cycle and SMX. If the 160 int ALUs were used instead, you would get 32 logical operations per alu and cycle yielding 5120 logical operations per cycle and SMX, outperforming the DP4 approach by more than a factor of 100.

Edit: Just read the first part of the link and I think there is another even bigger misconception. The assumption, that the GPU will execute the entire fragment program for all pixels of the image in ONE CYCLE no matter the dimensions of said image or the length of the fragment program, is ... how do I put this ... incorrect. If it were the case, then yes, any GPU could emulate hardware gates in software at arbitrary speeds as described in the posted link, thereby outperforming even their own hardware (paradox alert). But it isn't.

#5140269 Array of samplers vs. texture array

Posted by Ohforf sake on 19 March 2014 - 03:30 AM

I'm not sure but I think Kepler can genuinely "address" textures.


Here is a talk which (amongst other things) is about how arrays and bindless can be used together. Apparently there is a small overhead for bindless textures due to a "Texture header cache".



#5139197 Depth pre-pass: does OpenGL still execute fragment shader if depth not writte...

Posted by Ohforf sake on 15 March 2014 - 04:59 AM

Should I then have a version of my shader program that has only a vertex shader and no fragment shader? [...] The downside of doing a separate shader program for the depth pre-pass is that I'd have to do it for every shader program that has a different vertex shader.

I think you should be able to reuse the shaders for the shadow map generation, so you don't need an additional version (or rather, you need it anyways for the shadows).

As to whether or not you should use seperate versions in the first place: I don't have actual numbers, but a vertex program reads a lot of attributes (texture coordinates, normals, tangents, ...) that you don't need for shadow maps/z pre pass. The driver might be able to detect, that the fragment program can be disabled, and that it doesn't need the interpolants for those attributes, but my guess is that it won't recompile the vertex program to strip out all the unnecessary attribute reading and (possibly) transforming.
Also, most shaders only vary in the fragment part and using a seperate shader for shadow map/z pre pass should allow the renderer to issue less shader program changes or even merge entire draw calls.
So I would expect to see a performance speedup from seperate shader versions, but again: I don't have any real numbers to prove it.

#5134389 disasembly of some function

Posted by Ohforf sake on 25 February 2014 - 05:16 AM

The big numbers starting with 004 are the memory adresses within the code segment. Note that code can be relocated uppon loading.

The smaller numbers like 34DD and 33B6 are offsets. E8 is a call with a relative offset so those numbers get added to the adress of the instruction following the call instruction to get the adress of the target of the call.


What the dot after the opcode means? No clue :-/


The @16? This is a byproduct of C++ called name mangling. In C++ you can have several functions with the same name, which only differ in the number of parameters (or in the case of methods in the classes they belong to). In order to distinguish those, the compiler adds some magic numbers and characters to the function names, usually based on

the number and type of parameters. The specifics however are different for each compiler.

#5132590 inquisitive mind vs cache

Posted by Ohforf sake on 19 February 2014 - 04:44 AM

Since you are interested in latencies (I presume) you should build your benchmark around pointer chasing, since that prevents any form of OoO.


Essentially you specifically craft an array of integers and then you do the following:

unsigned index = 0;
for (unsigned iterations = 0; iteration < numIterations; iteration++) {
     index = array[index];
// do s.th. with index so the compiler doesn't remove the loop.

If you craft the array in a way such that array[i] = i+n then you get sequential access with stride n. But you can also create all kinds of other semi random patterns. Note that modern prefetchers can detect the stride of sequentiall accesses, so you might need s.th. more random to throw them off.


Since the compiler doesn't know anything about the array, it can't optimize anything away. And the only thing the cpu can run in parallel is the increment and comparison from the for loop (which is totally fine). But it has to perform all those loads and in sequence in order to arrive at the final index.


#5128490 hand-linking to stdlib.h in mingw

Posted by Ohforf sake on 03 February 2014 - 12:19 PM

Since you are compiling it as C++, try declaring the C functions as actual C functions, like this:

extern "C" {
int rand();
// whatever else

Otherwise the C++ name mangling will prevent the linker from finding the correct function.



I guess that you are doing this to reduce compilation times? If so, mingw supports precompiled headers, which are probably a better way to deal with this, than rewriting the standard headers.

#5128131 Polygon based terrain LOD

Posted by Ohforf sake on 02 February 2014 - 05:03 AM

Since you start out with voxel data, have you considered this modified marching cubes algorithm that allows for LODing without cracks?



#5125621 Position from Depth

Posted by Ohforf sake on 22 January 2014 - 05:39 AM

In the last shader (depth reconstruction), you are missing the division by w again. You can only skip that if you know that w is 1.0. This is usually the case for rotation, scale and translation but not for projection. It should look like this:

VS_OUTPUT vertexShader(VS_INPUT input)
 VS_OUTPUT output;

 float4 clipSpacePos = float4(input.position.x,input.position.y,1,1); // set zw to 1 for assurance
 output.position=clipSpacePos - pixel_size;

 float4 projectiveViewPos = mul(clipSpacePos,InvProjection);
 output.viewpos=projectiveViewPos.xyz / projectiveViewPos.w;

 return output;


Apart from that I can spot no other problems with this. Maybe you can give this a shot.

#5125046 Position from Depth

Posted by Ohforf sake on 20 January 2014 - 07:46 AM

So apart from the minor differences, it's also off by a factor of 100?

Can you please post the entire shader code with all declarations for both, the z pass and the position reconstruction pass?

#5124347 Position from Depth

Posted by Ohforf sake on 17 January 2014 - 04:34 AM

Just scaling the eye space position by a constant factor should suffice. You should end up with an image like this:


As you can see, everything close to the camera (the origin of the eye space coordinate system) is black, everything to the right is red, everything to the top is green and everything in the back is blue, assuming you are using the DirectX convention of a left handed coordinate system with the camera looking at +z. If not, than you should get the same image without any blue.

If scaling by a factor doesn't work for your position map, then the coordinates in the position map are probably not eye space. They could be world space or eye space but without the camera position removed.

Maybe you can post your images for the position map and the position reconstruction so we can take a look at them.

#5123819 Position from Depth

Posted by Ohforf sake on 15 January 2014 - 03:16 AM

What I don't understand is why there are two different maths, one for point lights and spotlights and another for directional lights; it sounds silly to ask if there is a different math when using it for ssao; isn't computing the position independent of what you need it for ?

The primary difference is that pointlights and spotlights are local effects, confined to a specific region which is usually defined by a mesh (sphere or cone) and when you render that mesh, you need to extract the view ray from it's surface. Directional lights however effect the entire screen and thus work with a full screen quad. Hence the slightly different approach in the view ray computation.
In that sense, ssao falls into the "directional light category", because it effects the entire screen.

input.position is the vertices that make up the full screen quad (-1.0 to 1.0)

As I stated above, please post the exact values. As far as I understand your code, it should be 4 vectors with 4 components each:
(-1.0 -1.0 1.0 1.0)
( 1.0 -1.0 1.0 1.0)
( 1.0 1.0 1.0 1.0)
(-1.0 1.0 1.0 1.0)

Are you familiar with the difference between euclidean coordinates and projective coordinates?

For testing this stuff it is extremely helpfull to disable the SSAO effect and output the reconstructed position as color values.

#5123616 Position from Depth

Posted by Ohforf sake on 14 January 2014 - 11:57 AM

What exactly (which values) is input.position? It should be (x, y, 1.0, 1.0) with x, y either -1.0 or 1.0.


Are you sure that that the shader does:

input.vpos.xyz == output.vpos.xyz / output.vpos.w

I might be wrong but I think what happens is actually:

input.vpos.xyzw == output.vpos.xyzw

so you are probably missing the division by w.


Also, are you sure that the values in the DepthTexture are linear?