Jump to content

  • Log In with Google      Sign In   
  • Create Account

Erik Rufelt

Member Since 17 Apr 2002
Offline Last Active Jun 15 2016 08:30 AM

#5212416 what would make vc text unreadable.?

Posted by on 23 February 2015 - 03:41 AM

Also check the Zoom-level in Visual Studio. Holding the Ctrl key and using the mousewheel in a document changes it for me, so I often change it by mistake.

#5212275 Eliminating texture seams at terrain chunk

Posted by on 22 February 2015 - 09:34 AM

So are these border pixels part of the pixel data or are they separate properties that can be set in opengl? For example this texture chunk is 1024x1024. If adding border would increase dimension, suppose it's 4 pixel border. Wouldn't it make it 1028x1028 thus violating non power of two requirements? but it's interesting though. I'll look it up more later


It would, so you have to let the usable part of the texture be 1016x1016 to add 4 pixel borders at each edge. This usually isn't a problem as with bilinear filtering and scaling anyway it won't look any different, just set the texture coords to be [4/1024, 1020/1024] instead of [0, 1].

#5212267 Eliminating texture seams at terrain chunk

Posted by on 22 February 2015 - 08:47 AM

If you want perfect tiling between the right side of one texture and the left side of another separate texture then you need to add borders of 2^N where N is the number of mip-levels not including the full-sized texture, and fill this border with the edge-pixels from the texture it's supposed to tile into, in order to make sure that all the N mip-levels have proper blending.

You probably don't need more than 4 or 8 pixel borders as it won't be very noticeable for smaller mips.


If you want to improve performance or texture usage, I would recommend placing the chunk textures in groups to form larger textures, so any chunks that share a group effectively form a smaller square terrain where a single texture covers all those chunks, and only border these textures.

#5211998 Writing my own Depth Buffer Functionality.

Posted by on 20 February 2015 - 03:47 PM

Would recommend against it if you have an OpenGL skeleton set up for it already.


You need a buffer that you can both write to and read from manually in the shader.. so unordered access view of a 2D surface, calculate the depth manually in the pixel shader and compare it to what is currently in the surface and either discard the pixel or output its color. https://msdn.microsoft.com/en-us/library/windows/desktop/ff476335(v=vs.85).aspx has some info.

You probably want https://msdn.microsoft.com/en-us/library/windows/desktop/ff471409%28v=vs.85%29.aspx to compare and set the depth atomically.

#5209542 Sky box rendering - Any downside to depthfunc LESS_EQUAL vs LESS?

Posted by on 09 February 2015 - 01:40 AM

2. Possible z-fighting caused by EQUAL rather than LESS? I.e., 2 objects at similar depth rendered at a distance - too infrequent to consider, or is it?


This one should be statistically irrelevant. If the depth can miss one depth-unit in one direction to cause fighting at all, it can as well miss one in the other direction, and if the misses are small enough to let the state make any difference at all then LESS_EQUAL would simply favor pixels of the last drawn triangle while LESS would favor the first drawn. I would guess it difficult to measure a difference, unless you expect to draw past the far-plane without clipping (DepthClipEnable set to false in rasterizer state).

#5208885 Next-Gen OpenGL To Be Shown Off Next Month

Posted by on 05 February 2015 - 08:51 AM

So, if GLNext comes out soon and is technically equal to Mantle, then maybe it will end up with the first strike bonus, and will kill off Mantle before its even born.


Didn't AMD offer Mantle as a base for GLNext?

One possibility is that AMD simply settled with NV and Intel and merged with GLNext instead of trying to compete with them.

Or that Mantle will be to GL/D3D like CUDA is to CS/CL for Nvidia.

#5208404 Is it possible to create some sort of "custom" stencil buffer?

Posted by on 03 February 2015 - 08:50 AM


Check the bottom of that page about unordered access buffers, seems like it would do what you want.

#5208129 Comparison of STL list performance

Posted by on 01 February 2015 - 08:50 PM

I think there would be more allocations/deallocations in the c++ version due to copy constructor (if the compiler doesn't compile that out).


In your custom implementation, you first malloc() the 'Widget', and then you malloc a 'ListLink' to store it in, 2 allocations for each node.

std::list will create an internal type that is 'sizeof(Widget) + sizeof(ListLink)' and do one single allocation per node.

That's why you saw the same performance when you changed the std::list store pointers that were allocated with new, as that version also had 2 allocations per node with that setup, 'new Widget' and new 'sizeof(pointer) + sizeof(ListLink)'.


Try timing the loop with 10 million allocations only without any list, and I think you will find that it's not too far from the time of the allocation-step in your list-test.


If you want to count the number of allocations for a C++ implementation, you can override operator new and increment a counter before calling malloc from it.

int g_countNew = 0;
int g_countDelete = 0;

void* operator new(size_t size) {
  return malloc(size);

void operator delete(void *mem) {

#5207979 HLSL d3dcompiler_47 problems

Posted by on 31 January 2015 - 07:30 PM

How do you know it's not optimizing it?

Did you actually benchmark the different methods, and if so what GPU and driver were you using?

Any output from the HLSL compiler is some form of cross-platform ASM and I'm pretty sure the driver will re-arrange it in all kinds of ways depending on the underlying architecture.

#5207807 Comparison of STL list performance

Posted by on 31 January 2015 - 12:25 AM

I certainly don't mean to contradict any of the good points raised for real applications, but the particular synthetic "benchmark" posted is much much simpler, and can be pretty nicely explained by comparing the running time of the following for-loops.

for(i=0; i < 1000*1000*10; i++) {
  delete (new int);

for(i=0; i < 1000*1000*10; i++) {

#5207618 Comparison of STL list performance

Posted by on 30 January 2015 - 12:15 AM

Investigate the ASM output to find out.

I suspect you're actually  measuring the time to do 2 million malloc in your C code compared to ~1 million std::list.

#5207417 Efficient way to erase an element from std::vector

Posted by on 29 January 2015 - 06:44 AM


Wouldn't it be enough to test for std::is_trivially_copyable? Requiring the whole more restrictive POD-property seems unnecessary.

Yep. I forgot that C++'s definition of POD is now stupidly restrictive. I usually use "POD" to mean "memcpy-safe", which apparently C++ calls "trivially copyable" now sad.png



I would even say is_trivially_destructible. Though I'm not entirely sure on that one, and actually for performance-reasons I don't think it can even be determined from such traits, as it would depend on their internals. But for the general complexity it's interesting.


As far as I see it, swap is always better than copy/assign (whether or not trivially) when there are parts of the object that need to be destructed, as otherwise any non-trivial original contents of [index] would be destructed before replaced with their new values from back(), and then back() would be destructed at some point (though it could of course at times be optimized out to a copy in the assignment, but worst-case).

When swapping out the objects internals, no potential extra destruction can take place.


However, there is another case where swap is worse, which is when there is no destructor so the object removed at back() never needs to be destructed. At that time it is completely unnecessary to swap the contents instead of just copying them (if copying would be faster, which may or may not be the case of course). For a trivially copyable object it's obvious, one memcpy instead of swapping.


What I was originally wondering was whether there is any chance the compiler can assume that the pop_back() means the memory at back() is undefined and therefore optimize away the swap and replace with a memcpy when appropriate. If destruction of an object means that the contents of the memory that object occupied is undefined afterwards it could theoretically.. and perhaps even realistically in simple inlined cases like this. I'm not sure that is actually a rule though?

#5205890 How to properly switch to fullscreen with DXGI?

Posted by on 21 January 2015 - 06:42 PM

I am talking about windowed mode switches. Let's say you want 720p and have a 768p monitor. No problem. But now if you switch to 768p, because of you window border and probably non-zero window location, the bottom right corner of the window is off-screen. This wouldn't be a problem except for Windows API: once the sizable window goes out of bounds, Win API will start to clip it. Asking for a 900p window for example will clip it randomly to a value that is neither 768p nor 900p. Even asking for 768p will clip it to a bit bellow.This only happens if the border is sizable and I found no way around it. For frame-less windows it behaves as expected.


You are really doing things you shouldn't be doing.

If you want a 720p front buffer on a 768p monitor, then you're not in fullscreen in the first place. In that case yes, cover the area you want with a window is a reasonable option, but then most certainly disable DXGI fullscreen. And again, the swap-chain is not and should not be in fullscreen then.

One simple option in that case is to simply set the view-port to a 720p area on a normal 768p fullscreen back-buffer, to avoid the problem.


You seem to be doing some pretty weird things that makes matters much more complicated than they need to be.

#5205659 How to properly switch to fullscreen with DXGI?

Posted by on 20 January 2015 - 06:37 PM

If you want DXGI to handle your fullscreen switches for you, don't touch the window. If you want to change window parameters, disable DXGI handling using MakeWindowAssociation and handle it yourself. When DXGI monitors your window it will automatically handle changes to the window as well as change the window when needed. Two separate programs (yours and DXGI) trying to simultaneously handle the same window is asking for trouble and sync issues.


Not sure what you mean by switching to larger modes, if you let DXGI handle your window and change to an actual enumerated mode of an output it should automatically handle the window so it covers that output. If you make the window cover more area than the monitor and want the offscreen parts drawn to (or areas on another monitor), then you don't want fullscreen mode (and can't really have it, as it is defined as exactly the area of one monitor).


As far as I can tell the window itself doesn't really matter in true fullscreen.. and you can get pretty interesting issues by resizing the window to a smaller size than the screen while still drawing in fullscreen. What happens is the screen will still be filled without clipping to the window, and any background windows will look funny at best afterwards.

#5205187 How to properly switch to fullscreen with DXGI?

Posted by on 18 January 2015 - 07:55 PM

The application starts correctly, but first Alt-Enter leaves me with a nonfunctional window. Second crashes.

It's better to start in window mode and then switch to fullscreen. It's discussed under Remarks here: http://msdn.microsoft.com/en-us/library/windows/desktop/bb174537%28v=vs.85%29.aspx


I updated the sample to allow starting to fullscreen and changed the "unknown" format for the back-buffer which is probably better if there are different formats (like 30-bit support on some modes). Check the link below for the new code.


@Erik Rufelt: Can you throw that code into a GitHub / Codeplex repository? It would be great to have a few samples to point at when questions like these come up (which honestly seems to be more frequent than expected...).