Jump to content

  • Log In with Google      Sign In   
  • Create Account


Member Since 13 Jan 2007
Offline Last Active Today, 07:35 PM

#5275796 MSVC generating much slower code compared to GCC

Posted by Ameise on 15 February 2016 - 02:03 PM

 __restrict cannot be used on references though, for whatever reason. And even though I read an explanation on those forums that references are already thought of to not be able to be rebound, testing it with assembler showed significant worse generated code than with pointers and __restrict (or whatever equivalent).


__restrict can be used on references as of Visual C++ 2015. It was one of the things that I was glad that they added. You also can functionally specify member functions as restrict in 2015 (where this is restrict).

#5275439 MSVC generating much slower code compared to GCC

Posted by Ameise on 12 February 2016 - 12:52 PM

The first version calls std::vector<>::size() every iteration. The second does so only once and stores the value in a local variable.


I've seen this come up in profiles before (and have corrected it). Visual C++, at least, has a lot of difficulty with this situation. It treats m_sum practically as a global variable for purposes of optimization in this case, so every single operation done on it in that loop incurs a load-hit-store. In the latter, the compiler knows very well that sum is local to just the function and thus generally just performs the operations on a register and stores it at the end.

This can be hugely faster in some cases, depending on what you're doing. While this is bad on x86, on the PPC consoles (like the 360 or the PS3) where LHS were death, this caused major slowdowns.

#5219390 Why didn't somebody tell me?

Posted by Ameise on 26 March 2015 - 12:30 PM

I like to talk (and text message) in fake olde english from time to time, for my own amusement.


Apparently, nobody bothered to correct my mispronouncing of the word ' ye '. It's the exact same word, and pronunciation, of the word "the"!


I have always pronounced it like " yee ", but it's pronounced closer to " duh ", because it's not actually the letter 'y' but a special letter we dropped from the english alphabet, which was a single-letter for 'th' combined (like in 'there').


Unless you're referring to the pronoun ye, which was also the Early Modern English second person plural... as in, "I am talking to ye five gentlemen!".


Also, yes, the y was used in those cases because typesets lacked þorn, so they used y.  There was also another character, eð, which was similar.


*Ye*, the article, is pronounced *the*. *Ye*, the pronoun, is pronounced *ye*.

#5219139 Why didn't somebody tell me?

Posted by Ameise on 25 March 2015 - 01:47 PM

I called duct tape duck tape for years without really wondering why it was called that.


Because it is duck tape. It originally was a tape made using duck cloth. It should not be used on ducts. Duck tape was also the original name.

#5217815 How 3D engines were made in the 80's and 90's without using any graph...

Posted by Ameise on 19 March 2015 - 11:36 PM

Something that is worth noting is that computers at that time were 16-bit (or MS DOS at least).

This means that only 64kb could be addressed via virtual memory at once.

Today we can allocate a buffer of 640x480x32bpp (or 1920x1080 if you want), store the address in a pointer and write to all those pixels in a for loop. Even 1920x1080x32bpp is a mere 7.91MB. Easy piesy.

With a 16-bit machine, you had to access video memory in 64kb chunks; paging in and out what you needed. And be sure you had access to the video memory region you needed and left enough space to address the data you needed for yourself. Oh! and while you do all that, make sure to switch those 64kb chunks as less often as possible.


​Not always true; you could and often did use DOS extenders (like DOS4/G) which allowed your program to execute in protected mode, giving you 32-bit addressing. Even on the 286, you had four segment registers, so you could access 256KiB of memory at a time.

​Watcom, for instance, included an extender.

#5216162 Forward+ vs Deferred rendering

Posted by Ameise on 12 March 2015 - 05:03 PM

There was also this variant of clustered rendering written about on this forum.


I would actually love to see a benchmark between all variants.

#5214043 null a vector of pointers in a parameter?

Posted by Ameise on 02 March 2015 - 04:17 PM

If an empty container would work, you can now do this:

foo( {} );


Which will send an empty array down.

#5213428 Best way of ensuring semantic type consistency?

Posted by Ameise on 27 February 2015 - 06:49 PM

I use the user-defined literal approach (I refer to them as Concrete Types), but with more changes. Mainly, I have dimensional units - that is, 'time' would have the units be 0 for distance, 0 for mass, 1 for time.


You'd see something like this as the declaration:


template <typename T, uint dimLength, uint dimMass, uint dimTime, ...etc...> class concrete_type...

I also have more significant work involved within the concrete_type class to enforce type safety and make it more difficult to accidentally assign values that don't make sense.


The nice thing about it is that:


time val1 = 1_s;
length val2 = 1_m;

auto val3 = val2 / val1; // decltype(val3) == speed.

float val4 = val2 / val1; // error
time val5 = val2 / vall; // error
float val6 = val3; // error
float val6 = val3.raw(); // OK

#5196172 Default font texts from windows

Posted by Ameise on 03 December 2014 - 08:12 PM

I'm a fan of the Liberation Fonts.

#5168965 Funniest line of code ever ?

Posted by Ameise on 24 July 2014 - 02:10 PM


When I was trying to teach somebody to code they wanted to use 'or' to randomly pick between two numbers
value = 1 or 2
Its understandable how a non programmer would expect this to work, so I didn't think he was dumb for doing it, but I still found it funny.

I remember trying to do the same thing myself when I was learning c++.



You should be able to do something like that with variadic templates, though... make a proper templated class, and a function helper to simplify the syntax, and you should be able to do akin to:

if (multi(foo, bar, 4) == some_var) ...

#5129144 fractal result by accident

Posted by Ameise on 05 February 2014 - 03:24 PM



Clearly an aliasing effect and nothing fractal about it.


If you say so ;\ (yawn)



Why do you bother asking if you're just going to reject the answers you get if they don't match your predetermined presumptions? I concur that it just looks like a Moiré pattern.

#5128463 Why would devs be opposed to the PS4`s x86 architecture?

Posted by Ameise on 03 February 2014 - 11:12 AM

X360's x86 architecture could perform 77 GFLOPS, the PS3 could perform 230 GFLOPS.

Just to nitpick, but the XBox 360 didn't have an x86 processor - it was a PowerPC just like the PS3 but with a different internal architecture. Perhaps you meant that the design of the Xenon CPU is more similar to a PowerPC version of common x86 chips than it is to the distributed design of the Core CPU?

#5080275 Better FPS with high-poly terrain

Posted by Ameise on 24 July 2013 - 05:51 PM

In other words, you implemented a O(log n) solution a la spatial subdivision?

#5079031 Generating Terrain on the GPU vs on the CPU

Posted by Ameise on 19 July 2013 - 04:14 PM

Graphical applications usually are bottlenecked by pixel shader, while vertex shader and cpu are being idle most of the time. People usually move computations from cpu to vertex shader to save cpu/memory resources for other cpu specific tasks like AI, physics.

In your case, you have moved some calculation from cpu which was idle most of the time to vertex shader which was also idle most of the time. The performance of your application was determined by pixel shader which was not changed and overall performance of your application was not changed, too. But you should be able to see difference in loads on cpu and vertex shader using some performance profiling tool like Nvidia PerfHud.


Most modern GPUs have unified shader models, so there are no dedicated vertex or fragment units. They are shared.

#5079021 Uniform Location = -1 even though Uniform is Active

Posted by Ameise on 19 July 2013 - 03:21 PM

I'd point out again that it is likely this line:


float shadowFactor      = (curSunSpaceDepth < sunSpaceDepth) ? 1.0f : 1.0f;

Since both sides of the ternary are 1.0f, the compiler is likely compiling away the conditional, and since those are the only places that those variables are referenced, it is recursively compiling out those two uniforms. Most likely, if he changes one of the 1.0f's to 0.0f, he will stop getting -1.

Not to say he shouldn't correct his code otherwise, but that's probably not why this is happening.