Jump to content

  • Log In with Google      Sign In   
  • Create Account

Erik Rufelt

Member Since 17 Apr 2002
Online Last Active Today, 04:17 AM

#5255794 What's the best system on which to learn ASM?

Posted by Erik Rufelt on Yesterday, 05:41 AM

I'm not sure what the goal is, but if it's learning ASM then both unrestricted hardware access as well as instruction set complexity are irrelevant. Easiest way to start learning x86 ASM is inline ASM in Visual Studio, then concepts can be added one at a time.


Unless you actually want to get to know a hardware well, which can be great in itself, but quite a small subset of it is required to learn ASM.

#5255783 What's the best system on which to learn ASM?

Posted by Erik Rufelt on Yesterday, 03:17 AM

If the goal is just to use ASM then I think straight x86 is the easiest.. has a lot of tools and you can easily use it on your desktop.


I'm sure it's a lot of fun to learn on an older machine, but it will be more difficult.

#5255781 Antialiasing when zooming?

Posted by Erik Rufelt on Yesterday, 03:06 AM

Use GL_NEAREST instead of GL_LINEAR for your texture-filtering.



You may also want to set GL_TEXTURE_WRAP_S and GL_TEXTURE_WRAP_T to GL_CLAMP_TO_EDGE.

#5255646 Framerate-indepentant friction?

Posted by Erik Rufelt on 05 October 2015 - 08:23 AM

You can fix the timestep just for that friction calculation.

(Note that over time this practice tends to add up so after a while you have 1000 local versions of what should be done in exactly one place at the outer game-loop, but if you just have a few things in an already working game it works. I speak from personal experience, as I have used this method too many times. smile.png Though that said when it comes to time-intervals like this there are often several places other than physics where this method is simple and effective, as the time-interval could be for example spawning particles or weapon-delay or whatever, but for physics it really should not be local to individual objects.)


Assuming you have something like this right now:

struct Car {
  float speed;

void breakCar(Car &car, float dt) {
  car.speed = car.speed * 0.98;

Change it to:

struct Car {
  float speed;
  float breakTime;

void breakCar(Car &car, float dt) {
  const float breakInterval = (1.0f / 60.0f);

  car.breakTime += dt;
  while(car.breakTime >= breakInterval) {
    car.speed = car.speed * 0.98;
    car.breakTime -= breakInterval;

#5255641 Framerate-indepentant friction?

Posted by Erik Rufelt on 05 October 2015 - 07:22 AM

This is often solved by always using the same frame-rate in calculation, Google for "fix your timestep" or "fixed timestep gameloop" and you should find many links on this site alone, as well as others.

So basically decide to always run your physics update at 60 updates per second or 100 updates per second or whatever makes sense and gives you good results and good performance.

#5253994 GPU load/temp monitoring for Intel?

Posted by Erik Rufelt on 25 September 2015 - 08:05 AM

Try this: https://msdn.microsoft.com/en-us/library/windows/desktop/aa371886%28v=vs.85%29.aspx

There are many counters under Processor, perhaps on integrated GPUs there's a counter for it specifically.

#5253938 GPU load/temp monitoring for Intel?

Posted by Erik Rufelt on 24 September 2015 - 09:22 PM

With integrated GPUs, shouldn't that be part of the CPU temperature?

Google for 'windows c++ get cpu temperature' and there are a lot of results.

#5253513 Is it possible to optimize this? (keeping a small array in registers)

Posted by Erik Rufelt on 22 September 2015 - 02:14 PM

It's a very interesting optimization problem.


I somehow doubt it can get very much faster though.. are those 'table' and 'dist' arrays very large and only used once, or are they used multiple times?

If the same elements aren't re-used without a very large number of separate elements being iterated over first then they will disappear from cache entirely. As your first loop reads 8 bytes per iteration and the second reads the same 8 (should be in cache) + writes 4, if those memory locations are "new" then they will be loaded from RAM. At 8 cycles total for 8 bytes read + 4 bytes written you can maybe set an absolute maximum performance as (bandwidth of RAM) divided by (the number of Ghz your CPU runs at times the number of threads).

(This is of course not true if the function is run multiple times on the same arrays while they are still in cache).


Are you sure you get any vectorization at all from GCC?

I pasted your code into VC++ 2015 and it didn't vectorize anything.


(Also I was mistaken about scatter, only gather is currently available, scatter is only in Xeon Phi, though you don't need it for the second loop, but in order to vectorize the first efficiently you would need scatter).


EDIT: Removed likely mistake about interleaving.

#5253322 can anyone explain const char*, char, strings, char array

Posted by Erik Rufelt on 21 September 2015 - 12:57 PM

char is a single character, like 'a', 'B' or '1'. Can internally be any of 256 values (on most platforms where it is 8 bits, some exotic ones use another bit-count).


char[10] is an array of 10 characters stored sequentially in memory.


char* is a pointer which holds the address in memory of a character, for example it can point at a single 'char' variable, or it can point at the address where a sequential array begins.

const char* is such a pointer that is declared constant, which means that any function accepting it promises not to change the values stored in memory at that address. Read-only access to a string is usually the meaning.


std::string is a class that internally holds a memory-buffer of many characters, and has methods to manipulate them.



For string-functions that take const char* it is usually appropriate to pass them string.c_str(), as the c_str() method returns a pointer to an address in memory where the string characters are stored sequentially, + it guarantees to end that sequence with a null-character, which means that a 'char' with value 0 will be stored at the end of the sequence. Such a null-terminator is used by many string functions to determine where a string ends.

#5253212 Is it possible to optimize this? (keeping a small array in registers)

Posted by Erik Rufelt on 20 September 2015 - 03:23 PM

If Hodgmans suggestion doesn't make much of a difference then probably manually making it vectorized is the only way to get it faster. You can fit your 52 floats into registers but I would guess it will be slower.. as you always have to handle the case where you have to sample from 7 registers, so that's 6 mask-combines (if you keep 8 floats per vector).

If you settle for 4 floats per vector then maybe, but then you need 13 registers and only have 3 left for other stuff..

If you get a newer CPU with AVX2 I think you can make the second loop significantly shorter when vectorized to do 8 floats at a time as there are scatter/gather load/store ops.. so by tweaking the table to be laid out as a proper mask that could be quite quick.

Also I'm just guessing as I can't really say without actually trying it.


(I'm assuming all combinations of 52 cards are possible for the 2 picked, if you could do some high-level changes that divided the problem in parts where each inner-loop only handled part of that range at a time, then that would probably help a lot).

#5252796 collision detection in maze issue

Posted by Erik Rufelt on 17 September 2015 - 07:55 PM

I would recommend that when trying to move in one direction and noticing that it doesn't work, check if movement is possible in the perpendicular directions, and if it is test moving the square in those directions 1, 2, 3 and 4 pixels and then in the original direction again. If it works, move the square 1 pixel in the direction where it became possible to move. So when trying to move down at a corner like that it will first glide past the corner and then move down.

if cant move down:
  for(i = 1 -> 4)
    test move down i pixels to the right/left
      move right/left depending on which direction a downward movement worked

#5252734 Changing the rendering device (windows)

Posted by Erik Rufelt on 17 September 2015 - 01:55 PM

That is incorrect, you can not normally specify what device you want to use from DirectX on Optimus laptops. Depending on your settings in the Nvidia control panel an app will use the Intel or the Nvidia one, and it can be overridden either way.

I hear DirectX 12 and the upcoming Vulkan changes this, but not any earlier versions, the driver handles that. If your battery is low for example the driver can make an app use the integrated instead of the discrete GPU.


What you are referring to can be done in DirectX when having two separate discrete graphics cards, which unfortunately is not in standard WGL, and Nvidia only exposes it in Quadro drivers, through https://www.opengl.org/registry/specs/NV/gpu_affinity.txt

#5252560 Changing the rendering device (windows)

Posted by Erik Rufelt on 16 September 2015 - 02:30 PM

From this thread: http://www.gamedev.net/topic/670562-cant-run-gl-nv-path-rendering-extension-demos/


Edit: This was caused my Nvidia's Optimus software for laptops. The solution is super simple, just export a flag on windows. Just add this to your main file with window.h and it will work great.


extern "C"
_declspec(dllexport) DWORD NvOptimusEnablement = 0x00000001;

#5252025 2D Character Variations without over 9000 Sprite sheets?

Posted by Erik Rufelt on 13 September 2015 - 06:51 AM

int index = hair * 5 * 5 + weapon * 5 + pants;

sprite sprite = allSprites[index];


Then order the sprites correctly in the image.


That said, with variations like that, I would suggest drawing the sprite composite. Like drawBaseSprite() then drawSprite(hairSprites[hairIndex], hairPosition) followed by weapon etc.. to draw the items on top of the plain sprite without equipment. This can ofcourse be difficult if you want advanced interactions between sprites, like the hair looking differently if the sprite is carrying a large weapon that reaches his hair or something like that, but usually rather simple rules can be applied like drawing sprites in a particular order. Then the order can be changed depending on the current action, like when idle probably the weapon is drawn before some other equipment to make it appear behind, but when fighting with the weapon drawn it might be drawn last, or just before the gloves or something like that depending on how detailed you want to be, and what direction the sprite is facing.

The exact nature of the ordering is different for every game, depending on the rules and available equipment and actions of the game.

#5250027 D3d12 in C

Posted by Erik Rufelt on 31 August 2015 - 07:12 PM

You're not checking return values for success. Run it with the debugger, or check all return codes and print errors.