Sign in to follow this  
keinmann

Unity x86 vs x64 - The truth?

Recommended Posts

Now I've always understood the fundamental differences between the two platforms; address space.

********************************************************************************

On a 32-bit platform we can address from:

Unsigned:
0b - 11111111111111111111111111111111b -or- 0x00 - 0xFFFFFFFF

3GB of addressable memory.

Signed:
0b - 1111111111111111111111111111111 -or- 0x00 - 7FFFFFFF

1GB of addressable memory (in the positive range).

Note, I'm just speaking here in very simplistic terms; making this observation from the number of bits found in the native pointer type. I do realize that it's actually possible on 32-bit architecture to have an address space of 4 GB (4 × 10243 bytes). I also realize that on many x64 machines, we have a "cap", of sorts, which limits us to 48 or 52-bits; and we don't even come close to using all of it. The day we have machines which can use the entire 64-bit address space and we need 128-bit CPUs will be a great day indeed! I'm just trying to keep this simple though. So don't hound me because this isn't exactly 100% everywhere. :)

********************************************************************************

On a 64-bit platform we can address from:

Unsigned:
0b - 1111111111111111111111111111111111111111111111111111111111111111b -or- 0x00 - 0xFFFFFFFFFFFFFFFF

17179869183GB of addressable memory.

Signed:
0b - 111111111111111111111111111111111111111111111111111111111111111b -or- 0x00 - 7FFFFFFFFFFFFFFF

8589934591GB of addressable memory (in the positive range).

********************************************************************************

So it's obvious that with 64-bit architecture, we have (at least theoretically) an incredibly more massive potential memory playground than we had with x86 architecture (such as with i386 processors).

But I often hear people say that the only real advantage of compiling an x64 binary for an x64 machine is that you can "use more memory". In other words, the general belief is you don't even need to make an x64 build unless you need more than x-GB of memory (people often say 2GB, sometimes 1 or 3). But two things make me wonder if this is really true. It seems to me that an x64 binary can theoretically run faster than its x86 counter-part. I will explain why, and hopefully those more knowledgeable than me can shed some more light on this.

The first reason is the registers themselves. For simplicity, I'll only discuss the 4 "general purpose" registers.

On x86 machines, we have these 4 GP registers (excuse crappy ASCII art, heh):



And on x64 machines, we have these 4 GP registers:



The registers are the fastest way to use memory. The "fast-call" calling convention uses the registers, as opposed to the stack, to pass parameters for this reason. When programming in assembly language, making wise usage of the registers can be the difference between an efficient program and a slower one. So it seems that these bigger, badder registers give us a bigger advantage. Our registers can natively store 64-bits (8-bytes) of memory. We could also use them to store two 32-bit values if, for instance, we take our first value and create a q-word by packing an empty low-order d-word, then storing it in a 64-bit register and overwriting the low-order (zeroed) d-word with the second value. So it seems to me that with x64, we have the opportunity to work with more memory more quickly.

The second thing which seems another advantage to me, in terms of speed/efficiency, is the data bus and lines. On the x86 architecture (like i386), we have 32 lines. On x64 architecture, there are 64 (theoretically; though as I said earlier, we're often actually running in an "artificial" 48-bit or 52-bit mode, and the amount of lines reflects this)! So technically, we can again seemingly move more memory or the same amount at a faster pace.

Again, I realize what I've presented here is a naive view of how it "theoretically" looks, but I've done this intentionally so as not to make this too lengthy and exhaustive (I also didn't feel like digging out any tech-specs or doing too much math, haha). But the ideas I presented above make it seem to me that an x64 binary running on an x64 CPU can actually be faster and more efficient than its 32-bit counterpart. But the question is, am I mistaken in my observation? If so, why? What is preventing us from using what seems a clear-cut advantage? If I'm right, and this is true, then why is it commonly thought that x64 architecture just gives you are large quantity of available memory?

Thanks for your time, and hopefully, I can either confirm my observations or learn something new! :)

Share this post


Link to post
Share on other sites
There was a good explanation somewhere in the web which I can't remember now, about the "truth" of x64. I'll try to sum it up:

Advantages:
* More registers: When dealing with extreme "register pressure", it can lead to significant performance increases. Emulators get a big benefit here.
* More RAM: When your application is memory limited, being able to use more than 2 GB of addressable memory per application (or 3GB if you use special OS functions to read memory without having an address) increases performance and flexibility.
* Native 64-bit arithmetic: If for some reason, you need to work with 64-bit integers, x64 will perform much faster than x86.
* Overall faster call conventions
* Technologies up to SSE2 are guaranteed to be present.

Disadvantages:
* More memory bandwidth. Sometimes this can slow things down significantly.
* Extra registers require more time to decode and are larger, which reduces cache efficiency.
* The calling convention uses a lot more of stack memory per call (this is OS specific, rather than x64).
* Compiler optimizations aren't as mature as they are when generating x86 code (this is getting less and less of an issue by now)
* On some applications and drivers, strong porting efforts.
* If the OS runs half of it's programs in 32 bit mode (i.e. Windows) it will use hell a lot more RAM, since many DLLs have to be loaded twice (their 32- and 64-bit counterparts). Not a problem on Linux and Mac which 99% of it's programs usually run in native 64-bit
* Not possible to insert inline assembly in MSVC (something you shouldn't be doing anyway, use intrinsics or an external assembler if you want to write in assembly)

It boils down if for your particular case, the benefits outweighs the disadvantages. But porting efforts is usually the reason number 1.

Cheers
Dark Sylinc

Share this post


Link to post
Share on other sites
The funniest thing is, majority of production software today runs on equivalent to 286/386 - due to how VMs are implemented.


Wiki discusses the differences. There is no immediate clear win just by switching, at least compared to complications of legacy libraries and OS APIs as well as negligible gains compared to algorithmic improvements.

Depending on type of data being managed, x64 can be either better or worse. Pointers and other data uses more memory, code can be larger, but at same time some operations can be done in single instruction.


In my limited experience, when working with finely tuned number crunching algorithms, 64-bit version runs some 10% faster. But this is on the very upper bound (think matrix, streaming processing, very data-parallel work), and obviously mostly SSE.

Everywhere else the benefits are negligible, depending mostly on random factors. And difference in compiler quality and library implementation (allocators, OS tweaks) will matter much more. Algorithms are still first and only optimization.

Quote:
why is it commonly thought that x64 architecture just gives you are large quantity of available memory?
Because application that runs out of memory and crashes is utterly useless while nobody even notices if something runs slowly.

Share this post


Link to post
Share on other sites
You can always address 4 GB of memory on a 32-bit architecture. The reason you can't actually *use* the whole 4 GB from your application is a typical OS (including Windows) reserve some portion of your address space for their own use.

As for performance, yes, compiling for 64-bit can provide some performance gains. Primarily, x64 has considerably more registers than x86, which the compiler can make use of. Also, x64 always has a modern SSE implementation, which may allow the compiler to optimise floating point operations.

However, most applications are not performance bound by registers or floating point calculations - there is a certain class of applications which will be, but certainly not a majority of applications. If your program is all ready limited by caches, bus speed or I/O performance, then you aren't likely to see much (if any) improvement.

Share this post


Link to post
Share on other sites
Addressing in Windows XP is actually limited to a total of 2GB (3GB with boot.ini switch, but never more than 2GB per process) for all user mode applications. Starting in Windows Vista, the amount of address space dedicated to kernel mode can change, allowing more than 2GB to be allocated for applications. In 64-bit operating systems, there is no such limit.

Architecture wise, the number of general purpose registers has been increased from 8 to 16 (R8 - R15), along with 8 additional XMM/SSE registers (XMM8 - XMM15).

With all the additional registers, fastcall is used by default, allowing the first few parameters to always be passed by registers which is faster than accessing stack memory.

Share this post


Link to post
Share on other sites
Quote:
Original post by swiftcoder
You can always address 4 GB of memory on a 32-bit architecture. The reason you can't actually *use* the whole 4 GB from your application is a typical OS (including Windows) reserve some portion of your address space for their own use.

On top of that, some 32-bit operating systems that don't support PAE will incur further loss of usable address space, due to memory mapped IO and reserved address range of video RAM. For instance, if your video card has 512MB of VRAM, the addressable range of physical RAM will be reduced to 3.5GB.

Share this post


Link to post
Share on other sites
Thanks for the insight; very nice and helpful answers! :)

It's nice to know that I wasn't totally wrong, and that there can be more advantages to x64 programs under certain circumstances beyond just "bigger memory". It's also nice to hear the different technicalities behind it and what can have significant impacts on performance.

So why do people commonly think that (x64 == more RAM), and nothing else? If you know just a tiny bit about what's under the hood, it at least seems apparent that there's more to it; and that in some cases differences should be noticeable, if not huge. Is this just a misconception? Ignorance? Or do many developers just consider the differences so negligible that they don't even address their existence?

In any case, I think for me to learn more, I should write code which performs different common tasks and profile it. To get a better idea of the impact it has on my own work.

EDIT (for addition)::

I'd also like to mention that the perceived advantages I named above may seem insignificant on an instruction-to-instruction basis of x86 vs x64. But my theory is that in large, resource intensive applications, the difference could become profound; because all of the tiny performance increases will add up across the weight of the entire application. And this isn't just because we have bigger (and more registers), more lines on the data bus (and maybe the whole system bus?), native arithmetic for 64-bit integers, larger memory (so less chances of caching to HD (thus eliminating, sometimes, the over-head of hard-disk seeks)), and more. But, like I said, there's only one way to find out. Gotta do the tests and see what actually happens in practice. :)

Oh, another question which came to mind. How about these new Intel i-Series processors (like i3, i5, i7)? IIUC, they're using essentially the same x64 architecture, but they have that "performance boost" capability; which I assume must be something like hardware accelerate TSS or something. Anyone know much about them? One of my hobbies is doing "academic" software projects for learning and fun. Quite a while back, I wrote a 32-bit micro-kernel operating system. It's very small, and is missing the features of a viable commercial system. Just something I'm doing for fun and learning, and it does work. :) But what would be the challenges of re-constructing it for these new Intel processors? I could never try porting to x64, because my virtual machines only support 32-bit systems, and I'm too chicken to test an unstable system on my good work PC. But would it be significantly different from an x64 system; kicking it into "long-mode" and expecting the same underlying architecture?

Share this post


Link to post
Share on other sites
Quote:
Original post by keinmann
So why do people commonly think that (x64 == more RAM), and nothing else?

Because that is the only real net win. More registers help sometimes, more often than not they don't. Guaranteed SSE, well it is just about guaranteed on x86 as well since the last non SSE processor is pretty old by now, and anything performance sensitive enough to care already has requirements for hardware much higher than a processor without SSE would have. Sure the data bus is larger but all your data just got larger too, so not only did you not pick up much of a gain but RAM bandwidth which is more important just got a heavier load. So the only place you can really say win without reservation is memory space, everything else is a trade off that may or may not help.


Intel's speed boost shuts down unused cores so it can over clock the single core system you are actually using, there is no hardware accelerated anything added.

Share this post


Link to post
Share on other sites
Quote:
Oh, another question which came to mind. How about these new Intel i-Series processors (like i3, i5, i7)? IIUC, they're using essentially the same x64 architecture, but they have that "performance boost" capability; which I assume must be something like hardware accelerate TSS or something


I'm not sure what TSS is but Turbo Boost is nothing more than the processor bumping it's multiplier up above normal when the processor is in heavy use. How many bumps depends on the number of cores in use and is also limited by power usage. It's effectively the exact opposite of SpeedStep.

I have a Core i5, but I have Turbo Boost disabled as I run my processor overclocked 100% of the time at a much higher frequency than Turbo Boost would normally produce, so it is no advantage for me to have enabled.

Share this post


Link to post
Share on other sites
Another thing to consider - in a 64 bit app, many of your native data types are bigger (specifically pointers, possibly also integers depending on your compiler), which makes most of your classes bigger as well, which puts more pressure on the cache, which might cause more cache misses.

It's all rather complex, and the best way to find out for a particular application is to compile for both instruction sets and test if x64 does indeed run faster.

Share this post


Link to post
Share on other sites
Sorry for the jargon; I meant Task State Switching. I was curious if they had built something into the hardware which could help it "decide" the best way to manage different tasks or something. It's the way it jumped out at me from their advertising.

I understand though. Thanks again for even more insight. :)

Quote:
Originally posted by Hodgman
Another thing to consider - in a 64 bit app, many of your native data types are bigger (specifically pointers, possibly also integers depending on your compiler), which makes most of your classes bigger as well, which puts more pressure on the cache, which might cause more cache misses.


Not for me. :) Anytime I write anything in C or C++, I always include my company "standard" header. It includes 5 or more other files, and one of them standardizes all of the primitive data types, and adds "aliases" for them as in C#. So it's always assured that:

char = 1-byte
wchar = 2-byte
short = 2-byte
int = 4-byte
...and so on...

Then I have the types "byte", "ushort", "Int16" and "UInt16", "Int32" and so on; these are for when it's CRUCIAL they are an exact known width. It's something I suggest anyone do, especially if you plan on moving to different compilers, porting or just being precise. Customize that bad boy, so you never have to worry about type-widths. Things that can get tricky are things like "long". Is it really "long" (64-bits/8-bytes), or is it "long" compared to the days of 16-bit "Real Mode" programming? :)

Of course, pointer types, you're stuck with the native size. I don't consider that any overhead worth worrying about. But it's true. Classes can get bigger on 64-bit. Unless there are a TON of data fields in every type, I don't think it'll amount to much. It would really be the same "load" on the processor as it was on 32-bit, right? Just consuming slightly more overall RAM? Because even if those fields are increased in size, we handle them natively; just like x86 handles a DWORD natively.

Quote:
[i]Originally posted by Hodgman[/i
It's all rather complex, and the best way to find out for a particular application is to compile for both instruction sets and test if x64 does indeed run faster.


Yes, indeed, but you've gotta love it! :) That's what I said above, in case you missed it. When I find time, I'm going to run some tests to see if my assumptions amount to a hill of beans. Maybe they don't, maybe they sometimes do. I at least HOPE they do! heheh

[Edited by - keinmann on November 15, 2010 2:41:06 AM]

Share this post


Link to post
Share on other sites
Quote:
Original post by keinmann
So why do people commonly think that (x64 == more RAM), and nothing else?


because possibly having some percent more performance in some cases is no wow. but having suddenly instead of 2gb say 4gb, 8gb, 100gb of ram available for your app is a massive gain.

Share this post


Link to post
Share on other sites
Quote:
Original post by Tachikoma
On top of that, some 32-bit operating systems that don't support PAE will incur further loss of usable address space, due to memory mapped IO and reserved address range of video RAM. For instance, if your video card has 512MB of VRAM, the addressable range of physical RAM will be reduced to 3.5GB.
I was including those in 'for their own use', but regardless, why it reduces the amount of *usable* memory, neither of those actually has to reduce the range of *addressable* memory.

Even though some OS's choose to do so, there is no reason why the reserved video memory has you come out of your address space. This is what the virtual memory architecture is for, after all, and only buffers your program is actively reading/writing to need to be mapped into your address space.

Share this post


Link to post
Share on other sites
I think maybe he was talking about physical address space. And you're right. You could, technically, set up a virtual memory system which is only limited by the combined size of physical RAM and free HD capacity. IIRC, XP was stuck at 2GB though for "userland".

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this  

  • Forum Statistics

    • Total Topics
      627700
    • Total Posts
      2978695
  • Similar Content

    • By Pixelated_Nate
      Hi all!
      We are looking for a C# programmer for our 2D Action RPG titled Adavia, made in Unity.
      The game itself is akin to Legend of Zelda: Link to the Past, though we're also adding in traditional RPG elements such as Character Creation.
      This is more of a hobby than anything commercial, if it somehow does manage to go commercially, all revenue will be split equally among the team.
      If you're interested, we ask that you be comfortable with:
      Coding A.I's for enemies and NPCs. Working with GUI's. Communicating regularly with the team via Skype (text only). If you have any questions or would like to apply, please contact me at nathan.jenkins1012@gmail.com
       
    • By ilovegames
      You are the commander of a special forces squadron. You were given a task that appeared simple at first glance - to check for suspicious activity in the building of an abandoned psychiatric hospital. But you could not even imagine what you will actually have to face.
      Download https://falcoware.com/HospitalSurvival.php



    • By ilovegames
      You find yourself in an abandoned place full of mutants in the dead of night, and have to kill waves of monsters with a different kind of weapon. The main goal is to survive through the night.
      Controls:
      WASD – Walk
      Shift – Run
      Mouse1 - Attack
      Space - Jump
      Scroll Down – Change weapon
      Esc - Exit, pause
      Download https://falcoware.com/NightSurvival.php



    • By ilovegames
      "Lost Signal" agency investigates paranormal events from all around the world. You are one of the agents who participates in the research of various artifacts. Investigate paranormal activities in this 3D game which has great action and elaborate 3D graphics.
      Interesting quests and creepy monsters await you!
      Download https://falcoware.com/TheLostSignalSCP.php



    • By trjh2k2
      I've never really been a "Unity guy", since all of my game-dev learning happened in C++, and in other engines, but I recently discovered the "complete projects" section in the asset store.  It's full up on projects you can buy that are billed as "ready to customize and release", with full ad integration.  Some of them claim to be for educational purposes, but why would you include a complete, polished, full featured game with ads as an educational example?
      This leads me to the question of why this goes by unchallenged?  Does Unity and the environment of the Unity Store actively encourage this style of game development?  Is the problem of asset flipping our own fault?  I don't mean this as a "we should make Unity shut this down" kind of thread, but rather just to examine whether or not the environment of being able to just buy whole games or pieces of games is something that damages the industry.  I get why Unity would allow it, and I'm sure it's a working business model for some people- and maybe some people DO actually just use these to learn from, but I'm not that naive as to think that there aren't people who recognize this as one of the shortest paths to putting a game on the market so they can cash in.
      Thoughts?
  • Popular Now