Jump to content

  • Log In with Google      Sign In   
  • Create Account

Interested in a FREE copy of HTML5 game maker Construct 2?

We'll be giving away three Personal Edition licences in next Tuesday's GDNet Direct email newsletter!

Sign up from the right-hand sidebar on our homepage and read Tuesday's newsletter for details!


We're also offering banner ads on our site from just $5! 1. Details HERE. 2. GDNet+ Subscriptions HERE. 3. Ad upload HERE.


Hodgman

Member Since 14 Feb 2007
Online Last Active Today, 07:07 AM

#5194066 Audio API

Posted by Hodgman on Yesterday, 07:44 PM

Yeah AFAIK, OpenAL Soft is the successor.

FMOD is an extremely popular choice, or Wwise if you have money wink.png




#5193957 C# .Net Open Source

Posted by Hodgman on Yesterday, 07:14 AM

If you follow a few simple practices (and sometimes a few complex practices) this ["no way to get away from the 'stop the world' garbage collector"] is actually an amazing feature of the languages.

The GC may be amazing, but why is barring you from having any control an amazing feature? Wouldn't it be nice if you could choose to opt in to specifying the sizes of the different heaps, hinting at good times to run the different phases, specifying runtime limits, providing your own background threads instead of automatically getting them, etc? Would it harm anything to allow devs to opt in to that stuff? Do the amazing features require the GC to disallow these kinds of hints?
 

With modern versions of both Java and C# ... On rare occasions  [when GC runs at the wrong time, it consumes] on the order of 1/10,000 of your frame time.

16.667ms / 10000 = 1.7 microseconds
Having seen GC's eat up anywhere from 1-8ms per frame in the past (when running on a background low-priority thread), claims of 1μs worst-case GC times sound pretty unbelievable -- the dozen cache misses involved in a fairly minimal GC cycle would alone cost that much time!
 
I know C# has come a long way, but magic of that scale that are justifiably going to be met with some skepticism.
Combine that skepticism with the  huge cost involved in converting an engine over to use a GC as it's core memory management system, and you've got still a lot of resistance in accepting them.
Also, often it's impossible to do an apples to apples comparison because the semantics used by the initial allocation strategies and the final GC strategy end up being completely different, making it hard to do a valid real world head-to-head too...
 

while your program has some spare time on any processor (which is quite often)

Whether it's quite often or not entirely depends on the game. If you're CPU bound, then the processor might never be idle. In that case, instead of releasing your per-frame allocations every frame, they'll build up until some magical threshold out of your control is triggered, causing a frame-time hitch as the GC finally runs in that odd frame.

Also when a thread goes idle, the system knows that it's now safe to run the GC... but the system cannot possibly know how long it will be idle for. The programmer does know that information though! The programmer may know that the thread will idle for 1 microsecond at frame-schedule point A, but then for 1 millisecond at point B.
The system sees both of those checkpoints as equal "idle" events and so starts doing a GC pass at point A. The programmer sees them as having completely different impacts on the frame's critical path (and thus frame-time) and can explicitly choose which one is best, potentially decreasing their critical path.
 

In C++ ... collection (calling delete or free) takes place immediately ... this universally means that GC runs at the worst possible time, it runs when the system is under load.

I assume here we're just dealing with the cost in updating the allocator's management structures -- e.g. merging the allocation back into the global heap / the cost of the C free function / etc?

In most engines I've used recently, when a thread is about to idle, it first checks in with the job/task system to see if there's any useful work for it to do instead of idling. It would be fairly simple to have free push the pointer into a thread-local pending list, which kicks a job to actually free that list of pointers once some threshold is reached.
I might give it a go biggrin.png Something like this for a quick attempt I guess.
 
However, the cost of freeing an allocation in a C++ engine is completely different to the (amortized) cost of freeing an allocation with a GC.
There's no standard practice for handling memory allocation in C++ -- the 'standard' might be something like shared_ptr, etc... but I've rarely seen that typical approach make it's way into game engines.
The whole time I've been working on console games (PS2->PS4), we've used stack allocators and pools as the front-line allocation solutions.

Instead of having one stack (the call stack) with a lifetime of the current program scope, you make a whole bunch of them with different lifetimes. Instead of having the one scope, defined by the program counter, you make a whole bunch of custom scopes for each stack to manage the sub-lifetimes within them. You can then use RAII to tie those sub-lifetimes into the lifetimes of other objects (which might eventually lead back to a regular call-stack lifetime).
Allocating an object from a stack is equiv to incrementing a pointer -- basically free! Allocating N objects is the exact same cost.
Allocating an object from a pool is about just as free -- popping an item from the front of a linked list. Allocating N objects is (N * almost_free).
Freeing any number of objects from a stack is free, it's just overwriting the cursor pointer with an earlier value.
Freeing an object from a pool is just pushing it to the front of the linked list.

 

 

Also, while we're talking about these kinds of job systems -- the thread-pool threads are very often 'going idle' but then popping work from the job queue instead of sleeping. It's pretty rediculous to claim that these jobs are free because they're running on an otherwise 'idle' thread. Some games I've seen recently have a huge percentage of their processing workload inside these kinds of jobs. It's still vitally important to know how many ms each of these 'free' jobs is taking.
 

In the roughly 11 major engines I have worked with zero of them displaced the heap processing to a low priority process.

The low priority thread is there to automatically decide a good 'idle' time for the task to run. The engines I've worked with recently usually have a fixed pool of normal priority threads, but which can pop jobs of different priorities from a central scheduler. The other option is the programmer can explicitly schedule the ideal point in the frame for this work to occur.

I find it hard to believe that most professional engines aren't doing this at least in some form...?
e.g.
When managing allocations of GPU-RAM, you can't free them as soon as the CPU orphans them, because the GPU might still be reading that data due to it being a frame or more behind -- the standard solution I've seen is to push these pointers into a queue to be executed in N frame's time, when it's guaranteed that the GPU is finished with them.
At the start of each CPU-frame, it bulk releases a list of GPU-RAM allocations from N frames earlier.
Bulk-releasing GPU-RAM allocations is especially nice, because GPU-RAM heaps usually have a very compact structure (instead of keeping their book-keeping data in scattered headers before each actually allocation, like many CPU-RAM heaps do) which can potentially entirely fit into L1.
 
Also, when using smaller, local memory allocators instead of global malloc/free everywhere, you've got thread safety to deal with. Instead of the slow/general-purpose solution of making your allocators all thread-safe (lock-free / surrounded by a mutex / etc), you'll often use a similar strategy to the above, where you batch up 'dead' resources (potentially using wait-free queues across many threads) and then free them in bulk on the thread that owns the allocator.
e.g. a Job that's running on a SPU might output a list of Entity handles that can be released. That output buffer forms and input to another job that actually performs the updates on the allocator's internal structures to release those Entities.
 
One engine I used recently implemented something similar to the Actor model, allowing typical bullshit style C++ OOP code to run concurrently (and 100% deterministically) across any number of threads. This used typical reference counting (strong and weak pointers) but in a wait-free fashion for performance (instead of atomic counters, an array of counters equal in size to the thread pool size). Whenever a ref-counter was decremented, the object was pushed into a "potentially garbage" list. Later in the frame schedule where it was provable that the Actors weren't being touched, a series of jobs would run that would aggregate the ref counters and find Actors who had actually been decremented to zero references, and then push them into another queue for actual deletion.
 
Lastly, even if you just drop in something like tcmalloc to replace the default malloc/free, it does similar work internally, where pointers are cached in small thread-local queues, before eventually being merged back into the global heap en batch.
 

When enough objects are ready to move to a different generation of the GC (in Mono the generations are 'Nursery', 'Major Heap', in Java they are "Young Collection" and "Old Space Collection") the threads referencing the memory are paused, a small chunk of memory is migrated from one location to another transparently to the application, and the threads are resumed.

Isn't it nicer to just put the data in the right place to begin with?
It's fairly normal in my experience to pre-create a bunch of specialized allocators for different purposes and lifetimes. Objects that persist throughout a whole level are allocated from one source, objects in one zone of the level from another, objects existing for the life of a function from another (the call-stack), objects for the life of a frame from another, etc...
Often, we would allocate large blocks of memory that correspond to geographical regions within the game world itself, and then create a stack allocator that uses that large block for storing objects with the same lifespan as that region. If short-lived objects exist within the region, you can create a long-lived pool of those short-lived objects within the stack (within the one large block).
When the region is no longer required, that entire huge multi-MB block is returned to a pool in one single free operation, which takes a few CPU cycles (pushing a single pointer into a linked list). Even if this work occurs immediately as you say is a weakness of most C++ schemes, that's still basically free, vs the cost of tracking the thousands of objects within that region with a GC...
 

On extremely rare occasions (typically caused by bad/prohibited/buggy practices) it will unexpectedly run when the system is under load, exactly like C++ except not under your control.

So no - the above C++ allocation schemes don't sound exactly like a GC at all tongue.png




#5193884 OpenGL Vs Monogame

Posted by Hodgman on 20 November 2014 - 05:39 PM

Every game is powered by a "game engine".
Drag'n'drop game-maker GUIs with visual scripting are not the only form of game engine.

If you choose to make a game without one, then you'll have built one by the time you're done.
The parts of your code that power the game, but aren't specific to the gameplay are "the engine".
Even if you make something simple like "pong" from scratch, you'll have built a "pong engine", which you can utilise to make other "pong-style" games, such as breakout.

I thought the game engine was the way for me to go, but after really looking into it I found them to be too much point and click and not enough actual coding. I really want to have full access to program user input, save states, collisions, etc on my own. While I know C# syntax I'm still trying to get use to all the useful classes and combining syntax to do certain things. I want as much hands on as possible when I make my games to give myself as much practice as possible, then in the future I can always move over to an engine of choice.

The first part isn't true - you'll still have to do a tonne of programming when using an existig engine.
If you don't use something like MonoGame, you'll just have to create your own version of it first, AND then build the game on top of your own "NotMonoGame" in exactly the same way that you would have done anyway.

As for the second part, if you're the kind of person who learns by doing, you'll probably be better off building your first games within existing, well designed, proven frameworks. Not only will you actually see results faster, but in the process of using these existing frameworks you'll be reading/using code written by expert game programmers, and gain a good understanding of how these base systems are often structured. Then later, when you try to build a game from scratch (AKA, buildnyour own engine) you'll already be a somewhat experienced game programmer, so you'll know what your engine should look like.

IMHO, trying to build an engine before you've built games is like trying to build a race-car before you've got a drivers license... Actually: before you've even ever driven in a car at all!
Sure it can be done, but an engineer/craftsman will do better to understand the users of their craft.

Even if your goal was to become a game-engine programmer, rather than a game programmer, I'd still advocate learning to make games on many different existing game engines first, so you understand the needs of your users (i.e. Game programmers) before trying to build your own engine.


Also don't underestimate the amount of work involved in either option.
When I worked in the games industry on low-budget games:
* when we used an existig engine, we had 2 engine programmers dedicated to modifying/maintaining that engine, plus a dozen gameplay programmers.
* when we used our own engine, we had two dozen engine/tools programmers and a dozen gameplay programmers.
For a simple 8-month game project, that's somewhere around 10 to 30 man-years of work, just on programming! Also, all of those staff had 5+ years of tertiary education/experience to begin with...

Completing any game as a 1-man band is a huge achievement to look forward to.


#5193860 Suggestions for simulating ambient light

Posted by Hodgman on 20 November 2014 - 02:58 PM

The standard solution today is pre-filtered IBL, AKA light probes, AKA cube maps for specular, and either the same, or SH light probes for diffuse.

On prev-gen, We often used a very cheap lighting model based around only using two directional light sources.
You'd merge together your main lights into one directional "key light" per object (there's a God of War and a Gears of War presentation that explains this trick), which gives strong diffuse/spec on half the model (leaving the other half unlit).
Then we would use a magic formula to pick another direction per-object to use as a "fill light". This direction would be opposite the key light to fill the other side of the model with light, but also somewhat opposite the camera to ensure it creates some nice specular reflections. We'd sample a simple light probe (SH, etc) using the surface normal and then use that value as the colour of the directional "fill light".


#5193732 OpenGL Vs Monogame

Posted by Hodgman on 20 November 2014 - 12:42 AM

OpenGL is an API for sending commands to the GPU.

MonoGame is basically a simplistic game engine -- a collection of APIs that deal with graphics, audio, asset loading, user input, etc...
Inside MonoGame's graphics components, it will use OpenGL to send commands to the GPU in order to draw things.
(n.b. Different platforms have different APIs for controlling the GPU, so actually, it will use OpenGL, OpenGL|ES, D3D11 via SharpDX or GNM depending on which platform you're using it on!)

OpenGL deals with low-level commands, like sending texture data to GPU memory, binding shaders, drawing triangles and projecting vertices.
MonoGame deals with high-level commands, like load this 3D model from disc, draw this 3D model to the screen.

If you want to learn how to control a GPU, then use OpenGL.
If you want to learn how to make games, use MonoGame.

DirectX is not crossplattform

Just to be pedantic tongue.png
D3D runs on: 3 game consoles (5 on DC, 9 on Xb360, 11 on XbOne), Windows Phones, and PCs (via Windows/Linux+Wine).
OpenGL runs on: only PCs (via Windows/Mac/Linux) and it's sister GL|ES runs on Android/iOS.
There's also emulation layers to run GL|ES on PC via D3D, and to run D3D on Mac/Linux/PS4 via GL/GNM.
So it depends on how you define cross-platform - depending on your target platforms, either could be more cross-platform than the other laugh.png
Mobile/PC devs will get most reuse by writing a shared GL & GL|ES engine (with ifdef's around the differences) and porting to consoles as a 2ndary concern.
Console/PC devs will get most reuse by writing a shared D3D engine (and porting to Sony) and porting to Mac/Linux/mobile as a 2ndary concern.


#5193712 Using RealWorld People in Games and law?

Posted by Hodgman on 19 November 2014 - 09:06 PM

3) Even if i do not show up and i am found guilty, is there a way to enforce the law (since im from eastern europe country)? Whould would enforce it? interpol :-D?

They can go after the company that's been selling your game, force them to stop selling it, and seize any profits you've made so far in order to pay for their damages.

Even if you're not using a middle-man like Steam, you'd be hosting the game on a web-server somewhere. The company that owns that web-server will probably cooperate rather than become an outlaw like you, so they'll shut you down. If you're hosting your own servers out of your bedroom, then they'll go to your ISP and get you cut off the internets.

Needless to say, it's not a good idea to try and run an illegal business...


#5193686 HLSL mul() and row/column major matricies in directx

Posted by Hodgman on 19 November 2014 - 04:34 PM

Mathematical row/column major defines how you logically arrange your data - do you put the basis vectors in logical columns or rows.

Mathematical "row major" matrix:
x1,x2,x3,0
y1,y2,y3,0
z1,z2,z3,0
p1,p2,p3,1

Mathematical "column major" matrix:
x1,y1,z1,p1
x2,y2,z2,p2
x3,y3,z3,p3
0, 0, 0, 1

When you choose between these, it determines whether you'll be writing:
vecD = vecC * matrixA * matrixB
Or
vecD = matrixB * matrixA * vecC

Computer Science row/column major defines how you store arrays in memory.
Given the data-driven
ABCD
EFGH
IJKL
MNOP

A comp-sci "row major" storage order is:
ABCDEFGHIJKLMNOP

A comp-sci "column major" storage order is:
AEIMBFJNCGKODHLP

The storage order has no impact on the maths at all. If youre changing the maths because of the comp-sci majorness, then something is wrong.


HLSL uses comp-sci column-major array storage by default (but you can override this with the row_major keyword).
HLSL does not pick a mathematical convention for you - that's just determined by the way you do your maths.

On the C++ side, the maths library that you're using will determine both the array storage convention and the mathematical convention.
What math library are you using?


#5193567 Real-time Physically Based Rendering and Transparent Materials

Posted by Hodgman on 19 November 2014 - 12:47 AM

Ignoring all the bending of light stuff:

I don't think the simple solution to just alpha-blend the objects with transparent materials is physically realistic.

It almost is - the basis of alpha-blending almost lines up with some physical concepts, but pre-multiplied alpha is closer:
traditional alpha is: final = a*src + (1-a)*dst;
premultiplied alpha is: final = src + (1-a)*dst;
(where src is the value output from your shader, dst is previous framebuffer contents, and final is the new framebuffer contents)

In that second model, you can think of a as the percentage of light that's being absorbed by the material and converted to heat/light (or being diffused). It should change based on the type of material (a traditional alpha map / alpha material value), the thickness of the material (which maybe you just fake and store in the texture), and the angle that you're looking at the material from.

On that last point - If light has entered a translucent object from the back and is about to enter via a front surface towards you, the amount of light that makes it through that boundary is defined by Fresnel's law. Some percentage won't make it though, and will reflect back into the object instead of making it out the other side. At glancing angles, this tends towards 100%.
However, some percentage of that inwardly-reflected light will continue to bounce around inside and eventually make it back out anyway... You can calculate all that if you want to go fully physically based!
 
With this model, any light being emitted or reflected off the material isn't affected by the alpha value, which is correct. A glass pane will have the same reflections regardless of whether you put black cardboard on the other side to make it become opaque.
A glass material then has a very low alpha value (perhaps even zero for crystal clear glass? but increasing towards 1 for thicker and thicker surfaces), a very dark diffuse colour (if it's clean), and a specular-mask value calculated from it's index of refraction (somewhere around 0.04 I'd guess). It would also have a very low roughness / high spec-power if it's smooth glass.
If you're trying to render glass of varying kinds of thicknesses, it might also be helpful to multiply your diffuse lighting results by the alpha value (so that opaque glass actually gains a colour instead of becoming black), but not the specular lighting results (as these photons don't actually enter the glass material, so aren't affected by it's properties).


#5193546 So Many Texture Units!

Posted by Hodgman on 18 November 2014 - 08:56 PM

Modern GPUs don't have fixed numbers of texture slots any more.
 
In the GL2 era, you had GPU's that looked something like this -- a whole bunch of constant resources in fixed slots, being read by a bunch of shader cores:
[Constant Register Bank][Tex 0][Tex 1]...[Tex N]
            \/             \/     \/        \/
------------------------------------------------
    \/         \/         \/             \/
[Shader 0] [Shader 1] [Shader 2] ... [Shader N] 
And your command stream looked something like this -- commands to modify the registers, and commands to launch programs on the shader cores:
(Set Register T#0 to blah)
(Set Register C#4 to blah)
(Launch program at 0x1234 on all shader units)
(Launch program at 0x5678 on all shader units) -- may overlap with previous program invocation
(Set Register T#0 to blah)                     -- will stall, as previous draw must complete before modifying that register
(Launch program at 0x1234 on all shader units)
In the GL4 era, GPUs look more like this:
[Memory]
 \/  /\
--------
 \/  /\
[Shader N]
All those fixed registers are gone! Instead, each program takes a void* as an argument, specifying where in memory it's input parameters (the old 'register' data) will be read from. The new command stream then looks like:
(Launch program at 0x1234 using data at 0xABCD on all shader units)
(Launch program at 0x5678 using data at 0xABCD on all shader units)
(Launch program at 0x1235 using data at 0xEF01 on all shader units)
So actually, now the driver is creating two streams -- on the one hand it's creating the GPU command stream as it's always done, but the command stream no longer contains any commands to bind textures/uniforms.
So now as well as doing that, it's taking all the resources that you've bound to GL's "slots", and copies them into a contiguous struct for each draw call. It then has to make sure that these structs are all prepared and ready for use before the GPU executes the commands that reference them.
//inside your glDraw*(blah) call:
void* registerMemory = CopyGlRegistersToGpuMemory();
WriteCommand((Program*)g_currentShader, registerMemory); // shader=0x1234, registers=0xABCD
//you call a glFucntion to change shaders here
//inside your next glDraw*(blah) call:
void* registerMemory = CopyGlRegistersToGpuMemory();
WriteCommand((Program*)g_currentShader, registerMemory); // shader=0x5678, registers=0xABCD (no registers changed, so CopyGlRegistersToGpuMemory returned the last set of constants)
//you change shaders back, but also change tex0
//inside your next glDraw*(blah) call:
void* registerMemory = CopyGlRegistersToGpuMemory();
WriteCommand((Program*)g_currentShader, registerMemory); // shader=0x1234, registers=0xEF01 (a whole new block of constants allocated, because you changed a single 'slot')
It's this fact that most APIs are still based around slot-bound resources, while modern GPUs are based around "bindless resources" that has ignited interest in these new down-to-the-metal APIs like DX12 / GLNext / Mantle / etc...
i.e. CopyGlRegistersToGpuMemory is expensive -- OpenGL is basically emulating itself on modern GPUs sad.png


In the future, I wouldn't be surprised to see draw calls in our programs to look more like
//From GLNext.h, etc...
struct GLN_Texture { u16 width, height, format, etc; void* gpuMemory; };
struct GLN_Buffer { u16 size, format, etc; void* gpuMemory; };

//this matches the inputs defined in myShader.nglsl
struct MyShaderParams { GLN_Texture diffuse, spec; GLN_Buffer ubo0; };

//Create and fill in the parameters required for a draw call using my shader
MyShaderParams* params = GPU_Malloc(sizeof(MyShaderParams));
params->... = ...;

//issue the draw call using these explicitly specified resources, instead of binding resources to API slots
commandBuf->glnDraw( myShader, params );
glnSubmit(commandBuf);//send finished command buffer thru to GPU



#5193541 Using RealWorld People in Games and law?

Posted by Hodgman on 18 November 2014 - 07:24 PM

I worked on a game recently where the player-base demands lots of likenesses, but they were impossible to license. The solution this developer came up with was to ship the game with a character creator so players could create the likenesses themselves, and a sharing server so they could give them to other players. n.b. The sharing server has to have a process in place to deal with DMCA take-down notices.

 

When you first start the game, the first thing it asks you is if you'd like to replace the default (non-real) characters with the most popular replacements from the sharing server, which always happen to be the real people they're supposed to be laugh.png




#5193201 Java - Maximum Recommended Threads At One Time ?

Posted by Hodgman on 17 November 2014 - 12:35 AM

^What they said.

 

In my humble opinion, the only purpose of threads in ("real time interactive") video games is to enable you to utilize extra CPU cores.

 

If you're using them for other purposes (e.g. to allow yielding/co-routines), then you're doing it wrong. Implement a coroutine/event system instead, which runs on a small number (changeable, depending on CPU core count) of shared threads.




#5193045 Are there any patent trollers in the games indestry?

Posted by Hodgman on 15 November 2014 - 07:20 PM

There's Tim Langdell, the trademark troll who sues anyone who uses the word 'edge'.

There's been some regular-horrible patents (not troll patents), such as Nintendo's D-pad, or Sega's 3d checkpoint arrow, or Creative's stencil shadow algorithm (AKA "Carmack's reverse").
The last one was close to a patent troll situation, where Carmack independently invented the algorithm in order to create a game, but was forced to comply with Creative's demands because they'd invented it simultaneously but had also filed a patent.

For the most part, I'm incredibly proud that there's so much R&D done in games but so little of it is patented. In such an environment is generally bad PR to patent anything as it'll be seen as an attack on games (and gamers)


#5192943 is there a better way top refer to assets in a game?

Posted by Hodgman on 14 November 2014 - 09:15 PM

I just hash the strings into a 32bit integer. This lets you get rid of all your strings ahead of time, and just use ints everywhere. If for some reason you're stuck with a string at runtime, you can just quickly hash it at runtime too.
n.b. it's possible to implement string hashing at compile time with some macro magic, so you can use strings in your source, which get compiled and used at runtime as ints.
 
Yeah, you can get hash collisions, where two different strings map to the same int. In practice, I've waiting for it to happen to me yet (the data build tools check for this case offline), and when it does I'll just increment the seed used by the hash until it doesn't happen.
 
ATM, I'm using the FNV32a hash, which is super cheap if required at runtime, easy to create a compile-time version, and seems to give good distribution:

u32 hash(const char* str, u32 seed)
{
    u32 h = seed;
    const u8* s = (u8*)str;
    while (*s) {
        h ^= *s++;
        h *= 0x01000193;//very specific magic prime number for 32bit hashing
    }
    return h;
}

When loading files -
For retail builds: all the assets get packed into an archive with 32-bit filenames. They're no longer using textual filenames at all. The archive has a lookup from a 32-bit file-name-hash to an offset/size within the archive.
For development builds: Assets are kept as loose files on disk. The data build tool produces a dictionary for converting from a 32-bit file-name-hash to a windows file name.

[edit]
In any case, you shouldn't have code like playwav("some_sound.wav"); though -- more like:
Sound* some_sound = loadwav("some_sound.wav"); //filename processing paid once, pointer obtained
...
playwav(some_sound); // no details of filesystem involved per frame




#5192809 Corrupting of the view if far from world center

Posted by Hodgman on 14 November 2014 - 01:26 AM

output.position = mul(input.position, worldMatrix);
    output.position = mul(output.position, viewMatrix); 
    output.position = mul(output.position, projectionMatrix);
Here, you're transforming positions from model space to world space to view space to projection space.
As others have pointed out, world space coordinates are too large for you, so you're losing precision.

Instead, multiply worldMatrix and viewMatrix together on the CPU side ahead of time to produce worldViewMatrix.
Then in your shader, you'll be able to go from model space straight to view space (without making a stop via world space).


#5192359 Why are new game updates unable to view old RTS game replays?

Posted by Hodgman on 11 November 2014 - 09:19 PM

Check out this article for how RTS networking code typically works:
http://www.gamasutra.com/view/feature/131503/1500_archers_on_a_288_network_.php

TL;DR - instead of sending events that happen and the state of the world, they only send the player's inputs (clicks, keypresses) and then rely on a deterministic simulation to produce the same results on every computer, without actually synchronizing the game at all!

Then: replay systems and networking have a lot in common. If you properly write a networking system like this, you've basically already written a replay system!

Seeing as this system relies on a deterministic simulation (same inputs == exact same outputs, every time), then any small change to the game will result in a butterfly effect, where the replay gets more and more out of sync as time goes on. If you patch the game so a marine does 10dmg instead of 8dmg, then after a particular fight, different units will be dead/alive, and the replay will be out of sync from the original!

Starcraft solved this problem by keeping around all of the old versions of the game! When you view an old replay, the game temporarily downgrades itself to the old version required by that replay.

Yes, another solution would be to change the replay system to actually store events/state, rather than inputs... but this would result in replay files that are Megabytes in size, rather than Kilobytes.




PARTNERS