Jump to content

  • Log In with Google      Sign In   
  • Create Account

Not dead...

Modern Warfare 2 : No Russian.

Posted by , 16 November 2009 - - - - - - · 219 views

Having completed Modern Warfare 2 this weekend in what was, for me, record time of obtaining a game I've decided to also weight in on the 'no russian' level as many people have been doing.

If you have not finished the game there WILL be spoilers below. Turn away NOW.

So, as you probably know by now Modern Warfare 2, while being a massive hit has its fair share of detractors. From complaints about the size of the install vs lenght of game play, lack of 'modern game play' and of course the now infamous 'no russian' level.

The first two can be handled quickest so I'll cover them first.

The size vs lenght (and vs cost as well) arguement is an intresting one and one which, in my opinion, really isn't a major one. The fact is a large game which takes 5h or so to play (my approximate play time, spread over 3 sittings) is, if nothing else, testimate to the amount of assets present in the game. The first focus is, as always, on the graphics, which aren't by any means poor (although I might be a tad bias given my gaming rig) however it doesn't end there. The between mission cinematics are, like the first game, very impressive giving you a wealth of secondary information and story detail while the voice overs explain what is going on.

After that comes the sound scape put together; something which is often missed in a number of games. The characters you are fighting with and against feel more real. They have accuate situational things to say and their general interaction with the player is engaging enough that you find yourself becoming attached to them somewhat. We aren't totally out of the 'canned single line response' yet, as they are still present but there is a definate improvement in this regard.

All of which feeds back into cost and, lets not ignore it, profit. End of the day Activision are a company, they are there to make money, and while I'm sure they have made a profit on this game the fact is it would not have been a cheap game to make by any stretch of the imagination.

Finally there is length and the fact that people are complaining that it is only 5h long. I'm not going to compare it to the cinema, or indeed to other games, however while it might have "only" been 5h long I feel it was pretty much long enough.

You see, the problem with many games is if they try to go for 'epic game play' then unless they have a story and tasks to match it then you are just getting into padding. It becomes 'oh, another <place> filled with bad guys I have to kill. yay' and then it really does become 'just another shooter'. Two games which are hailed, for in my view mind boggling reasons, as the 'best FPS games ever' suffer from this for me. I'm talking of course of Half-Life and Half-Life 2, where I've never made it past a few levels of the former and the latter I only played because I had nothing better todo with my time and the later levels just dragged on.

All of which brings me to 'modern game play' elements and, frankly, I'm not sure what this really is. Is it the lack of "physics puzzles" people are talking about, because if so GOOD! Those things often turn out to be quite pointless and contrived just to show that 'wooo! we have a physics engine'. Once you get beyond that there isn't much else that I can see; I admit I'm not a game designer by a long shot but I never found myself playing MW2 and thinking "yeah, this game needs X".

What MW2 represents to me is a FPS game doing what an FPS game should and sticking to being an FPS game. You still have to be stealthy because thats in the context of the game, but beyond that you have a gun, you have your orders, and you have to carry them out in, what is at times, the cluster fuck of modern combat.

End of the day MW2 lives up to its original as a solid, FPS game, by no means glorifying war and is well worth the price of admission.

All of which brings us to 'No russian'.

As you may or may not know 'No Russian' is an early mission in the game. Having played the level to 'learn the game' and select your difficulty level and have some combat experiance vs some enemies you are recruited to go undercover as someone close to the main villian of the piece (a Vladimir Makarov, a former protégé of Imran Zakhaev from MW1) as part of a multi-national task force (aka some SAS members and some cannon fodder). The rest of the task force, containing one of the other characters you play 'Roach', is off in Russia obtaining some hardware.

Once you complete that mission you leap back to the undercover guy to the mission in question. The mission plays out with you and 3 other guys, including Makarov, walking through a Moscow airport, cutting down civilians and guards alike with heavy machine guy fire.

This level alone generated alot of outrage and people calling it "sick", the thing is I disagree and I'm going to give them a pass on this.

Now, the one complaint I have heard about this level was the lack of setup, and I do agree with that to a degree; a mission before hand involving yourself and Makarov to introduce things and get you on this 'team' might have been a better introduction to this section of the story, but the level itself I have no issue with.

I'll come straight out and say it; I shot the civies. A fair few in fact, because I felt, given the position I was in and the guy I was with that if I hadn't done that I might have been punished for it in some way (ironic considering your are shot and killed at the end of the level by Makarov). There have been complaints that there was no reason to shoot them, but given that mindset and the fact you went into this mission being told that 'you would lose a piece of yourself' (or words to that effect) and that it was for 'the greater good' would probably have convinced someone in that situation to follow through as well.

The fact of the matter is that game is set in modern times and terrorism is a part of modern warfare as much as massive battleships, armor and remotely targetted missiles and this brings it home what some might do in the name of their cause.

Now, that this ignites an invasion of the US by the Russians has also been critised but lets consider this from a few angles;

Firstly those in charge of russia at that point in the game were anti-american and would have been looking for an excuse to do something like this. The ACS system, which I can only assume was a key piece of technology for the main land defense, gave them the technical ability and the attack gave them the 'moral' ability to get the people behind it.

You might say that an investigation into the shooting would have found out it was a setup but then if you want an excuse who does a real investigation? If you want "proof" however you can look to the real world and the events post-9/11. Now, I'm not going to say the American Goverment is like the Ultranationals however even then given a single terrorist event on home soil the USA (and the UK) went on to invade two countries and this was with a moderate goverment; imagine if the goverment had been actively looking for an excuse to go in.

End of the day, could "no russian" have been handled better? Possible, however I feel this level IS an important landmark in gaming history because it is the first time that I can think of where a game has taken on an issue such as terrorism and the role people might play in it head on in, what I feel, was a mature way. There was no shying away from the death and suffering there (a theme which continues in the rest of the game with the realities of war played out). The response will certainly have an effect on how this subject matter is handled in the future but I feel that with this Infinity Ward have opened a door and its one I think we should use if we are going to tell stories set in the modern world; a bit less jingoism and a bit more of a look at how things are done is never a bad thing and gaming is a powerfull way to get that message across.

Intrestingly, this whole level probably over shadowed another aspect of the game completely one I'm surprised didn't get more reaction from the Americans out there.

In Modern Warfare 1 the SAS were, pretty much, the guys who got stuff done. The USMC did their part, however their part was pretty much screwing things up and getting nuked. While the SAS weren't the nicest bunch they got in their, got shit done, and got out again. It isn't until the last mission where things go boobs skywards and even then they are bailed out by the russians.

Modern Warfare 2 kinda continues this theme to a degree; the fubar mission in russia, getting their arses kicked on home soil and the shelling of the gulag while Task Force 141 (aka SAS and some cannon fodder) are still inside, nearly killing them in the process. Meanwhile TF-141 carry out their missions pretty much flawlessly, save Washington (at the cost of the ISS I admit) and generally continue to run about fixing things.

Frankly, as a non-American its refreshing to see games which don't show the US in an all positive light, however the one thing I was surprised didn't get more coverage was the end of the game where, having completed the objective yourself (playing Roach) and Ghost are met by General Shepherd who shoots them both and sets fire to them (all seen from the eyes of Roach, while Cpt. Price who you previously rescued, is yelling over the mic that Shepherd isn't to be trusted). The game then jumps you to playing Soap again, with Price as they escape the ambush, hunt down and kill Shepherd (and as many people as get in their way).

To me, the most shocking 'what he hell?' moment of the game was that betrayal by the americans of the SAS members, and the lack of complaint about 'painting the US' in a negative light I find intresting.

I'm going to end this with some words from Adam Biessener of Game Informer, as summerised by wikipedia, which I happen to agree with;

In his review for Game Informer, Adam Biessener writes that while the level "makes the player a part of truly heinous acts", he also notes that the "mission draws the morality of war and espionage into sharp focus in a way that simply shooting the bad guys cannot". Biessner concludes that it is one of the more emotionally affected moments in the game, is "proud that our medium can address such weighty issues without resorting to adolescent black-and-white absolutes".

Letters From The Readers

Posted by , 18 October 2009 - - - - - - · 165 views

Its time for one of those entries where I quickly answer a post or two from the comments [smile]

Original post by Jason Z
Do we get to see some screen shots of the system in action?

Right now it exists in a stripped down purely mathematical test environment.

Drawing is on my 'todo' list however before that I want to finish up and refactor the test cases a bit to centralise a few things, check over the maths and add a few 'effectors' for the systems to get an idea of the 'true' performance. I also need to test emitter spawning speed a bit.

At that point I'll probably throw together a simple test render for screen shots/video reasons; maybe multithreaded, maybe not.. its tempted to do the former simply because D3D11 will make it a bit easier.

I'm currently looking into the Thread Building Blocks and some details on existing game system design which uses them to try and come up with a truely scaleable game system based on it. While 'parrellel_for' et al are fine for this test the real game is going to have to manage the task system itself.

Yeah, alot of thought for something which will be a 2D game rendered primarily with pixels [grin]

Original post by Black Knight
About tessellation how do you think it will effect collision detection?Can the detailed tesselletad mesh be used for collision detection? Or they will use some kind of simpler mesh.

This is an intresting one; I suspect you could abuse 'stream out' to get the post-tesselation data back from the card in real time however I suspect in many cases tesselation will be used to add detail but collision will still occur with a lower resolution set of mesh data.

Particles 2

Posted by , 13 October 2009 - - - - - - · 244 views

One unplanned day off due to illness later and, when I felt up to it later in the afternoon, I reattacked my SSE based code.

I decided that, instead of trying to do sillyness with ensuring that the blocks are always multiples of 8, I would just work out how many mulitples of 8 I can process and then deal with the remaining 4 particles after that.
Initally I only did this to one pass, just to check it would work, before moving onto a two pass method, however upon changing the 2nd pass to work the same way I noticed a (very small I admit) speed drop. Some pondering a quick Bing later I realised that in 32bit mode we only have 8 SSE registers and the 2nd pass, doing 2 groups of 4 at a time, would require 9 registers active at one point (5x src, 4x dest) so I'm currently attributing that slow down to register pressure problems.

I also decided that a single pass SSE method was needed, just to try and hit all the bases.

Finally, I adjusted the app to take total frame readings and pull out min, max and average times.

The times, for the biggest test for the SSE routines came out as follows;

Two pass SSE parallel (100/10000) 3.08299 4.6787 3.47186
Single pass SSE parallel (100/10000) 3.12475 4.60054 3.47186
Two pass Adaptive SSE parallel (100/10000) 3.06958 4.64958 3.46114
Two pass Full Adaptive SSE parallel (100/10000) 3.09947 4.9921 3.46305

(times in ms, numbers in name are (emitters/particles per emitter))

On average the Two pass adaptive is a bit faster, however at this point the difference is marginal at best.

I also have some numbers from an Intel Atom 330 @ 1.6Ghz with 4 threads, as there are a few numbers I'll just give the average final test results in each case;

Simple : 113.297ms
Emitter & Particle full parallel : 67.7486
Emitter parallel : 66.8808
Two pass parallel : 67.9637
Two pass SSE : 28.2201
Single pass SSE : 28.246
Two pass adaptive : 28.2491
Two pass full adaptive : 28.2966

So, top tip from that is that if you are going to do a fair amount of maths on an Atom make sure you use SSE and threads.
That said, going to 4 threads only gained around a 50% speed up, and going to 4 particles at a time only gained around a 50% speed up again.

Other tests to add before I release unto the masses to get more data;
- thread grain size
- limiting the number of threads active on the test system

Even with all this, it'll only act as a 'best case' guide for a real game system as that will have other things going on (rendering, audio, maybe network processing) all of which will cut into the threading time as well (in my plans at least). This is really more a feature fesibility study to try and get some idea of processing power.

- i7 920 has lots
- Atom 330 has little



Posted by , 11 October 2009 - - - - - - · 275 views

Around the middle of the week I decided to get my arse in gear and get working on some code related to a game idea I had.

I also decided to grab a trial copy of Intel's Parallel Studio to have a play with it, see what its auto vectorisation is like and also try things like the Thread Building Blocks.

One of the things I need in this game are particles.
Lots of them.

Now, the obvious way to deal with it is the give it to my shiney HD5870 and let that sim things, however I've never been one to take the obvious route and I've got a core i7 sitting here which hardly gets a work out; time for particles on the CPU.

Having installed the Intel stuff I had a look over the TBB code and decided that the task based system would make a useful platform to build a multi-threaded game on; it's basically a C++ way of playing with a threadpool, the thread pool being hidden behind the scenes and shared out as required to tasks. You don't even have to be explicate with tasks they provide parallel_for loops which can subdivide work and share it out between threads, including task stealing. All in all pretty sweet.

After putting it off for a number of days I decided to finally knuckle down and get on with some performance testing code. I've decided to start simply for now, get a vague framework up and running, however it'll require some work to make it useful with the final goal being to throw it at some people and see what performance numbers I get back for emitters and particle count.

I've so far tested 5 different methods of updating the particle system;

The first is a simple serial processing on a single thread; so each emitter is processed in turn and each particle is processed one step at a time. Best I can hope for is some single floating point value SSE processing to help out. The particles themselves are structs stored in an std::vector (floats and arrays of floats in the struct).

The second is the first parallel system I went for; emitters and particles in parallel. This uses the parallel_for from the TBB to split the emitters over threads and then to break each emitter's particles up between the threads. Same data storage as the first test.

The third was me wondering if splitting the particles over threads was a good idea so I removed the parallel_for from the particle code and left it in the emitter processing. Same storage setup as before.

Having had a look at my basic processing of the particles and a bit of a read of the Intel TBB docs I realised that I had a bit of a dependancy going on;

momentum += velocity
position += momentum * time;

So, I decided to split it into two passes; the first works out the momentum and the second the position. Storage stayed the same as before.

At this point I had some timing data;
With 100 emitters, each with 10,000 particles and the particle system assuming an update rate of 16ms a frame and a particle lifetime of 5 seconds:

- serial processing took 5.06 seconds to complete, with the final frame taking 23ms to process
- Emitter and particle parallel took 2.44 seconds to complete, with the final frame time taking 8.6ms to complete.
- Emitter only parallel took 2.4 seconds to complete, with the final frame taking 8.53ms
- Two pass parallel processing took 2.44 seconds to complete, with the final frame taking 8ms

Clearly, single threaded processing wasn't going to win and speed prizes, however splitting the work over multiple threads had the desired effect, with the processing now taking ~8ms, or half a frame.

The final step; SSE.

I decided to build on the two pass parallel processing as it seemed a natural way to go.

The major change to SSE was dropping the std::vector of structures. There were two reasons for this;
- memory alignment is a big one; SSE likes things to be 16byte aligned
- SIMD. Structures of data don't lend themselves well to this as at any given point you are only looking at once chunk of data.

SIMD was a major one, as the various formulas being applied to the components could be applied to x and y seperately, so if we closely pack the data this means we can hold say 4 X positions and work on them at once.

So, the new storage was a bunch of aligned chunks of memory aligned to 16bytes. The particle counts were forced to be multiple of 4 so that we can sanely work with 4 components at a time.

The dispatch is the same as the previous versions; we split threads off into emitters and then into groups of particles and these particles are processed.

Final result;
- Two pass SSE parallel processing took 1.09 seconds, with the final frame taking 4.9ms.

Thats a nicer number indeed, as it still leaves me 11ms to play with to hit 60fps.

There is still some tuning to do (min. particles per thread for example, currently 128) and it might even be better to do my own task dispatch rather than leave it to the parallel_for loop to setup. I've also not done any profiling with regards to stalling etc or how particles and emitter count effect the various methods of processing. Oh, and my own timing system needs a bit of work as well; taking all frame times and processing them for min/max/average might be a good idea.

However, for now at least I'm happy with todays work.

ToDo: Improve test bed.

A Note on Fermi

Posted by , 01 October 2009 - - - - - - · 310 views

Fermi, for those who haven't noticed, is the code name being given for NV's new GPU and recently some information has come out about it from NV themselves. This is good because, up until this point, there has been a definite lack of information it that regard; something I'd noted myself previously.

So, what is Fermi?

Well, Fermi is a 3 Billion Transistor GPU which is mainly aimed at the Tesla/GPGPU market, however this doesn't preclude it running games of course but it does however hint at the focus the company have taken which is to try and grow the GPGPU area of the market.

Processing power wise it has 512cores, which is twice the GT200, but bandwidth wise its a bit unclear, the bus size having shrunk from 512bit to 384bit but it is on a GDDR5 interface which AMD have had good results from bandwidth wise (the HD5870 is on a 256bit bus after all).

The GPU itself is a 40nm gpu, like the RV870 of the HD5xxx series, however its 40% bigger which could have an impact on yields and thus directly on the cost of the final cards.

The card will of course support DX11 and NV of course believe that it will be faster than the HD5870, however there isn't much else with regards to games being talked about right now.

As I mentioned up top the main focus seems to be GPGPU, this is evident in a few of the choice made in the design and is part of the reason the chip is 'late'.

The maths support and 'power' points very firmly to this; The chip does floating point ops to the IEEE-754 spec and has full support for 32bit integers (apprently before 32bit integer multiples were emulated as the hardware could only do 24bit), and it also suffers a MUCH lower cost for double (64bit FP) maths; where as tehe GT200 processed 64bit at 1/8th the speed and the RV770 processes at 1/5 speed of normal 32bit maths the Fermi GPU only takes a 50% hit to its speed.

The question in my mind is how they do that, I have seen some people commenting that they might well leave silicon idle during 32bit maths/game maths but this would strike me as a waste; I suspect instead some clever fusing of cores is occuring the pipeline somehow to get this performance. (Given that each of these

The Fermi core also introduces a proper cache setup, something which wasn't in the previous core which had a per 'shader block' (a 'shader block' being a section of 'cores', control circuits etc and are made up of 32 cores) section of user controlled memory and a L1 texture cache which was mostly useless when you are in 'compute mode'. In Fermi there is a 64KB block of memory for each shader block, this can be split 16k/48k or 48k/16k with one section being shared memory and the other being L1 cache (The min shared memory size is to ensure that older CUDA apps which required the max 16k still worked). The is also a block of shared L2 cache of 768KB which helps with atomic memory operations making them between 5 and 20 times faster than before.

The other area which has been tweaked, and does have an impact on gaming performance as well, is the number of threads in flight; the GT200 could have 30720 threads active at any given time while Fermi has reduced this to 24576. This is because it was noted the performance was mostly tied to the amount of shared memory not the number of threads; so shared memory went up while thread count went down. Part of me can't help wonder if they might have been better off keeping the thread count up, although I suspect they have run the numbers and the extra memory space might well have offset the extra transistors required to control the extra threads on the die.

There have been various improvements in the shader instruction dispatch as well, mostly to remove deadtime on the shader blocks which was a perticular "problem" with the previous generation when it came to certain operations. This was mostly down to the 'special function unit' which, when used, would stall the rest of a threads in the thread group until they were completed. This has been decoupled now which solves the problem (the SFU is seperate from the main 'core' sections of the shader blocks).

Per clock, each of these shader blocks can dispatch;
32 floating point operations
16 64bit floating point operations
32 int operations
4 'special function unit' (SFU) operations (sin, cos, sqrt)
16 load/store operations

It is however important to note that if you are performing 64bit operations then the whole SM is reduced to 16 ops per clock and that you can't dual issue 64bit FP and SFU ops at the same time. This gives some weight to my idea of fusing cores together to perform the operations, thus not wasting any silicon, and probably involving the SFU in some way thus the lack of dual issue.

Given the way the core blocks seem to be tied I would suspect this is a case of being able to send a FP and int instruction at the same time, as well as a SFU and Load/Store op at the same time, the latter two being seperation from the 'cores' and part of the shader block.

Another big GPGPU improvement is that of 'parallel kernel support'.

In the GT200/G80 GPUs the whole chip had to work on the same kernel at once; for graphics this is no problem as often there is alot of screen to render, however with GPGPU you might not have that kind of data set so parts of the core are idle which is a waste of power.

Fermi allows multiple kernels to execute at once, rather than startng one, waiting for it to complete, then starting another and so on. This was an important move as, with twice as many cores, the chances of idle cores in the original setup was increased under Fermi, now the GPU can scheduel as it needs to to keep the core fed and doing the most amount of work. It'll be intresting to see how this feeds back into graphic performance, if it does at all.

Moving between GPU and CUDA mode has also been improved, NV are claiming a 10x speed up meaning you can switch a number of times a frame, which of course comes with a note about hardware physics support. There is also parallel GPU<->CPU data transfer meaning you can send and recieve data at the same time instead of one than the other.

The HD5870 can detect memory errors on the main bus, however all it can do is resend and until they succeed they can't correct. NV have gone one step futher and added ECC support to the register file, L1 and L2 cache and the DRAM (its noted that this is Tesla specific so I don't expect to see if in a commerical GPU); the main reason for this is that many people won't even talk about Tesla without these features as they are, percieved at least, to be a requirement for them.

One area where NV have typically stood head and shoulders about ATI (and now AMD) has been tools. PerfHUD is often stated as a 'must have' app for debugging things in the PC space. With Fermi they have yet another thing in store; Nexus.

Put simply Nexus is a plugin for Visual Studio which allows you to debug CUDA code as you would C++ or C# code. You can look at the GPU state and step into functions, giving what to me seems like a huge boost in work flow and debugging abiliy. As much as I like the idea of OpenCL, without something like this I can see alot of GPGPU work heading NV's way (provided they have a market, see later).

They have also changed the ISA for the chip, part of which involved unified memory addressing which apprently required to enable C++ on the GPU. Thats right, with Fermi you have virtual functions, new/delete and try/catch on the GPU. Couple this with the improved debugger and intresting times ahead; although it will be intresting to see how C++ makes a transition to the GPU, even in a sub-set type setup they had with Cg and C for CUDA before hand.

I mentioned the GPGPU market earlier and this is an intresting part of the story.

It can't be denied that with this GPU NV are firmly taking aim at this market, yet last quater the Tesla business has only made $10M, which to a company which grossed $1B at its peak isn't a whole lot. However when asked about this the "blame" is placed on a lack of focused marketing rather than a lack of business as such; maybe with the improvements above (ECC being a key) and better marketing NV can grow this sector. Given what they have done with this GPU and the added focus on Tegra, its mobile ARM based System on Chip, these do appear to be the areas NV are betting on growth to expand on.

However, if Fermi doesn't improve things (and NV are looking at an exponential growth in sales) then even with Tegra they are going to be in trouble.

So, those are the facts, and a few guesses by myself about how things are done, but what is my opinion?

Well, the chip itself is intresting and from a GPGPU point of view a great step forward, however I have this feeling NV might have shot themselves in the foot a bit while taking this step forward.

You see alot of these GPGPU features which have been added have more mostly delayed the chip, it is apprently only recently they have got working chips back and that means we are still maybe 2 months away from a possible launch and even then there might be a lack of volume until Q1 of next year (with murmers of Q2).

This means NV might well miss the Thanks giving, XMas and New Year sales, areas I suspect are key for people buying new machines and with Windows 7, argueably the most populare windows release EVER, around the corner I suspect a number of people are looking at the market right now and thinking 'what should I upgrade to?' and in answer to that they are going to find HD5870 and, more importantly, HD5850 cards waiting to be purchased.

It seems that NV have decided to take the pain and, to a lesser or greater degree, are betting on the Tesla GPGPU space to make the pain they are going to take from being late to market against AMD worth while. It remains to be seen if this is the case, however I'm sure AMD will be happy with a free pass for the next few months.

Finally, by the time they do get to market we could be only 2months away from an AMD refresh, so if you hold out for an NV card (and we have no real numbers here) then you start thinking 'maybe I should hold out for the AMD refresh...' and that way madness lies.

From a consumer point of view however, this delay isn't that great as it allows AMD to sell their cards at a higher price point, which will good for the company isn't so great for everyone's pocket.

I think that if NV want to dent AMD's success here then they need to start releasing numbers and possible prices soon, because right now the only game in town is AMD, but if you can get some numbers out and make your GPU start to look worth while then people might hold off.

Right now however, with no numbers, no price and no release date, AMD are a no brainer of a buy.

Recent Entries

Recent Comments