Jump to content

  • Log In with Google      Sign In   
  • Create Account

We're offering banner ads on our site from just $5!

1. Details HERE. 2. GDNet+ Subscriptions HERE. 3. Ad upload HERE.


Member Since 29 Mar 2007
Offline Last Active Today, 01:17 AM

#5181476 HD remakes, what does it involve?

Posted by MJP on 19 September 2014 - 01:43 AM

I worked on the PS3 port of the God of War PSP games. For that port we didn't use a new engine or add any additional graphical features, we just straight-ported the PSP engine to PS3. The artist improved some of the models, up-rezed a bunch of textures (a lot of them automatically through photoshop scripts), and switch out the Kratos gameplay model for the hi-quality model used in cinematics. The bulk of the work involved was just programmers porting the engine bit by bit until it ran on PS3, and then coming up with ways to get all of the graphics functionality working on the PS3 GPU. A *lot* of looking at the PSP GPU docs to figure out what the hell it did when some obscure feature was enabled (texture doubling, argh!). The rest was just bug fixing (a lot of data bugs from endianness issuess), and optimization. There was also a large amount of work put into re-rendering our old cinematics in HD, so that we weren't just playing upscaled 480x272 videos.

It really depends though on the project...some remakes take things a lot further and add lots of new graphical features. But I would say most are similar to what we did, we're it's mostly a straight port.

#5181130 OIT (Order Independent Transparency)

Posted by MJP on 17 September 2014 - 03:50 PM

There's no real "best" here. Just like any other engineering problem there's different solutions, with different trade-offs. The Intel technique has good results, but requires a recent Intel GPU which usually makes it a deal-breaker. Blended OIT can work on almost any hardware, but can produce sub-par results for certain scenes and generally relies on per-scene tweaking of the blending weights. What's better for you will depend on what platform and hardware you're targeting, as well as the kind of content you'll have in your game or app.

#5180866 GPU Perfstudio error

Posted by MJP on 16 September 2014 - 06:13 PM

You should check out Renderdoc, which is an open-source replacement for PIX. It's very good!

#5180835 LOD in modern games

Posted by MJP on 16 September 2014 - 03:35 PM

If possible could you tell me which titles? I'd like to see some videos.
Progressive meshes from Huges Hoppe? I've seen it quoted a few places as well, It always seemed to me they actually used it.
Is Simplygon and other similar solutions that hit and miss requiring artist intervention?
Out of curiosity I seem to remember the 360 having some hardware tessellation, was it ever used in any titles?
Thanks for your help.


MLB: The Show used it on PS3. I believe they released some material about it, although this presentation is all I can find at the moment. I think Lair was using something similar as well. Insomniac was using it for a while with Ratchet and Clank on PS3, but dropped it before release. They have some slides here.


The problem with LOD is that you want something that has lower polygon counts but "still looks good", and that last part is very subjective. This is why you usually want an artist involved: so that they can make sure that the LOD's being generated are still visually pleasing. It's not always necessary, but ideally you'll want human feedback in the process as much as possible.


Xbox 360 did indeed have tessellation hardware. I don't believe it ever had widespread use outside of a few special cases. 

#5180538 LOD in modern games

Posted by MJP on 15 September 2014 - 03:13 PM

In the general case, hand done LODs of models at three or four distances, maybe plus an impostor billboard of needed. Simple alpha fade between them. Nobody is bothering with continuous LOD or other automated systems, as they have several flaws:
* Expensive on CPU
* Require dynamic GPU buffers
* Pretty much ruin any chance of instancing/batching - this is a huge problem
* The vast majority of methods are not able to cope with tex coords, normals, tangent spaces, and all the other real world stuff verts actually use. This becomes a huge n-dim optimization problem.
* In conjunction with the above, LOD changes create severe popping artifacts due to discontinuities in the various spaces
* Vertex processing isn't the limiting factor in nearly every system
The primary use of LOD right now is to prevent small triangles from being rasterized. Rasterization of tiny (less than 8x8 pixels in particular) triangles is catastrophically slow on modern hardware and negatively impacts overall fragment shading efficiency. There are a few special cases where more sophisticated LOD techniques are useful, notably (and almost exclusively) terrain.


Well it's not really true that nobody used continuous LOD systems (I know of a few games that did), but it definitely wasn't popular. In fact I've seen Progressive Meshes mentioned as one of the most frequently-cited but least-used techniques in graphics, which is pretty funny.

In general discrete LOD levels is definitely still the normal. Simplygon is becoming pretty popular as a tool for automatically generating LODs. It's pretty good, although you still generally want artist intervention. 

Tessellation used to be frequently mentioned as the solution to all LOD problems, but in reality it's found little use in big-budget games. The Call of Duty series seems to be the one notable exception.

#5180499 StructuredBuffer and matrix layout

Posted by MJP on 15 September 2014 - 11:46 AM

Yes, I've seen that behavior as well when using compute shaders. My workaround was to use a StructuredBuffer<float4> instead and use 4 loads. In terms of the compiled assembly this isn't really any less efficient, since a float4x4 by will get split into 4 loads when compiled to assembly (you can only load 4 DWORDs at a time from structured buffers).

#5180197 Using Texture2DMS in shader

Posted by MJP on 14 September 2014 - 12:59 AM

Like I said previously, Texture2DMS.Load takes integer pixel coordinates. You're passing the same [0,1] texture coordinates that you use for Texture2D.Sample, which isn't going to work. You can obtain the current pixel coordinate in a pixel shader by using the SV_Position semantic.


You're also only sampling one subsample from your Texture2DMS, and the index you're passing is too high. If you have 8 subsamples for your render target, then the valid indices that you can pass are 0-7. Try using a for loop that goes from 0-7, and averaging the result from all subsamples.

#5179778 Using Texture2DMS in shader

Posted by MJP on 11 September 2014 - 11:59 PM

You can't use any of the "Sample" functions on a Texture2DMS, since MSAA textures don't support filtering or any of the "normal" texture operations. The only thing you can with them is load a single subsample at a time, by specifying integer pixel coordinates as well as a subsample index. See the documentation for more info.


For your case, if you wanted to resolve the texture on-the-fly in the pixel shader you can just load all of your subsamples for a given pixel and then average the result. Also, note that for ps_5_0 you're not required to declare the Texture2DMS with the number of subsamples as a template parameter. You only had to do this for ps_4_0.

#5179741 Using Texture2DMS in shader

Posted by MJP on 11 September 2014 - 08:15 PM

You'll want to resolve your MSAA texture before you can sample from it using SpriteBatch. Just create another render target texture that has the same dimensions/format as your MSAA texture but with 0 multisamples, and then resolve to that before using it with SpriteBatch.

#5179419 Blur shader

Posted by MJP on 10 September 2014 - 03:12 PM

1. Samplers are generally only useful if you need filtering, or want special clamp/wrap/border addressing modes. For your case, not using a sampler is easier since you always want to load at exact texel locations.

2. Invalid addresses will return 0's when loading from textures or buffers. If that's not desirable, you should clamp your address manually to [0, textureSize - 1].

#5178941 HLSL compiler weird performance behavior

Posted by MJP on 08 September 2014 - 02:58 PM

I'm not sure why it takes so long to compile. You'd really need to get someone from the DirectX team to help you out. Historically the loop simulator used for unrolling loops has always been rather slow with HLSL compiler, but it doesn't make sense that it would be so slow for your particular case. It must be doing some sort of bounds-checking that is slowing it done.

I tried changing your shader to use a StructuredBuffer instead of a constant buffer for storing the array of bone matrices, and it compiles almost instantly. So you can do that as a workaround. A StructuredBuffer shouldn't be any slower (in fact it probably takes the same path on most recent hardware), and will give you the same functionality. 

As for unrolling, it's almost always something you want to do for loops with a fixed number of iterations. It generally results in better performance, because it allows the compiler to better optimize the resulting code and always prevents the hardware from having to execute looping/branching instructions every iteration. So you'll probably want to be explicit and put an [unroll] attribute on your loop.

#5178236 Intrinsics to improve performance of interpolation / mix functions

Posted by MJP on 04 September 2014 - 11:40 PM

I don't think there's anything available that the compiler won't already being using, and even if there were there's no guarantee that it would actually map to a single instruction once it's JIT compiled for your GPU.

#5178150 "DirectX Texture Tool : An error occurred trying to open that file" w...

Posted by MJP on 04 September 2014 - 02:56 PM

PIX comes with the old DirectX SDK, which you seem to already have installed. Just be aware that PIX will not work with up-to-date versions of Windows 7 or any version of Windows 8 without patching the EXE and one of its DLLs.


You may want to consider trying RenderDoc instead, which is a very awesome third-party tool that aims to be a worthy successor to PIX.

#5178149 DX12 - Documentation / Tutorials?

Posted by MJP on 04 September 2014 - 02:50 PM



what kind of documentation are you searching for?

Sorry, should've been more specific. I'm referring to documentation on the binary format to allow you to produce/consume compiled shaders like you can with SM1-3 without having to pass through Microsoft DLLs or HLSL. Consider projects like MojoShader that could make use of this functionality to decompile SM4/5 code to GLSL when porting software or a possible Linux D3D11 driver that would need to be able to compile compiled SM4/5 code into Gallium IR and eventually GPU machine code.

There's also no way with SM4/5 to write assembly and compile it which is a pain for various tools that don't work to work through HLSL or the HLSL compiler.



I'm not sure what the actual problem you have here is.  It's an ID3DBlob.


If you want to load a precompiled shader, it's as simple as (and I'll even do it in C, just to prove the point) fopen, fread and a bunch of ftell calls to get the file size.  Similarly to save one it's fopen and fwrite.


Unless you're looking for something else that Microsoft actually have no obligation whatsoever to give you, that is.....



He's specifically talking about documentation of the final bytecode format contained in that blog, so that people could write their own compilers or assemblers without having to go through d3dcompiler_*.dll, as well as being to disassemble a bytecode stream without that same DLL. That information (along with the full, complete D3D specification) is only available to driver developers. 

#5177977 DX12 - Documentation / Tutorials?

Posted by MJP on 03 September 2014 - 11:39 PM



D3D12 will be the same, except will perform much better (D3D11 deferred context do not actually provide good performance increases in practice... or this is the excuse of AMD and Intel which do not support driver command lists).

Fixed cool.png
AMD support them on Mantle and multiple game console APIs. It's a back end D3D (Microsoft code) issue, forcing a single-thread in the kernel mode driver be responsible for kickoff. The D3D12 presentations have pointed out this flaw themselves.



Indeed. There's also potential issues resulting from the implicit synchronization and abstracted memory management model used by D3D11 resources. D3D12 gives you much more manual control over memory and synchronization, which saves the driver from having to jump through crazy hoops when generating command buffers on multiple threads.