Sign in to follow this  
Evil Steve

2 FPS - Vertex shader bound?

Recommended Posts

Hi all, I'm trying to get a game to work on my laptop (Guild Wars). I've installed it all fine, and it runs - but it runs at 2 FPS with the lowest graphics settings, using PIX to measure FPS. I've checked through PIX, and the game isn't doing anything that odd - it creates and destroys 3 vertex buffers per frame, but I've written a proxy d3d9.dll which recycles those three buffers, and it doesn't affect the frame rate at all. The game itself uses VS 1.1 and PS 1.1, and my laptop's graphics chipset (A crappy Intel express 965M) claims it can do VS 3.0 and PS 3.0 (Although no hardware vertex processing available). It does make around 30 VB locks, around 40 SetPixelShader calls and around 160 SetTexture calls per frame, but that seems perfectly reasonable to me. I've used my proxy DLL to try a few things; none of which make any difference to the frame rate:
  • Replacing all SetTexture() calls to set a 2x2 texture (So it's not texture-bound)
  • Forcing everything to fail the Z-test (ZFUNC=NEVER)
  • Using a 1x1 scissor rect The only thing I've tried so far that seems to make a difference is not making any DrawPrimitive/DrawIndexedPrimitive calls - that makes the FPS go up to around 80 FPS. I've checked, and the MinIndex and NumVertices parameters passed to DP/DIP are sane, so it's not as if D3D is transforming unnecessary vertices. My laptop is usually pretty capable, it's able to run Red Alert 3 quite happily, and my girlfriend's laptop is very similar to mine, and it gets around 30 FPS in Guild Wars quite happily. I've run Process Monitor to see if the game is doing anything odd, and it's not, and there's very little hard drive access, so I don't think it's swapping textures in and out. So, does all this imply that I'm vertex shader bound? Does anyone have any ideas for things I could try to narrow down the cause of the slowdown and find a possible solution? I know this is technically a tech support question, but it's sort of programming related [smile]

    Share this post


    Link to post
    Share on other sites
    Quote:
    Original post by Adam_42
    http://www.reghardware.co.uk/2007/08/20/intel_updates_gma_driver/ suggests that it should be able to do hardware vertex processing with an appropriate driver.

    You might want to try and find one :)
    Interesting, I'll have a look tonight. I had a look for the latest official driver, which I already have (Although it's from early 2008 and from Toshiba's site).

    Cheers,
    Steve

    Share this post


    Link to post
    Share on other sites
    I had a look, but I can't seem to find a driver that'll work. When I install the driver from Intel's site, it says "The driver being installed is not validated for this computer. Please obtain the appropriate driver from the computer's manufacturer. Setup will exit."

    It seems that hardware vertex processing works for fixed function, but not for shaders - I presume that's what this driver fixes (The page wasn't really all that clear)?

    EDIT: It's getting strange now. I can compile my own source code with D3DCREATE_HARDWARE_VERTEXPROCESSING, but when I force the device for Guild Wars to be created with that flag, the debug runtimes say that the device cannot perform hardware processing. Anyone know why this would be? As far as I can see, the present parameters are the same...

    EDIT #2: The device caps are totally different when coming from the Guild Wars process. I looked into it and it seems that Guild Wars is using DX8, and Vista is doing some sort of shim between DX8 and DX9, and is changing the way the device behaves somehow.

    [Edited by - Evil Steve on June 4, 2009 7:33:01 AM]

    Share this post


    Link to post
    Share on other sites
    needle.in.haystack

    But what I will say is: in the 9 or so years I've programmed shaders I've never seen a PC be so seriously vertex bound except in cases of extreme wrongness (hey, we all know shipping games can be bugged to hell too) or deliberate test behaviour.

    Cap for Hardware Vertex Processing only means "card can do fixed function T&L in hardware" not "can do shaders".

    Could it be CPU bound...? It doesn't necessarily have to mean the game code itself is running slow, it could be (for example) kmixer.sys software mixing too many concurrent sound channels. Run a CPU profiler over the whole system (profiling the GW process alone doesn't tell you if the driver is spinlocked most of the time). If the Intel driver is doing VP on the CPU as well (even if the driver pretends to be doing it in H/W, some Intels do that..) expect a ton more CPU load.

    Remember there's a vertex buffer creation flag for hardware VP vs software VP.

    If device is MIXED VP, be wary of more than 2 switches between the two per frame.

    Force it to fixed function with the same number of draw calls and only the POSITION in the FVF (you can find out where it lives from the vertex declaration) and the stride set to skip the rest.

    Debug runtimes? ;-)



    Share this post


    Link to post
    Share on other sites
    Whoops, I thought I'd replied to this.

    The DX8 option worked, I can play at ~25FPS on medium/high settings (It only makes a few FPS difference between low and high).

    I'm still curious why it fails so hard on DX9 mode on my laptop though. My main curiosity is why the caps are so different when IDirect3D9::GetDeviceCaps is called from within the GW process and from within my own stand-alone app. The process is definitely loading d3d9.dll (Although that may be through some Vista DX8 shim I suppose). It's also possible that the driver has some detection for Guild Wars and changes the caps it exposes (Although I couldn't find any option to change or disable that in the rubbish Intel control panel)

    Quote:
    Original post by S1CA
    needle.in.haystack

    But what I will say is: in the 9 or so years I've programmed shaders I've never seen a PC be so seriously vertex bound except in cases of extreme wrongness (hey, we all know shipping games can be bugged to hell too) or deliberate test behaviour.
    Oh so true [smile]

    Quote:
    Original post by S1CA
    Cap for Hardware Vertex Processing only means "card can do fixed function T&L in hardware" not "can do shaders".
    Yeah, my point was more that this caps bit was different between the GW process and my own process.

    Quote:
    Original post by S1CA
    Could it be CPU bound...? It doesn't necessarily have to mean the game code itself is running slow, it could be (for example) kmixer.sys software mixing too many concurrent sound channels. Run a CPU profiler over the whole system (profiling the GW process alone doesn't tell you if the driver is spinlocked most of the time). If the Intel driver is doing VP on the CPU as well (even if the driver pretends to be doing it in H/W, some Intels do that..) expect a ton more CPU load.
    Changing the DX8 mode allows it to work fine, so I don't think it's CPU bound (Going on the big assumption that DX8 mode only has any real impact on D3D8 / D3D9).

    Quote:
    Original post by S1CA
    Remember there's a vertex buffer creation flag for hardware VP vs software VP.

    If device is MIXED VP, be wary of more than 2 switches between the two per frame.
    Yup - all vertex buffers are created in software vertex processing mode, and mixed mode is never used.

    Quote:
    Original post by S1CA
    Force it to fixed function with the same number of draw calls and only the POSITION in the FVF (you can find out where it lives from the vertex declaration) and the stride set to skip the rest.
    That's a good idea - I'll give that a try tomorrow.

    Quote:
    Original post by S1CA
    Debug runtimes? ;-)
    Actually - I spent a few hours looking into this before I realised I had the debug runtimes turned on. However, turning them off makes no noticeable difference.

    It's also entirely possible (Given the horrific performance when alt+tabbing which I'm guessing is from using up too much system RAM) that when it runs in D3D9 mode it uses more textures, and is exceeding VRAM and swapping like crazy - Something else I'll need to find out (Total up the size of all resources used on a given frame and compare that to the about of CRAM reserved (Which I don't know offhand)).

    Share this post


    Link to post
    Share on other sites

    Create an account or sign in to comment

    You need to be a member in order to leave a comment

    Create an account

    Sign up for a new account in our community. It's easy!

    Register a new account

    Sign in

    Already have an account? Sign in here.

    Sign In Now

    Sign in to follow this