Jump to content

  • Log In with Google      Sign In   
  • Create Account

Installing a second "development" video card?


Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.

  • You cannot reply to this topic
15 replies to this topic

#1 TysonJ   Members   -  Reputation: 145

Like
0Likes
Like

Posted 22 June 2012 - 11:02 AM

I had the idea of installing a second pci video card (geforce fx5200) in my pc... my baseline, least common denominator platform is a 3gs, and I want to test my game's performance on roughly comparable hardware, as I develop it. Ideally I would like to hook up my main video card with dvi and the second with vga, and be able to easily toggle between the two.

Is this feasible? Do other people do this? Or would I be entering a world of headaches? My os is Windows 7.

Sponsor:

#2 mark ds   Members   -  Reputation: 1268

Like
0Likes
Like

Posted 22 June 2012 - 01:23 PM

I don't wish to hijack TysonL's thread, but I have a similar and related question - it seemed appropriate to put it here.

Is it possible to have an amd + nvidia card in the same system, and direct the bios to choose one over the other (init first display device - PCIE1/PCIE2). Obviously the two wouldn't be active enabled the same time.

#3 NightCreature83   Crossbones+   -  Reputation: 2827

Like
0Likes
Like

Posted 22 June 2012 - 01:37 PM

I have had a dev pc with an ATI(9600 or something similar) and an Nvidia(TNT2 M64) in one machine on win xp. At the time I was porting an OpenGL renderer to D3D(DX9.0c) and when you swap the drawing window from one screen to the other. The application is actually still rendering on the first GPU and the slow down you see is the PCI/AGP bus (this was years ago) transfer of the rendered image from one GPU to the other.

At least on Win XP Nvidia and AMD card together went fine, however I think I had only installed the Catalyst driver as that was the main GPU and relied on the OS drivers for the TNT card.
Worked on titles: CMR:DiRT2, DiRT 3, DiRT: Showdown, GRID 2, Mad Max

#4 mark ds   Members   -  Reputation: 1268

Like
0Likes
Like

Posted 22 June 2012 - 06:32 PM

just to add to this thread, I have just checked many posts relating to this on web sites... and the answers were No, or Yes!

It would appear that (under W7) no one seems to know whether
a: you can run two nvidia cards of different generations,
or b: you can run amd and nvidia at the same time.

There are several totally conflicting reports.

Personally, I was hoping to avoid buying a complete amd system just to test OpenGL functions! Although, it may acutally be worth while for several reasons (not least of which was an 8 core processor for half the price of an Intel equivalent).

Which reminds me... what is the AMD equivalent to a 560GTX?

Cheers

#5 L. Spiro   Crossbones+   -  Reputation: 13600

Like
1Likes
Like

Posted 22 June 2012 - 08:34 PM

I want to test my game's performance on roughly comparable hardware, as I develop it.

Then get an iPhone 3GS. There is no such thing as “comparable hardware”. Not even the iOS Simulator is comparable (its OpenGL ES 2 implementation is emulated in software, and provides no hardware support). There are numerous hardware differences that matter, including deferred tile-based rendering which eliminates overdraw (your GeForce card will not perform this) and a unified memory model which eliminates bus transfers to the graphics card.
iOS devices have a virtual memory system but no paging system.
Even threading is not the same.

Both Mac OS X and iOS adopt a more asynchronous approach to the execution of concurrent tasks than is traditionally found in thread-based systems and applications.



The point is that the only hardware that is like an iPhone 3GS is an iPhone 3GS. If you want to develop for one, get one.


L. Spiro

Edited by L. Spiro, 22 June 2012 - 08:43 PM.

It is amazing how often people try to be unique, and yet they are always trying to make others be like them. - L. Spiro 2011
I spent most of my life learning the courage it takes to go out and get what I want. Now that I have it, I am not sure exactly what it is that I want. - L. Spiro 2013
I went to my local Subway once to find some guy yelling at the staff. When someone finally came to take my order and asked, “May I help you?”, I replied, “Yeah, I’ll have one asshole to go.”
L. Spiro Engine: http://lspiroengine.com
L. Spiro Engine Forums: http://lspiroengine.com/forums

#6 TysonJ   Members   -  Reputation: 145

Like
0Likes
Like

Posted 25 June 2012 - 04:52 PM

Personally, I was hoping to avoid buying a complete amd system just to test OpenGL functions! Although, it may acutally be worth while for several reasons (not least of which was an 8 core processor for half the price of an Intel equivalent).

Are you buying an AMD system for their opengl es support? If so I would advise against it. The last driver update broke it, at least for me, so after hours of pulling my hair out trying to figure out why my game stopped working I switched to ANGLE, which has worked very well so far.

#7 TysonJ   Members   -  Reputation: 145

Like
0Likes
Like

Posted 25 June 2012 - 04:55 PM


I want to test my game's performance on roughly comparable hardware, as I develop it.

Then get an iPhone 3GS. There is no such thing as “comparable hardware”. Not even the iOS Simulator is comparable (its OpenGL ES 2 implementation is emulated in software, and provides no hardware support). There are numerous hardware differences that matter, including deferred tile-based rendering which eliminates overdraw (your GeForce card will not perform this) and a unified memory model which eliminates bus transfers to the graphics card.
iOS devices have a virtual memory system but no paging system.
Even threading is not the same.

Both Mac OS X and iOS adopt a more asynchronous approach to the execution of concurrent tasks than is traditionally found in thread-based systems and applications.



The point is that the only hardware that is like an iPhone 3GS is an iPhone 3GS. If you want to develop for one, get one.


L. Spiro


I'm aware that they are quite different, but they are not so different that optimizing one will harm the other. What I meant by "comparable" is that if I can get one to run at 60hz, the other probably will as well, or at least be most of the way there. Does that sound about right? Really I want to find every excuse possible to stay in my cozy pc development environment :)

#8 Erik Rufelt   Crossbones+   -  Reputation: 3480

Like
2Likes
Like

Posted 25 June 2012 - 05:01 PM

If you have the available port and get an additional monitor, it works perfectly fine to use two different GPUs simultaneously on Windows 7. Just connect them both, install both drivers, and they will work side by side as a normal multi-monitor environment. I currently have one NVidia and one AMD running.

If you get a very old and a newer card from the same vendor there might be driver collisions.. I haven't tried that...

If you want only one monitor and two graphics cards and switch between them that should also be possible, just put the DVI cable into one card and the VGA cable into the other, and let Windows think it's two different monitors. Then switch between which card/input you use with the monitors control panel and the monitor input settings.

EDIT: The monitor/input that you use for OpenGL must be set to the 'primary monitor' in the Windows monitors control panel. So to switch between the cards for OpenGL, switch which one is the primary monitor.

Edited by Erik Rufelt, 25 June 2012 - 05:04 PM.


#9 Madhed   Crossbones+   -  Reputation: 2983

Like
3Likes
Like

Posted 25 June 2012 - 05:52 PM

What I meant by "comparable" is that if I can get one to run at 60hz, the other probably will as well, or at least be most of the way there. Does that sound about right?


No.

The PowerVR chipset on the apple stuff has a completely different architecture than most pc gfx adapters. You will have to optimise different aspects of your renderer to get the best performance out of it.

#10 mark ds   Members   -  Reputation: 1268

Like
0Likes
Like

Posted 26 June 2012 - 05:08 AM

Thanks Erik, that's what I wanted to know.

I basically want to be able to test opengl on both video cards without physically swapping them in and out of the system, not run a dual monitor setup.

#11 Krypt0n   Crossbones+   -  Reputation: 2571

Like
0Likes
Like

Posted 29 June 2012 - 12:30 PM

Yes it is possible to have AMD and NVidia installed on the same Windows 7 PC. Tho, I don't have any video-out connected, I just use the 2nd card to run opencl without stalls on the main window.

Regarding performance, it's less difference than you'd expect, although the HW works in a complete different way, that's because
1. modern GPUs also reject occluded surfaces, this doesn't work perfectly as it's dependent on the draw order, but
a) it rejects still a lot of faces
b) you can make a z-prepass on PC to simulate perfect culling like on iOS devices
c) even if the PowerVR is more efficient in find out what needs to be done, doing it is still slow, you'll be very memory bound most of the time (at least my experience)
2. most optimizations that you do and that help on PC, also help on iOS. like
a) optimizing draw order, shader, state changes
b) reducing data by using compression e.g. 16bit floats for vertexbuffers, texture compression to reduce bandwidth
c) reducing clears, FBO switching, using optimal formats (e.g. 16bit for shadows).
3. iOS devices have also different performance levels, 4generation of phones, 3 of pads and a dozen of iPods, you will want to keep things scaleable, especially as an indie dev who cannot affort testing on all devices (and optimizing for those). So if you can get a smooth performance on PC by e.g. dynamically adjust LOD distances, limit particle overdraw, balance the amount of texture/VBOs update per frame etc. you will benefit across the line on all other platforms as well.

There are just a few very specific optimizations that are down the hardware, most are generic.

#12 L. Spiro   Crossbones+   -  Reputation: 13600

Like
0Likes
Like

Posted 29 June 2012 - 08:01 PM

There are just a few very specific optimizations that are down the hardware, most are generic.

While many of these points are true for both platforms, not all are. One thing worth pointing out is that OpenGL drivers for PC are often quite shoddy since vendors are focusing more on DirectX, so it is quite hit-and-miss when saying how much of an impact certain things make. For example, sorting by shader first and textures second could be faster on one PC and slower on another.
Luckily iOS devices are more consistent, and it is always better to sort by textures first, then shaders.
This is true for my PC as well, but on PC I add a 3rd condition when both textures and shaders are the same: depth. This extra condition actually slows down iOS so it is worth mentioning.

The biggest difference that pops out to me is in regards to clearing.
Clearing buffers on iOS devices just sets a flag. It doesn’t actually copy memory over the whole buffer etc.
It is instantaneous as long as it does not cause a resolve. You can avoid resolves by calling glDiscardFramebufferEXT() before glClear().

This means that what makes for a long operation on PC is virtually free on iOS.



One of the most important things to avoid on your life when working on iOS is dynamic/streaming VBO’s. It takes only a handful of updates per frame (around 6) to halt your game to a crawl (around 10 FPS). The same number of updates on the same vertex buffers without using a VBO allows you around 200 updates per frame before it even drops below 60 FPS. Without GL_DYNAMIC_DRAW VBO’s, I can update about 1,000 buffers per frame before it starts to slow to 10 FPS.

This applies even if you are double-buffering the VBO’s that you update, although to a slightly smaller extent. In one of our games, changing from GL_DYNAMIC_DRAW VBO’s to having no VBO’s increased the FPS from 21 to 32. That is a 16.369-millisecond difference, and there were only 4 or 5 VBO’s being updated.

This is not so hard on the PC side, although PC’s still benefit from not using VBO’s for GL_DYNAMIC_DRAW or GL_STREAMING_DRAW a bit.



On the one hand you can argue that low-end PC hardware can still give you a rough idea of what to expect on iOS devices due to many optimizations being “a good idea” for both platforms.
On the other hand you might still get hit by something that hits iOS harder than it does PC (dynamic/streaming VBO’s, dependent texture reads) or you might entirely miss opportunities to optimize for iOS specifically (discarding before clearing) simply because your only experience with iOS was a port. You didn’t get into the meat of the system and really learn how it works and how to exploit it for best performance.



There is no substitution for a real device.


L. Spiro
It is amazing how often people try to be unique, and yet they are always trying to make others be like them. - L. Spiro 2011
I spent most of my life learning the courage it takes to go out and get what I want. Now that I have it, I am not sure exactly what it is that I want. - L. Spiro 2013
I went to my local Subway once to find some guy yelling at the staff. When someone finally came to take my order and asked, “May I help you?”, I replied, “Yeah, I’ll have one asshole to go.”
L. Spiro Engine: http://lspiroengine.com
L. Spiro Engine Forums: http://lspiroengine.com/forums

#13 Krypt0n   Crossbones+   -  Reputation: 2571

Like
0Likes
Like

Posted 30 June 2012 - 11:17 AM


There are just a few very specific optimizations that are down the hardware, most are generic.

While many of these points are true for both platforms, not all are. One thing worth pointing out is that OpenGL drivers for PC are often quite shoddy since vendors are focusing more on DirectX, so it is quite hit-and-miss when saying how much of an impact certain things make. For example, sorting by shader first and textures second could be faster on one PC and slower on another.
Luckily iOS devices are more consistent, and it is always better to sort by textures first, then shaders.

Actually on all platform you should first sort by shader (at least if overdraw is not an issue e.g. due to a z-pass). it's not related to drivers, it's how hardware works.
e.g. unity optimization report:

Rendering order of opaque geometry
• Tegra: big occluders first -­ front to back, rest by shader
• iOS: sort by shader

http://blogs.unity3d...n_Unite2011.pdf

This is true for my PC as well, but on PC I add a 3rd condition when both textures and shaders are the same: depth. This extra condition actually slows down iOS so it is worth mentioning.

changing depth order is completely irrelevant on iOS.

The biggest difference that pops out to me is in regards to clearing.
Clearing buffers on iOS devices just sets a flag. It doesn’t actually copy memory over the whole buffer etc.
It is instantaneous as long as it does not cause a resolve.

on PowerVR hardware, a clear is simply a big triangle covering the whole screen. I think you confuse it to be the same as resolve. while it affects the resolve behavior, it still has some cost.


This means that what makes for a long operation on PC is virtually free on iOS.

please don't teach people that, that's simply not true. it has the exact same cost as drawing a full screen triangle on iOS.

on PC it depends on what GPU you have, if you use one that has HiZ, it will just invalidate some areas


One of the most important things to avoid on your life when working on iOS is dynamic/streaming VBO’s. It takes only a handful of updates per frame (around 6) to halt your game to a crawl (around 10 FPS). The same number of updates on the same vertex buffers without using a VBO allows you around 200 updates per frame before it even drops below 60 FPS. Without GL_DYNAMIC_DRAW VBO’s, I can update about 1,000 buffers per frame before it starts to slow to 10 FPS.

I think the PowerVR whitepaper tells it differently.
You should never update buffers once you start drawing, as the PowerVR gpu is deferred, this means, it keeps all buffers until the end of the frame, so when you lock, it will have to allocate new memory etc. But you can update all buffers before you issue the first draw call without affecting the GPU/driver.

This applies even if you are double-buffering the VBO’s that you update, although to a slightly smaller extent. In one of our games, changing from GL_DYNAMIC_DRAW VBO’s to having no VBO’s increased the FPS from 21 to 32. That is a 16.369-millisecond difference, and there were only 4 or 5 VBO’s being updated.

it doesn't matter how many times you buffer, if you try to modify a buffer that was used already in this frame for drawing, the driver will have to allocate a temporal new buffer etc.

This performance behavior is true for nearly all mobile GPUs, PowerVR, Adreno, Mali... to my knowledge only Tegra has no deferred rendering and it could be fine with updates in the middle of the frame, but even there, you could stall until the HW is done with the last drawcall that was using this particular buffer.


On the one hand you can argue that low-end PC hardware can still give you a rough idea of what to expect on iOS devices due to many optimizations being “a good idea” for both platforms.
On the other hand you might still get hit by something that hits iOS harder than it does PC (dynamic/streaming VBO’s, dependent texture reads) or you might entirely miss opportunities to optimize for iOS specifically (discarding before clearing) simply because your only experience with iOS was a port. You didn’t get into the meat of the system and really learn how it works and how to exploit it for best performance.

you can still roughly get an idea of how it will run on low end devices. optimizing states, shaders, textures, meshes, improving culling etc. will help on all platforms.

I suggest to get the Imagination SDK, it has an 'emulator' for Opengl ES. (if you work on Android, then you could also check out the Adreno and Tegra SDK, they have quite some nice tools/libs to optimize textures, indexbuffer etc. for those platforms).

#14 L. Spiro   Crossbones+   -  Reputation: 13600

Like
0Likes
Like

Posted 01 July 2012 - 01:52 AM

Actually on all platform you should first sort by shader (at least if overdraw is not an issue e.g. due to a z-pass). it's not related to drivers, it's how hardware works.

No, you shouldn’t, and no, it isn’t.
http://lspiroengine.com/?p=96
Swapping shaders is no longer such a big deal on DirectX 10 and DirectX 11 due to vertex buffers being forced to precompute an input layout—something that was done on each shader swap on DirectX 9 and still is done in OpenGL.
What happens under the hood is very much dependent on the drivers. Swapping textures and shaders is not the same cost in OpenGL vs. DirectX 9 on the same hardware.
On my machine, texture swaps are more inefficient in OpenGL, so sorting by textures first and shaders second is faster. Neither the hardware nor the scene changed.

You should always do your own benchmarks as I did when it comes to different hardware. However iOS devices are not different enough from each other that the results would ever be in favor of “shader first textures second” on one device and not on another.


changing depth order is completely irrelevant on iOS.

Which is why adding a sort slows it down. Extra work for nothing. Hence it is worth mentioning. That was my point.


on PowerVR hardware, a clear is simply a big triangle covering the whole screen.

No, it isn’t.
http://www.unrealeng...AA_Graphics.pdf

● Avoid buffer restore
● Clear everything! Color/depth/stencil
● A clear just sets some dirty bits in a register
● Avoid buffer resolve
● Use discard extension (GL_EXT_discard_framebuffer)



please don't teach people that, that's simply not true. it has the exact same cost as drawing a full screen triangle on iOS.

No, it doesn’t.
http://www.unrealeng...AA_Graphics.pdf

● Avoid buffer restore
● Clear everything! Color/depth/stencil
● A clear just sets some dirty bits in a register



You should never update buffers once you start drawing

If you can avoid it, it is best. True on all hardware.


as the PowerVR gpu is deferred, this means, it keeps all buffers until the end of the frame

This is not an exactly accurate description, since flushes can happen mid-frame if the command buffer is full etc., but approximately correct.
Hodgman correctly points out below that this part of the pipeline is the same for all hardware, and the special extra stages added for deferred tile-based rendering are completely separate. I had a brief absence of mind in my original reply.


it doesn't matter how many times you buffer, if you try to modify a buffer that was used already in this frame for drawing, the driver will have to allocate a temporal new buffer etc.

Again, approximately correct.
But without getting into the details of what else the driver could be doing (flushing etc.), my profiling revealed one function as the most costly: memcpy().
Internally, glBufferSubData() calls memcpy() (regardless of whatever else it is doing such as allocating, flushing, etc.) and this call was, in all of our samples, games, and test cases, the second most time-consuming call after glDrawElements().


you can still roughly get an idea of how it will run on low end devices. optimizing states, shaders, textures, meshes, improving culling etc. will help on all platforms.

That’s not the point. As I said, certain things are just a good idea in general, and you are going to do them regardless of what hardware you are actually using as your testbed.
The point is what you don’t do as a result of not using the real device. You end up just porting your code over and calling it good without investigating what else you can do to make things faster specifically for that hardware. And if you care about performance at all then you must use a real device because that is the only way to gain access to “Instruments” which actually help you find time-consuming functions, redundant OpenGL ES 2 calls, and various other performance problems.


I suggest to get the Imagination SDK, it has an 'emulator' for Opengl ES.

We tried working with it at work for a few months until finally abandoning it due to the number of differences between it and a real device, plus several blatant bugs.


As I have been saying. There is no substitution for a real device. Not “similar” hardware, not SDK emulators, not even the iOS Simulator


L. Spiro

Edited by L. Spiro, 01 July 2012 - 06:16 AM.

It is amazing how often people try to be unique, and yet they are always trying to make others be like them. - L. Spiro 2011
I spent most of my life learning the courage it takes to go out and get what I want. Now that I have it, I am not sure exactly what it is that I want. - L. Spiro 2013
I went to my local Subway once to find some guy yelling at the staff. When someone finally came to take my order and asked, “May I help you?”, I replied, “Yeah, I’ll have one asshole to go.”
L. Spiro Engine: http://lspiroengine.com
L. Spiro Engine Forums: http://lspiroengine.com/forums

#15 Hodgman   Moderators   -  Reputation: 30388

Like
1Likes
Like

Posted 01 July 2012 - 05:25 AM

You guys have already derailed the topic, so...

The biggest difference that pops out to me is in regards to clearing.
Clearing buffers on iOS devices just sets a flag. It doesn’t actually copy memory over the whole buffer etc.
It is instantaneous as long as it does not cause a resolve. You can avoid resolves by calling glDiscardFramebufferEXT() before glClear().
This means that what makes for a long operation on PC is virtually free on iOS.

please don't teach people that, that's simply not true. it has the exact same cost as drawing a full screen triangle on iOS.

on PC it depends on what GPU you have, if you use one that has HiZ, it will just invalidate some areas

@L. Spiro - I would expect PC GPUs to still have "fast clear" optimisation in hardware. This was present in PC hardware 5 years ago, so it should be still be around. It will likely depend on the texture format as to whether it's supported or not (more likely supported on depth and 8-bit channel formats).
@Krypt0n - HiZ can't possibly help when clearing a non-depth texture, and not every depth texture will necessarily be assigned a corresponding hierarchical representation (which yes, should support fast clear) -- there might only be a single HiZ "buffer" which the current depth target can make use of (but isn't permanently assigned to that target).

Actually on all platform you should first sort by shader (at least if overdraw is not an issue e.g. due to a z-pass). it's not related to drivers, it's how hardware works

On several widely popular (read: older) GPUs, a change to shader constants causes the same performance impacts as changing the shader program itself (which may or may not be a bottleneck for your scene, only sensible profiling would tell) -- so sorting by shader program isn't going to do anything on these GPUs if you're also changing any shader constants between draw-calls, as these are causing internal program switches anyway (unless grouping by shader helps you to reduce changes to shader-constants).

You should never update buffers once you start drawing, as the PowerVR gpu is deferred, this means, it keeps all buffers until the end of the frame, so when you lock, it will have to allocate new memory etc.
it doesn't matter how many times you buffer, if you try to modify a buffer that was used already in this frame for drawing, the driver will have to allocate a temporal new buffer etc.
This performance behavior is true for nearly all mobile GPUs, PowerVR, Adreno, Mali... to my knowledge only Tegra has no deferred rendering and it could be fine with updates in the middle of the frame, but even there, you could stall until the HW is done with the last drawcall that was using this particular buffer.

All regular PC GPUs do that (buffer your data/commands for at least one frame) -- this has nothing to do with mobile/deferred -- deferred/PowerVR style buffering is a completely different concept implemented between the primitive rasteriser and the pixel shader.

Edited by Hodgman, 01 July 2012 - 09:16 AM.


#16 L. Spiro   Crossbones+   -  Reputation: 13600

Like
0Likes
Like

Posted 01 July 2012 - 07:57 AM

On several widely popular GPUs, changing a shader constants causes the same performance impacts as changing the shader program itself

Not just the GPU hardware but drivers and API are also a factor here.
Because my engine uses a custom shader language, it is able to fully parse shader files and generate buffers that represent all of the shader constants, so when I set a shader constant it can perform automatic redundancy checks.

For small constants such as floats and vectors, this always improves performance.

But I tested larger constants—vec4 arrays and matrices. The results are a bit surprising.

Between DirectX 9, OpenGL, and OpenGL ES 2 for iOS, none shared the same results.

In DirectX 9 it is best not to redundancy-check matrices and larger types. By always calling SetMatrixTranspose() or what-have-you, even if the same matrix is being set, it seems to still be faster.

In OpenGL for the same hardware, it appears that redundancy checking is faster. Remember that this is the same scene with the same number of redundant matrices occurring every frame. The overall pipeline is identical.

In OpenGL ES 2 for iOS, I was not able to measure any difference, whether checking for redundancies or not. This might ultimately end up true, and it makes sense for the hardware, but I plan to test a few times more in the future under different circumstances.


Your results may vary, especially if you have one of those cards mentioned by Hodgman.
But this just again shows that there are too many differences between drivers and API’s for “similar” hardware to have any real meaning.


L. Spiro
It is amazing how often people try to be unique, and yet they are always trying to make others be like them. - L. Spiro 2011
I spent most of my life learning the courage it takes to go out and get what I want. Now that I have it, I am not sure exactly what it is that I want. - L. Spiro 2013
I went to my local Subway once to find some guy yelling at the staff. When someone finally came to take my order and asked, “May I help you?”, I replied, “Yeah, I’ll have one asshole to go.”
L. Spiro Engine: http://lspiroengine.com
L. Spiro Engine Forums: http://lspiroengine.com/forums




Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.



PARTNERS