Most of the time I use reasonably opaque vertex data as Dingleberry suggests. Any time I need to do operations on the mesh though, I tend to split it up into several streams of data e.g. Positions and UVs and tangents etc. Alternatively, I also have functions that take an e.g. Vec3* to operate on positions and a stride value that defines the size between each vertex. So to increment between each vertex you can do ((char*)pPositions) += stride.
I use handles for pretty much every resource that needs a direct link - I have different ways of managing them, but most of them are 32bit and store an index to an array of resources along with 8 bits or so reserved for a "magic" ID so I can track resource lifetime (i.e. if the resource is alive or freed). I think this might have been the first thing I read about it years ago: http://scottbilas.com/publications/gem-resmgr/. It's evolved since then but the premise is the same.
I tend to use hashes when serialising if one thing needs to look-up another. I've started to use a unique string index that's created at build time along with a dictionary (i.e. list) of all strings used. That can be used for compares + easy string lookup for tools and debugging, and saves having to keep the hash + string pointer around. However, it requires some additional management that i'm not sure is worth the pay-off yet.
If you pass in the debug flag when creating the device do you get any warning/error messages?
I remember I had a bug where running the debugger with a D3D9 game (years ago!). Not closing down correctly would occasionally leave the device in an unhappy state, causing it to fail during subsequent creation calls. I've not seen anything like that since though.
There are few things you can implement to handle large scenes:
1) View frustum culling (as mentioned by vinterberg). With this you avoid rendering anything outside of the view frustum as a simple way to keep draw calls down. You typically do this by having objects bounded by a simple shape (e.g. Axis Aligned Bounding Box or Bounding Sphere), which are then intersected with the camera frustum. To improve this, a hierarchy or grid is often used e.g. you first check if the bounding shape of an entire block of buildings is visible, if so, then check the individual bounding volumes within. Look into grids, nested grids, quadtrees and octrees as structures often used to accelerate this process.
2) Level-of-Detail (LOD) systems: Instead of drawing a full detail version of an object at all distances, you prepare simpler versions of the object which are then used when far enough away from the camera, so much so that the extra detail wouldn't really be visible. You can also stop drawing small or thin items at a distance too e.g. if you look all the way down a street, the rubbish bins needn't be rendered after a few hundred meters etc. For a very large scenes, people will sometimes use billboards or imposters (quads that face the camera essentially, very similar to particles) as the lowest level of detail. Generating the LOD models is often a manual process, but it can be automated in some cases.
3) Occlusion culling: Being inside a city, you can make use of the fact that buildings often obscure lots of the structures behind them. There are a number of techniques to do this:
i) you can break the level into parts and precompute which parts of a level are visible from each area (search for Precomputed Visibility Sets). This technique is fairly old fashioned as it requires quite a bit of precomputing and assumes a mostly static scene, but it can still be handy today in huge scenarios or on lower spec platforms.
ii) GPU occlusion queries - this is where you render the scene (ideally a much simpler version of it) and then use the GPU to determine which parts you actually need to render. Googling should provide you with lots of info, including the gotchas with this approach.
iii) CPU based occusion culling - this can be done by rasterizing on CPU a small version of the scene into a tiny zbuffer-like array, which is then used to do occlusion queries much like the GPU version. This avoids the latency of the GPU version at the expense of more CPU cost.
iv) A mix of both GPU and CPU approaches where you re-use the previous frame's zbuffer.
There are other methods but I think these are the most common.
Just throwing in another idea that's a slight change on the render key approach that I'm trying recently:
In my hobby engine I maintain a list of sorted material instances - they're sorted by state, shader, texture and hashed constant data. They don't need much sorting after being loaded (changing material params or streaming in new materials requires another small re-sort). I sort these much like you would with a render key and maintain a list of indices into the material instances. Additionally, each material instance also knows its sorted index.
When I render, rather than trying to pack lots of info into the render key, I make use of the fact that the material instances are already sorted and simply use the material instance's sorted index + mesh data (vb/ib handles munged together) to then sort the data. I can also pack depth in at this stage since I don't need much space for material index (14 bits currently and that's overkill for my needs).
1) As far as I can tell, the metallic parameter controls how the primary colour is used i.e. whether it goes into diffuse (metallic == 0) or into specular (metallic == 1) or a mix somewhere in between. When metallic == 1, the diffuse colour is black (0), when metallic == 0, the specular colour is assumed to be 0.04 (which is typical for the majority of non-metals as they tend to lie between 0.02 and 0.05 for common things). Perhaps something similar to this in shader terms:
2) I *think* the cavity map is multiplied by the resulting specular value to help darken it - light from an environment map tends to look a little too bright in small creases without proper occlusion being taken into account, so this is a reasonable work around for it. They no longer need a separate specular input (except for special cases) as it's handled by the single colour input and the metallic parameter.
Regarding getting light values for things - I've had some success capturing my own with some of the light metering iOS apps (e.g. LightMeter by whitegoods). I doubt it's super accurate, but it does a good job illustrating how crazily different light values can be.
IIRC in a recent presentation on the new Killzone, one of Guerrilla's devs said that they'd modified the code to match their BRDF when doing the integration, so material roughness is treated uniformly for all light types.
Creating multiple sets of assets at differing resolutions seems to be a fairly popular solution to the problem - I can't remember the game right now, but I know one popular (although fairly old) RTS did that.
The other thing you can try is to break your UI elements up into corner, edge and center pieces that you manipulate in different ways. This is often referred to as scale 9 (see here: http://jessewarden.c...ilverlight.html). It adds some complexity to things but is used in Flash regularly.
Depending on your UI style, you could create the entire thing from geometry and avoid bitmaps entirely.