note that the first parameter of CreateSwapChain expects a command-queue not the device (even though the parameter name suggests this). Also note you should pass a DXGI_SWAP_EFFECT_FLIP_SEQUENTIAL in the SwapEffect field of your description (and therefore use 2 buffers). Also you should definitely specify the sample desc (like count=1, quality=0).
Also try to enable the debug layer, so validation errors are logged directly to the visual studio output window:
I'm currently working on a DirectX11 port of a old DX9 renderer and facing the problem that DX11 seems really slow in comparison to the old stuff. I've already checked all the 'best practice' slides which are available all around the internet (such as creating new resources at runtime, updatefrequency for constantbuffers, etc...). But nothing seems to be a real problem. Other engines i checked are much more careless in most of this cases but seem not to have likewise problems.
Profiling results that the code is highly CPU bound since the GPU seems to be starving. GPUView emphasizes this since the CPU Queue is empty most of the time and becomes only occasionally a package pushed onto. The wired thing is, that the main thread isn't stalling but is active nearly the whole time. Vtune turns out that most of the samples are taken in DirectX API calls which are taking far to much time (the main bottlenecks seem to be DrawIndexed/Instanced, Map and IASetVertexbuffers).
The next thing I thought about are sync-points. But the only source I can imagine is the update of the constant buffers. Which are quite a few per frame. What I'm essentially doing is caching the shader constants in a buffer and push the whole memory junk in my constant buffers. The buffers are all dynamic and are mapped with 'discard'. I also tried to create 'default' buffers and update them with UpdateSubresource and a mix out of both ('per frame' buffers dynamic and the rest default), but this seemed to result in equal performance.
The wired thing is, that the old DX9 renderer produces much better results with the same rendercode. Maybe somebody has experienced an equal behaviour and can give me a hint.
Splitting vertex-data in multiple streams can be very handy if you don't want to always provide the full amount of vertex information. Let's say you want to do a z-prepass, you don't need information like normal/tangent, color etc... The proposed solution however can easily result in a huge number of simultaneous bound vertex streams and thus can badly impact IA fetch performance.