I am going to answer this as accurately as I can.
1) Does the d3d api return right now when call it? or return after it completely execute ? such as drawXXXX , setXXXBuffer
It returns immediately, Unless it is a Async one.
2) the UpdateSubresource method modify the ram in graphic card directly or modify a temporary buffer in cpu side?
That is upto the driver, But AFAIK the driver generally stores it in RAM until an appropriate time to upload to VRAM.
3) If the buffers are live in ram of graphic card, the setXXXBuffer(buffer) is just call gpu to use that buffer in graphic ram?
AFAIK from D3D10+ is that the data could be moved to system RAM and then sent back when needed due to VRAM constraints.
But yes, The data has to be in VRAM for the GPU to use it.
4) What discrepancy between context->Map/unmap and context->UpdateSubresource , the context->Map will cause lock?
I don't recall the difference off hand, It will be best if you read the API doc's on what they do.
5) If I have n cbuffer, it sounds context->SetConstantBuffer(0, n, &buffers) will much faster than SetConstantBuffer one by one,
but some guys say perframe cbuffer must set once per frame, and per object cbuffer will be set many times per frame, Is it necessary
for me to set perframe cbuffer together with perobject cbuffer to gain the "one commit will faster than multi commit" ? or set them dependently?
The fewer commands sent to the driver/GPU are better.
A buffer stays bound to the D3D pipeline till something else is bound in its place, But you can update the contents without having to rebind it.
A per object cBuffer HAS to be set for each object, hence its name as it is useless to render object N+1 with N's data.
Also: Profile, profile, profile.
Not everything will affect the performance of your game/app the same way as others do, Outside of general API usage.