1) Your backbuffer should not actually be MSAA-enabled. This is what's going to be displayed directly on the monitor, and it has no direct way to use the extra sample data. You'll either be resolving an 'offscreen' MSAA surface to this directly via ResolveSubresource(), buuut there happens to be another way...
2) With the advent of D3D10+ you can actually do a custom resolve by way of a shader, which is extra useful because you mention a desire to use HDR rendering. (BTW-- use R11G11B10F as your target format, it halves write bandwidth for little to no image quality loss unless you have insane lighting contrast.) I mention this because while a flat average will work just fine for standard LDR images, doing this pre-tonemapping can have disastrous effect on the final image quality for much the same reason why naive normal map filtering creates alias city-- you're doing the blending at the wrong time. For an example of what this looks like, grab any early UE3 game (Gears of War or Unreal Tournament 3) and enable AA, then watch in horror as it does basically nothing aside from reduce performance. The short version is that the tonemap operator can result in radically different brightness values for adjacent pixels and you've already thrown out the extra subsamples in the earlier resolve pass.
The 'proper' way to do things is to grab all your individual MSAA samples using the new MSAA texture feature, (Texture2DMS in HLSL, you'll need to create special shaders for each supported sample count, unfortunately, but #defines make this fairly easy) tonemap them all individually, *then* average with some multiply-adds. Emil Persson, a really clever ex-ATi demo guy, has a sample app that does exactly this, available from here. It includes source
The final, ten-thousand-foot view:
1) Render your scene to an MSAA, HDR surface somehow.
2) Bind your swap chain buffer as a render target, and bind the scene texture to a shader resource slot. You can also merge your volumetric light pass result in with the main stuff here if you'd like. Additionally, if you want to do some extra postprocessing, you can just create another non-MSAA surface and render to that instead. The important bit is that it's not multisampled, not that it's the backbuffer. D3D10 is very cool about this, actually, it was much more of a pain in the ass in 9.
3) Draw a fullscreen triangle that reads all the MSAA samples, tonemaps them, then averages. The aforementioned volumetric light merge can be done a few ways. The cheapest, though not necessarily most correct method would be to tonemap it, tonemap the main scene, then blend those two values somehow. The more correct, but slightly more expensive approach would be to add the volumetric light value into each individual main scene MSAA sample, then tonemap/average the results. ALUs are pretty cheap nowdays so if you're a correctness nutter you can probably get away with either one.
EDIT: Aaand you ninja'd me, though I suppose this answers your new question too