what about merge and cull at the sub mesh level simultaneously? i use something like this to generate terrain chunks on the fly from underlying map data structures. A complex chunk can contain 5,000 meshes. items that pass cull get merged. then the results of the cull-merge get drawn. i also use an approach similar to Hodgman's "radial slicing" of the scene to cull entire chunks.
in general i've found for questions like this, a good approach is:
1. try brute force. the Abrash and ID way. if that doesn't cut it....
2. reorganize your data into the format the pipeline likes best - then draw. this would include culls, merges, etc.
sometimes a middle ground between 1 and 2 is good enough to get the job done.
BTW, lots of draw calls is something one can live with. a complex scene in my current title can tip the scales at over 18,000 calls per render - and that's fixed function with no shaders and no instancing. state changes almost seem to be a bigger performance issue than draw calls. but both are still issues.
Also, sometimes you'll find that cull helps render times, and sometimes its doesn't that much. it all depends on what you're drawing. so start with brute force, then try some culling, then get jiggy with it and merge, etc.