You're not really providing enough information for someone to offer much in the way of help:
Your CPU usage will be high when you create with software vertex processing, as the CPU will be performing the transforms for you. You don't define what you mean by 'stuck' usually 'stuck' means a hang, but I don't think you mean that from your description and I'm going to take it that you just mean the CPU usage is high (which should be the case).
Conversely your CPU usage will be low with hardware vertex processing, as your hardware will be performing the transforms for you leaving your CPU free for other things.
So you've asked a question based on expected behaviour, you should probably show the draw calls you're making.
Taking a guess, when you use software vertex processing you must make sure your vertices are consecutive in your vertex buffer, ie. when you call DrawPrimitives the CPU will transform every primitive from start vertex through the vertex count. So make sure you're specifying 100*100 triangles in your draw call rather than 100*100 triangles in a list of 100,000 or something (even for indexed vertices). That's the only 'gotcha' I can think of given your description, if that's not it I would suggest more details if possible.
Hopefully that help''s some,