Yes that sounds correct
States can be anything that output a pose without needing a real input, other than some settings you can configure for that state (like which motion to use).
So anything that 'generates' an output pose can act as state. So instead of a predefined motion node you can also have say a procedurally generated motion node that does some physics based walking motion or so.
These states can be connected with transitions. These transitions can have conditions on them, which define when transitions should be activated. For example if you have an idle state and a move state, you can make a transition from Idle to Move, and put some condition on it, which triggers when the character 'speed' parameter becomes bigger than zero.
So when your game would then pass a value of say 0.5 for speed, it would automatically make the transition from idle to move. During the transition both outputs poses are calculated of both the idle and move state, and a blend is done between them.
The move state could be a blend tree. Blend trees have some final node to which you connect. This final node then represents the output pose of the blend tree.
A state can also be another state machine. So the output of that state would be the output of the state machine it represents.
This way you can build hierarchies. I think Unity 5 will also support that. I think its amazing they didn't/don't support hierarchies yet, as that makes it almost useless or impossible to manage.
I have seen some of our clients of our animation middleware EMotion FX have graphs of over 1500 nodes. Imagine having to place that all in one state machine. That would end up in one big spaghetti of transitions
Btw, to answer your question if you over-complicate the use of blend trees: you seem to pick between different motions quite frequently.
In theory you wouldn't need a blend tree for that, but in practise that can do that as well.
You could either make that a state machine that picks the right motion or pose based on say what kind of stance you are in, or you could make a blend tree which has some blend node which takes multiple input motions, and your weight will represent which motion to pick. The blend node can automatically blend between them. Or you make a node that picks a given motion based on an input value, without blending.
For your jump types you would most likely want a state machine rather than blend tree, as you are not likely to blend between different types of jumps.