Is this really my problem?
May be, may be not. How much time is consumed here and there can be determined only if you make a runtime analysis. However, the points shown do all contribute to your runtime problem, some more than others.
Why? Do you mean I shouldn't generally use them, or just in this case?
You should not use them in dependence on the invocation frequency and available time. You want to write a realtime application and need to expect several hundreds(+) invocations. Although the std containers are not necessarily bad, they are made for general use and, more or less, desktop applications.
You mean iterate through the animation data map, find the corresponding SceneNode for every element, and set its local (animation) matrix?
More or less, as long as "find" does not mean a search. As I've written in a post above, let the animation sub-system manage its own data. If you need an access back to the scene node, then let the sub-system have a list of pointers to all animated scene nodes. Iterating the list and accessing the belonging scene node is then a fast operation.
This part has to be done recursively right?
Well, not really. If you iterate the tree top-down, then a parent's world transform is already calculated when you visit its children, so that they can rely on its up-to-date state. No recursive calculation necessary. If you follow the mentioned DoD approach for the parent/child relations as well, then order the matrices so that parent world transforms are hit before those of the associated children.