If two scopes have completely different lifetimes, they cannot share their contents without duplication/[font=courier new,courier,monospace]memcpy[/font]ing. Ref-counting is not used for most assets.
You can however form a parent/children hierarchy of scopes that can share assets up the tree. E.g. if a "level-chunk" scope was parented under a "whole-level" scope, then if you asked the "level-chunk" scope to load an asset that already existed in it's parent, then the asset would not be reloaded (it would be fetched from the parent).
The situation I've dealt with in the past is the user 'equipping' a skill or weapon in an RPG, and having the need to pre-emptively load all assets that might subsequently be used for that selection. Assets are potentially shared between related scopes in this scenario, and in a heavily memory bound system like the last generation of consoles, duplicating asset data is something to be avoided. The sharing needs to happen horizontally (between siblings), not vertically between a parent/child. I can't see a safe way to achieve this type of differential loading (load scope B, but reference any shared data from sibling scope A) without some kind of reference counting mechanism. This is not to say reference counting couldn't be implemented relatively easily into the asset scope framework you have described. This may not be suitable if the memory fragmentation associated with such an approach is a problem. It was the right fit for one specific project I worked on, with load times down to a fraction of what they would otherwise be, by allowing preloading of almost all level assets during character/level select screens.
What processing operations / data transformations does the entity node provide? In this case, if it's only required to group sub-objects in the editor, then it doesn't need to exist in the game -- the car will emerge simply by loading the actual components that were specified under the car node, without creating a representation of the car node itself.
It provides no data transformations, existing only as a point of reference, a collection of properties (expressed as interfaces at the entity level) defining a logical entity. It will have a tree of objects below it (chassis, wheels, whatever), but the actual control interfaces - how it translates user inputs, or whether an AI operates it, will be (optionally) defined by interfaces assigned to that root 'car' node.. or it won't have any interfaces associated with it, in which case it will basically exist as a static object.
Just because your XML is a "scene graph" of nodes, that doesn't mean that your de-serialised XML data has to follow the same structure. You could dump all the car-components into the scene and connect them to each other without having an node retaining links to them.
Yes, though the car could be considered a component in and of itself, having intrinsic properties (or, it might not..). If not, you might choose instead to express a car as a collection of networked (linked) entities as you describe - though some designers might find this a bit cumbersome (if they want to link another entity to the 'car', can they still link it to the root car node given it has no direct runtime representation?).
Passing the map of factories into the entity only makes sense if the Entity's purpose for existing is "parsing XML files". In that case, then before parsing an XML file, I'd construct a map containing all of the factories that your scene XML files could possibly require, and use that map during any deserialisation routine. However, in this case, I'd move this logic out of the Entity class altogether and into an XmlSceneParser instead... and again get rid of the Entity as a misplaced concept, or find it's real purpose and make it do that.
I agree the loading is better done from some loading/parsing framework, completely outside of entity. Except where you start drilling down to concrete implementations, for example loading an EngineDynamicModel component or something - where data to load is a bit more private in nature.