I want to be helpful, but it's difficult for me to reason about how the architecture works from a high-level graph like that. For instance, what information do your scenes contain? What constitutes a scene? Is your front menu a scene? How will the scenes be loaded/unloaded? Will you support dynamic, asynchronous loading of level fragments as the player plays through the level? Is each one of those fragments a scene? If not, how do you keep track of which assets are actually needed and which aren't? Some sort of usage counting? These are the sorts of questions that generally give rise to an overall architecture. However, these questions are usually not answerable from a high-level graph of class names alone.
I tend to have more success building up levels of abstraction from the bottom-up. I know how to draw a scene using low-level API calls, but that is tedious and painful. What tools and abstractions would be helpful here? Identify the biggest pain points and fix them first. For instance, there's so much that goes into just loading and initializing a texture with all of the right parameters (size, dimension, mip levels, etc.). You may even need a small constellation/hierarchy of classes to make this less painful while retaining the flexibility you need. And how much flexibility will you need, by the way? It's important to keep your design goals in mind. Are there any extreme optimizations you have in mind in order to reduce texture state changes? Atlasing? Placing multiple same-sized textures into one array? Sparse virtual texturing? You'll want to make sure that your low-level design doesn't preclude these options when you get to a higher level.
I don't know, maybe that's just me. I just find it hard to *start* with the highest-level abstractions. It seems like, once you start to get to the granularity of concrete, single-purpose objects, you may find that the boundaries you drew at the higher levels aren't going to work so well. It looks like you've designed some engines before, though, so maybe you just have some domain knowledge (even if that knowledge is just your usual way of doing things) that I don't have.