Ad scene management and the stuff...
You're mixing two different things together. First of all is data management - e.g. deciding which data you need and which you don't. All textures, meshes, etc. should be in VRAM, so you can render them ... sometimes you don't use VRAM (using software renderer F.e.) ... sometimes you need them in both VRAM and RAM (rendering using GPU, doing some other magic on CPU) - so let's generally refer to this as "data in memory". Second of all is culling in general (deciding which object to draw and which don't).
Let's start with culling - because that one is simple and well defined.
Let's have a scene consisiting of objects. I'll discuss just static scenes (e.g. scenes containing static objects only), but it's easy to extend for dynamic objects too. There is a bunch of known algorithms to determine which objects are visible and must be rendered -> we basically use two types of culling - frustum culling (detecting whether object collides with camera view volume) and occlusion culling (detection whether the object is hidden by objects standing in front of it).
For ease of scene management in this case we use some kind of hierarchy where the objects will be stored, the most common are either spatial hierarchies (octrees, kd-trees, grids, nested grids, ...) or object hierarchies (bounding volume hierarchies, bounding interval hierarchies, ...) ... both of these have advantages and disadvantages, but using any hierarchy is basically always better than brute forcing. In worst case you can also hand place portals into your map, but it's bad - artists need to do this work and they will do it wrong, really.
Frustum culling is simple detection of whether object's bounding volume intersects or is contained by the camera view volume (e.g. the frustum), this is where hierarchy comes in - if some hierarchy node isn't intersecting or contained, all it's children aren't visible by camera. Otherwise we check the current node children in hierarchy.
Occlusion culling is hell complicated. It also isn't worth it for lots of cases (especially if you do some very good Level-of-Detail). One of the recent and useful for dynamic and large outdoor scenes is Hierarchical Z-buffer occlusion culling, because you don't need to precompute any data for it, all can be computed on-the-fly. For static areas, you probably want Pontential Visibility Set or PVS (it's used in games using BSP maps), basically for each node in the map you compute a set of other visible nodes, this computation is done numerically using ray casting and it's not 100% precise (for very-little nodes next to very large nodes the results might be incorrect ... so you have to use more rays = more samples and PVS computation takes longer time), but it is clearly visible that it works (F.e. Half Life 2 uses it).
Basically you want to put this into nice Scene class and just call GetListOfVisibleObjects method with camera parameter (or equivalent in other programming paradigms).
I mentioned Level-of-Detail and we slowly can move towards data management because this one is between data management and rendering optimizations. Basically you have an object, let's imagine a stone - consisting of 10 000 triangles (e.g. it's good stone). You also want the versions with 1 000 and 100 triangles so you create them in your favourite modelling application (or if you're too much of a programmer, you generate the lower detail versions). Now let's say we have this object in the middle of our world. If we're on the edge of the world, we don't need to render the 10 000 triangles version, we just render the 100 triangles version (we don't even have the higher-detail versions in memory at all). As we get closer to stone, we load better-detailed object and throw away the low-detail version ... and so on (if we get further we just need lower-detail version, if we get closer we just need the better or the highest detail version).
This saves us both, memory and the computing power. Note that in reality we most likely will need to hold the currently needed level of detail and lower levels of detail as well, because the object (like stone) will most likely be on more locations in a world and the lower detail version will also be visible.
With just level of detail, you can achieve pretty large worlds, basically any better data management of the scene is sort of level of detail technique (where F.e. at some point you throw some data off memory and dont render the object at all).
Now let's jump ahead to full data management. Imagine a HUGE world like F.e. Skyrim has. Let's make our world 10x10 kilometers. First of all 10x10 kilometers with 1 pixel in heightmap per meter, we would have 100 000 000 pixels of data only in height map, thats approx. 381.5 MB (if we have 32-bit floating point describing single value in height map).
Even if we can fit 381.5MB of data in our VRAM, we don't want to spare that much just for terrain. And also, even if we're in the center of the map, the edges are 7.071 kilometers away and we don't need 1 vertex per meter precision at that distance, it'll eat more memory than needed, not even mentioning other troubles (antialiasing?).
So we divide our world to lets say 100x100 meters (thats 0.1x0.1 km) squares, we now have 100x100 squares. For NxN nearest squares we need high quality e.g. the 1 pixel per meter (where N is F.e. 5 - e.g. some good small number - which needs 0.953MB - which is more than acceptable), for the rest of the tiles we can live with 1 pixel per 10 meters (for whole world this gives us 3.82 MB - which is acceptable). So now we can fit our world terrain height map into some 4.77 MB, which is a lot better.
Second optimization, for further terrain we don't need to have high quality models of trees, castles, bridges, etc. - we can use just low detail versions and imposers - e.g. just a single billboarded quad (these works very good for trees - Oblivion used them and Skyrim too). So we don't need to have the geometry or textures of high quality objects in memory. Of course we must load them when we load the high quality square.
If there would be enough interest I might even put an article (or articles, as this is quite big topic) on this topic together (as especially the first has quite a lot in common with my work).
EDIT: In the end I decided to put a little effort into this and write actual article (or maybe articles) on optimization of 3D rendering. It might take a while, but I think I'll manage to throw out few useful articles.