First of all. Profile to see what's your problem.
Without hard data, what I have written below is pure speculation.
Is it because XNA is slower than directx ?
This question makes no sense. DirectX is the API below XNA.
Anyway, there are several problems when streaming stuff. Granularity, latency and pressure.
- Granularity is basically as follows. You know you can spend X ms in computing a tile. By testing we figure out that our "terrain tile" might be... say 200x200 heightmap pixels / model vertices. This of course is a function of the algorithm used.
- Latency management involves pulling stuff from mass storage. Even with a SSD, it will still be massively slow compared to RAM, let alone VRAM. In general, we will need deferred texel loading. Waiting for the disk to seek is rarely acceptable. If loading is assumed syncronous, it will erode processing time from the LODding algo.
- Pressure. How much of the above you'll need to do per-frame and per-second.
In my experience the only way to deal with pressure on low-end hardware is to have precomputed LOD representations. Let me elaborate.
Maybe someone who knows more then me can show me the directions and maybe answer the question about how you can manage memory with big height maps?
Your example lacks a key feature. The tiles are not homogeneous because of construction.
That is, let's assume 4,8,12 require LOD 0. Let's say even 3,7,11 ended up in LOD 0.
By constrast, 2,6,10 are at the edge of the viewport. They cannot be at LOD 0, because this would imply you're bruteforcing everything. Now, there are various cases to manage this which depend on the algorithm we use.
In the case of octree-simplified terrains, the cell size would stay constant and reduce the polygon count (a thing I actually don't like at all for modern HW but let's carry on).
Many people just load LOD 0 anyway for 2,6,10 and then decimate polygons. This is not going to work as the work involved in generating a LOD n (n>0) is superior to generating a LOD 0 tile and it gets worse with a growing n. I guess that's what you're doing?
It's worth noticing the article you're referring to is concerned to optimizing the representation (visualization) of a terrain node. Loading LOD 0 each time to run decimation is unacceptable. There must be a way (before the visualization algorithm runs) that figures out what to work on.
So what do we do?
Personally I'd suggest to pre-compute everything (oops, that's impossible for a viewpoint-dependent method) or switch to a regular grid method which allows easier decimation (less compute for same granularity).
Alternatively: use smaller tiles (less computation, but more pressure).
Anything requiring non-trivial per-vertex work will have issues sooner or later, that's it. I'm very surprised the algorithm you refer to speaks about a per-vertex visibility test.