Managing large amount of data.

Started by
7 comments, last by SirWeeble 8 years, 5 months ago

I'm working on an infinite terrain generator - and so far, so good. I've got what is likely to be a problem in the future though. The player-save-data.

The terrain itself is reading from perlin noise with some modifications to let me control it's result. When the player changes anything, that object's relevant data is processed into a uLong and put in a list. Every so many loops, I then check the complete list to see if the rendered tile's position is within the render area. If it is, i put it into an array that corresponds with the tile - so my terrain-maker can access the data without looping through the entire list. My saved data doesn't attempt to save the entire infinite world - just the player's alterations.

Right now, there's no problems whatsoever, but I'm just testing now, so I have maybe 10-100 tiles in the list. So when i do the periodic "InRenderArea" check, it only has to process those 10-100 items. BUT.. players could have 1,000 to 100,000. Or higher I suppose. Since 100,000 tiles would be .8mb, I'm guessing that it would have an impact on performance since I'm trying to populate my Render-Area array every 10th loop. I'm not too sure how much data is too much to process in a single loop.

I'm not sure what the best way to deal with this would be. I thought about doing a fairly complex system that splits the world up into grids and stores the data in structures that correspond to them. But, seeing as I'm lazy I'd rather find an easier solution than that. My lazy-solution was going to be to have a "near list" and "far list" and every loop, process through maybe 100-500 or whatever "far list" items. If they're close to the render-area, then they'd be put into the "near list".

Would this be sufficient enough to avoid a performance hit (or late updates, resulting in non-present tiles) , or should i go for the more elaborate multi-grid solution?

I apologize for the lengthy preamble, but I figured details were necessary.

Advertisement

You could conceptually cut the world into larger areas, have each area save its own array of changes and only search areas visible to the player for rendering.

Why don't you try a longer game or generating a large changelist and then profiling before deciding if its worth optimizing?

To scale a world unlimited brings more pitfalls than only many data. You will soon hit floating point precision/scale issue, bringing a need for floating point multiverse. Also if you will need to process a register of your universe to construct a scene, it is wise to have this register not universal, but rather create "load points" of appropriate register, but this brings issue of "I need objects universal and explicitly unique".

But you can still choose to go a simple solving way, make loadings of subverses, save state of each subverse when unloading, and save all progressing and transferring objects separate as well (player, actors etc.), which would happen to be synchronized with subverses state.

Now the question is wheather objects in your universe will have some explicitly unique universal property, even if it was a unique number, you will run out of it, and some universal exclusive information will be needed if subverses can relate/transfer its objects into eachother. But you can still make it some 32 bytes binary numerical identifier what would allow you for.... hey, even our universe is maybe not infinite :)

To scale a world unlimited brings more pitfalls than only many data.
This is very true. I will assume that when you say "infinite" you really mean "kind of big, no hardcoded limit".

So the user may have 100,000 changes on arbitrary tiles. What now?

Spatial hashing to the rescue. The big advantage of spatial hashing is that it nicely fits the "infinite, or rather very big" thing very well. You basically have a grid, but you don't have to commit to a maximum size. Anything that is at any position, no matter how far away, will land in some bucket, somehow. If you have 1,000 buckets, each bucket contains 1/1,000 of the total number of items (on the average, +/- a few percent).

Which means the user may have 100,000 to 500,000 modifications all over the world, but you have a spatial hash with 1000 buckets, and you only ever need to look at something like 100-500 items which fall into that bucket.


So the user may have 100,000 changes on arbitrary tiles.

I have been facing the issue of "too large saves" as well, and the best I came up with was, that globaly, an object can only refer a single object, this allowed me for reasonable unscaling saves as I did not want to go the route of accumalitve saves, rather compressed global state. The referred object might by anything (it is mostly player though, or greedy NPC), but the refered object can be as well an advanced detailing state, that is created, but comes rare and so does not accumulate in notable manner. But to optimize I still did changes collective logic, to not work with save defintion reduntantly, but at least I got me the limit.

The universal defintion of anything that exists there contains really information only that is very close to real universality, 3d position + radius + unique id + refer id. All other definition is static and resides out of universal registry, and if I wish to modify some object further, I acttualy use other defined object switch/evolve.

you basically have two possible approaches:

1. terrain chunks. paged off disk most likely. this is what wintertime is talking about. you divide the world into a grid, and chunks near the player are kept in a cache in ram. when a player moves and approaches a new chunk not in the cache, you see if a mod-ed version is on disk, if so, you load it. else you generate via perlin as usual. if the player mods a chunk in the cache, you write the change to disk. caveman uses terrain chunks in combo with change or exception lists.

2. change or exception lists. IE what you do now. you keep a list of all the changes the player made. when you draw, you take the changes into account. caveman does this as well, things like huts, bedding, lean-to's, etc. cause nearby trees, rocks, and grass (stored in terrain chunks) to not be generated. but there's a max of 1000 dropped objects in the game at the moment, so iterating through the lists when you generate a chunk isn't that bad. and since its caches chunks instead of iterating through the changes each render, you only get a performance hit when a new chunk is generated. FYI i use superimposed sinusoidal waves and a heightmap seam fixer as opposed to perlin noise for ground mesh generation.

an optimization of change lists is some sort of indexed change list for large change lists. this is what samoth is talking about. here you use an index system to reduce the number of list entries you have to search. caveman uses indexes for terrain chunks (5000 entries? indexed on texture, then mesh), the render queue (20,000 entries? indexed on texture then mesh), the list of caves (60,000 entries indexed on map square x,z), and the list of huts (28,000 entries indexed on map square x,z).

start with brute force. run a test with 100,000 changes or a million changes in the list. if its ok, you're good to go!

if its too slow, try an index system of some type. i tend to favor bucket sort type indexes myself. they tend to be the fastest.

if its still too slow (which may be unlikely), try saving changes to disk on a per chunk basis (method #1 above).

Norm Barrows

Rockland Software Productions

"Building PC games since 1989"

rocklandsoftware.net

PLAY CAVEMAN NOW!

http://rocklandsoftware.net/beta.php


Since 100,000 tiles would be .8mb,

While I think wintertime is right especially about profiling I'd also like to point out that worst case scenario is 100,000 seperate access's to memory which modern average PC's can handle 33 times over per frame at 60hz. So you should be alright.

The math:

Common RAM in PC's today DDR3 1600

1,600,000,000hz / 8burstmodetransferlength = 200,000,000 burst transfers per second

200,000,000 / 100,000 access's = 2000

2000 / 60fps =33.33333

-potential energy is easily made kinetic-

As stated above - technically the size is limited by floating point errors. So, actually just really really big. My save data keeps it's space via chunkX,chunkY,gridX,gridY. max of 9999 chunks. The Chunks are 64x64. So, The actual size is 639936x639936. Right now, it can also go into the negative, so double that value, but I may drop them.

I could expand that, but I figured it's pretty big already and unlikely that the average player would really ever find the edge of the world, and if they do, I can just make it loop around or put up some sort of barrier.

I haven't heard of bucket-sort. Just looked it up on wikipedia. It seems like this would probably work well with what I've currently got. I think I'm going to give that a try to see how it works.

BTW are you sure about your .8MB figure? It seems on the low side.

-potential energy is easily made kinetic-

It was based off of that fact that ulongs are 8bytes each. 100,000 ulongs = 800,000 bytes = .8mb. I'm also only saving player alterations. The rest of the world is generated by an algorithm.

I actually changed this though. Now I'm saving the data as a list int[3]s, which will save the x location, y location, and object id, then putting it in an IEnumerable class and saving the class.

Previously I was parsing the data by reading individual bytes as strings, but that ate up too many resources - doing a bunch of concatenates all at once. So now the files are larger and slower when I start the program, but the access speed is much quicker during gameplay, but "slower" at .8mb or even 8mb at loading still isn't very bad.

This topic is closed to new replies.

Advertisement