Infinite terrain generation in chunks - need hints on handling chunk edges

Started by
33 comments, last by Kylotan 11 years, 1 month ago

I'm doing some terrain generation, using the usual Perlin-style interpolated noise (although strictly speaking it's value noise, as I'm not using gradients). However rather than generating a large landmass in one go, I am generating it chunk by chunk, eg. 100m x 100m areas. Therefore it's important that the edges of each chunk line up with each other in terms of their generated height. (I think this is how Minecraft does it, but I couldn't find out exactly how the height values are calculated, especially across changing biomes. And I'm not using voxels, anyway.)

Most algorithms that I've seen for terrain generation using noise seem to assume that you're generating across the whole landmass, that you have precalculated noise values that span the entire world which you then interpolate between (and repeat with higher frequencies at lower amplitudes, etc). But since my world is unbounded I don't have this. I think the answer should be as simple as having some sort of 2D hash function that can return a noise value for any arbitrary x,y in the world, and sampling that value itself is easy. But I'm having trouble thinking about how to generalise this to the system of frequency and amplitude. If I just scale up x and y by the frequency each time then I'm sampling values from elsewhere in the world which seems logically incorrect. Can anybody clear this up for me? Maybe post some pseudocode?

I have one other aspect I am trying to implement - each chunk has itself got a height value, so that I can specify that certain parts of the world are higher than others before the noise algorithm runs. I am contemplating calculating the height at each point as basically a bilinear interpolation of the chunk height with that of its neighbours, and then add the noise value for that position. I think this should be continuous both within chunks and across chunk boundaries, but I'd be interested to hear if there are any problems with this approach, ways it could be improved, or ways to combine it with the previous step to make the algorithm simpler and/or faster.

Advertisement

If you think of the whole terrain as a single function, generating a chunk just means sampling that function in a particular region. If you then sample the region next to it, you'll magically get consistent borders.

As you say, the trick is to use a hash function of the coordinates for noise generation, instead of random numbers. I don't quite understand what part of this you find troublesome, though...

Maybe it could be interesting to use multiple functions for various terrain features and see them clash in unexpected ways. Also a function can be modified to use sampling from points around it.

As Alvaro says, noise normally handles this problem implicitly.

Imagine your chunk divided into 4 "sub-chunks". Do they line up? If you're using straight value noise, they shouldn't, as value noise isn't inherently continuous.

Hence, it's an intrinsic problem of the algorithm. Gradient and other offline noise algorithms get around that by being continuous, which naturally ensures that any arbitrary chunk boundary lines up.

As for your regions idea, I have a feeling that will lead to artifacts, but don't have the time on hand to evaluate it more closely. One way to do what you're after is to blend your noise function with another very low-frequency noise function. I believe this is how it's usually done.

Alvaro - the problem is that the logic changes when you're no longer mapping an array of noise to an equal-sized array of height values. The algorithms rely not only on the noise being pregenerated but also on it being a fixed size.

eg. In one example I've seen, 1D noise is sampled by taking x * frequency % numSamples - which I can't do directly because numSamples is unbounded. I've considered simply leaving out the numSamples wrapping factor, but I don't understand if the output will be correct - after all, it means that f(x) is going to be based on the hash value of arbitrary multiples of x. Is that always going to work? As x approaches large numbers? And when x is zero?

I'm sure the solution is simple but I'm having trouble convincing myself of the correct mathematics to get it working.

Maybe you can post some simple version of the code you are working with, and we can go from there...

SeraphLance, the subchunks should all line up, because I don't use value noise directly, but interpolate from one value to the next (as most noise algorithms would). I don't see there's any inherent problem with this method of generation. It's not as continuous as other types of noise but the interpolated output can still be C1 continuous.

Alvaro, I don't have code currently because I'm trying to work out how to do it. It's not a bug in my code but a problem with my thinking.

It's not a bug in my code but a problem with my thinking.

I'm having trouble following your thinking.

Most noise functions take arbitrary (continuous) input coordinates, and produce continuous output values. Even in an infinite world, your input coordinates should be continuous, right?

Tristam MacDonald. Ex-BigTech Software Engineer. Future farmer. [https://trist.am]

My thinking is difficult to follow because I don't fully understand the problem myself! All I know is that there are plenty of implementations of fixed-width noise systems and I can't find one that works for arbitrary widths. The fixed width ones all presume a periodic function that yields up their pregenerated noise values, the indexes into it also wrap around, and I can't prove to myself whether this matters or not.

The function should handle continuous input, but in order to decide what to return it picks discrete elements from a pregenerated array of noise samples. In my case I expect I will have to pass discrete values to a hash function. It's knowing which discrete values I need to use which is tricky. I don't fully understand the theory around this. It doesn't help that the various pieces of sample code I find tend to use 'frequency' to mean either frequency, wavelength, or period, depending on each particular implementer's (mis)understanding of the term! This makes it hard for me to understand the fundamental way of deciding what the input value should be for each octave.

This topic is closed to new replies.

Advertisement