# Infinite terrain generation in chunks - need hints on handling chunk edges

## Recommended Posts

Kylotan    9997

I'm doing some terrain generation, using the usual Perlin-style interpolated noise (although strictly speaking it's value noise, as I'm not using gradients). However rather than generating a large landmass in one go, I am generating it chunk by chunk, eg. 100m x 100m areas. Therefore it's important that the edges of each chunk line up with each other in terms of their generated height. (I think this is how Minecraft does it, but I couldn't find out exactly how the height values are calculated, especially across changing biomes. And I'm not using voxels, anyway.)

Most algorithms that I've seen for terrain generation using noise seem to assume that you're generating across the whole landmass, that you have precalculated noise values that span the entire world which you then interpolate between (and repeat with higher frequencies at lower amplitudes, etc). But since my world is unbounded I don't have this. I think the answer should be as simple as having some sort of 2D hash function that can return a noise value for any arbitrary x,y in the world, and sampling that value itself is easy. But I'm having trouble thinking about how to generalise this to the system of frequency and amplitude. If I just scale up x and y by the frequency each time then I'm sampling values from elsewhere in the world which seems logically incorrect. Can anybody clear this up for me? Maybe post some pseudocode?

I have one other aspect I am trying to implement - each chunk has itself got a height value, so that I can specify that certain parts of the world are higher than others before the noise algorithm runs. I am contemplating calculating the height at each point as basically a bilinear interpolation of the chunk height with that of its neighbours, and then add the noise value for that position. I think this should be continuous both within chunks and across chunk boundaries, but I'd be interested to hear if there are any problems with this approach, ways it could be improved, or ways to combine it with the previous step to make the algorithm simpler and/or faster.

Edited by Kylotan

##### Share on other sites
alvaro    21263

If you think of the whole terrain as a single function, generating a chunk just means sampling that function in a particular region. If you then sample the region next to it, you'll magically get consistent borders.

As you say, the trick is to use a hash function of the coordinates for noise generation, instead of random numbers. I don't quite understand what part of this you find troublesome, though...

##### Share on other sites
Nercury    812

Maybe it could be interesting to use multiple functions for various terrain features and see them clash in unexpected ways. Also a function can be modified to use sampling from points around it.

Edited by Nercury

##### Share on other sites
SeraphLance    2603
As Alvaro says, noise normally handles this problem implicitly.

Imagine your chunk divided into 4 "sub-chunks". Do they line up? If you're using straight value noise, they shouldn't, as value noise isn't inherently continuous.

Hence, it's an intrinsic problem of the algorithm. Gradient and other offline noise algorithms get around that by being continuous, which naturally ensures that any arbitrary chunk boundary lines up.

As for your regions idea, I have a feeling that will lead to artifacts, but don't have the time on hand to evaluate it more closely. One way to do what you're after is to blend your noise function with another very low-frequency noise function. I believe this is how it's usually done. Edited by SeraphLance

##### Share on other sites
Kylotan    9997

Alvaro - the problem is that the logic changes when you're no longer mapping an array of noise to an equal-sized array of height values. The algorithms rely not only on the noise being pregenerated but also on it being a fixed size.

eg. In one example I've seen, 1D noise is sampled by taking x * frequency % numSamples - which I can't do directly because numSamples is unbounded. I've considered simply leaving out the numSamples wrapping factor, but I don't understand if the output will be correct - after all, it means that f(x) is going to be based on the hash value of arbitrary multiples of x. Is that always going to work? As x approaches large numbers? And when x is zero?

I'm sure the solution is simple but I'm having trouble convincing myself of the correct mathematics to get it working.

##### Share on other sites
alvaro    21263

Maybe you can post some simple version of the code you are working with, and we can go from there...

##### Share on other sites
Kylotan    9997

SeraphLance, the subchunks should all line up, because I don't use value noise directly, but interpolate from one value to the next (as most noise algorithms would). I don't see there's any inherent problem with this method of generation. It's not as continuous as other types of noise but the interpolated output can still be C1 continuous.

##### Share on other sites
Kylotan    9997

Alvaro, I don't have code currently because I'm trying to work out how to do it. It's not a bug in my code but a problem with my thinking.

##### Share on other sites
swiftcoder    18437

It's not a bug in my code but a problem with my thinking.

I'm having trouble following your thinking.

Most noise functions take arbitrary (continuous) input coordinates, and produce continuous output values. Even in an infinite world, your input coordinates should be continuous, right?

##### Share on other sites
Kylotan    9997

My thinking is difficult to follow because I don't fully understand the problem myself! All I know is that there are plenty of implementations of fixed-width noise systems and I can't find one that works for arbitrary widths. The fixed width ones all presume a periodic function that yields up their pregenerated noise values, the indexes into it also wrap around, and I can't prove to myself whether this matters or not.

The function should handle continuous input, but in order to decide what to return it picks discrete elements from a pregenerated array of noise samples. In my case I expect I will have to pass discrete values to a hash function. It's knowing which discrete values I need to use which is tricky. I don't fully understand the theory around this. It doesn't help that the various pieces of sample code I find tend to use 'frequency' to mean either frequency, wavelength, or period, depending on each particular implementer's (mis)understanding of the term! This makes it hard for me to understand the fundamental way of deciding what the input value should be for each octave.

##### Share on other sites
cr88192    1570
note that the sample-count is usually bounded and wraps, but applies over a potentially large area, and is not used up all at once.

going outside of the sample range, as can be noted, tends to result in repeating patterns.
a partial trick here is the use of a per-region noise function (in my case, a region is 512x512x128m), and interpolating over the region-edges via a windowing function. then a hash-function or similar can be used to calculate a per-region seed.

the idea would basically to multiply the values by a sin or sin^2 curve, where as the curve for one side is falling off, the curve for the other side is rising, say to ideally create a case where f(theta0)+f(theta1)=1, where the theta values will depend on the positions within the region.

except for potentially very large/low-frequency features (such as biomes), this works ok, but then potentially a person can use a few global noise functions (the world probably wont be big enough for this to matter).

##### Share on other sites
swiftcoder    18437

You can't generate an infinite non-repeating terrain from a fixed set of samples - entropy's a bitch.

Your choices are to loop the output (fine for fairly small worlds), or generate additional samples once you are far enough away from the origin.

##### Share on other sites
Kylotan    9997

Swiftcoder: I don't want to use a fixed set of samples. Not sure where it was implied that I did? The typical Perlin noise algorithms do use a fixed set, which I'm trying to do away with by using a hash function instead (as mentioned in my first post), but I'm not sure of the implications for what values I need to pass in for each of the different 'octaves'.

On reflection it might be that I'm thinking too hard about the usual implementation and forgetting that I just need to add together several continuous functions. It's just confusing because I've seen the 'frequency' parameter defined in terms of the original pregenerated noise sample map, and obviously that doesn't work. eg. The usual site: http://freespace.virgin.net/hugo.elias/models/m_perlin.htm - has frequency as basically 'how many samples to take from the whole set of noise' or 'how many samples to take across the whole area being generated', both of which are generally equal in the usual implementation, but which doesn't apply in my case.

I guess I need to think of it as a different type of value, as I don't think frequency itself has any meaning in my situation.

cr88192: I must admit I have no idea how any of that would work - although I'm not sure I would need it if I had a function that was defined for all possible values of x any y. I think "interpolating over the region-edges via a windowing function" is essentially what I described originally, bilinearly interpolating between the values for adjacent regions - right?

##### Share on other sites
swiftcoder    18437

The usual site: http://freespace.virgin.net/hugo.elias/models/m_perlin.htm

As I've been saying for years, that site ought to die in a fire. It contains more terminology confusion than it does useful information...

Yes, throw out pre-conceived notions of how "standard" noise generators are implemented. You just need to hash on some sort grid, and generate noise based on that.

For example, I have an infinite voronoi cell noise generator. It divides the world into a square grid at some input-defined resolution, and in each cell it generates a fixed number of points. The resulting noise is a function of the distance to the points in the 3x3 grid of cells surrounding the input coordinate.

##### Share on other sites
coderx75    435

Detailed generation of "infinite" worlds is a bitch.  I've put myself through it and learned much but, currently am not applying everything I've learned because I'd have to bring my entire project back to the drawing board.  It's a good thing you're asking because you really want to get this right early on.

First off, a noise function will not cut it.  Sampling a point at (x, y) from a noise function to get a height value will produce very boring results.  Having an infinite world is pretty pointless if it's all homogeneous.  You will need to implement different methods of creation.  Noise (spectral synthesis) is always useful.  Displacement (also known as distortion or perturbation) really helps to produce natural results.  The use of convex hull diagrams, such as Voronoi Diagrams, are very important.  The ability to use modifiers on output, from basic mathematical operations to more complex diagram conversion algorithms, will open you up to more flexibility.  All of this is hell on seems though.  Ex: displacement can really put the edges of a terrain patch far out of whack with it's neighbors.

Second, how you generate terrain, handle collision detection of terrain and render terrain should probably not be the same.  It's efficient to generate large patches of terrain but more efficient to render in smaller patches, especially for dynamic detailing.  Collision detection depends on the library you are using but I try to get everything in place as it's generated.  I say it's efficient to generate larger patches because you want to leave yourself plenty of resources for dealing with seems.  The smaller the patch, the more seems that have to be managed.

I would suggest basing the generation of the terrain on both elevation and vertex normals.  Generating patches separately will always leave you with visible seems, even if the elevations match.  For example, the resulting edge of one patch may be horizontal while the edge of it's neighbor rises sharply, creating a highly visible "L" shape along the seem.  So, rather than using something like midpoint averages, include the vertex normal in your calculations to keep a more consistent surface (ex: a surface of an elevation can "lean" away from the midpoint, resulting in a higher midpoint and smoother surface).  If a surface at one edge is horizontal (or anything else), the vertex normal insures that it's neighbor is the same.

Another solution would be to generate "areas" or "continents" before generating actual terrain.  The area can be defined by varying dimensions, by a particular method(s) of generation and produce a terrain with a central high point and lower-lying edges.  This could be overlapped by other areas, only the higher values of the two areas being used and everything else discarded.

There's a lot to this and I've taken tons of notes (much of it in exasperation).  Dealing with texture generation, shadowing, collision detection and, likely, multi-threading, can be a real nightmare.  Once I'm done with work, I'll try to go over my notes and get back to this thread.

##### Share on other sites
Kylotan    9997

I appreciate the mention of Voronoi methods and the like but all that is out of my scope - height-mapped noise is sufficient for my purposes. I may not have made it clear but I can't generate a whole world at any point - the idea is to be able to procedurally generate individual chunks as and when they're required, which requires a continuous function that I can sample at any point in the future to get results from.

I'm also glad that the link I cited is considered to have problems with it because that helps to explain why I've had trouble reconciling my idea of what the terms mean and my idea of how the algorithm would work.

I'm still slightly unclear on how to implement my algorithm however, given that every single resource I'm finding has some terminology issue or a fixed-size assumption. Given an input value 'x', how do I derive a height 'y', given an arbitrary number of 'octaves', ie. coherent noise functions with changing amplitude and frequency in inverse proportions? I figure that I need two adjacent hash values and the frequency value for the current octave determines an interpolated point between the two values - but I have been staring at this all day and really can't work out how to implement it properly. I'm sure it's a trivial one or two liner but I don't know how to do it.

##### Share on other sites
swiftcoder    18437
Given some arbitrary noise function [tt]Noise(x, y)[/tt], a fairly vanilla fractal function might look like this:
generate(x, y) {
frequency = 1.0
amplitude = 1.0
octaves = 8

result = 0.0

for i from 0 to octaves {
result += Noise(x*frequency, y*frequency) * amplitude

frecuency *= 2.0
amplitude *= 0.5
}

return result
}

Where octaves is the number of layers to combine, frequency is the frequency at which to sample the noise function, and amplitude is the magnitude of each layer.

I'm not seeing any dependency on world size here (at least until you run out of floating point precision). As long as your [tt]Noise(x, y)[/tt] is continuous, your world can be any size, and you can generate it in whatever increments you desire.

##### Share on other sites
Kylotan    9997

But I don't see that would work for an arbitrary noise function. Imagine the noise function is based around a hash function that takes an integer in and yields a float from 0-1 out. The only way I'd ever get any smoothing is if I pass in fractional values of X which force it to interpolate between two adjacent hash results. But in this example, the frequency is always a positive integer. So if I sample at x=1, x=2,... x=123456778, I'm always going to be handling whole numbers in the noise function and getting arbitrary non-interpolated numbers back out, making the whole coherent noise thing pointless.

It seems to me like I want to be dividing the input parameters by frequency rather than multiplying by it? That makes it "the frequency with which we sample between 2 points", which makes sense on some level. But nobody seems to be doing that, hence my (continued) confusion.

Or alternatively, the noise function doesn't expect to have random numbers exactly on the whole-number boundaries - but then I'm not sure how to construct it in a reasonable way.

##### Share on other sites
swiftcoder    18437
The point of something like fBm (which is roughly the function I reproduced above), is that successive octaves display increased detail - each layer decreases in both scale and amplitude (scale being the inverse of frequency).

Think of it this way: if you view your noise basis as a 2-dimensional image, then increasing the frequency is equivalent to zooming out the image. When you zoom out the image, visible features become smaller, which increases the visible detail.

If we go your route, and take it as the "frequency with which we sample between 2 points", then your noise basis needs to have infinite levels of detail from the get go (at which point you might as well just sample it once; it wouldn't need multiple octaves to build up fine detail on top of a coarse base). Edited by swiftcoder

##### Share on other sites
Kylotan    9997

I understand the principle but the implementation isn't working. I know the idea is to increase the detail at higher octaves; but as I see it, the detail is going to be at maximum right from the start if I'm sampling at integer boundaries (which I will - because ultimately I need to pass this data to a system that needs discrete values).

Example:

float hash(x) { return [0.56, 0.1, 0.99, 0.44, 0.63, 0.72, 0.1, 0.72, 0.01, 0.2]; } // pretend these are pseudo-random
float Noise1D(x) { return Interpolate(hash(x), hash(x+1), FractionalPartOf(x); }


Now if I call Noise1D with the numbers 0 and 1 consecutively, I get exactly 0.56 and 0.1 out.

If I double the frequency and call it with the numbers 0 and 1, I get exactly 0.56 and 0.99 out.

If I double it again and call it again, the results are 0.56 and 0.63 out.

None of these values will ever hit the interpolation part as FractionalPartOf(x) always returns 0. In this case the Noise function is no different from a discrete random number generator, and no smoothing is being performed. The detail is effectively at maximum all the time, for any whole-number value for frequency.

I need a system where I can call Noise(1) and Noise(2) and get 2 values that have some degree of coherence. Either this requires a rethinking of what exactly the 'frequency' is or I need a better definition of what Noise(x) is meant to look like. I can't see how the frequency value can be premultiplied with X and Y because surely the implementation of Noise needs both the X,Y and the frequency information to find 2 samples to interpolate from?

##### Share on other sites
cr88192    1570

Kylotan, on 25 Feb 2013 - 14:38, said:
cr88192: I must admit I have no idea how any of that would work - although I'm not sure I would need it if I had a function that was defined for all possible values of x any y. I think "interpolating over the region-edges via a windowing function" is essentially what I described originally, bilinearly interpolating between the values for adjacent regions - right?

it would be more like how one interpolates between the IMDCT output-blocks in something like MP3 or Ogg/Vorbis.
(may help give an idea at least: http://en.wikipedia.org/wiki/MDCT ).

basically, each region would have its own local set of interpolated sample points, which are used as its own local noise function.

now, when blending between regions, the local sample-values would be computed independently within each region, and the windowing would be between the various noise functions (simulating a larger-scale continuous function from a large number of smaller-scale periodic functions). Edited by cr88192

##### Share on other sites
swiftcoder    18437

None of these values will ever hit the interpolation part as FractionalPartOf(x) always returns 0. In this case the Noise function is no different from a discrete random number generator, and no smoothing is being performed. The detail is effectively at maximum all the time, for any whole-number value for frequency.

You can always choose to sample a function at such a low frequency that it does not appear continuous. I guess I'm confused why you insist on taking that frequency as your base case?

##### Share on other sites
JTippetts    12970

You're always going to have problems with the typical variants of noise if you insist on sampling at integer boundaries. In fact, if you use Perlin's gradient noise, that will get you values of 0 across the board, since gradient noise is constructed specifically to cross the 0 line at grid corners, thus shifting the "peaks" and "valleys" of the function off the grid points and out into the middle of the cells somewhere. If you only sample at grid corners there, you miss all the peaks and valleys completely. If nothing else, just divide your integral coordinates by the size of a chunk; this will map unit-sized pieces of the domain to your chunk, then you can modify the frequency of the individual functions comprising your generator to tune it for feature density.

A lot of implementations of noise that you find on the web have a relatively small period. The reference implementations of gradient and simplex Perlin noise, for example, have a period of (IIRC) 255 samples, since they hash unsigned char coordinates. For ANL, I built my hash using a FNV-1A hash that can be applied to arbitrarily sized arrays, using it to hash integer coords (or long, or even long long if I so desired, though I have yet to find a need to have a function with a larger period than an int type can provide) then XOR folding the hash down to access the 256-entry lookup table of gradients. (The 256-size table isn't really necessary; you can get away with an 8-entry table) This gives me an effectively unlimited function. After that, it's a simple matter of mapping out small pieces of it to chunks. The hash can be used to hash any required dimensionality of noise (ANL goes up to 6D, in order to implement a special kind of 3D seamless mapping if desired). It's a bit slower than the simple byte folding in the reference implementation, but not too bad, and I have done small scrolling tile-map style demos of infinite worlds just to test the concept, and it has been sufficient.

As far as your idea to have a base height that is interpolated, I don't really see how this is different from just adding in another very low frequency noise layer with the higher frequency detail layers on top of it. If your chunk size is 100x100, for example, and you are dividing the integral coordinates by the size (100) to map a unit-sized cell of noise to your chunk, then a base layer with an internal frequency of 1/100 would net you approximately 1 "feature" (a low or a high) per cell/chunk, smoothly interpolated by the basic noise generator to be continuous, upon which would be added the subsequent detail layers.

##### Share on other sites
Kylotan    9997

I don't get any choice of where I sample the function. I'm generating a heightmap and for that I need a value at x=1, a value at x=2, etc. I only have the choice of scaling the function to suit what I sample, or wrapping it in another function that does the scaling, but the combination will be equivalent. So really I need a function that is smooth to some degree across several integer values.

I'm having real trouble describing my problem and as a result nobody is quite understanding me. Let me try and describe it as if I'm a newbie and hopefully it'll be clearer.

• I have an infinite world, coordinates going from -infinity to +infinity. (Will describe as one-dimensional to keep it simple.)
• I generate chunks that are 100 units long, .eg. from -200 to -100.
• I need samples at every integer position along that range, -200, -199, -198, ..., -100
• I want to generate the height maps based on a simple sum of several rounds of interpolated value noise - not real Perlin noise, or Simplex noise, etc. - each weighted by its 'amplitude'. I don't need nor care about higher-level continuity or any of that.
• The highest frequency for any of my rounds of noise should be equal to the height map resolution. Values at that level should appear completely incoherent from one value to the next. The amplitude will be minimal for this round. The output at each position here could essentially be determined as hash(x) * amplitude.
• Each successive round should generate data with more apparent coherence than the last, up to round N which will be so smooth that adjacent values will always be similar.
• I do not know what function to use for the successive rounds. The value needs to be interpolated between 2 separate hashed values. I don't know how to select both the inputs to the 2 hash functions, based on the integer value of x. I need to know how to choose those 2 inputs in such a way that I get to vary a frequency parameter (by that name or another name) to alter the level of coherence between adjacent integer samples.

As far as your idea to have a base height that is interpolated, I don't really see how this is different from just adding in another very low frequency noise layer with the higher frequency detail layers on top of it.

The only real difference is that these base heights are predetermined outside of the algorithm, that's all. It might be simple to add this as a trivial final 'layer' - if only I could work out the logic to get the other layers first.

Edited by Kylotan

##### Share on other sites
JTippetts    12970

I don't get any choice of where I sample the function. I'm generating a heightmap and for that I need a value at x=1, a value at x=2, etc. I only have the choice of scaling the function to suit what I sample, or wrapping it in another function that does the scaling, but the combination will be equivalent. So really I need a function that is smooth to some degree across several integer values.

I'm having real trouble describing my problem and as a result nobody is quite understanding me. Let me try and describe it as if I'm a newbie and hopefully it'll be clearer.

• I have an infinite world, coordinates going from -infinity to +infinity. (Will describe as one-dimensional to keep it simple.)
• I generate chunks that are 100 units long, .eg. from -200 to -100.
• I need samples at every integer position along that range, -200, -199, -198, ..., -100
• I want to generate the height maps based on a simple sum of several rounds of interpolated value noise - not real Perlin noise, or Simplex noise, etc. - each weighted by its 'amplitude'. I don't need nor care about higher-level continuity or any of that.
• The highest frequency for any of my rounds of noise should be equal to the height map resolution. Values at that level should appear completely incoherent from one value to the next. The amplitude will be minimal for this round. The output at each position here could essentially be determined as hash(x) * amplitude.
• Each successive round should generate data with more apparent coherence than the last, up to round N which will be so smooth that adjacent values will always be similar.
• I do not know what function to use for the successive rounds. The value needs to be interpolated between 2 separate hashed values. I don't know how to select both the inputs to the 2 hash functions, based on the integer value of x. I need to know how to choose those 2 inputs in such a way that I get to vary a frequency parameter (by that name or another name) to alter the level of coherence between adjacent integer samples.
The only real difference is that these base heights are predetermined outside of the algorithm, that's all. It might be simple to add this as a trivial final 'layer' - if only I could work out the logic to get the other layers first.

Unless I'm mis-interpreting you here, what you are describing is standard fractal noise. The basis function (whether it be one of Perlin's variants, interpolated value noise, or whatever) is completely irrelevant. In fractal noise you pile up successive layers of noise, each layer having a smaller amplitude and higher frequency than the preceding layer. (The most common variant uses half the amplitude and twice the frequency of the preceding layer).

As far as using integer coordinates, if you absolutely must use them then just set the internal frequency of your generator function to, say, 1/100. Successive layers can have their own internal frequencies relative to that. That way, you can still use integral coordinates it's just that internally they are converted to smaller ranges before being passed to the generator.