• entries
3
10
• views
2104

# N-vector data structure to store and query for scatter points

739 views

#### Introduction

The impetus for this research topic is concerned with a data structure that efficiently stores and queries scatter point data. These points may originally be neatly arranged on a grid, or randomly scattered across the surface of a sphere (e.g. the earth), but it makes no assumption in that regard. The data structure could have member functions such as add(point) or add(points, amount). The points themselves contain three scalar values: the longitude, the latitude, and the data value itself. This data value could be geographical data such as land height or weather data such as air temperature, but again, the data structure is agnostic about the data itself other than as a scalar value.

#### Data Structure

The purpose of the data structure is ultimately to efficiently and accurately query for point(s) that is(are) nearby another point, the reason being to compute the average value at that point. Suppose a requirement is to create an image representing the air temperature (or rainfall, cloud cover, whatever the scalar value of each point represents) - this data structure could be queried for each point (i.e. pixel) in the image. Once nearby points are queried, the algorithm to average the scalar values of each point is outside the scope of this, but a good candidate is the inverse distance weighed algorithm.

Within the data structure, the data points will not be stored in spherical coordinates, but as n-vectors. The reason for this is that the sampling algorithm - finding N closest data points - is more efficient and accurate using vector algebra. While it is outside the scope of this, to summarize: it is easier to determine how close two points are on a sphere if these points are represented by unit vectors than by latitude and longitude. For example, two points sharing the same longitude are farther apart at the equator than above or below the equator. Even if we are just concerned with relative distances, and do not need to compute the square root of the distance formula, for this reason it is still not as accurate. Also the international date line, where longitude "resets", presents a problem that would have to be dealt with. Also either pole. But you get the idea - vectors are more stable, accurate, and efficient.

I've always been a fan of Binary Space Partitions - each branch node of a BSP has two areas, one below and one above (or one left and one right, or one in and one out, however you like to think of it). To go along with storing points as vectors, branch nodes within the structure partitions their space also using a vector, which in this case forms the normal of a plane that bisects the unit sphere. All data points in a node (or children of the node) will either be above or below this plane (points that are on the plane can be allocated to either side as is appropriate). Points that are below the plane are placed in the first child of the node, and points that are above placed in the second child. We call this plane normal the split axis of the node.

Querying the structure for the closest N points then becomes trivial. For any branch node, compute the dot product of the point in question and the split axis of the node. If it is negative (the point is below the split plane), recursively query with the first child node, and if positive (the point is above the split plane), with the second child node. For a leaf node, compute the dot product of the point in question with each data point contained in the node, keeping a sorted list of the closest N points. The one caveat is that in branch nodes, after recursing into one child node, it may be necessary to recurse into the other child if the farthest point found so far is farther than the other child node, since there may be closer points in the other child node. But this is trivial as we are comparing dot products. No expensive computations are necessary to compute the N closest points - just a bunch of dot products. As dot products of unit vectors range from -1 to 1 (-1 being farthest apart and 1 being equal), two points are closer if their dot product is higher.

Once complete, and the list of N points found, the actual distances can be calculated if necessary, which in this case is done by calculating the angles using the already-calculated dot products. This angle can then be multiplied by the radius of the earth to get an exact distance, again only if necessary. But for averaging, these extra calculations are not necessary.

As a final note, the data structure lends itself well to graphics hardware. In my particular case, rendering an image using such a data structure on the CPU may take several minutes, but on the GPU takes a fraction of a second.

Problem

The problem - common to any space partitioning tree - is how to initially create the data structure. Again, the points are not assumed to be arranged in any specific way, and as far as we are concerned, are a "point soup". They can be specified one at a time - addPoint(point) - or all at once - addPoints(points) - or both. Specifically, how can the determination of the split axis of any branch be efficient and provide an even split (the same or similar number of points on either side). The problem is unique here because the points are not arranged on a two-dimensional surface or in three-dimensional space, but lie on a unit sphere.

My current solution to the problem is not terribly elegant. For each two data points, compute the axis that splits them evenly. This is done by computing the point between the two (subtract one from the other) and normalizing it, then crossing this with the cross product of the two points. This cross product comprises the normal of the plane that evenly splits the two points. This normal is then tested against each other point in the node to get 1) the number of points on either side of the plane and 2) the distances of the points to the plane. A good split plane is one that 1) splits points evenly, 2) has the same/similar distances on other side, and 3) has large distances on other side. As this test is performed for each two data points, some big-O notation shows that determination of the split plane for nodes containing a large number of points will become prohibitive.  However, I have not been able to determine a better solution.

But the advantages of such a data structure still outweigh this expense. In my particular case, the time spent initially creating the tree is worth the time saved during queries. I should mention that if the data points are known ahead of time, it is faster to specify them all at once, so the tree can re-build itself once, rather than one at a time which may cause the tree to re-build itself multiple times.

I'm guessing this is just for a general scheme for nearest neighbour search on a sphere, but the kind of questions I would be asking are:

• how many points are we talking about
• what is the distance (relative to the sphere) over which to make a query for nearest neighbours
• what are the access patterns (are the queries random, or are you, for instance looking for the closest points to a moving object, where you can reuse the results of a previous search)?
• The best scheme may depend on how fast adding a point to the data structure needs to be, versus the speed of the queries.

As well as spatial partitioning schemes, you may also be able to simply store a list of nearest neighbours on each point (you could prestore the angular 'distance' too). Then you search becomes a case of find the nearest point, and recurse outward (with a bitmask for points already visited) until you have found all the nearest points of interest.

Another data structure which might be useful is a 2d grid of all the point combinations, with the angular distance between each point pre-calculated. Also consider that knowing the exact distance to neighbours may not be as important as finding the 'rough' neighbourhood, depending on the application, so you may be able to avoid most of those calculations.

The actual use case is a bit more involved than I describe, but not much different than what I use as examples; it will be for rasterizing averaged values of weather data over a geographic area. The weather data comes in pre-defined grids, where each point of the grid is defined by latitude/longitude and a data value, and the grid may or may not be the same format as what is rendered (i.e. the weather data grid may be Polar Stereographic, while the rendered grid may be Lambert Conformal). As such, the internal structure should be agnostic.

Another layer of complication is that the weather data also comes in one or more forecasts, where there is a grid for each forecasted unit of time X number of times. For example, 72 grids spaced out an hour apart from each other, containing wave height data for the Caribbean Sea.

To keep the implementation simplified, each BSP only contains one type of weather data, at one point in time. Since the time is the same for all points in a BSP, it is not stored for every point, but once for the whole BSP. I mention it because even though it is not stored with each point, it is taken into consideration as another "dimension" when finding nearby points. The end result looks something like this:

float target_time = ...; // render time
float max_distance = ...; // points outside this distance are not considered
int N = ...; // number of nearby points to consider

foreach (pixel) {
float target_geo[3] = ...; // unit vector (nvector) of pixel

// [0] = distance, [1] = value
float distance_and_value[N][2] = ...; // initialize to max_distance

// check BSPs
foreach (BSP close enough to target_time) {
// modifies distance_and_value parameter as smaller distances are found; keeps the array sorted based on distance
BSP.findClosest(target_geo, target_time, &distance_and_value);
}

float average = ...; // calculate using distance_and_value, stopping when >= max_distance
pixel = ...; // calculate using average
}

But I didn't mention all of this because it is tangential to the problem of selecting a partition for each BSP tree branch.

• how many points are we talking about - it varies from grid to grid. Some are as small as 350x150, and others as large as 4400x2200. It could be larger as more meteorological data is added. For now I'm only using freely available data from NOAA, but could eventually use proprietary data from other sources.
• what is the distance (relative to the sphere) over which to make a query for nearest neighbours - this is a parameter passed to the query function, and may differ based on application. For most cases, the end result should look "pretty" and show smooth gradients, whereas accuracy is preferred for "serious" uses such as maritime navigation.

• what are the access patterns (are the queries random, or are you, for instance looking for the closest points to a moving object, where you can reuse the results of a previous search)? - (see above code snippet)
• The best scheme may depend on how fast adding a point to the data structure needs to be, versus the speed of the queries. - you are right, it may be, although not for my current use case. Regardless of which is best, I would certainly implement faster BSP insertion if I knew how.
2 hours ago, lawnjelly said:

As well as spatial partitioning schemes, you may also be able to simply store a list of nearest neighbours on each point (you could prestore the angular 'distance' too). Then you search becomes a case of find the nearest point, and recurse outward (with a bitmask for points already visited) until you have found all the nearest points of interest.

That makes sense for rendering multiple images of the same geographic area, but not for the case where the area may be panned or zoomed, as the list would have to be re-built. Also how would the list of nearby neighbors be efficiently built without some partitioning scheme to begin with?

2 hours ago, lawnjelly said:

Another data structure which might be useful is a 2d grid of all the point combinations, with the angular distance between each point pre-calculated. Also consider that knowing the exact distance to neighbours may not be as important as finding the 'rough' neighbourhood, depending on the application, so you may be able to avoid most of those calculations.

I think this is also not effective for the same reasons. Pre-calculating the angular distance between data points is not necessary, as that is never needed (only need to determine distance between data points and another arbitrary point), and as in the previous paragraph, pre-calculating angular distance between data points and another point is only used once.

## Create an account

Register a new account

• ### Similar Content

• By _Nyu
Hello,
I'm trying to make a PBR vulkan renderer and I wanted to implement Spherical harmonics for the irradiance part (and maybe PRT in the future but that's another story).
the evaluation on the shader side seems okay (it look good if I hardcode the SH directly in the shader) but when I try to generate it from a .hdr map it output only gray scale.
It's been 3 days I'm trying to debug now I just have no clue why all my colour coefficients are gray.
Here is the generation code:

SH2 ProjectOntoSH9(const glm::vec3& dir) { SH2 sh; // Band 0 sh.coef0.x = 0.282095f; // Band 1 sh.coef1.x = 0.488603f * dir.y; sh.coef2.x = 0.488603f * dir.z; sh.coef3.x = 0.488603f * dir.x; // Band 2 sh.coef4.x = 1.092548f * dir.x * dir.y; sh.coef5.x = 1.092548f * dir.y * dir.z; sh.coef6.x = 0.315392f * (3.0f * dir.z * dir.z - 1.0f); sh.coef7.x = 1.092548f * dir.x * dir.z; sh.coef8.x = 0.546274f * (dir.x * dir.x - dir.y * dir.y); return sh; } SH2 ProjectOntoSH9Color(const glm::vec3& dir, const glm::vec3& color) { SH2 sh = ProjectOntoSH9(dir); SH2 shColor; shColor.coef0 = color * sh.coef0.x; shColor.coef1 = color * sh.coef1.x; shColor.coef2 = color * sh.coef2.x; shColor.coef3 = color * sh.coef3.x; shColor.coef4 = color * sh.coef4.x; shColor.coef5 = color * sh.coef5.x; shColor.coef6 = color * sh.coef6.x; shColor.coef7 = color * sh.coef7.x; shColor.coef8 = color * sh.coef8.x; return shColor; } void SHprojectHDRImage(const float* pixels, glm::ivec3 size, SH2& out) { double pixel_area = (2.0f * M_PI / size.x) * (M_PI / size.y); glm::vec3 color; float weightSum = 0.0f; for (unsigned int t = 0; t < size.y; t++) { float theta = M_PI * (t + 0.5f) / size.y; float weight = pixel_area * sin(theta); for (unsigned int p = 0; p < size.x; p++) { float phi = 2.0 * M_PI * (p + 0.5) / size.x; color = glm::make_vec3(&pixels[t * size.x + p]); glm::vec3 dir(sin(phi) * cos(theta), sin(phi) * sin(theta), cos(theta)); out += ProjectOntoSH9Color(dir, color) * weight; weightSum += weight; } } out.print(); out *= (4.0f * M_PI) / weightSum; }
outside of the SHProjectHDRImage function that's pretty much the code from MJP that you can check here:
https://github.com/TheRealMJP/LowResRendering/blob/2f5742f04ab869fef5783a7c6837c38aefe008c3/SampleFramework11/v1.01/Graphics/SH.cpp
I'm not doing anything fancy in term of math or code but I that's my first time with those so I feel like I forgot something important.
basically for every pixel on my equi-rectangular hdr map I generate a direction, get the colour and project it on the SH
but strangely I endup with a SH looking like this:
coef0: 1.42326 1.42326 1.42326
coef1: -0.0727784 -0.0727848 -0.0727895
coef2: -0.154357 -0.154357 -0.154356
coef3: 0.0538229 0.0537928 0.0537615
coef4: -0.0914876 -0.0914385 -0.0913899
coef5: 0.0482638 0.0482385 0.0482151
coef6: 0.0531449 0.0531443 0.0531443
coef7: -0.134459 -0.134402 -0.134345
coef8: -0.413949 -0.413989 -0.414021
with the HDR map "Ditch River" from this web page http://www.hdrlabs.com/sibl/archive.html
but I also get grayscale on the 6 other hdr maps I tried from hdr heaven, it's just different gray.
If anyone have any clue that would be really welcome.
• By pedro87
I am trying to understand calculating normal vectors in my game engine book. I am having issues understanding a specific section about using gradient to calculate the normal vector. I have attached an image of the page, basically i am unsure where the gradient is coming into play, or just really anything thats going on, on the page. Any help would be appreciated.

• Hello.
I'm trying to implement normal mapping. I've been following this: http://ogldev.atspace.co.uk/www/tutorial26/tutorial26.html
The problem is that my tangent vectors appear rather obviously wrong. But only one of them, never both. Here's my code for calculating the tangents:
this.makeTriangle = function(a, b, c) { var edge1 = VectorSub(b.pos, a.pos); var edge2 = VectorSub(c.pos, a.pos); var deltaU1 = b.texCoords[0] - a.texCoords[0]; var deltaV1 = b.texCoords[1] - a.texCoords[1]; var deltaU2 = c.texCoords[0] - a.texCoords[0]; var deltaV2 = c.texCoords[1] - a.texCoords[1]; var f = 1.0 / (deltaU1 * deltaV2 - deltaU2 * deltaV1); var vvec = VectorNormal([ f * (deltaV2 * edge1[0] - deltaV1 * edge2[0]), f * (deltaV2 * edge1[1] - deltaV1 * edge2[1]), f * (deltaV2 * edge1[2] - deltaV1 * edge2[2]), 0.0 ]); var uvec = VectorNormal([ f * (-deltaU2 * edge1[0] - deltaU1 * edge2[0]), f * (-deltaU2 * edge1[1] - deltaU1 * edge2[1]), f * (-deltaU2 * edge1[2] - deltaU1 * edge2[2]), 0.0 ]); if (VectorDot(VectorCross(a.normal, uvec), vvec) < 0.0) { uvec = VectorScale(uvec, -1.0); }; /* console.log("Normal: "); console.log(a.normal); console.log("UVec: "); console.log(uvec); console.log("VVec: "); console.log(vvec); */ this.emitVertex(a, uvec, vvec); this.emitVertex(b, uvec, vvec); this.emitVertex(c, uvec, vvec); }; My vertex shader:
precision mediump float; uniform mat4 matProj; uniform mat4 matView; uniform mat4 matModel; in vec4 attrVertex; in vec2 attrTexCoords; in vec3 attrNormal; in vec3 attrUVec; in vec3 attrVVec; out vec2 fTexCoords; out vec4 fNormalCamera; out vec4 fWorldPos; out vec4 fWorldNormal; out vec4 fWorldUVec; out vec4 fWorldVVec; void main() { fTexCoords = attrTexCoords; fNormalCamera = matView * matModel * vec4(attrNormal, 0.0); vec3 uvec = attrUVec; vec3 vvec = attrVVec; fWorldPos = matModel * attrVertex; fWorldNormal = matModel * vec4(attrNormal, 0.0); fWorldUVec = matModel * vec4(uvec, 0.0); fWorldVVec = matModel * vec4(vvec, 0.0); gl_Position = matProj * matView * matModel * attrVertex; } And finally the fragment shader:
precision mediump float; uniform sampler2D texImage; uniform sampler2D texNormal; uniform float sunFactor; uniform mat4 matView; in vec2 fTexCoords; in vec4 fNormalCamera; in vec4 fWorldPos; in vec4 fWorldNormal; in vec4 fWorldUVec; in vec4 fWorldVVec; out vec4 outColor; vec4 calcPointLight(in vec4 normal, in vec4 source, in vec4 color, in float intensity) { vec4 lightVec = source - fWorldPos; float sqdist = dot(lightVec, lightVec); vec4 lightDir = normalize(lightVec); return color * dot(normal, lightDir) * (1.0 / sqdist) * intensity; } vec4 calcLights(vec4 pNormal) { vec4 result = vec4(0.0, 0.0, 0.0, 0.0); \${CALC_LIGHTS} return result; } void main() { vec4 surfNormal = vec4(cross(vec3(fWorldUVec), vec3(fWorldVVec)), 0.0); vec2 bumpCoords = fTexCoords; vec4 bumpNormal = texture(texNormal, bumpCoords); bumpNormal = (2.0 * bumpNormal - vec4(1.0, 1.0, 1.0, 0.0)) * vec4(1.0, 1.0, 1.0, 1.0); bumpNormal.w = 0.0; mat4 bumpMat = mat4(fWorldUVec, fWorldVVec, fWorldNormal, vec4(0.0, 0.0, 0.0, 1.0)); vec4 realNormal = normalize(bumpMat * bumpNormal); vec4 realCameraNormal = matView * realNormal; float intensitySun = clamp(dot(normalize(realCameraNormal.xyz), normalize(vec3(0.0, 0.0, 1.0))), 0.0, 1.0) * sunFactor; float intensity = clamp(intensitySun + 0.2, 0.0, 1.0); outColor = texture(texImage, fTexCoords) * (vec4(intensity, intensity, intensity, 1.0) + calcLights(realNormal)); //outColor = texture(texNormal, fTexCoords); //outColor = 0.5 * (fWorldUVec + vec4(1.0, 1.0, 1.0, 0.0)); //outColor = vec4(fTexCoords, 1.0, 1.0); outColor.w = 1.0; } Here is the result of rendering an object, showing its normal render, the uvec, vvec, and texture coordinates (each commented out in the fragment shader code):

Normal map itself:

The uvec, as far as I can tell, should not be all over the place like it is; either this, or some other mistake, causes the normal vectors to be all wrong, so you can see on the normal render that for example there is a random dent on the left side which should not be there. As far as I can tell, my code follows the math from that tutorial. I use right-handed corodinates.
So what could be wrong?

• I need a measure for how close a collection of points is to forming a straight line in 2d.
Currently I use Principle Component Analysis in 2d and take the mean of the of square on the second axis.
It works fine, but I feel it is overkill.

I would like something simpler I could express in a li rary like Tensorflow.

Is there a simpler way?

• So I have hundreds of moving objects that need to check there speed. One of the reasons they need to check there speed is so they don't accelerate into oblivion, as more and more force is added to each object.
At first I was just using the Unity vector3.magnitude. However this is actually very slow; when used hundreds of times.
Next I tried the dot-product check:  vector3.dot(this.transform.foward, ShipBody.velocity) The performance boost was fantastic. However this only measures speed in the forward direction. Resulting in bouncing objects accelerating way past the allowed limit.

I am hoping someone else knows a good way for me to check the speed with accuracy, that is fast on the CPU. Or just any magnitude calculations that I can test when I get home later.

What if I used  vector3.dot(ShipBody.velocity.normalized, ShipBody.velocity)?
How slow is it to normalize a vector, compared to asking it's magnitude?
×