Jump to content

  • Log In with Google      Sign In   
  • Create Account


Member Since 30 Aug 2006
Online Last Active Today, 08:46 AM

#5294227 GCN Shader Extensions

Posted by JoeJ on Yesterday, 02:31 PM

I think OpenCL will stay - it has most of its applications outside games / graphics. Or are there any games using it?

Personally i liked OpenCL a lot more than GLSL for compute. Easier to use, more solid programming - less 'just a shader', and it was always faster on any hardware i've tested.


SPIR-V already draws a line between OpenCL and Vulkan-Compute:

"Execution models include Vertex, GLCompute, etc. (one for each graphical stage), as well as Kernel for OpenCL kernels."

I don't know if there are technical or business reasons behind that.

Because there are no plans for OpenCL <-> Vulkan data sharing, we have no choice anyways.


But those extensions are exactly what i want and there should be no big reason to look back to OpenCL.

#5294089 GCN Shader Extensions

Posted by JoeJ on 29 May 2016 - 06:12 PM



Just in case someone else missed this.

No need to wait for SM 6.0 :D


#5293568 Are Third Party Game Engines the Future

Posted by JoeJ on 26 May 2016 - 06:56 AM

"As technology improves and third party tools improve, do you think that the bigger AAA game studios that have internal engines will eventually switch to using third party engines or will the industry continue as is for the foreseeable future?"


What a nightmare - the end of game "developement", and the rise of the game "maker" area.


Fortunately this will not happen so soon. If you look closely, UE4, Cryengine and Unity are mainly indie engines today.

(Nothing against that - it's very welcome)


For me it would be MUCH more work to tweak those engines to my needs than to write a new one from scratch.

Same for AAA companies - few people are enough to do this, a fraction of what is necessary for content creation.

It makes more sense to pay for something secialized like Umbra, Natural Motion, Simplygon etc.


The only downside of in house engine is missing public reward. Lots of people out there fell in love with UE4 demos or still think Crysis is best graphics ever.

They run around screaming 'downgrade!' and 'upscaled!' knowing nothing about the work they criticize or the limitations of their favorized engines.

At least that's my impression after reading sites related to pc gaming... there's something going wrong here.

#5293499 Cracks between patches with same the LOD level.

Posted by JoeJ on 25 May 2016 - 11:28 PM

An indexing bug? Instead of using the same hight twice for both borders, you use hight(n) for one and height(n+1) for the other?

#5293496 what good are cores?

Posted by JoeJ on 25 May 2016 - 11:15 PM

Reminds me to the learning process i've gone through with GPU compute and LDS.

Since that i really wish i would have something like control over the CPU cache.


But i also noticed most people would not want this. They don't want to code close to metal,

they just want the metal (or the compiler) to be clever enough to run their code efficiently.


Probably they are afraid of additional work.

Personally i think the more control you have the less work is necessary - less trial and error, guessing, hoping and profiling.


On the other hand, on GPU the LDS size limit became a big influence on what algorithms i choose.

E.g. if it would grow twice as large for a new generation of GPUs, i'd need to change huge amounts of code in drastic ways to get best performance again.


So - on the long run - maybe those other people are right? Man should rule the machine - not the other way around?

#5293239 what good are cores?

Posted by JoeJ on 24 May 2016 - 11:37 AM


Honestly, the fact you think this means you are a good 15 years behind the curve right now - threads have been a pretty big deal for some time and far from 'bling bling'.

Threads and cores are two different things imho; having hundreds of threads doesn't imply you need many cores.


I believe it's actually pretty hard to usefully use more than 1-2 cores full time.



I have observed this situation quite often in a preprocessing tool using OpenMP:


Do a very simple thing like building a mip-map level in prallel: Speed up is 1.5... very disapointing

Do a complex thing like ray tracing: Speed up is 4... yep - that's the number of cores


My conclusion is that memory bandwidth limits hurt the mip-map generation.

I assume it would be faster to do mips and tracing at the same time, so memory limit is hidden behind tracing calculations.


Are there any known approaches where a job system tries to do this automatically with some info like job.m_bandWidthCost?

I've never heared of something like that.

#5292783 Hybrid Frustum Traced Shadows

Posted by JoeJ on 21 May 2016 - 02:36 PM

Instead of rastering potential light blocking triangles and doing the shadow test based on the rasterization result,

they do the shadow test directly on triangles itself by testing each texel against the planes build from its edges and light source position.

To ensure no edge-intersecting texel is missed, conservative rasterization is necessary.


Pro is robustness leading to pixel perfect shadows (reminds me to shadow volumes).


Con is trashing the eintire idea behind the efficiency of shadow maps.

There should be better ways to burn the PC vs. console advantages... just my opinion :)

#5292049 Theory behind spherical reflection mapping (aka matcap).

Posted by JoeJ on 17 May 2016 - 06:32 AM

Haha, great someone else comes up with that stuff from the stoneage :)


See there for a image showing the projection for the hemi sphere variant: http://www.gamedev.net/topic/678670-whats-the-advantage-of-spherical-gaussians-used-in-the-order-vs-enviroment-map/


Seen from front it would be simply a orthogonal projection of the grid to the sphere.

#5291796 Spherical interpolation for multi transfer basis

Posted by JoeJ on 16 May 2016 - 03:58 AM

In case you want more volume density, Remedys paper about Quantum Break covers the usage of trees with a branching factor of 16.

The also talk about how they store the lighting env in different ways (handling sunlight seperated etc.)


Some older work with less compexity and good results in this blog: http://copypastepixel.blogspot.co.at/


Did you think about  6*4 texel cubemaps? No SH issues and the resolution should be good enough to overcome the ambient cube limit.

More memory, but if you reduce density in empty space it might be a robust option.

#5291690 Spherical interpolation for multi transfer basis

Posted by JoeJ on 15 May 2016 - 07:56 AM

The downside of HL2 ambient cube is that the coefficients directions are hardcoded, while Spherical Harmonics are rotationally invariant.
In practice that means SH can capture the main direction to a bright light (sun) independent of its position,
while ambient cube produces differing results depending on how close the light direction matches one of the pricipial axis.

It's not clear to me if you want your 6 directions to be axis aligned, or if you want to point them to arbitary 'most important' directions.
In the latter case summing by dot product is not possible because they are not orthonormal.
Would it work to find the 3 closest directions and interpolate with barycentric coords from their spherical triangle?

Edit: Nonsense and terrible nonsense :)


You might wanna look up Spherical Gaussians as well...

#5291597 Mapping a Sphere to a Cube

Posted by JoeJ on 14 May 2016 - 01:32 PM

assuming you use the simple transfrom from cube surface to sphere surface: vec3 sphereS = cubeS.Unit(),
going from sphere to cube is very easy - i'll show two methods:

1. intersection ray - plane

vec3 ray = (playerPos - planitCenter).Unit();
Find the largest dimension of (fabs(ray.x), fabs(ray.y), fabs(ray.z)) and it's sign to classify which cubeface to select.
Build a plane for that face, e.g. planeNormal = vec3(1,0,0) of positive x has been chosen.
Assuming the cube has a side length of 2, we can use the normal also for the plane position.

vec3 intersectionPoint = IntersectRayPlane (qVec3 (0,0,0), ray, planeNormal, planeNormal);

Get the coords for the quadtree from y&z, still assuming normal is x:
float u = (intersectionPoint.y + 1.0f) / 2.0f;
float v = (intersectionPoint.z + 1.0f) / 2.0f;

inline float IntersectRayPlane (qVec3 &rO, qVec3 &rD, qVec3 &pO, qVec3 &pN)
		// rD does not need to be unit length
		float d = pN.Dot(rD);
		float n = pN.Dot(pO - rO);

		if (fabs(d) < FP_EPSILON) // ray parallel to plane
			if (fabs(n) < FP_EPSILON) return 0; // ray lies in plane
			else return FLT_MAX; // no intersection
		float t = n / d;
		return t;
... you can optimize and remove the branches because those case will not happen.

2. interstion ray - box

struct AABox 
		vec minmax[2]; // would be [(-1,-1,-1), (1,1,1)] for the same cube as above

		void DistanceRayFrontAndBackface (float &ffd, float& bfd, const vec& rayOrigin, const vec& rayInvDirection)
			vec t0 = vec(minmax[0] - rayOrigin).MulPerElem (rayInvDirection);
			vec t1 = vec(minmax[1] - rayOrigin).MulPerElem (rayInvDirection);
			vec tMin = t0.MinPerElem (t1);
			vec tMax = t0.MaxPerElem (t1);
			ffd = tMin.MaxElem(); // front face distance (behind origin if inside box) 
			bfd = tMax.MinElem(); // back face distance	

		bool IntersectRay (float& t, const vec& rayOrigin, const vec& rayInvDirection, float rayLength)
			float ffd, bfd;
			DistanceRayFrontAndBackface (ffd, bfd, rayOrigin, rayInvDirection);
			t = (ffd > 0) ? ffd : bfd; // always the first intersection with a face
			return (ffd <= bfd) & (bfd >= 0.0f) & (ffd <= rayLength);

That's an optimization utilizing the fact our planes always match the coordinate system directions.
So no dot products, but the division is still there:
vec rayInvDirection = vec (1.0 / ray.x, 1.0 / ray.y, 1.0 / ray.z);

The math here is pretty basic so i wonder you ask.
Be sure to understand how it works if you can use it.
Should become second nature and you should also end with a simpler way than what i've shown here ;)

A more interesting question would be:
How can i do that sphere <-> cube mapping in a way, that all cube texels have similar area.
Currently there is more detail near the cube corners, and you don't want this.
I plan to work on this soon, in case i can't find a solution somewhere...

#5291333 How to calculate lighting by point light with size of the lightsource like Sun?

Posted by JoeJ on 12 May 2016 - 02:53 PM

Just found another snippet for just a sphere:
float unitRad = rad / dist; // disk radius of sphere projected to hemisphere // == sin(solidAngle);
if (unitRad > 1.0f) unitRad = 1.0f;
float cosSolidAngle = sqrt(1.0f - unitRad*unitRad);
float solidAngle = asin (unitRad);
float unitArea = unitRad*unitRad*PI; // area on hemisphere projection
float planeArea = unitArea * ray[2]; // area on sample plane projection // ray[2] = sampleNormal.Dot(ray)
Those values are accurate as long as the sphere does not intersect the sample normal plane.
(For sun that case should go totally unnoticed)

Hope there's no bug (i've kept this in comments too because the math is confusing).

#5291328 How to calculate lighting by point light with size of the lightsource like Sun?

Posted by JoeJ on 12 May 2016 - 02:21 PM

You can use the form factor of a disc pointing towards sample position:
qVec3 diff = ePos - rPos; //  emitter (disc) - receiving sample position
float dR = dot(rDir, diff); // rDir = receiver normal
if (dR > FP_EPSILON)
	float dE = -dot(eDir, diff); // eDir = disc normal
	if (dE > FP_EPSILON)
		float sqD = dot(diff, diff);
		float formFactor = (dR * dE) / (sqD * (PI * sqD + discArea)) * discArea;

		qVec3 light = eLuminosity * eCol;
		sampleReceivedLight += formFactor * light;
If that works for you, you could try to optimize for a sphere.
The code snippet below should help to understand the math.
It projects a sphere to the hemisphere and then to the normal plane of the sample (We want to know the area if the projected shape).
The disc is approximated then by a simple dot product (assuming orthonormal projection and ignoring perspective).
A disc intersecting the normal plane is also approximated only with a dot product -> some error when disc is large and close to sample.

inline void ProjectDisc (
		qVec3 &ray, float &hemiSphereProjectedAreaByPI,
		const qVec3 &relPos, const qVec3 &discDir, const qMat3 &local) // todo: remove matrix and give localRay instead
		// Note: Disc direction is assumed to point towards the sample position!

		float sqDist = relPos.SqL() + FP_TINY;	
		ray = local.Unrotate(relPos);
		hemiSphereProjectedAreaByPI = -(ray[2] * discDir.Dot(relPos)) / 
			(sqDist * (PI * sqDist + discDir[3])) * discDir[3]; // discDir[3] = disc area

		ray /= sqrt(sqDist);

		//		unoptimized math:
		//			float unitRad = radius / sqrt(radius*radius + dist*dist); // sin(solidAngle); // radius / Hypotenuse
		//			float unitArea = unitRad*unitRad*PI;
		//			planeArea = unitArea * ray[2];
		//			planeArea *= fabs (discDir.Dot(ray));
		//			planeAreaByPI = planeArea / PI;
		//			float solidAngle = atan (radius * ooDist);
		//			RenderCone (qVec3(0,0,0), (qVec3&)ray, solidAngle, 0,0.5,1, 1,0,0);
		//			RenderCircle (sin(solidAngle), (qVec3&)(ray * cos(solidAngle)), (qVec3&)ray, 0,0.5,1);

#5290928 Getting objects within a range (In order, fast)

Posted by JoeJ on 10 May 2016 - 12:24 AM

If your data is dense and in a regular grid, you could do without any runtime sorting with a percomputed indirection table looking like this for a 5*5 area:
The number is the distance from the center cell, and there would be a second number for the grid index.
After sorting this by distance you get a range of cells per distance, and adding your player grid index to the resulting indices allows to use the same table for any position.

If the data is coarse, a quadtree approach can do this too by classification of player position against current node.
(For a distance sorted traversal order, it matters in which order you visit the child nodes.)
If the player is inside the node, see which quadrant (child region) it's in.
If player is outside, see which node corner or edge is closest.
This gives 12 possibilities times a sequence of 4 children, so the precalculated order table you need is just 24 bits.
(You can get better accuracy be combining quadrant / edge / corner requiring 128 bits)

Both ideas can be extended to 3D (grid->volume, quadtree->octree).
To avoid the problem of 'my object intersects two cells, but i don't want to stor in in both', look up 'loose octree', works for grid too.

#5289838 Procedural Universe: The illusion of infinity

Posted by JoeJ on 03 May 2016 - 12:41 AM

>> Strongly disagree, on what experience do you base that assumption?

that you don't model such a large number of stars that you have to partition space to check collisions.
have you tried it yet? at what point (how many stars with how many planets per star on average and how many moons per planet on average) is an oct-tree required? would it apply to my case of 1000 stars with 10 planets and 10 moons per? or can i go 100,000 stars? or 100,000,000? when do i hit oct-tree requirements? if you haven't tried it, it would seem to me to be pre-mature optimization.

Some years back i was able to render about a million point sprites per frame (no culling), but a bright spot on the night sky can be a whole galaxy, not just a star.
For collisions the octree was worth it with less than 100 spheres in a box (much more years back).

With the complexity i have in mind when reading 'procedural infinity' some kind oh hirarchy is needed.
Probably you assumed lower 'good enough for purpuse' complexity.

However - if i'd be a newbie reading your comment (or the Dice paper), i might take conclusion:
"Optimized brute force almost always wins over complex algorithms reducing work. No need to learn this difficult tree stuff properly.",
and that would be a bad thing.