Jump to content

  • Log In with Google      Sign In   
  • Create Account

We're offering banner ads on our site from just $5!

1. Details HERE. 2. GDNet+ Subscriptions HERE. 3. Ad upload HERE.


Fast distance based sort


Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.

  • You cannot reply to this topic
4 replies to this topic

#1 Migi0027   Crossbones+   -  Reputation: 2110

Like
1Likes
Like

Posted 13 June 2014 - 06:02 PM

Hi,

 

I've stumbled across a problem, I have a bunch of particles that are rendered in 3d space, but transparency problems occur, during each frame the particle buffer ( a chunk of data sent to the gpu ), is updated, so I thought why not do the range based sorting ( from camera position to particle ) there for a nice and easy fix, but obstacles occur.

 

How I do it: ( Keep in mind that the following code is not done, its just a quick prototype, making it nicer is done later on )

D3DXVECTOR3 __g_vCam; // Yeah yeah, globals are bad, it's 1:40 at night here!

bool ceparticle_sort(CE_NAMESPACE::Physics::CEParticle &p0, CE_NAMESPACE::Physics::CEParticle &p1)
{
	float dist0 = D3DXVec3Length(&(p0.vPosition - __g_vCam));
	float dist1 = D3DXVec3Length(&(p1.vPosition - __g_vCam));

	if (dist1 < dist0) return true;
}

void CE_NAMESPACE::CEParticleSet::BuffersUpdate(DX11 *pDX11, D3DXVECTOR3 *pCam)
{
	std::vector<Physics::CEParticle> m_vParticles_Sort; 
	m_vParticles_Sort.resize(m_iMax); // m_iMax -> Max Nr of particles
	__g_vCam = *pCam; // store camera/eye position

	UNTIL(m_iMax)
	{
                // m_pParticles is a pointer to an array created at runtime
		m_vParticles_Sort[i] = m_pParticles[i]; // copy
	}

	// Sort Array
	std::sort(m_vParticles_Sort.begin(), m_vParticles_Sort.end(), ceparticle_sort);

	// Map
	pDX11->BufferMap(m_pBufferPoints, &m_vParticles_Sort[0], sizeof(Physics::CEParticle), m_iMax);
}

A way of speeding up the reaction would be to store pointers of particles and then processing that for faster memory management, but, the demands of the graphic apis are strict, it must have a pointer to an array that leads directly to the memory, not another point, so that's not an option.

 

In this sole operation the frame time increased by 2/3 as the cpu is slowing the operation down.

 

EDIT. pDX11 is a custom wrapper for DX11, by the graphics API I mean Direct3D 11.

 

So here I am, and my actual question is, how would I go about to approach this problem? What have you done?

 

Thank you for your time.

-MIGI0027


Edited by Migi0027, 13 June 2014 - 06:03 PM.

Hi! Cuboid Zone
The Rule: Be polite, be professional, but have a plan to steal all their shaders!

Sponsor:

#2 Pink Horror   Members   -  Reputation: 1227

Like
0Likes
Like

Posted 13 June 2014 - 10:23 PM

Why are you copying them into a temporary vector instead of sorting the base array?

I'm not sure where your problem is here, though my guess is that finding some way to remove the copy should help the most.

There's one small thing you can do no matter what: D3DXVec3Length can be replaced with D3DXVec3LengthSq without really changing anything else.

Also, your comparison function looks like it's missing the return false.

#3 C0lumbo   Crossbones+   -  Reputation: 2492

Like
1Likes
Like

Posted 14 June 2014 - 12:12 AM

It's perhaps more correct to calculate the z depth than the raw distance (or distance squared as Pink Horror points out). It's more or less the same cost (just do a dot product of the particle position with either the z-column or z-row (I forget which) of your view matrix). As an advantage it opens up the opportunity to quickly discard anything on the wrong sides of your near and far planes, which will save you time from submitting useless data to the GPU.

 

I think your choice of sort algorithm will be important. Radix sorts are awesome, so you could definitely do worse than go with that, you perhaps don't need super-high precision in the sort so you could speed it up further by shortening the z-depth to only 16-bits so you only need two passes.

 

Alternatively, depending on the approach you've used, you might be expecting your data to be nearly sorted (if there's coherency from the previous frame). If that's the case, then pick a sort algorithm that performs well for nearly sorted data.



#4 danbrown   Members   -  Reputation: 255

Like
0Likes
Like

Posted 14 June 2014 - 10:43 PM

I think your choice of sort algorithm will be important. Radix sorts are awesome, so you could definitely do worse than go with that, you perhaps don't need super-high precision in the sort so you could speed it up further by shortening the z-depth to only 16-bits so you only need two passes.

 

Worth noting that for situations where the sorted data is already almost in order (i.e. a list of depths with a short frame time and camera not moving very fast), the listed performances for sorts don't necessarily apply as they assume a full sort on unsorted data. An iterative 'early-out' sorting algorithm such as a plain bubble sort on almost-ordered data can be a very fast solution at up to O(n) if nothing has changed depths which is quite possible at good frame rates. The basic radix sort is O(kn) where k is the number of bits and can gain some speed with a hybrid version but still won't beat that. A full sort would still be necessary if the camera switched positions completely so best to have a choice of which to use depending on the situation.

 

However, as usual, beware of premature optimisation :). The frame time increasing by 2/3 sounds like a huge slowdown and even using a slow brute-force sorting algorithm shouldn't do that unless there are 100's of thousands of particles being sorted. I'd recommend the OP first confirms that the sort is what is causing the issue by commenting out just the std::sort line in the code and see if it still slow. It may be the fact that they are now doing per frame buffer updates, or that sorting back-to-front while using transparency could be massive per-pixel overdraw and therefore a huge shader hit (if you have been using a depth buffer for particles) or some other part.

 

If it is definitely the sort, I would first create a temporary std::vector containing just the m_pParticles index and squared distance, sort that, then make your final copy using the sorted array as the new index. You want to do the least work and copy the least data possible (yay, pointers) in the sort function as it could be called multiple times per item.

 

Something like -

struct temp_particle
{
    int index;
    float dist;
}

bool ceparticle_sort(temp_particle * p0, temp_particle * p1)
{
    if (p0->dist < p1->dist) return true;
}

void CE_NAMESPACE::CEParticleSet::BuffersUpdate(DX11 *pDX11, D3DXVECTOR3 *pCam)
{
    vCam = *pCam; // store camera/eye position

    std::vector<temp_particle *> sorted_array;
    sorted_array.resize(m_iMax); // m_iMax -> Max Nr of particles
    UNTIL(m_iMax)
    {
        sorted_array[i] = new temp_particle;
        sorted_array[i]->index = i;
        sorted_array[i]->dist = D3DXVec3LengthSq(&(m_pParticles[i].vPosition - vCam));
    }

    // Sort Array
    std::sort(sorted_array.begin(), sorted_array.end(), ceparticle_sort);

    std::vector<Physics::CEParticle> m_vParticles_Sort; 
    m_vParticles_Sort.resize(m_iMax); // m_iMax -> Max Nr of particles
    UNTIL(m_iMax)
    {
        // m_pParticles is a pointer to an array created at runtime
        m_vParticles_Sort[i] = m_pParticles[sorted_array[i]->index]; // copy
    }

    // Map
    pDX11->BufferMap(m_pBufferPoints, &m_vParticles_Sort[0], sizeof(Physics::CEParticle), m_iMax);
}

If it is definitely the sort and this doesn't help much, then it's time to worry about either the sorting algorithm or question how many particles you are trying to render.


Dan


#5 danbrown   Members   -  Reputation: 255

Like
0Likes
Like

Posted 14 June 2014 - 10:52 PM

 

I think your choice of sort algorithm will be important. Radix sorts are awesome, so you could definitely do worse than go with that, you perhaps don't need super-high precision in the sort so you could speed it up further by shortening the z-depth to only 16-bits so you only need two passes.

 

Worth noting that...<frame coherency stuff>

 

Doh! Somehow I completely missed that you mentioned that bit in the next paragraph :)

 

The rest still applies though.


Dan





Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.



PARTNERS