Jump to content

  • Log In with Google      Sign In   
  • Create Account

Calling all IT Pros from Canada and Australia.. we need your help! Support our site by taking a quick sponsored surveyand win a chance at a $50 Amazon gift card. Click here to get started!


Member Since 30 Aug 2006
Offline Last Active Aug 14 2015 03:15 PM

Topics I've Started

Slow down when using OpenMP with stl vector per thread

14 June 2015 - 12:05 PM



I'm new to OpenMP and for the second time i notice a slow down in code like this:


const Graph::GraphData &graph = graphPerPart[part];
#pragma omp parallel for
for (int borderIndex = 0; borderIndex < newBorders.size(); borderIndex++)

                    int source = borders[borderIndex].FirstVertex(*mesh);
                    int target = borders[borderIndex].LastVertex(*mesh);                    
                    std::vector<float> min_distance;
                    std::vector<int> previous;

                    Graph::DijkstraComputePaths (source, graph.vertex_adjacency_list, min_distance, previous); // this grows both vectors

                    newBorders[borderIndex].DoSomethingWithTheClosestPath (previous, target, ...); // this also grows some std::vectors                 


OMP code runs only 0.75 times as fasat as single threaded (VS 2013, quad core).


I guess it is because each thread has to manage memory for the stl::vectors - can this be true and is there a clue to fix this?


This is for a preprocessing tool and avoiding stl surely isn't worth the time,

but usually i get the expected speed ups with very little affort so i'm curious.


Also i'd like to know if OpenMP can get closely as fast as other multithreading techniques for runtime code, because it's so easy to use.



VC Debugging too slow

05 May 2015 - 08:33 AM



Debug mode became very slow for me somewhen the last months, on both VS 2013 Community and VS 2012 Express.

Sometimes it takes a few seconds to step over a very simple code line, but using STL it becomes a nightmare.




mesh->BlurVertexMap (minVal, maxVal, temp, vertexHeight);


-> This function takes just the time i need to press the key,

note the function uses a lot stl::vectors (see below, marked in red)



for (int i=0; i<mesh->mVertices.size(); i++) vertexHeight[i] += temp[i];


-> this simple addition of two stl::vector<float> (size:8000) takes 2 MINUTES :(




I've tried _ITERATOR_DEBUG_LEVEL=0, but no effect.

Is this a common problem?

The project is about 10 years old and links to old libs. Maybe something is messed up.









void BlurVertexMap (float &minVal, float &maxVal, std::vector<float> &out, const std::vector<float> &in) const
            minVal = FLT_MAX;
            maxVal = -FLT_MAX;

            for (int i=0; i<mVertices.size(); i++)
                const Vertex &vert = mVertices[i];
                float area = mVertexArea[i];
                float val = in[i] * area;
                for (int j=0; j<vert.edgeIndices.size(); j++)
                    int nI = mEdges[vert.edgeIndices[j]].OppositeVertex(i);
                    area += mVertexArea[nI];
                    val += in[nI] * mVertexArea[nI];
                val /= area;
                out[i] = val;
                if (val < minVal) minVal = val;
                if (val > maxVal) maxVal = val;

Can i do this using the fourier transform?

15 April 2015 - 07:06 AM



i'm working on a mesh preprocessing tool and have the idea to calculate neighbourhood information for each vertex to detect things like edge flow, convex / concave regions etc.

E. g. to sum up neighbouring convexity of polygons i'd project their center to the vertex normal plane to get 3 values:


Angle (representing position to vertex), polygon area / distance relation (=weight) and polygon convexity (=value).


To sum this up using a 1D FT, and looking at the attached picture, the black dot represents a single polygon, one times with large area (red) or with small area (blue).


Is there a way to add such a sample to a FT for each polygon, one after another?


Actually my solution would be: Don't use FT at all and do this using a simple 1D array without any cool math.

But if this is a common problem with an elegant solution please let me know - my math background is so weak i don't know what search terms to use :)

Compression questions

19 November 2014 - 09:27 AM

Hi there,


i've some questions about compression tricks in mind.

Because trying those all is a lot of work i'd like to hear anything from people who already have some experience.

I'm using OpenCL on GPU, but API should not matter here.



1. E.g i have a normal in xyz and a radius in w component of a vec4, but now i want a cone angle too. I see those options:

Either convert the normal to spherical coords so needing only two values.

Or pack two half floats in the last component.

Or create another buffer or tex to store the additional stuff.


Is there an answer like 'mostly option 2 is the fastest way, because trig is slow',

Or is it more like 'it totally depends on usecase'?



2. I do GI and cover the geometry with samples at 10x10cm resolution.

Can i expect using half floats for everything (position, direction, color...)

should be accurate enough for a 1km game world?


Is this usually done only for memory savings or can it help / hurt performance too?



Problems when moving from Nvidia to ATI card / GPGPU performance comparision

10 September 2014 - 10:29 AM



i have a complex compute shader project that refuses to work since i've replaced gtx670/480 against R9 280X.

Hopefully at the end i can give some useful GPGPU performance comparison without the need to compare OpenCL against Cuda.



The first issue is: I'm unable to modify a Shader Storage Buffers by shader - maybe i miss some stupid little thing...



The setup code is this:


    int sizeLists = sizeof(int) * 4096;
    gpuData.dataDbgOut = (int*)_aligned_malloc (sizeLists, 16);    

    gpuData.dataDbgOut[0] = 10;
    gpuData.dataDbgOut[1] = 20;
    gpuData.dataDbgOut[2] = 30;
    gpuData.dataDbgOut[3] = 40;

    glGenBuffers (1, &gpuData.ssbDbgOut);    
    glBindBuffer (GL_SHADER_STORAGE_BUFFER, gpuData.ssbDbgOut);
    glBufferData (GL_SHADER_STORAGE_BUFFER, sizeLists, gpuData.dataDbgOut, GL_DYNAMIC_COPY);
    glBindBufferBase (GL_SHADER_STORAGE_BUFFER, 1, gpuData.ssbDbgOut);

    gpuData.computeShaderTestATI = GL_Helper::CompileShaderFile ("..\\Engine\\shader\\gi_TestATI.glsl", GL_COMPUTE_SHADER, 1, includeAll);
    gpuData.computeProgramHandleTestATI = glCreateProgram();
    if (!gpuData.computeProgramHandleTestATI) { SystemTools::Log ("Error creating compute program object.\n"); return 0; }
    glAttachShader (gpuData.computeProgramHandleTestATI, gpuData.computeShaderTestATI);
    if (!GL_Helper::LinkProgram (gpuData.computeProgramHandleTestATI)) return 0;




Per frame code:


glBegin (GL_POINTS); glVertex3f (0,0,0); glEnd (); // <- remove this and it works


    glUseProgram (gpuData.computeProgramHandleTestATI);
    glDispatchCompute (1, 1, 1);
    glMemoryBarrier (GL_ALL_BARRIER_BITS);

    glBindBuffer (GL_SHADER_STORAGE_BUFFER, gpuData.ssbDbgOut);
    int* result = (int*) glMapBuffer (GL_SHADER_STORAGE_BUFFER, GL_READ_ONLY);    
    for (int i=0; i<4; i++) base_debug::logF->Print ("dbg: ", float(result[i]));





layout (local_size_x = 1) in;

layout (binding = 1, std430) buffer dbg_block
    uint dbgout[];

void main (void)
    dbgout[0] = 0;
    dbgout[1] = 1;
    dbgout[2] = 2;
    dbgout[3] = 3;




For the output i' expect 1,2,3,4 as modified by shader, but it is still 10,20,30,40


I've tried GL error checking but there is none, also the shader program is definitively called, and there are no shader compiler errors.

Any idea what's wrong? Version is ok too:


OpenGL ok
GL Vendor : ATI Technologies Inc.
GL Renderer : AMD Radeon R9 200 Series
GL Version (string) : 4.3.12967 Compatibility Profile Context 14.200.1004.0
GL Version (integer) : 4.3
GLSL Version : 4.40



EDIT: Added the stupid little thing :)