Jump to content

  • Log In with Google      Sign In   
  • Create Account


OpenGL rasterizing on the CPU?


Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.

  • You cannot reply to this topic
10 replies to this topic

#1 Conoktra   Members   -  Reputation: 140

Like
0Likes
Like

Posted 09 May 2012 - 03:06 PM

Hello Posted Image

I have a game that I am working on (screenshot). The code is very lightweight and capping the frame-rate at 60 FPS while rendering the GUI idles the CPU around 1-2%. But, the moment I make a single 3D draw call per-frame (using a VBO and glDrawArrays) it jumps from 1-2% CPU usage too maxing out the entire core my rendering thread is on!

Ok, so it sounds like it is falling back to a software implementation of OpenGL and rasterizing on the CPU... but why on earth would it do that, and how do I fix it? Any help is appreciated!

Here is some information:
  • It is a 32-bit application built with MinGW.
  • It uses SDL for window, thread and timer management.
  • It uses GLEW for all OpenGL related stuff.
  • The test system runs a 64-bit version of Windows 7 and has the correct drivers installed.
Here is the output from various glGet() calls:
Vendor: NVIDIA Corporation

Version: 4.2.0

Renderer: GeForce GTX 460/PCIe/SSE2

Extensions: GL_ARB_base_instance GL_ARB_blend_func_extended GL_ARB_color_buffer_float GL_ARB_compatibility GL_ARB_compressed_texture_pixel_storage GL_ARB_conservative_depth GL_ARB_copy_buffer GL_ARB_depth_buffer_float GL_ARB_depth_clamp GL_ARB_depth_texture GL_ARB_draw_buffers GL_ARB_draw_buffers_blend GL_ARB_draw_indirect GL_ARB_draw_elements_base_vertex GL_ARB_draw_instanced GL_ARB_ES2_compatibility GL_ARB_explicit_attrib_location GL_ARB_fragment_coord_conventions GL_ARB_fragment_program GL_ARB_fragment_program_shadow GL_ARB_fragment_shader GL_ARB_framebuffer_object GL_ARB_framebuffer_sRGB GL_ARB_geometry_shader4 GL_ARB_get_program_binary GL_ARB_gpu_shader5 GL_ARB_gpu_shader_fp64 GL_ARB_half_float_pixel GL_ARB_half_float_vertex GL_ARB_imaging GL_ARB_instanced_arrays GL_ARB_internalformat_query GL_ARB_map_buffer_alignment GL_ARB_map_buffer_range GL_ARB_multisample GL_ARB_multitexture GL_ARB_occlusion_query GL_ARB_occlusion_query2 GL_ARB_pixel_buffer_object GL_ARB_point_parameters GL_ARB_point_sprite GL_ARB_provoking_vertex GL_ARB_robustness GL_ARB_sample_shading GL_ARB_sampler_objects GL_ARB_seamless_cube_map GL_ARB_separate_shader_objects GL_ARB_shader_atomic_counters GL_ARB_shader_bit_encoding GL_ARB_shader_image_load_store GL_ARB_shader_objects GL_ARB_shader_precision GL_ARB_shader_subroutine GL_ARB_shading_language_100 GL_ARB_shading_language_420pack GL_ARB_shading_language_include GL_ARB_shading_language_packing GL_ARB_shadow GL_ARB_sync GL_ARB_tessellation_shader GL_ARB_texture_border_clamp GL_ARB_texture_buffer_object GL_ARB_texture_buffer_object_rgb32 GL_ARB_texture_compression GL_ARB_texture_compression_bptc GL_ARB_texture_compression_rgtc GL_ARB_texture_cube_map GL_ARB_texture_cube_map_array GL_ARB_texture_env_add GL_ARB_texture_env_combine GL_ARB_texture_env_crossbar GL_ARB_texture_env_dot3 GL_ARB_texture_float GL_ARB_texture_gather GL_ARB_texture_mirrored_repeat GL_ARB_texture_multisample GL_ARB_texture_non_power_of_two GL_ARB_texture_query_lod GL_ARB_texture_rectangle GL_ARB_texture_rg GL_ARB_texture_rgb10_a2ui GL_ARB_texture_storage GL_ARB_texture_swizzle GL_ARB_timer_query GL_ARB_transform_feedback2 GL_ARB_transform_feedback3 GL_ARB_transform_feedback_instanced GL_ARB_transpose_matrix GL_ARB_uniform_buffer_object GL_ARB_vertex_array_bgra GL_ARB_vertex_array_object GL_ARB_vertex_attrib_64bit GL_ARB_vertex_buffer_object GL_ARB_vertex_program GL_ARB_vertex_shader GL_ARB_vertex_type_2_10_10_10_rev GL_ARB_viewport_array GL_ARB_window_pos GL_ATI_draw_buffers GL_ATI_texture_float GL_ATI_texture_mirror_once GL_S3_s3tc GL_EXT_texture_env_add GL_EXT_abgr GL_EXT_bgra GL_EXT_bindable_uniform GL_EXT_blend_color GL_EXT_blend_equation_separate GL_EXT_blend_func_separate GL_EXT_blend_minmax GL_EXT_blend_subtract GL_EXT_compiled_vertex_array GL_EXT_Cg_shader GL_EXT_depth_bounds_test GL_EXT_direct_state_access GL_EXT_draw_buffers2 GL_EXT_draw_instanced GL_EXT_draw_range_elements GL_EXT_fog_coord GL_EXT_framebuffer_blit GL_EXT_framebuffer_multisample GL_EXTX_framebuffer_mixed_formats GL_EXT_framebuffer_object GL_EXT_framebuffer_sRGB GL_EXT_geometry_shader4 GL_EXT_gpu_program_parameters GL_EXT_gpu_shader4 GL_EXT_multi_draw_arrays GL_EXT_packed_depth_stencil GL_EXT_packed_float GL_EXT_packed_pixels GL_EXT_pixel_buffer_object GL_EXT_point_parameters GL_EXT_provoking_vertex GL_EXT_rescale_normal GL_EXT_secondary_color GL_EXT_separate_shader_objects GL_EXT_separate_specular_color GL_EXT_shader_image_load_store GL_EXT_shadow_funcs GL_EXT_stencil_two_side GL_EXT_stencil_wrap GL_EXT_texture3D GL_EXT_texture_array GL_EXT_texture_buffer_object GL_EXT_texture_compression_dxt1 GL_EXT_texture_compression_latc GL_EXT_texture_compression_rgtc GL_EXT_texture_compression_s3tc GL_EXT_texture_cube_map GL_EXT_texture_edge_clamp GL_EXT_texture_env_combine GL_EXT_texture_env_dot3 GL_EXT_texture_filter_anisotropic GL_EXT_texture_format_BGRA8888 GL_EXT_texture_integer GL_EXT_texture_lod GL_EXT_texture_lod_bias GL_EXT_texture_mirror_clamp GL_EXT_texture_object GL_EXT_texture_shared_exponent GL_EXT_texture_sRGB GL_EXT_texture_sRGB_decode GL_EXT_texture_storage GL_EXT_texture_swizzle GL_EXT_texture_type_2_10_10_10_REV GL_EXT_timer_query GL_EXT_transform_feedback2 GL_EXT_vertex_array GL_EXT_vertex_array_bgra GL_EXT_vertex_attrib_64bit GL_EXT_import_sync_object GL_IBM_rasterpos_clip GL_IBM_texture_mirrored_repeat GL_KTX_buffer_region GL_NV_alpha_test GL_NV_blend_minmax GL_NV_blend_square GL_NV_complex_primitives GL_NV_conditional_render GL_NV_copy_depth_to_color GL_NV_copy_image GL_NV_depth_buffer_float GL_NV_depth_clamp GL_NV_explicit_multisample GL_NV_fbo_color_attachments GL_NV_fence GL_NV_float_buffer GL_NV_fog_distance GL_NV_fragdepth GL_NV_fragment_program GL_NV_fragment_program_option GL_NV_fragment_program2 GL_NV_framebuffer_multisample_coverage GL_NV_geometry_shader4 GL_NV_gpu_program4 GL_NV_gpu_program4_1 GL_NV_gpu_program5 GL_NV_gpu_program_fp64 GL_NV_gpu_shader5 GL_NV_half_float GL_NV_light_max_exponent GL_NV_multisample_coverage GL_NV_multisample_filter_hint GL_NV_occlusion_query GL_NV_packed_depth_stencil GL_NV_parameter_buffer_object GL_NV_parameter_buffer_object2 GL_NV_path_rendering GL_NV_pixel_data_range GL_NV_point_sprite GL_NV_primitive_restart GL_NV_register_combiners GL_NV_register_combiners2 GL_NV_shader_atomic_counters GL_NV_shader_buffer_load GL_NV_texgen_reflection GL_NV_texture_barrier GL_NV_texture_compression_vtc GL_NV_texture_env_combine4 GL_NV_texture_expand_normal GL_NV_texture_lod_clamp GL_NV_texture_multisample GL_NV_texture_rectangle GL_NV_texture_shader GL_NV_texture_shader2 GL_NV_texture_shader3 GL_NV_transform_feedback GL_NV_transform_feedback2 GL_NV_vertex_array_range GL_NV_vertex_array_range2 GL_NV_vertex_attrib_integer_64bit GL_NV_vertex_buffer_unified_memory GL_NV_vertex_program GL_NV_vertex_program1_1 GL_NV_vertex_program2 GL_NV_vertex_program2_option GL_NV_vertex_program3 GL_NVX_conditional_render GL_NVX_gpu_memory_info GL_OES_depth24 GL_OES_depth32 GL_OES_depth_texture GL_OES_element_index_uint GL_OES_fbo_render_mipmap GL_OES_get_program_binary GL_OES_mapbuffer GL_OES_packed_depth_stencil GL_OES_rgb8_rgba8 GL_OES_standard_derivatives GL_OES_texture_3D GL_OES_texture_float GL_OES_texture_float_linear GL_OES_texture_half_float GL_OES_texture_half_float_linear GL_OES_texture_npot GL_OES_vertex_array_object GL_OES_vertex_half_float GL_SGIS_generate_mipmap GL_SGIS_texture_lod GL_SGIX_depth_texture GL_SGIX_shadow GL_SUN_slice_accum GL_WIN_swap_hint WGL_EXT_swap_control


Sponsor:

#2 mhagain   Crossbones+   -  Reputation: 7833

Like
0Likes
Like

Posted 09 May 2012 - 05:27 PM

If you're getting GL 4.2 on Windows 7 it's most definitely not using a software implementation - you would be getting 1.1 if so. It's also the case that it you had rasterization on the CPU the symptoms would be a LOT more dramatic than just a jump to 100% usage on one core. You would be running at about 1fps (or less), for example - software rasterization in a GL app really is that slow.

You should post some code - including your main loop, and where your draw call(s) is/are being done, to enable further analysis. Depending on how your program is structured this may well be perfectly normal behaviour (as in it's using 100% CPU because your code is telling it to do so).

It appears that the gentleman thought C++ was extremely difficult and he was overjoyed that the machine was absorbing it; he understood that good C++ is difficult but the best C++ is well-nigh unintelligible.


#3 Conoktra   Members   -  Reputation: 140

Like
0Likes
Like

Posted 09 May 2012 - 07:00 PM

Thanks mhagain!

My rendering code/main loop is pretty complicated (18 rendering passes, post processing, threading, etc) and I know that its not the issue.

I have tested the code on another system (Windows 7 64-bit, Radeon 5650) and it does not suffer from the same issues. After a bunch of work profiling I have tracked it down to the shader calls (not the shaders themselves!) but calles to glGetUniformLocation and whatnot.... which is odd. By commenting all shader calls out the CPU usage drops back to 1-2%.


This code causes the 100% CPU usage:
void glsl_shader_uniform_texture(glsl_shader * shader, u32 slot) {
	static cc8 * slot_names[] = {
		"texture_0",
		"texture_1",
		"texture_2",
		"texture_3",
		"texture_4",
		"texture_5",
		"texture_6",
		"texture_7",
	};
	assert(slot < 8);
	//glActiveTextureARB(GL_TEXTURE0_ARB + slot);
	glClientActiveTextureARB(GL_TEXTURE0_ARB + slot);
	s32 id = glGetUniformLocationARB(shader->opengl_id, slot_names[slot]);
	glUniform1iARB(id, slot);
}

void glsl_shader_uniform_int(glsl_shader * shader, cc8 * name, s32 value) {
	s32 id = glGetUniformLocationARB(shader->opengl_id, name);
	glUniform1iARB(id, value);
}

void glsl_shader_uniform_float(glsl_shader * shader, cc8 * name, f32 value) {
	s32 id = glGetUniformLocationARB(shader->opengl_id, name);
	glUniform1fARB(id, value);
}

void glsl_shader_uniform_vec2f(glsl_shader * shader, cc8 * name, vec2f * value) {
	s32 id = glGetUniformLocationARB(shader->opengl_id, name);
	glUniform2fARB(id, value->x, value->y);
}

void glsl_shader_uniform_vec3f(glsl_shader * shader, cc8 * name, vec3f * value) {
	s32 id = glGetUniformLocationARB(shader->opengl_id, name);
	glUniform3fARB(id, value->x, value->y, value->z);
}

void glsl_shader_uniform_vec4f(glsl_shader * shader, cc8 * name, vec4f * value) {
	s32 id = glGetUniformLocationARB(shader->opengl_id, name);
	glUniform4fARB(id, value->x, value->y, value->z, value->w);
}


This code uses 1-2% of the CPU:
void glsl_shader_uniform_texture(glsl_shader * shader, u32 slot) {
	static cc8 * slot_names[] = {
		"texture_0",
		"texture_1",
		"texture_2",
		"texture_3",
		"texture_4",
		"texture_5",
		"texture_6",
		"texture_7",
	};
	assert(slot < 8);
	//glActiveTextureARB(GL_TEXTURE0_ARB + slot);
	glClientActiveTextureARB(GL_TEXTURE0_ARB + slot);
	//s32 id = glGetUniformLocationARB(shader->opengl_id, slot_names[slot]);
	//glUniform1iARB(id, slot);
}

void glsl_shader_uniform_int(glsl_shader * shader, cc8 * name, s32 value) {
	//s32 id = glGetUniformLocationARB(shader->opengl_id, name);
	//glUniform1iARB(id, value);
}

void glsl_shader_uniform_float(glsl_shader * shader, cc8 * name, f32 value) {
	//s32 id = glGetUniformLocationARB(shader->opengl_id, name);
	//glUniform1fARB(id, value);
}

void glsl_shader_uniform_vec2f(glsl_shader * shader, cc8 * name, vec2f * value) {
	//s32 id = glGetUniformLocationARB(shader->opengl_id, name);
	//glUniform2fARB(id, value->x, value->y);
}

void glsl_shader_uniform_vec3f(glsl_shader * shader, cc8 * name, vec3f * value) {
	//s32 id = glGetUniformLocationARB(shader->opengl_id, name);
	//glUniform3fARB(id, value->x, value->y, value->z);
}

void glsl_shader_uniform_vec4f(glsl_shader * shader, cc8 * name, vec4f * value) {
	//s32 id = glGetUniformLocationARB(shader->opengl_id, name);
	//glUniform4fARB(id, value->x, value->y, value->z, value->w);
}

Binding & using the shaders on the scene does not cause the issue. It is these calls to glGetUniformLocationARB() and glUniform*() that cause the 100% CPU usage. And it doesn't happen at all on my secondary computer.

Any ideas how to fix this?

EDIT - I was reading that on certain nvidia drivers calling glUniform forces a recompile of the shader, no confirmation on this though.

Edited by Conoktra, 09 May 2012 - 11:20 PM.


#4 mhagain   Crossbones+   -  Reputation: 7833

Like
0Likes
Like

Posted 10 May 2012 - 05:06 AM

I think the NV recompiling shaders problem is confined to older hardware/drivers - GeForce FX series perhaps.

There are a coupla things you can do with uniforms that may help here. One is that a uniform location and value are associated with the program object, so for your sampler uniforms you just need to set them once and the values will stick for subsequent uses of that program, including if you change the currently program (via glUseProgram), and until such a time as the program is re-linked. This works in a similar manner to glTexParameter with texture objects, if that helps you to understand what's happening here.

You can also cache the uniform locations after load (just store them in some variables) and use those cached locations for subsequent glUniform calls, which can avoid having to call glGetUniformLocation every time. You could probably wrap all of this in a nice Material class if you wanted.

I believe that with GL4.2 you can set explicit uniform locations in your shader code too, so if you want to jump up to that level as a requirement that gives you another option.

Edited by mhagain, 10 May 2012 - 05:07 AM.

It appears that the gentleman thought C++ was extremely difficult and he was overjoyed that the machine was absorbing it; he understood that good C++ is difficult but the best C++ is well-nigh unintelligible.


#5 Hodgman   Moderators   -  Reputation: 29567

Like
1Likes
Like

Posted 10 May 2012 - 05:27 AM

But, the moment I make a single 3D draw call per-frame (using a VBO and glDrawArrays) it jumps from 1-2% CPU usage too maxing out the entire core my rendering thread is on!

I'm guessing this is measured from task manager? Does the game still run at 60Hz despite the CPU usage increase?

think the NV recompiling shaders problem is confined to older hardware/drivers - GeForce FX series perhaps

I know that at least up until the GeForce 8, pixel shader uniforms didn't actually exist in hardware so whenever your changed a uniform value, the nVidia driver would have to generate a whole new shader program with the new uniforms hard-coded into it... Posted Image

Edited by Hodgman, 10 May 2012 - 05:29 AM.


#6 mhagain   Crossbones+   -  Reputation: 7833

Like
0Likes
Like

Posted 10 May 2012 - 07:29 AM

I know that at least up until the GeForce 8, pixel shader uniforms didn't actually exist in hardware so whenever your changed a uniform value, the nVidia driver would have to generate a whole new shader program with the new uniforms hard-coded into it... Posted Image

That recent? Urffff.

I guess a workaround might be to set uniforms to your vertex shader and pass them as varyings to your fragment shader then.

It appears that the gentleman thought C++ was extremely difficult and he was overjoyed that the machine was absorbing it; he understood that good C++ is difficult but the best C++ is well-nigh unintelligible.


#7 Conoktra   Members   -  Reputation: 140

Like
0Likes
Like

Posted 10 May 2012 - 02:38 PM

Thanks for the help guys :). Eliminating the per-frame shader calls didn't help much. It appears that if I make one call to glUniform*() a frame it causes this horrendous CPU usage. Comment out that one call and it runs fine at 1-2% of the CPU being used.


Where would I go to report a bug in a nvidia driver?

#8 mhagain   Crossbones+   -  Reputation: 7833

Like
0Likes
Like

Posted 10 May 2012 - 03:46 PM

I wouldn't quite describe 100% usage of a CPU core as "horrendous" - despite the fact that it doesn't happen on a second machine, it may still be perfectly normal and expected behaviour.

Before you do anything else, read this: http://www.gamedev.net/topic/445787-game-loop---free-cpu/ and this: http://www.gamedev.net/topic/193322-an-empty-gameloop-takes-100-cpu-usage/ and this: http://stackoverflow.com/questions/2363206/windows-game-loop-50-cpu-on-dual-core

Then check out Hodgman's first question: are you still getting 60fps despite this? Or at least performance that is in a comparable ballpark?

If it's still something that you want to avoid, you could try putting a "Sleep (1)" call into your main loop. This is normally not a good thing for a high-performance program that needs to be greedy for resources, but maybe your program doesn't fall into that category? If you still get good and smooth performance with it, then you know that you've just got a classic busy-wait loop and it's not a bug or anything like that. If everything turns jerky and uneven then it might be time to start looking elsewhere.

It appears that the gentleman thought C++ was extremely difficult and he was overjoyed that the machine was absorbing it; he understood that good C++ is difficult but the best C++ is well-nigh unintelligible.


#9 Conoktra   Members   -  Reputation: 140

Like
0Likes
Like

Posted 10 May 2012 - 04:17 PM

Thanks again :).

Edited by Conoktra, 10 May 2012 - 04:38 PM.


#10 Sik_the_hedgehog   Crossbones+   -  Reputation: 1608

Like
0Likes
Like

Posted 11 May 2012 - 12:51 AM

think the NV recompiling shaders problem is confined to older hardware/drivers - GeForce FX series perhaps

I know that at least up until the GeForce 8, pixel shader uniforms didn't actually exist in hardware so whenever your changed a uniform value, the nVidia driver would have to generate a whole new shader program with the new uniforms hard-coded into it... Posted Image

OUCH.

I know of a different issue, where the driver would rebuild the shader programs if the uniforms had a value of 0.0, 0.5 or 1.0, to be able to generate optimized shader code... except it backfired since it ate up a lot of CPU time, as you can guess. I think this applied to all shaders (not just pixel ones), but I'm not sure. This was ages ago so it may have been changed by now.
Don't pay much attention to "the hedgehog" in my nick, it's just because "Sik" was already taken =/ By the way, Sik is pronounced like seek, not like sick.

#11 l0calh05t   Members   -  Reputation: 691

Like
0Likes
Like

Posted 11 May 2012 - 02:07 AM

Sounds like a VSync issue to me. Try turning Threaded Optimization OFF in the nvidia control panel and see if that changes anything.




Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.



PARTNERS