Jump to content
  • Advertisement

TwoNybble

Member
  • Content Count

    11
  • Joined

  • Last visited

Community Reputation

186 Neutral

About TwoNybble

  • Rank
    Member
  1. Thanks for the reply. I suspected that it may be optimization, but I had hoped that there would be more control over this behavior. I suppose until a better solution is found, I will have to make use of one of these workarounds. It's unfortunate that OpenGL does not specify optimization and other "standard" client-side features in GLSL compilers, but I guess I understand why they chose not to.
  2.   Yes, this was one of the sources I used for the double emulation in GLSL. And it did/does work for me as well. As I stated it worked on Debian Jessie with Intel HD 4000 graphics, and on my current Windows 7 computer with an AMD R9 390. However, it doesn't work on certain configurations (which I'm not totally sure of) such as Debian Testing (Linux kernel 4) on the same exact laptop, and a Dell Windows lab computer that I will need to look up the hardware details of.     Yes, you are correct here, this is a mistake where I previously do a std::setprecision(17) to look at some doubles earlier in the code. These are indeed both single precision numbers, and any digits after 7 are inaccurate in this case. Rest assured that both calculations are indeed single precision as I stated.   However, the other calculation (the third sample in which the GPU version is just 0) shows that the CPU variant (which is totally single precision floats) and the GPU variant (also single precision double emulation) do not agree despite doing the same thing. I realize that float may not conform to the IEEE standard on the GPU, but I should still have 23 bits (24 implied) of precision. It is also strange to me that this behavior has changed on the same exact GPU, with different drivers or configuration. I suppose it's possible they changed their float implementation but this seems odd.     I'm afraid I've explained this quite poorly. Yes, I am indeed emulating doubles, and no where in the double emulation am I using doubles as you mention (except for splitting on the CPU side). I do a double sum of the numbers for comparison purposes (not shown), but the two implementations I showed above are indeed single precision double emulation. The CPU double emulation comes within two decimal digits of the "correct" double sum. The GPU double emulation is no more accurate than if I had cast them to floats.   Truncate was a poor choice of word on my part. I meant that I simply used single precision types instead of my double emulation in the GLSL on my Windows build to see what the effects of low precision are on the working simulation.   To summarize: Windows 7 (AMD R9 390) Debian Jessie (Intel HD 4000) Debian Testing (Intel HD 4000) Double Emulation Correct Behavior, Correct Behavior, Incorrect Behavior, (using single) nearly "real" double nearly "real" double precision no better precision. precision. than just using floats.   You are correct that double emulation does not quite give the same precision as real doubles. However, my two working builds have been within 2 decimal digits, which is more than enough for my purposes. I know for a fact that when the double emulation is working correctly, there is enough precision and there is no staircase effect.   I am stumped as to why two seemingly identical implementations (GPU/CPU) are giving such significantly different results.
  3. Unfortunately, there are a fair number of double emulation calls, so I think I'd like to stick to them if I can get it working consistently. I plan on making use of hardware doubles on GPUs on which they perform well   The doubles are not actually used on the GPU, or at all after they are split on the CPU side. The inputs to both the CPU and GPU ds_add function (vec2 a, vec2 b) are simply a vec2 that represents the "high" and "low" parts of the original double-type variable. The splitting function looks like this: void ds_split(double d, float& hi, float& lo) { hi = (float) d; lo = (float) (d - hi); } However this is performed on the CPU for both the CPU and GPU tests, and so the inputs to each are as expected. For example, the double d1 is decomposed into a hi and lo piece that is used to create the vec2 a: double d1 = 1.20385132193021958120313214; float hi, lo; ds_split(d1, hi, lo); glm::vec2 a(hi, lo); In the above example:   a.x = 1.2038513422012329 a.y = -2.0271013312367359e-08   So yes, the doubles are being transformed into 2 floats, but I don't believe this is where the problem lies. Thanks for the response!
  4. I have been working on a space-scale game that makes use of double emulation on the GPU to compute positions relative to the camera (RTE). This has worked beautifully on both my Debian (Linux) Laptop with an Intel HD 4000, and on my Windows PC (AMD R9 390). However, I recently upgraded my Debian distro, which I assume also installed a newer version of the Intel graphics drivers and Mesa (13.0.2). After this upgrade, my planet rendering lost all its precision and now the surface is like a "staircase" instead of the nice smooth land I had before. To confirm that this is a precision issue, I truncated the floating point operations on the working Windows build to only 32 bit, and it exhibited the same "staircase" surface from lack of precision.   I'm having difficulty figuring out where these precision issues are coming from. I am using identical code with the floating point type in GLSL, and both implementations return a floating point precision of p=23 when I query the GLSL vertex shader precision from OpenGL. I also suspected at first that it may be an optimization problem since double emulation has algorithms that could mathematically be simplified, but must be executed separately due to floating point rounding. However, when I set MESA_GLSL=nopt, which is the command to disable shader optimization, I still see the same effect.   Finally, to get to the root of the problem, I began dissecting each individual calculation on the GPU and writing them to a 32 bit floating point texture to read back on the CPU. I implemented the same function on the CPU side, which works as I expect. However, the results on the GPU side do no line up.   GLSL: vec2 ds_add(vec2 a, vec2 b) { float t1 = a.x + b.x; float e = t1 - a.x; float t2 = ((b.x - e) + (a.x - (t1 - e))) + a.y + b.y; //the above is the standard DSFUN90 Knuth algorithm. vec2 ret; ret.x = t1 + t2; ret.y = t2 - (ret.x - t1); return ret; } Sample Parts:   b.x - e = 0 t1 - e = 1.2038513422012329 a.x - (t1 - e) = 0   ret.x = 4.3432750701904297 ret.y = 0   CPU: glm::vec2 ds_add(glm::vec2 a, glm::vec2 b) { float t1 = a.x + b.x; float e = t1 - a.x; float t2 = ((b.x - e) + (a.x - (t1 - e))) + a.y + b.y; glm::vec2 ret; ret.x = t1 + t2; ret.y = t2 - (ret.x - t1); return ret; } Sample Parts:   b.x - e = 0 t1 - e = 1.2038512229919434 (diverges from GPU after 7 decimal places) a.x - (t1 - e) = 1.1920928955078125e-07 (the small bit I expect to see on the GPU but only shows up on the CPU implementation)   ret.x = 4.3432750701904297 ret.y = 9.1924000855669874e-08 (missing from GPU result)   From the above sample output, the GPU is somehow not lining up with the CPU results. The inputs to this function are split identically on the CPU side, so the problem can't be there. But for completeness, the inputs for this test were:   double d1 = 1.20385132193021958120313214; double d2 = 3.13942384018421752835103287;   Those are the two numbers that are being summed together. The CPU implementation above is very close to the purely double addition (to within 10^-14), while the GPU implementation is only within (10^-7), or no better than the purely float-type implementation. What is going on? Why are the newer drivers or Mesa breaking my program's past behavior?
  5. Thanks guys, that looks to be what the problem was! I thought 2-bytes would be fine, but I was indeed substituting an odd number of texels (the pages are (2^n)+1, but the texture is POT). I'm glad it was a simple fix.
  6. As part of my terrain streaming system, I page heightmap detail levels into a larger texture using the same method I use for the normal- and splat-map datasets, which both seem to work perfectly fine. However, the heightmap data seems to substitute in with visible artifacts, and also seems to insert or remove an additional pixel for every three in the substitution. This results in the page being unusable and skewed by about a third.   Now, I'm suspicious that it has to do with the unorthodox internal format of the texture; it is a single component, 16-bit texture (GL_RED, GL_R16, GL_UNSIGNED_SHORT). The other two datasets use relatively normal formats (GL_RGB8 for normals, and GL_RGBA8 for splat-values), and so that would explain the problem only appearing in the height data. I've double-checked my code, and I'm very positive that this is not a problem with the data or the substitution routines.   I am currently using Intel HD 4000 integrated graphics on a Linux system with Mesa drivers, both of which I realize could cause problems for me. This is still a dilemma if this is the root of the problem, because I'd like to be able to support Intel HD 4000 graphics for my game. I've been thinking that, worse case scenario, I can create a "normal" format texture (GL_RGB8) and simply use both R and G for my 16-bit height values, ignoring the B component. I'm just wondering if anybody has experienced this and/or has found a simple fix that I've been overlooking?   Here are some visual examples of this problem, taken directly from the textures themselves:   [attachment=28317:02.png] A simple checkerboard dataset, which shows the 1/3 "skewing" effect, as well as some artifacts throughout. There was no size mismatch in the data during the substitution, which I suspected could be the cause of the misalignment at first.   [attachment=28318:01.png] The actual height test dataset, which is a simple cosine 3D surface (better seen in the following image). The data should look more or less like the next image, but instead is skewed, and has a strange triangular artifact. I have no idea where this might be originating from. The data looks fine when I plot it using other software.   [attachment=28316:03.png] This is the test splatmap, which looks great. I'm not handling this any differently besides the obvious different formatting, etc.
  7.   I don't know if this is just a simplification of how you really split off the logic thread or not, but judging from what I see, this is likely a really bad way of doing it. If you are running the thread, and drawing the objects directly from the state being kept on the logic thread, you are going to run into a lot of problems. Variables need to be synchronized (or locked) before they can be accessed from other threads to avoid race conditions and read/write trouble.   Unfortunately, this means that if you want to continue to try to separate the logic and rendering threads, you will need to create a much better, safer way of passing data between the two threads. Generally this is done with a secure blocking queue or buffer with which you pass information about objects to be drawn. This can be fairly complex to do correctly. Bad implementations can actually end up being worse than single thread execution, always waiting on the other thread to complete and slowing the whole thing down.   A good place to start researching a proper solution is the Replica Island blog and source code.   You can ignore all that if you are already implementing something like that.     Just from looking at this snippet, I'd say that the batch likely is not your problem, especially since it looks like you are only putting 8 triangles in it... Although, I would suggest simply skipping the intermediate container and building the FloatBuffer directly. If you are completely replacing the Buffer Object each time, I would recommend using glBufferData instead of the “Sub” variant. Also, try drawing directly with the FloatBuffer instead of the Buffer Object, just to eliminate the possibility of them being slow on your particular device.   Lastly, consider keeping your backgrounds as static geometry in a static buffer and translating each in the vertex shader. Rebuilding the VBO each time with new coordinates can be very slow, although with only 8 triangles, I doubt this is where you are experiencing your slowdown.   Worst case scenario, try on a few different devices if you can to make sure it is not simply an inefficient implementation of some OpenGL ES functionality.
  8. You never actually mentioned it, but it looks to me like you are using OpenGL ES with Android. Your "game loop" is actually the onDrawFrame callback, which the system will try to call each time the screen is refreshed. If your frames are longer than this refresh rate (usually 16.67 milliseconds or 60 FPS), they will start to queue up on the GPU. This will either result in ignored frames and stuttering like what you are experiencing.   From your data, it seems that this may be what is happening. Notice that your frames are frequently longer than 0.0167 seconds, and that this error accumulates several times over 0.03 seconds. When the onDrawFrames callback gets that far behind, you're going to see some stuttering as the system tries to play catch up. I suspect that you may have an inefficient implementation behind your "batch" object, or simply are not using a very powerful testing device (if you're using an emulator, try with a physical device to make sure this is even an issue in reality).   If you are sure that you are running as efficiently as possible on capable hardware, you may want to look into separating your update and rendering logic into their own threads. In this scheme, you could be updating the game state at the same time as you are rendering a previous frame. This can be much more efficient on Android devices -- especially those with multiple cores.
  9. You could cast your epoch time to std::time_t and then use std::localtime or std::gmtime to return a std::tm struct.
  10. TwoNybble

    Making a tool for the engine

    I've experimented with adding a GUI to my map editor using qt and a few other solutions that work with OpenGL. However, in the end I still decided to work with my engine and build the GUI myself, which was good enough for my needs.   Most game engines will need some sort of interactive menu system, even if it is just a title menu or HUD. If your engine already has this system in place, it should be easy enough to extend for this purpose. It may even give you a chance to create a more capable GUI that you can use in the game itself.   If you do not have any type of menu system already in place, it may be worth it to start here. It is not as difficult as it sounds for most simple purposes, and it may be a useful addition to your engine. It can be accomplished by rendering a 2D orthographic layer on top of your viewport.   A homebrew GUI system obviously might not be good enough if you need a lot of complex features and want to avoid reinventing the wheel. Libraries like qt can replace your windowing system (GLUT, GLFW, etc) and provide you with a window for rendering OpenGL graphics. In this case, it should not affect the “features and stuff” you already have implemented in your engine, although you may need to add some listeners to receive menu events. This is a good place to start for an OpenGL window in qt.   Like Promit said, it's a big topic, and there are many different approaches for different situations. It seems that you are worried about it affecting the features already in your engine, but I think they should be fine in any case as long as they are reasonably well separated.
  11. I've been working on a map editor for my 3D game written in C++/OpenGL. Up until now, I haven't been too concerned about having super elegant code in the editor since it will mostly be used internally by me. But now that I've started adding new tools and brushes, I'm realizing that a lot of the code could be refactored so that I'm not constantly rewriting the same boilerplate code. Normally this would not be a problem, but I have a unique set of constraints that is making this more complicated and frustrating.   The map data consists of a single-component floating array of elevation values, and a 4-component byte array of splatmap texture weightings. I am using an LOD scheme similar to CDLOD that keeps both of these arrays as textures on the GPU, as well as locally in the memory (I know that it would be more efficient to stream in the data; that's what I'm doing in the actual game). Because these textures could be pretty large, I need to create a temporary array of the modified region to glTexSubImage2D into the texture.   Now with that in mind, the "brushes" are where I am running into code reuse issues. The brushes are the tools that are used to modify the heightmap (elevate, etc.) or "paint" on the splatmap. So depending on which, it needs to modify and update one of the aforementioned arrays, both locally and on the GPU. The brushes also can be different shapes, for example, rectangular or circular. The loops and calculations for the rectangular or circular areas should be reusable, as well as the update code.   A solution I've been trying to implement is to have a "BrushSelection" abstract class from which I can derive with a RectangularSelection and CircularSelection class. These would have an apply function which would accept a Brush class that defines how it will operate on the data (size of selection, heights or splatmap, etc). This function will loop through the selected region and call a Brush function to apply it to the dataset. This has been plagued with problems, however, such as trying to build a temporary array to sub into the GPU. This may be clearer in my simplified attempted code: //Brush.h class Brush { public: bool paint; unsigned int size; Brush(bool paint, unsigned int size); virtual void apply(TerrData* td, float delta, unsigned int x, unsigned int y, float cur_x, float cur_y) const = 0; }; class ElevateBrush : public Brush { public: ElevateBrush(); void apply(TerrData* td, float delta, unsigned int x, unsigned int y, float cur_x, float cur_y) const; }; class BrushSelection { public: virtual void applyBrush(const Brush* brush, TerrData* td, float delta, float x, float y) = 0; }; class RectSelection : public BrushSelection { public: void applyBrush(const Brush* brush, TerrData* td, float delta, float x, float y); }; //Brush.cpp void RectSelection::applyBrush(const Brush* brush, TerrData* td, float delta, float x, float y) { unsigned int start_x = (unsigned int) std::max(0.f, x - brush->size); unsigned int start_y = (unsigned int) std::max(0.f, y - brush->size); unsigned int end_x = (unsigned int) std::min(td->dimension - 1.f, x + brush->size); unsigned int end_y = (unsigned int) std::min(td->dimension - 1.f, y + brush->size); int width = end_x - start_x; int height = end_y - start_y; GLfloat* temp_ht; GLubyte* temp_sm; if(brush->paint) { temp_sm = new GLubyte[width * height * 4]; } else { temp_ht = new GLfloat[width * height]; } for(int j = 0; j < height; j++) { for(int i = 0; i < width; i++) { brush->apply(td, delta, i + start_x, j + start_y, x, y); unsigned int data_idx = (start_x + i) + (start_y + j) * td->dimension; unsigned int idx = i + j * width; if(brush->paint) { temp_sm[idx * 4 + 0] = td->splatmap_0[data_idx * 4 + 0]; temp_sm[idx * 4 + 1] = td->splatmap_0[data_idx * 4 + 1]; temp_sm[idx * 4 + 2] = td->splatmap_0[data_idx * 4 + 2]; temp_sm[idx * 4 + 3] = td->splatmap_0[data_idx * 4 + 3]; } else { temp_ht[idx] = td->heights[data_idx]; } } } if(brush->paint) { //sub temp_sm into respective texture delete [] temp_sm; } else { //sub temp_ht into respective texture delete [] temp_ht; } } TerrData is a struct containing the two data arrays, two textures, and the side dimension size for these 2D arrays. I'm not sure if this is a good way to pass around this data, or if I should pass the temporary array instead. This of course would run into problems with having separate functions for height values and splatmap values. struct TerrData { GLfloat* heights; GLubyte* splatmap_0; Texture* tex_heights; Texture* tex_splatmap_0; unsigned int dimension; }; The problem with this approach is all of the code that will need to be repeated in each BrushSelection object for substituting into the texture and the temporary data-set handling. The repeated boolean check on whether it is a splatmap brush or heightmap brush seems unnecessary to me. I keep thinking there MUST be a more elegant way to handle all of this, but I've been stuck trying and failing with new solutions. What is the best way to reuse both the temporary data upload and selection code across different brushes and tools?
  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

GameDev.net is your game development community. Create an account for your GameDev Portfolio and participate in the largest developer community in the games industry.

Sign me up!