Jump to content

  • Log In with Google      Sign In   
  • Create Account


Member Since 28 Feb 2011
Online Last Active Today, 05:23 PM

#5163912 very strange bug (when runing c basic arrays code)

Posted by Bacterius on 30 June 2014 - 01:50 PM

Okay. Not wanting to learn to use a debugger is the straw that broke the camel's back. I will not be replying to any of your threads in the future, fir - I've been patient, but I'm just about through with you. If other people somehow still want to help you after reading through your recent threads, it is their time to waste. I am not wishing you good luck, for you obviously will not need it, being too cool for debuggers and, in fact, anyone's advice that goes against your preconceptions. Do not bother replying to this post.

#5163809 very strange bug (when runing c basic arrays code)

Posted by Bacterius on 30 June 2014 - 06:15 AM

Check that your loop limit isn't overflowing (the int type is only technically guaranteed to hold values from -32768 to +32767, and you did say you were using 32-bit XP in a previous thread). Set compiler warnings to maximum. Print the loop counter every iteration. Run through the code with a debugger and see which iteration fails. Is the bug consistent, does it always crash at the same place? If it sometimes succeeds, does it print the right answer or just garbage? What happens if you decrease the number of iterations? You know, the usual stuff. There's nothing wrong that I can spot with the code except the potential for overflow, and, indeed, it works just fine for me.


By the way, there is a difference between "doesn't crash" and "prints the right answer" - differentiating the two in your diagnostics usually helps. And also, please try to avoid tagging your thread "C language" when you are really compiling with a C++ compiler. The two languages are different and go by (often subtly) different rules - you will get into trouble eventually thinking they are interchangeable. Make up your mind on a language, be it C, C++, or C with classes, but please don't say you are using one language and then compile your code as another, that's just misleading for everyone involved.

#5163735 Can someone help me write a program?

Posted by Bacterius on 29 June 2014 - 06:07 PM

If you use Python and the terminal you can use the colorama library for coloring the different letters, which also happens to be cross-platform. For C# this functionality is built into the Console class, for C and C++ you can use rlutil, there's probably a library for it in Java, etc... if using a graphical window, color should be something very easy to add. Hopefully this helps you set up a prototype, remember to start small: first hardcode the colors for each letter, then think on how you could make them editable, and keep iterating.

#5163714 SDL.h: no such directory found

Posted by Bacterius on 29 June 2014 - 03:58 PM

Guys thanks I fixed the issue


Care to share, just so this thread isn't totally useless and other people coming across it in the future can hope to fix their problem too?

#5163601 Need more than 1 value to unpack?

Posted by Bacterius on 29 June 2014 - 06:23 AM

You need to delete the "python test11.py etc.." line from your script, since it doesn't belong there.

#5163443 decompositing and recompositing color (pixel)

Posted by Bacterius on 28 June 2014 - 08:15 AM



color = (red <<16) + (green<<8) + blue;

this strikes me both as an ugly and probably inefficient..

Yeah, me too...
You should really be using bitwise-or:
color = (red<<16) | (green<<8) | blue;
There we go -- much better.


why much? iznt add one cycle and well optymized?



No, there is no time difference, addition and bitwise OR take the same time in most hardware (see the Intel docs on throughput/latency of both instructions, they are identical and quite fast indeed). On (very) old hardware bitwise OR could even be slightly faster since you don't need to carry bits, but good luck measuring that. There is also no runtime difference as long as red, green and blue are no larger than a byte. But it's slightly more readable, because when packing bytes into a single word you are not really doing any addition in the usual sense, you're just.. packing bits. So in this sense bitwise OR is better than addition, not that it matters much (both will give wrong answers if red, green and blue are wider than 8 bits anyway).


And who knows? If you write it with | instead of + a compiler might actually recognize what you're trying to do and use a special CPU instruction that can pack bytes very quickly (unlikely, but perhaps on DSP's - digital signal processors). When writing C or C++ code, you're talking to the compiler, not the CPU. Without resorting to manually written assembly, your code will be going through the compiler, so if your goal is to make your code fast, you had better make your code as clear and to-the-point as possible, so that the compiler can understand your intent better (and, yes, it does try - compilers have many heuristics that recognize common code patterns). Writing convoluted code will just cause the compiler to give up and emit suboptimal code. As an added bonus, compiler-friendly code is also often human-friendly code. Yes, there are exceptions, in some cases you can produce faster code by writing code a certain way in bottleneck situations, and intrinsics are a nice middle ground between standard code and full-on assembly which can boost performance immensely if you use them just right, but to be blunt, by going over your snippets in your various threads, you are really not at this stage yet.


How can you claim with a straight face that you've properly profiled your code "100x more" and identified likely bottlenecks when you are still questioning in this very thread whether bitwise OR is less "optymized" than addition? You keep getting tons of very useful advice that you really should follow, but you keep brushing it off as "propaganda" as if you were too good for it. It's getting very repetitive. If you think you know better, why are you asking for advice? If you are not looking for help, why are you making threads?


My final advice to you is: get off your high horse and face the possibility that you actually might not know everything (or anything) about optimization. Then try and modify your code and see what changes in the resulting assembly to learn what your compiler does and does not do. Read up a bit on how CPU hardware works, and get familiar with at least the basics of your own architecture (probably x86 Pentium 3 or Core 2). Find existing C/C++ code on github or whatever. There have to be dozens of software rasterizers online - you could study a few and see how they implemented various parts of their pipeline. Learn from other people's code, compare it to yours. It is hard work, yes. But asking vague questions on a forum unfortunately only gets you so far - to learn to write fast code, you must work at it. There's no secret. If you don't want to take this advice, your loss. I will have only wasted 15 minutes writing it.

#5163126 Need sugestion to optimize realtime bitblt->StretchBlt (scaling takes to...

Posted by Bacterius on 26 June 2014 - 06:32 PM

Scaling is expensive. This is why, for instance, Fraps only offers fullscreen or halfscreen resolution when it captures the screen, so that scaling is either unnecessary or very easy. It's just too costly to handle cases where you're not scaling down to a power of two of the original size, because you have to handle filtering of multiple overlapping pixels, which also blows your cache because the resulting memory access patterns are.. suboptimal to say the least. There is one thing you can try, which drastically helped me back when I was on my old laptop and trying to capture stuff, which is to set your screen resolution to 16-bit colors.


Otherwise, what I would try is instead get a copy of the screen in a GPU-bound texture, using your favorite graphics API, scale it using the pixel shader (which ought to be fast enough, even with bilinear filtering) and read it back on the CPU. But this might incur significant capture latency, and might not be feasible, and you said you don't want to do this anyway, so...


I'm not sure what your specific needs are, but have you considered simply getting one of those screen recording devices? They don't have any system overhead and you might be able to get one to feed back into the computer to deliver the screen contents in realtime.

#5162966 Using 3rd party libraries or code your own?

Posted by Bacterius on 26 June 2014 - 02:46 AM

On big-endian architectures you can actually just do the final comparison for 4/8 byte chunks just as you would for a byte-by-byte approach (of course, most consumer hardware is little-endian). I would also imagine that unless your buffers are in cache to begin with, your implementation is likely going to be memory-bound. In short, memcmp is not the first function I would look at when micro-optimizing, compared to, say, memory allocation patterns, excessive use of virtual functions, poor cache usage, etc... but, yes, it will probably make you a better programmer to really understand how it is typically implemented on a variety of architectures.

#5162522 Why does this matrix multiplication order matter?

Posted by Bacterius on 24 June 2014 - 06:34 AM

in words a column vector time M time V time P equals to row vector time transpose(M time V time P)



That is almost true (up to transposition, e.g. you'll get the same result, just as either a row or column vector, obviously, and you must know that a column vector is to be multiplied from the right i.e. matrix * column vector, not from the left like a row vector is). I get what you are saying. But it is not equivalent to your following statement that



P*(V*(M*p)) = p*P*V*M

and this means

P*(V*(M*p)) =!= P*V*M*p



Which is what we are desperately trying to tell you! What you really want to say is that if \(p\) is the vector in column notation, and \(p^T\) is the vector in row notation, where the \(^T\) indicates matrix transposition, then:


\( \left ( P \cdot V \cdot M \cdot p \right )^T = p^T \cdot \left ( P \cdot V \cdot M \right )^T \)


Which is trivially true from the properties of the matrix transpose. But note this is not at all what you end up saying... and what people have been trying to tell you. Now if you'd actually stopped and read what people were saying, you might have noticed your equations were missing the transpose bit that makes them true (or even just meaningful). I get column/row vector semantics don't mean much to most programmers, but that doesn't make it right - factually, your equations are incorrect and misleading, and the second one (literally) contradicts associativity. I hope you understand now that clearly distinguishing between row and column vectors is essential or you can easily arrive to major contradictions (here, to be clear, the mistake is that you use "p" for both the row and column vector).


And finally, you will notice this is clearly not the problem polyfrag is having, as has been explained on page one of this thread.

#5162453 Why does this matrix multiplication order matter?

Posted by Bacterius on 23 June 2014 - 08:07 PM

JohnnyCode, it's not that people are angry at you specifically, it's just that what you are saying is factually wrong. Matrix multiplication (and that includes vector multiplication because vectors are also matrices) is associative. In other words:


projection * (view * (model * position))


is the same as


projection * view * model * position


and is the same as


((projection * view) * model) * position


but it is (generally) not the same as multiplying them in another order, e.g. view * model * projection * position. This is what order means. Order does not mean the order of the multiplications within the brackets, which is irrelevant by associativity, it means whether you are doing left or right multiplication of the different pairs of matrices involved.


PERIOD. If you still deny this then you seriously need to open a linear algebra textbook (or even look it up online, or even do the math yourself to see they are always the same) because your knowledge of matrices and vectors is wrong and you are not helping the OP by giving incorrect information all over this thread (in addition to derailing it).




There are so many posts of not only yours full of bolocks now that I refuse to deal with this anymore.


Good, us too. Now go learn how matrix multiplication works before posting about it again, please.


Let's get back to the actual thread now, please.

#5162311 -march=pentium3 -mtune=generic -mfpmath=both ?

Posted by Bacterius on 23 June 2014 - 06:27 AM

by query performance counter ...


Uhh, you're going to have tons of latent cache effects if you just time random portions of code. Just use a sampling profiler, it will tell you exactly which functions your program spends the most time in by directly sampling the instruction pointer, minus the voodoo and uncertainty. You cannot guess performance by eyeballing how many multiplications you're doing or how many variables you're using in your code, hardware doesn't work that way anymore (though perhaps it still might for you, I don't know what you're running on...)




i was doubting if it will help as those operation were not complex


Then stop doubting - profile. With a real profiler, not a microbenchmark. Main thing is a profiler does a better job isolating actual realistic function runtimes, timing alone is very dependent on context (recently executed instructions and so on) so any minuscule gain you observe is usually illusory (or a cognitive bias) and will likely disappear the next time you refactor some code, or even reboot.




it took me probably about 3 hours of hard work of moving those variables in text editor :C


You spent three hours renaming and moving variables around? blink.png I hope that is just the forum acting up because, no offense, but the indentation is completely incoherent.

#5162218 Why does this matrix multiplication order matter?

Posted by Bacterius on 22 June 2014 - 07:48 PM

You mean if I concatenate the projection, view, and model matrix on the CPU side once before sending it to the shader?


It shouldn't make a difference (in terms of outcome) though it would probably be faster since you'd only be doing one matrix multiplication per shader invocation instead of three. But it might work around the bug.




All my other (4 or so) shaders are working with the first method without the bug while the other 3 must be written without brackets as said.


Are they written exactly the same? How are the matrix variables (projection, model, etc..) defined?


I've noted strange bugs in GLSL compilers before (including ambiguous and sometimes plain contradictory behaviour between two or more compilers) but, well, matrix multiplication is supposed to be associative. So either there's something in how the matrices are defined that is wrong but used to work by accident before, or (more likely) it's a compiler bug...


yes, first formula is more effective since you transform vector 3 times by a matrix, while second formula performs 2 matrix multiplications and than transforms a vector. I stress again that second formula performs reverse order of transformations, and it results in the same thing since matricies are (diagnosticly) transposed.


Matrix multiplication is associative. For any three matrices A, B, C, of any dimension (where multiplication is defined), A * (B * C) = (A * B) * C. There should be no difference in the result. The order of transformations is the same, this is not about commutativity.

#5162073 Best way to remove a substring from a C string?

Posted by Bacterius on 22 June 2014 - 06:33 AM


Also if you implement it manually make sure you validate your bounds, that is:

if ((b &amp;amp;lt; a) || (b &amp;amp;gt; strlen(str))))    /* abort! */
Otherwise you're just setting yourself up for scribbling all over your stack or heap.
Shouldn't the sanity checks be like:
len = strlen(str);
if ( (!len) || (b <= a) || (a >= len) ) {... abort}
if (b > len) b = len;



Depends how you want to use your function. I personally prefer to not allow nonsensical input at all, thus if b is beyond the string, I would reject it. If you prefer to clamp it to the string's length instead like some string functions do, that's fine too, as long as you document that behaviour. If you don't want to allow a == b, that's fine as well. And of course you should reject null char* pointers, forgot that one (though it is obvious). Notice that !((b < a) || (b > len)) implies len > 0 (or a = b = 0) since a and b are unsigned.

#5162058 Best way to remove a substring from a C string?

Posted by Bacterius on 22 June 2014 - 05:43 AM

Also if you implement it manually make sure you validate your bounds, that is:

if ((b < a) || (b > strlen(str))))
    /* abort! */

Otherwise you're just setting yourself up for scribbling all over your stack or heap.

#5162038 Why do we need to iterate data structures as they are already in RAM

Posted by Bacterius on 22 June 2014 - 04:31 AM


Iteration means repeating something, usually refers to executing the body of a loop repeatedly (because it is a loop). It doesn't mean executing an algorithm. And a map data structure doesn't need to be implemented as a tree.

A loop is an algorithm too. Not very complex but it describes how something has to be done.


I dont said that the map is implemented in a tree. But even the traversal of a tree is done in steps that are done the same way in each step, so it is an iteration anyways you use a loop inside or recursive call to the traversal function.

Maps maybe implemented as hashtables. But even there you only get a good guess where to start the search, that afterwards is iterated through a collection of data to find the right one.



But it is not true. There are data structures which can perform constant time random access, notably arrays as mentioned earlier in the thread. To look up an element in the array from its position (index), you take the base address of the array, add to it the index of the element to be accessed multiplied by the element size in bytes, and that gives you the address of the element you are looking for, which you can then read. Directly. If you want to argue that this process consists of multiple steps and is implemented in assembly as a sequence of a handful of instructions and is thus a form of "iteration", you are free to do so, and you will have successfully redefined iteration to something different than what it means to every other programmer on the planet, and nobody will understand you.