Advertisement Jump to content
  • Advertisement

Ohforf sake

  • Content Count

  • Joined

  • Last visited

Community Reputation

2052 Excellent

About Ohforf sake

  • Rank

Recent Profile Visitors

The recent visitors block is disabled and is not being shown to other users.

  1. I'm amazed no one else has pointed this out yet, but does your girlfriend want to drive manual? I prefer manual (my only automatic experience was a rental in america) and I love to drive, but I wouldn't buy a car that my girlfriend refuses to use.   Heh, if only you could explain that to the insurance company. In my car, the ABS has this issue that once it kicks in, it won't let go unless you release the brake. Which means that if you break over a pothole or railway track, you loose most of your breaking force until you gain the mental presence to let go of the break and hit it again. Which is exactly what ABS is supposed to prevent. It's a VW btw...
  2. Hi everyone,   I have a question concerning the "proper" way of giving credit for CC-BY licensed work.   I would like to use a couple of 3D models from in academical research and publish those results (including the files which might require slight modifications). Some of those are licensed under CC-BY which requires giving proper credit to the author. In addition, all sources in publications must by properly cited anyways, so the same issue also holds for eg. CC-Zero.   My problem is, that the files only contain the internet alias of the respective authors. The CC - best practices guide seems to indicate that the nickname in combination with a link to the persons website is sufficient. This feels very wrong to me: The idea of giving credit is to, well, give credit to a person. A nickname usually only gives you anonymity. In addition, it is very uncommon in academia to cite a person's work by refering to the person's internet nickname.   How do you handle this in academical or non academical projects? Do you use the nicknames? Do you email each and every author and ask them for their preferred handle? Do you have a special CC credits section on your project website that you refer to (this is mentioned very briefly in the best practices guide)?   Any ideas or advice is appreciated.
  3. Ohforf sake

    GCC auto vectorizer alignment issues

    Thank you all for the feedback. ISPC looks interesting, but sadly the code is part of an elaborate template mechanism right now, so ISPC isn't really an option there. But it looks like a tool worth keeping in your toolbox. I was hoping that auto vectorization had progressed further after seeing some pretty impressive vectorizations for ARM-NEON. But given how fragile it is, also in your experience, I guess I'll go back to intrinsics.
  4. Hi everyone, I'm having a hard time getting the GCC auto vectorizer to auto vectorize. I believe that the problem has to to with its ability to figure out the stride/alignment of pointers. Consider the following minimal (not) working example: void func(const float *src, float *dst, const float *factors) { const float * __restrict__ alignedSrc = (const float *)__builtin_assume_aligned(src, 32); float * __restrict__ alignedDst = (float *)__builtin_assume_aligned(dst, 32); const float * __restrict__ unaliasedFactors = factors; enum { NUM_OUTER = 4, NUM_INNER = 32 }; for (unsigned k = 0; k < NUM_OUTER; k++) { const float factor = unaliasedFactors[k]; const float * __restrict__ srcChunk = alignedSrc + k * NUM_INNER; float * __restrict__ dstChunk = alignedDst + k * NUM_INNER; for (int j = 0; j < NUM_INNER; j++) dstChunk[j] = srcChunk[j] * factor; } } It is two nested loops, sequentially looping over an array of size 32*4. It gets four factors and multiplies the first 32 elements by the first factor, the next 32 elements by the second and so on. Results are stored sequentially in an output array. Now, I use "__builtin_assume_aligned" and "__restrict__" to tell the compiler that the arrays are 32 byte aligned and not aliased. This should be prime meat for a vectorizer. Sadly, the output looks like this: (compiled with -march=native -ffast-math -std=c++14 -O3 on gcc 4.9.2) 0000000000000000 <_ZN2ml3mlp4funcEPKfPfS2_>: 0: 4c 8d 54 24 08 lea 0x8(%rsp),%r10 5: 48 83 e4 e0 and $0xffffffffffffffe0,%rsp 9: 49 89 f0 mov %rsi,%r8 c: 41 ff 72 f8 pushq -0x8(%r10) 10: 55 push %rbp 11: 48 89 f9 mov %rdi,%rcx 14: 45 31 c9 xor %r9d,%r9d 17: 48 89 e5 mov %rsp,%rbp 1a: 41 56 push %r14 1c: 41 55 push %r13 1e: 41 54 push %r12 20: 41 52 push %r10 22: 53 push %rbx 23: 49 8d 40 20 lea 0x20(%r8),%rax 27: c5 fa 10 02 vmovss (%rdx),%xmm0 2b: 48 39 c1 cmp %rax,%rcx 2e: 73 0d jae 3d <_ZN2ml3mlp4funcEPKfPfS2_+0x3d> 30: 48 8d 41 20 lea 0x20(%rcx),%rax 34: 49 39 c0 cmp %rax,%r8 37: 0f 82 2b 02 00 00 jb 268 <_ZN2ml3mlp4funcEPKfPfS2_+0x268> 3d: 48 89 c8 mov %rcx,%rax 40: 83 e0 1f and $0x1f,%eax 43: 48 c1 e8 02 shr $0x2,%rax 47: 48 f7 d8 neg %rax 4a: 83 e0 07 and $0x7,%eax 4d: 0f 84 ed 01 00 00 je 240 <_ZN2ml3mlp4funcEPKfPfS2_+0x240> 53: c5 fa 59 09 vmulss (%rcx),%xmm0,%xmm1 57: c4 c1 7a 11 08 vmovss %xmm1,(%r8) 5c: 83 f8 01 cmp $0x1,%eax 5f: 0f 84 2b 02 00 00 je 290 <_ZN2ml3mlp4funcEPKfPfS2_+0x290> 65: c5 fa 59 49 04 vmulss 0x4(%rcx),%xmm0,%xmm1 6a: c4 c1 7a 11 48 04 vmovss %xmm1,0x4(%r8) 70: 83 f8 02 cmp $0x2,%eax 73: 0f 84 8f 02 00 00 je 308 <_ZN2ml3mlp4funcEPKfPfS2_+0x308> 79: c5 fa 59 49 08 vmulss 0x8(%rcx),%xmm0,%xmm1 7e: c4 c1 7a 11 48 08 vmovss %xmm1,0x8(%r8) 84: 83 f8 03 cmp $0x3,%eax 87: 0f 84 63 02 00 00 je 2f0 <_ZN2ml3mlp4funcEPKfPfS2_+0x2f0> 8d: c5 fa 59 49 0c vmulss 0xc(%rcx),%xmm0,%xmm1 92: c4 c1 7a 11 48 0c vmovss %xmm1,0xc(%r8) 98: 83 f8 04 cmp $0x4,%eax 9b: 0f 84 37 02 00 00 je 2d8 <_ZN2ml3mlp4funcEPKfPfS2_+0x2d8> a1: c5 fa 59 49 10 vmulss 0x10(%rcx),%xmm0,%xmm1 a6: c4 c1 7a 11 48 10 vmovss %xmm1,0x10(%r8) ac: 83 f8 05 cmp $0x5,%eax af: 0f 84 0b 02 00 00 je 2c0 <_ZN2ml3mlp4funcEPKfPfS2_+0x2c0> b5: c5 fa 59 49 14 vmulss 0x14(%rcx),%xmm0,%xmm1 ba: c4 c1 7a 11 48 14 vmovss %xmm1,0x14(%r8) c0: 83 f8 07 cmp $0x7,%eax c3: 0f 85 df 01 00 00 jne 2a8 <_ZN2ml3mlp4funcEPKfPfS2_+0x2a8> c9: c5 fa 59 49 18 vmulss 0x18(%rcx),%xmm0,%xmm1 ce: 41 bb 19 00 00 00 mov $0x19,%r11d d4: 41 ba 07 00 00 00 mov $0x7,%r10d da: c4 c1 7a 11 48 18 vmovss %xmm1,0x18(%r8) e0: bb 20 00 00 00 mov $0x20,%ebx e5: 41 89 c5 mov %eax,%r13d e8: 41 bc 18 00 00 00 mov $0x18,%r12d ee: 29 c3 sub %eax,%ebx f0: 41 be 03 00 00 00 mov $0x3,%r14d f6: 4b 8d 04 a9 lea (%r9,%r13,4),%rax fa: c4 e2 7d 18 c8 vbroadcastss %xmm0,%ymm1 ff: 4c 8d 2c 07 lea (%rdi,%rax,1),%r13 103: 48 01 f0 add %rsi,%rax 106: c4 c1 74 59 55 00 vmulps 0x0(%r13),%ymm1,%ymm2 10c: c5 fc 11 10 vmovups %ymm2,(%rax) 110: c4 c1 74 59 55 20 vmulps 0x20(%r13),%ymm1,%ymm2 116: c5 fc 11 50 20 vmovups %ymm2,0x20(%rax) 11b: c4 c1 74 59 55 40 vmulps 0x40(%r13),%ymm1,%ymm2 121: c5 fc 11 50 40 vmovups %ymm2,0x40(%rax) 126: 41 83 fe 04 cmp $0x4,%r14d 12a: 75 0b jne 137 <_ZN2ml3mlp4funcEPKfPfS2_+0x137> 12c: c4 c1 74 59 4d 60 vmulps 0x60(%r13),%ymm1,%ymm1 132: c5 fc 11 48 60 vmovups %ymm1,0x60(%rax) 137: 43 8d 04 22 lea (%r10,%r12,1),%eax 13b: 45 89 da mov %r11d,%r10d 13e: 45 29 e2 sub %r12d,%r10d 141: 44 39 e3 cmp %r12d,%ebx 144: 0f 84 c5 00 00 00 je 20f <_ZN2ml3mlp4funcEPKfPfS2_+0x20f> 14a: 4c 63 d8 movslq %eax,%r11 14d: 4f 8d 1c 99 lea (%r9,%r11,4),%r11 151: c4 a1 7a 59 0c 1f vmulss (%rdi,%r11,1),%xmm0,%xmm1 157: c4 a1 7a 11 0c 1e vmovss %xmm1,(%rsi,%r11,1) 15d: 44 8d 58 01 lea 0x1(%rax),%r11d 161: 41 83 fa 01 cmp $0x1,%r10d 165: 0f 84 a4 00 00 00 je 20f <_ZN2ml3mlp4funcEPKfPfS2_+0x20f> 16b: 4d 63 db movslq %r11d,%r11 16e: 4f 8d 1c 99 lea (%r9,%r11,4),%r11 172: c4 a1 7a 59 0c 1f vmulss (%rdi,%r11,1),%xmm0,%xmm1 178: c4 a1 7a 11 0c 1e vmovss %xmm1,(%rsi,%r11,1) 17e: 44 8d 58 02 lea 0x2(%rax),%r11d 182: 41 83 fa 02 cmp $0x2,%r10d 186: 0f 84 83 00 00 00 je 20f <_ZN2ml3mlp4funcEPKfPfS2_+0x20f> 18c: 4d 63 db movslq %r11d,%r11 18f: 4f 8d 1c 99 lea (%r9,%r11,4),%r11 193: c4 a1 7a 59 0c 1f vmulss (%rdi,%r11,1),%xmm0,%xmm1 199: c4 a1 7a 11 0c 1e vmovss %xmm1,(%rsi,%r11,1) 19f: 44 8d 58 03 lea 0x3(%rax),%r11d 1a3: 41 83 fa 03 cmp $0x3,%r10d 1a7: 74 66 je 20f <_ZN2ml3mlp4funcEPKfPfS2_+0x20f> 1a9: 4d 63 db movslq %r11d,%r11 1ac: 4f 8d 1c 99 lea (%r9,%r11,4),%r11 1b0: c4 a1 7a 59 0c 1f vmulss (%rdi,%r11,1),%xmm0,%xmm1 1b6: c4 a1 7a 11 0c 1e vmovss %xmm1,(%rsi,%r11,1) 1bc: 44 8d 58 04 lea 0x4(%rax),%r11d 1c0: 41 83 fa 04 cmp $0x4,%r10d 1c4: 74 49 je 20f <_ZN2ml3mlp4funcEPKfPfS2_+0x20f> 1c6: 4d 63 db movslq %r11d,%r11 1c9: 4f 8d 1c 99 lea (%r9,%r11,4),%r11 1cd: c4 a1 7a 59 0c 1f vmulss (%rdi,%r11,1),%xmm0,%xmm1 1d3: c4 a1 7a 11 0c 1e vmovss %xmm1,(%rsi,%r11,1) 1d9: 44 8d 58 05 lea 0x5(%rax),%r11d 1dd: 41 83 fa 05 cmp $0x5,%r10d 1e1: 74 2c je 20f <_ZN2ml3mlp4funcEPKfPfS2_+0x20f> 1e3: 4d 63 db movslq %r11d,%r11 1e6: 83 c0 06 add $0x6,%eax 1e9: 4f 8d 1c 99 lea (%r9,%r11,4),%r11 1ed: c4 a1 7a 59 0c 1f vmulss (%rdi,%r11,1),%xmm0,%xmm1 1f3: c4 a1 7a 11 0c 1e vmovss %xmm1,(%rsi,%r11,1) 1f9: 41 83 fa 06 cmp $0x6,%r10d 1fd: 74 10 je 20f <_ZN2ml3mlp4funcEPKfPfS2_+0x20f> 1ff: 48 98 cltq 201: 49 8d 04 81 lea (%r9,%rax,4),%rax 205: c5 fa 59 04 07 vmulss (%rdi,%rax,1),%xmm0,%xmm0 20a: c5 fa 11 04 06 vmovss %xmm0,(%rsi,%rax,1) 20f: 49 83 e9 80 sub $0xffffffffffffff80,%r9 213: 48 83 c2 04 add $0x4,%rdx 217: 49 83 e8 80 sub $0xffffffffffffff80,%r8 21b: 48 83 e9 80 sub $0xffffffffffffff80,%rcx 21f: 49 81 f9 00 02 00 00 cmp $0x200,%r9 226: 0f 85 f7 fd ff ff jne 23 <_ZN2ml3mlp4funcEPKfPfS2_+0x23> 22c: c5 f8 77 vzeroupper 22f: 5b pop %rbx 230: 41 5a pop %r10 232: 41 5c pop %r12 234: 41 5d pop %r13 236: 41 5e pop %r14 238: 5d pop %rbp 239: 49 8d 62 f8 lea -0x8(%r10),%rsp 23d: c3 retq 23e: 66 90 xchg %ax,%ax 240: 41 bc 20 00 00 00 mov $0x20,%r12d 246: 41 be 04 00 00 00 mov $0x4,%r14d 24c: bb 20 00 00 00 mov $0x20,%ebx 251: 45 31 ed xor %r13d,%r13d 254: 41 bb 20 00 00 00 mov $0x20,%r11d 25a: 45 31 d2 xor %r10d,%r10d 25d: e9 94 fe ff ff jmpq f6 <_ZN2ml3mlp4funcEPKfPfS2_+0xf6> 262: 66 0f 1f 44 00 00 nopw 0x0(%rax,%rax,1) 268: 31 c0 xor %eax,%eax 26a: 66 0f 1f 44 00 00 nopw 0x0(%rax,%rax,1) 270: c5 fa 59 0c 01 vmulss (%rcx,%rax,1),%xmm0,%xmm1 275: c4 c1 7a 11 0c 00 vmovss %xmm1,(%r8,%rax,1) 27b: 48 83 c0 04 add $0x4,%rax 27f: 48 3d 80 00 00 00 cmp $0x80,%rax 285: 75 e9 jne 270 <_ZN2ml3mlp4funcEPKfPfS2_+0x270> 287: eb 86 jmp 20f <_ZN2ml3mlp4funcEPKfPfS2_+0x20f> 289: 0f 1f 80 00 00 00 00 nopl 0x0(%rax) 290: 41 bb 1f 00 00 00 mov $0x1f,%r11d 296: 41 ba 01 00 00 00 mov $0x1,%r10d 29c: e9 3f fe ff ff jmpq e0 <_ZN2ml3mlp4funcEPKfPfS2_+0xe0> 2a1: 0f 1f 80 00 00 00 00 nopl 0x0(%rax) 2a8: 41 bb 1a 00 00 00 mov $0x1a,%r11d 2ae: 41 ba 06 00 00 00 mov $0x6,%r10d 2b4: e9 27 fe ff ff jmpq e0 <_ZN2ml3mlp4funcEPKfPfS2_+0xe0> 2b9: 0f 1f 80 00 00 00 00 nopl 0x0(%rax) 2c0: 41 bb 1b 00 00 00 mov $0x1b,%r11d 2c6: 41 ba 05 00 00 00 mov $0x5,%r10d 2cc: e9 0f fe ff ff jmpq e0 <_ZN2ml3mlp4funcEPKfPfS2_+0xe0> 2d1: 0f 1f 80 00 00 00 00 nopl 0x0(%rax) 2d8: 41 bb 1c 00 00 00 mov $0x1c,%r11d 2de: 41 ba 04 00 00 00 mov $0x4,%r10d 2e4: e9 f7 fd ff ff jmpq e0 <_ZN2ml3mlp4funcEPKfPfS2_+0xe0> 2e9: 0f 1f 80 00 00 00 00 nopl 0x0(%rax) 2f0: 41 bb 1d 00 00 00 mov $0x1d,%r11d 2f6: 41 ba 03 00 00 00 mov $0x3,%r10d 2fc: e9 df fd ff ff jmpq e0 <_ZN2ml3mlp4funcEPKfPfS2_+0xe0> 301: 0f 1f 80 00 00 00 00 nopl 0x0(%rax) 308: 41 bb 1e 00 00 00 mov $0x1e,%r11d 30e: 41 ba 02 00 00 00 mov $0x2,%r10d 314: e9 c7 fd ff ff jmpq e0 <_ZN2ml3mlp4funcEPKfPfS2_+0xe0> There is some vectorization happening there, but most of the code is scalar and looks like some kind of duffs device. I played around with this and found out that the following "hint" procduces the output that I want:   void func(const float *src, float *dst, const float *factors) { const float * __restrict__ alignedSrc = (const float *)__builtin_assume_aligned(src, 32); float * __restrict__ alignedDst = (float *)__builtin_assume_aligned(dst, 32); const float * __restrict__ unaliasedFactors = factors; enum { NUM_OUTER = 4, NUM_INNER = 32 }; for (unsigned k = 0; k < NUM_OUTER; k++) { const float factor = unaliasedFactors[k]; const float * __restrict__ srcChunk = alignedSrc + k * NUM_INNER; float * __restrict__ dstChunk = alignedDst + k * NUM_INNER; // <HINT> if (NUM_INNER % 8 == 0) { // the gcc tree vectorizer won't recognize this on its own?!? srcChunk = (const float *)__builtin_assume_aligned(srcChunk, 32); dstChunk = (float *)__builtin_assume_aligned(dstChunk, 32); } // </HINT> for (int j = 0; j < NUM_INNER; j++) dstChunk[j] = srcChunk[j] * factor; } } 0000000000000000 <_ZN2ml3mlp4funcEPKfPfS2_>: 0: 48 8d 8f 00 02 00 00 lea 0x200(%rdi),%rcx 7: 48 8d 46 20 lea 0x20(%rsi),%rax b: c5 fa 10 02 vmovss (%rdx),%xmm0 f: 48 39 f8 cmp %rdi,%rax 12: 76 09 jbe 1d <_ZN2ml3mlp4funcEPKfPfS2_+0x1d> 14: 48 8d 47 20 lea 0x20(%rdi),%rax 18: 48 39 f0 cmp %rsi,%rax 1b: 77 43 ja 60 <_ZN2ml3mlp4funcEPKfPfS2_+0x60> 1d: c4 e2 7d 18 c0 vbroadcastss %xmm0,%ymm0 22: c5 fc 59 0f vmulps (%rdi),%ymm0,%ymm1 26: c5 fc 29 0e vmovaps %ymm1,(%rsi) 2a: c5 fc 59 4f 20 vmulps 0x20(%rdi),%ymm0,%ymm1 2f: c5 fc 29 4e 20 vmovaps %ymm1,0x20(%rsi) 34: c5 fc 59 4f 40 vmulps 0x40(%rdi),%ymm0,%ymm1 39: c5 fc 29 4e 40 vmovaps %ymm1,0x40(%rsi) 3e: c5 fc 59 47 60 vmulps 0x60(%rdi),%ymm0,%ymm0 43: c5 fc 29 46 60 vmovaps %ymm0,0x60(%rsi) 48: 48 83 ef 80 sub $0xffffffffffffff80,%rdi 4c: 48 83 c2 04 add $0x4,%rdx 50: 48 83 ee 80 sub $0xffffffffffffff80,%rsi 54: 48 39 cf cmp %rcx,%rdi 57: 75 ae jne 7 <_ZN2ml3mlp4funcEPKfPfS2_+0x7> 59: c5 f8 77 vzeroupper 5c: c3 retq 5d: 0f 1f 00 nopl (%rax) 60: 31 c0 xor %eax,%eax 62: 66 0f 1f 44 00 00 nopw 0x0(%rax,%rax,1) 68: c5 fa 59 0c 07 vmulss (%rdi,%rax,1),%xmm0,%xmm1 6d: c5 fa 11 0c 06 vmovss %xmm1,(%rsi,%rax,1) 72: 48 83 c0 04 add $0x4,%rax 76: 48 3d 80 00 00 00 cmp $0x80,%rax 7c: 75 ea jne 68 <_ZN2ml3mlp4funcEPKfPfS2_+0x68> 7e: eb c8 jmp 48 <_ZN2ml3mlp4funcEPKfPfS2_+0x48> This is more in line with what I wanted and it is actually twice as fast. In my real code, the speed difference is even bigger. Both versions produce correct output. Note that for NUM_INNER % 8 == 0, alignedSrc + k * NUM_INNER is always 32 byte aligned iff alignedSrc is 32 byte aligned. This is the compiler should be able to figure out on its own. Or am I missing here? Do you have any experience with this, or any advice on how to fix it without resorting to lots of hand crafted "hints" throughout the code? Do I really have to provide such alignment hints for every strided access that's happening? Thanks in advance for any help or advice with this.
  5. Two months is not a lot of time, especially since you (presumably?) won't be working full time on it.   Anyways, here are two programming heavy ideas from the top of my head:   Global Illumination is kinda hard, especially given your limited time frame and experience. However, this comes to mind: It is a neat approach that might be feasible in the two months if you push yourself a bit. There seems to be code if you get stuck and you might be able to come up with some creative improvement.   Another idea I always wanted to implement that involves light, although in an unusual way, would be a content creation tool that helps with texturing models. Usually, triangle meshes are unwrapped and then the textures are painted directly. You could create a tool where instead of directly painting the final texture(s), you set up a couple of projectors around the object, like spot lights which project an image (hence the need for light and shadows). The artists would then paint the images of those projectors and the final model textures would be baked by your tool. To some degree this is already supported in the major modelling packages but you could enhance it by allowing projectors to mix and combine colors so that reusable dirt or rust decals can be added on top. You could also allow the projectors to not only affect the color textures, but also the textures for the other material parameters. The downside is that such a tool can be rather GUI heavy. The upside is that you can easily "scale" the project according to your progress. Eg start with a purely non-gui application that reads the projector positions and mesh from a blender export, displays a preview, and performs the bake. Then add features like GUI, different projector types, etc until the two months are over.
  6. Thanks for sharing your code.   However, I do share the sentiment of the others that this is can be a bad idea. While it is true that 99% of all players, artists, and programmers won't be able to tell that the normalmaps are broken, they will be able to tell that it looks bad, or at least not "right". Trust me, I've been there, done that. Also on a commercial project.   The real problem comes later though. Once a significant amount of all materials have broken normalmaps, the spec/gloss maps are adapted to it to somehow counteract the effect. Then the lighting. All of a sudden you can no longer change individual assets to "good" normalmaps because it would break the entire setup. And before you know it, "bad looking" becomes your new art style that every new asset has to adhere to, because otherwise the game would not look coherent.   If you don't have the time or ressources to make actual normalmaps, then not using normalmaps or using funky normalmaps might be the right choice. There are very good looking games out there that aren't photo realistic. But it should be a conscious choice.
  7. For large amounts of data, there are also SIMD intrinsics that can do this: half -> float: _mm_cvtph_ps and _mm256_cvtph_ps float -> half: _mm_cvtps_ph and _mm256_cvtps_ph see Oh, I just noticed you aren't doing this on a PC. But some ARM processors support similar conversion functions. See for example:
  8. I'm guessing here that by "decimal" you mean "as text"? If so, this is your problem: The ostream::operator<< operator always outputs text, even if the stream is set to binary. This is a bit braindead, I know... Try: std::uint8_t F = 10111001; // btw, this is missing a 0b prefix, it should be 0b10111001 std::ofstream K("C:/Users/WDR/Desktop/kml.enc", std::ios::binary); for(int i = 0; i < 256; i++) { K.write(&F, sizeof(F)); }You should get a 256 byte long file where every byte is 0b10111001.
  9. Ohforf sake

    Relation between TFLOPS and Threads in a GPU?

    Peak performance (in FLoating point OPerations per Second = FLOPS) is the theoretical upper limit on how many computations a device can sustain per second. If a Titan X were doing nothing else than computing 1 + 2 * 3 then it could do that 3 072 000 000 000 times per second and since there are two operations in there (an addition and a multiplication) this amounts to 6 144 000 000 000 FLOPS or about 6.144 TFLOPS. But you only get that speed if you never read any data or write back any results or do anything else other than a multiply followed by an addition.   A "thread" (and Krohm rightfully warned of its use as a marketing buzzword) is generally understood to be an execution context. If a device executes a program, this refers to the current state, such as the current position in the program, the current values of the local variables, etc.   Threads and peak performance are two entirely different things!   Some compute devices (some Intel CPUs, some AMD CPUs, SUN niagara CPUs and most GPUs) can store more than one execution context aka "thread" on the chip so that they can interleave the execution of both/all of them. This sometimes falls under the term of "hardware-threads", at least for CPUs. And this is done for performance reasons. But it does not affect the theoretical peak performance of the device, only how much of that you can actually use. And the direct relationship between the maximum number of hardware threads, the used number of hardware threads, and the achieved performance ... is very complicated. It depends on lots of different factors like memory throughput, memory latency, access patterns, the actual algorithm, and so on. So if this is what you are asking about, then you might have to look into how GPUs work and how certain algorithms make use of that.
  10. Ohforf sake

    Playing Video With Alpha

    I'm not very fluent with java, but I would be suprised if there wasn't a less "probabilistic" approach to playing videos ;-)   Anyhow, about the alpha channel: Have you considered using a second greyscale video stream for the alpha channel? You might have to "blur" the transparency borders of the color stream, similarly to how it's done with eg. foliage textures. Bitrate would be slightly inferior to a specialized codec that exploits the coherence between alpha and color channels, but you can use pretty much any codec pair with whatever bitrate you choose. You can even user a different bitrate (or resolution) for the alpha channel.
  11. Ohforf sake

    Github DDos Attack two days ago?

    This is actually quite specific: Though there is something I don't get: As frob pointed out, one result of this is usually that the offending ip ranges are blocked and remain blocked for some time. Which, presumably, is exactly what they want: No access to those two git-hub projects from within china. But if that was their goal, why would they design the attack in a way that all the requests originate from outside of china?
  12. Ohforf sake

    c++ Instantiating a class in another class

    Is it possible that you instantiate "TextField" before you initialize SDL? SDL might reset the unicode behavior uppon initialization.
  13. Ohforf sake

    What do you have on your Desk?

    That's awesome! My first thought was "Nice render!!" Hmmm, needs a different lighting model though, the phong shading looks way too much like plastic.
  14. Ohforf sake

    A* scanning extra nodes?

    Just to clarify, this is what can happen when the heuristic is (extremely) pessimistic: [attachment=26575:nonAdmissible.png]   Usually a small search space, but sometimes non optimal solution.
  • Advertisement

Important Information

By using, you agree to our community Guidelines, Terms of Use, and Privacy Policy. is your game development community. Create an account for your GameDev Portfolio and participate in the largest developer community in the games industry.

Sign me up!