• Create Account

Banner advertising on our site currently available from just \$5!

# LarryKing

Member Since 25 Jan 2011
Offline Last Active Mar 03 2015 03:00 PM

### PinWheel Encryption

25 November 2013 - 09:47 PM

Edit: See the newest version in this post

New phases have been added: Turbulence and Avalanche.

--

Hello everybody, I'm looking for some feedback on an encryption technique/algorithm I've been working on for the past few days: PinWheel (PNWL) encryption.
Now, I've found that explaining how the technique works is a challenge in and of itself, so please bear with me – I've even included lots of pictures.
Before I get to how the algorithm works, here are some statistics:

• Operates on 256 Byte Blocks
• Makes heavy use of XOR
• “Spins” the data to achieve encryption
• Strength of encryption is exponentially proportional to the password length

Essentially PNWL works by splitting up 256 bytes of data into sized blocks. Sort of like this:

Thus, one block of 256 bytes (the main block) contains four blocks of 64 bytes; each of these blocks contains four blocks of 16 bytes, and similarly each of these blocks contain four blocks of 4 bytes.

To encrypt the data each block's content is quartered and then spun clockwise. As the quartered block spins, its content is internally XOR'd.

This hierarchy of spins is repeated for each character in the password. Furthermore, the magnitude of each spin is determined by the respective char.

The only exception to the “Spin” technique are the Block4's, which instead “roll.” The amount of roll is determined by a set of magic numbers:

`MAGIC[4][4] = { { 1, 3, 5, 7}, { 1, 7, 2, 9}, { 2, 3, 5, 7}, { 1, 9, 9, 6} };`

To encrypt:

For each character in the password:

• Roll Block4's Left
• Spin Bloc16's
• Spin Block64's
• Spin the Block256

To decrypt:

Reverse the password, and then for each character:

• Spin the Block256 in reverse
• Spin the Block64's in reverse
• Spin the Block16's in reverse
• Roll Block4's Right

Anyways enough talk; here's the code, which also attached (note: requires SSE3)

```//Copyright (C) 2013 Laurence King
//
//Permission is hereby granted, free of charge, to any person obtaining a
//copy of this software and associated documentation files (the "Software"),
//to deal in the Software without restriction, including without limitation
//the rights to use, copy, modify, merge, publish, distribute, sublicense,
//and/or sell copies of the Software, and to permit persons to whom the
//Software is furnished to do so, subject to the following conditions:
//
//The above copyright notice and this permission notice shall be included
//in all copies or substantial portions of the Software.
//
//THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED,
//INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A
//PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
//HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
//OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
//SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

#pragma once

#include <intrin.h>

#ifdef _MSC_VER
#define ALIGN( n )	__declspec( align( n ) )
#else
#define ALIGN( n ) alignas( n )
#endif

namespace PinWheel
{
typedef				int int32;
typedef unsigned	int uint32;

// PNWL Magic constants
const uint32 MAGIC[4][4] = { { 1, 3, 5, 7}, { 1, 7, 2, 9}, { 2, 3, 5, 7}, { 1, 9, 9, 6} };

ALIGN(16)
struct Block16
{
union
{
uint32		Data[4];
__m128i		vData;
};

void Spin0 (void);
void Spin1 (void);
void Spin2 (void);
void Spin3 (void);

void rSpin0 (void);
void rSpin1 (void);
void rSpin2 (void);
void rSpin3 (void);
};

ALIGN(16)
struct Block64
{
union
{
uint32		_Data[16];
Block16		Blocks[4];
__m128i		vData [4];
};

void Spin0 (void);
void Spin1 (void);
void Spin2 (void);
void Spin3 (void);

void rSpin0 (void);
void rSpin1 (void);
void rSpin2 (void);
void rSpin3 (void);

};

ALIGN(16)
struct Block256
{
union
{
uint32		_Data[64];
__m128i		_vData[16];
Block16		_Block16[16];
Block64		Blocks[4];
};

void Spin0 (void);
void Spin1 (void);
void Spin2 (void);
void Spin3 (void);

void rSpin0 (void);
void rSpin1 (void);
void rSpin2 (void);
void rSpin3 (void);

void Forward(const char *);
void Reverse(const char *);
};

#define ROTATE_LEFT(x, n) (((x) << (n)) | ((x) >> (32-(n))))
#define ROTATE_RIGHT(x, n) (((x) >> (n)) | ((x) << (32-(n))))

void Block16::Spin0(void)
{
Data[3] ^= Data[0];
Data[2] ^= Data[0];
Data[1] ^= Data[0];
Data[0] = ~Data[0];
}
void Block16::Spin1(void)
{
Data[3] ^= Data[0];

vData = _mm_shuffle_epi32(vData, _MM_SHUFFLE(2, 1, 0, 3));
}
void Block16::Spin2(void)
{
Data[3] ^= Data[0];
Data[2] ^= Data[0];
Data[0] = ~Data[0];

vData = _mm_shuffle_epi32(vData, _MM_SHUFFLE(1, 0, 3, 2));
}
void Block16::Spin3(void)
{
Data[3] ^= Data[0];
Data[2] ^= Data[0];
Data[1] ^= Data[0];

vData = _mm_shuffle_epi32(vData, _MM_SHUFFLE(0, 3, 2, 1));
}
void Block16::rSpin0(void)
{
Data[0] = ~Data[0];
Data[1] ^= Data[0];
Data[2] ^= Data[0];
Data[3] ^= Data[0];
}
void Block16::rSpin1(void)
{
vData = _mm_shuffle_epi32(vData, _MM_SHUFFLE(0, 3, 2, 1));

Data[3] ^= Data[0];
}
void Block16::rSpin2(void)
{
vData = _mm_shuffle_epi32(vData, _MM_SHUFFLE(1, 0, 3, 2));

Data[0] = ~Data[0];
Data[2] ^= Data[0];
Data[3] ^= Data[0];
}
void Block16::rSpin3(void)
{
vData = _mm_shuffle_epi32(vData, _MM_SHUFFLE(2, 1, 0, 3));

Data[3] ^= Data[0];
Data[2] ^= Data[0];
Data[1] ^= Data[0];
}

void Block64::Spin0(void)
{
vData[3] = _mm_xor_si128(vData[0], vData[3]);
vData[2] = _mm_xor_si128(vData[0], vData[2]);
vData[1] = _mm_xor_si128(vData[0], vData[1]);
}
void Block64::Spin1(void)
{
__m128i val_a = vData[0];

vData[0] = _mm_xor_si128(vData[3], val_a);
vData[3] = vData[2]; // _mm_xor_si128(vData[2], val_a);
vData[2] = vData[1]; // _mm_xor_si128(vData[1], val_a);
vData[1] = val_a;
}
void Block64::Spin2(void)
{
__m128i val_ab = vData[0];

vData[0] = _mm_xor_si128(vData[2], val_ab);
vData[2] = val_ab;

val_ab = vData[1];

vData[1] = _mm_xor_si128(vData[3], val_ab);
vData[3] = val_ab;
}
void Block64::Spin3(void)
{
__m128i val_a = vData[0];

vData[0] = _mm_xor_si128(vData[1], val_a);
vData[1] = _mm_xor_si128(vData[2], val_a);
vData[2] = _mm_xor_si128(vData[3], val_a);
vData[3] = val_a;
}
void Block64::rSpin0(void)
{
vData[3] = _mm_xor_si128(vData[0], vData[3]);
vData[2] = _mm_xor_si128(vData[0], vData[2]);
vData[1] = _mm_xor_si128(vData[0], vData[1]);
}
void Block64::rSpin1(void)
{
__m128i val_a = vData[1];

vData[1] = vData[2];
vData[2] = vData[3];
vData[3] = _mm_xor_si128(vData[0], val_a);
vData[0] = val_a;
}
void Block64::rSpin2(void)
{
__m128i val_ab = vData[2];

vData[2] = _mm_xor_si128(vData[0], val_ab);
vData[0] = val_ab;

val_ab = vData[3];

vData[3] = _mm_xor_si128(vData[1], val_ab);
vData[1] = val_ab;
}
void Block64::rSpin3(void)
{
__m128i val_a = vData[3];

vData[3] = _mm_xor_si128(vData[2], val_a);
vData[2] = _mm_xor_si128(vData[1], val_a);
vData[1] = _mm_xor_si128(vData[0], val_a);
vData[0] = val_a;
}

void Block256::Spin0(void)
{
_vData[0x4] = _mm_xor_si128(_vData[0x0], _vData[0x4]);
_vData[0x8] = _mm_xor_si128(_vData[0x0], _vData[0x8]);
_vData[0xC] = _mm_xor_si128(_vData[0x0], _vData[0xC]);

_vData[0x5] = _mm_xor_si128(_vData[0x1], _vData[0x5]);
_vData[0x9] = _mm_xor_si128(_vData[0x1], _vData[0x9]);
_vData[0xD] = _mm_xor_si128(_vData[0x1], _vData[0xD]);

_vData[0x6] = _mm_xor_si128(_vData[0x2], _vData[0x6]);
_vData[0xA] = _mm_xor_si128(_vData[0x2], _vData[0xA]);
_vData[0xE] = _mm_xor_si128(_vData[0x2], _vData[0xE]);

_vData[0x7] = _mm_xor_si128(_vData[0x3], _vData[0x7]);
_vData[0xB] = _mm_xor_si128(_vData[0x3], _vData[0xB]);
_vData[0xF] = _mm_xor_si128(_vData[0x3], _vData[0xF]);
}
void Block256::Spin1(void)
{
__m128i val_ = _vData[0];

_vData[0x0] = _mm_xor_si128(val_, _vData[0xC]);
_vData[0xC] = _vData[0x8];
_vData[0x8] = _vData[0x4];
_vData[0x4] = val_;

val_ = _vData[1];

_vData[0x1] = _mm_xor_si128(val_, _vData[0xD]);
_vData[0xD] = _vData[0x9];
_vData[0x9] = _vData[0x5];
_vData[0x5] = val_;

val_ = _vData[2];

_vData[0x2] = _mm_xor_si128(val_, _vData[0xE]);
_vData[0xE] = _vData[0xA];
_vData[0xA] = _vData[0x6];
_vData[0x6] = val_;

val_ = _vData[3];

_vData[0x3] = _mm_xor_si128(val_, _vData[0xF]);
_vData[0xF] = _vData[0xB];
_vData[0xB] = _vData[0x7];
_vData[0x7] = val_;
}
void Block256::Spin2(void)
{
__m128i val_ = _vData[0];

_vData[0x0] = _mm_xor_si128(val_, _vData[0x8]);
_vData[0x8] = val_;

val_ = _vData[1];

_vData[0x1] = _mm_xor_si128(val_, _vData[0x9]);
_vData[0x9] = val_;

val_ = _vData[2];

_vData[0x2] = _mm_xor_si128(val_, _vData[0xA]);
_vData[0xA] = val_;

val_ = _vData[3];

_vData[0x3] = _mm_xor_si128(val_, _vData[0xB]);
_vData[0xB] = val_;

val_ = _vData[4];

_vData[0x4] = _mm_xor_si128(_vData[0x8], _vData[0xC]);
_vData[0xC] = val_;

val_ = _vData[5];

_vData[0x5] = _mm_xor_si128(_vData[0x9], _vData[0xD]);
_vData[0xD] = val_;

val_ = _vData[6];

_vData[0x6] = _mm_xor_si128(_vData[0xA], _vData[0xE]);
_vData[0xE] = val_;

val_ = _vData[7];

_vData[0x7] = _mm_xor_si128(_vData[0xB], _vData[0xF]);
_vData[0xF] = val_;

}
void Block256::Spin3(void)
{
__m128i val_ = _vData[0];

_vData[0x0] = _mm_xor_si128(val_, _vData[0x4]);
_vData[0x4] = _mm_xor_si128(val_, _vData[0x8]);
_vData[0x8] = _mm_xor_si128(val_, _vData[0xC]);
_vData[0xC] = val_;

val_ = _vData[1];

_vData[0x1] = _mm_xor_si128(val_, _vData[0x5]);
_vData[0x5] = _mm_xor_si128(val_, _vData[0x9]);
_vData[0x9] = _mm_xor_si128(val_, _vData[0xD]);
_vData[0xD] = val_;

val_ = _vData[2];

_vData[0x2] = _mm_xor_si128(val_, _vData[0x6]);
_vData[0x6] = _mm_xor_si128(val_, _vData[0xA]);
_vData[0xA] = _mm_xor_si128(val_, _vData[0xE]);
_vData[0xE] = val_;

val_ = _vData[3];

_vData[0x3] = _mm_xor_si128(val_, _vData[0x7]);
_vData[0x7] = _mm_xor_si128(val_, _vData[0xB]);
_vData[0xB] = _mm_xor_si128(val_, _vData[0xF]);
_vData[0xF] = val_;
}
void Block256::rSpin0(void)
{
_vData[0x4] = _mm_xor_si128(_vData[0x0], _vData[0x4]);
_vData[0x8] = _mm_xor_si128(_vData[0x0], _vData[0x8]);
_vData[0xC] = _mm_xor_si128(_vData[0x0], _vData[0xC]);

_vData[0x5] = _mm_xor_si128(_vData[0x1], _vData[0x5]);
_vData[0x9] = _mm_xor_si128(_vData[0x1], _vData[0x9]);
_vData[0xD] = _mm_xor_si128(_vData[0x1], _vData[0xD]);

_vData[0x6] = _mm_xor_si128(_vData[0x2], _vData[0x6]);
_vData[0xA] = _mm_xor_si128(_vData[0x2], _vData[0xA]);
_vData[0xE] = _mm_xor_si128(_vData[0x2], _vData[0xE]);

_vData[0x7] = _mm_xor_si128(_vData[0x3], _vData[0x7]);
_vData[0xB] = _mm_xor_si128(_vData[0x3], _vData[0xB]);
_vData[0xF] = _mm_xor_si128(_vData[0x3], _vData[0xF]);
}
void Block256::rSpin1(void)
{
__m128i val_ = _vData[4];

_vData[0x4] = _vData[0x8];
_vData[0x8] = _vData[0xC];
_vData[0xC] = _mm_xor_si128(val_, _vData[0x0]);
_vData[0x0] = val_;

val_ = _vData[5];

_vData[0x5] = _vData[0x9];
_vData[0x9] = _vData[0xD];
_vData[0xD] = _mm_xor_si128(val_, _vData[0x1]);
_vData[0x1] = val_;

val_ = _vData[6];

_vData[0x6] = _vData[0xA];
_vData[0xA] = _vData[0xE];
_vData[0xE] = _mm_xor_si128(val_, _vData[0x2]);
_vData[0x2] = val_;

val_ = _vData[7];

_vData[0x7] = _vData[0xB];
_vData[0xB] = _vData[0xF];
_vData[0xF] = _mm_xor_si128(val_, _vData[0x3]);
_vData[0x3] = val_;
}
void Block256::rSpin2(void)
{
__m128i val_ = _vData[8];

_vData[0x8] = _mm_xor_si128(val_, _vData[0x0]);
_vData[0x0] = val_;

val_ = _vData[9];

_vData[0x9] = _mm_xor_si128(val_, _vData[0x1]);
_vData[0x1] = val_;

val_ = _vData[0xA];

_vData[0xA] = _mm_xor_si128(val_, _vData[0x2]);
_vData[0x2] = val_;

val_ = _vData[0xB];

_vData[0xB] = _mm_xor_si128(val_, _vData[0x3]);
_vData[0x3] = val_;

val_ = _vData[0xC];

_vData[0xC] = _mm_xor_si128(_vData[0x0], _vData[0x4]);
_vData[0x4] = val_;

val_ = _vData[0xD];

_vData[0xD] = _mm_xor_si128(_vData[0x1], _vData[0x5]);
_vData[0x5] = val_;

val_ = _vData[0xE];

_vData[0xE] = _mm_xor_si128(_vData[0x2], _vData[0x6]);
_vData[0x6] = val_;

val_ = _vData[0xF];

_vData[0xF] = _mm_xor_si128(_vData[0x3], _vData[0x7]);
_vData[0x7] = val_;

}
void Block256::rSpin3(void)
{
__m128i val_ = _vData[0xC];

_vData[0xC] = _mm_xor_si128(val_, _vData[0x8]);
_vData[0x8] = _mm_xor_si128(val_, _vData[0x4]);
_vData[0x4] = _mm_xor_si128(val_, _vData[0x0]);
_vData[0x0] = val_;

val_ = _vData[0xD];

_vData[0xD] = _mm_xor_si128(val_, _vData[0x9]);
_vData[0x9] = _mm_xor_si128(val_, _vData[0x5]);
_vData[0x5] = _mm_xor_si128(val_, _vData[0x1]);
_vData[0x1] = val_;

val_ = _vData[0xE];

_vData[0xE] = _mm_xor_si128(val_, _vData[0xA]);
_vData[0xA] = _mm_xor_si128(val_, _vData[0x6]);
_vData[0x6] = _mm_xor_si128(val_, _vData[0x2]);
_vData[0x2] = val_;

val_ = _vData[0xF];

_vData[0xF] = _mm_xor_si128(val_, _vData[0xB]);
_vData[0xB] = _mm_xor_si128(val_, _vData[0x7]);
_vData[0x7] = _mm_xor_si128(val_, _vData[0x3]);
_vData[0x3] = val_;
}

void Block256::Forward(const char * key)
{
for(char c = *(key++); c != 0; c = *(key++))
{
uint32 amnt0 =	c & PNWL_MASK1;
uint32 amnt1 = (c & PNWL_MASK2 ) >> 2;
uint32 amnt2 = (c & PNWL_MASK3 ) >> 4;
uint32 amnt3 = (c & PNWL_MASK4 ) >> 6;

#pragma region BLOCK4

for(int i = 0; i < 64; i+=4)
{
_Data[i] =		ROTATE_LEFT( _Data[i] ,		MAGIC[amnt3][0] );
_Data[i + 1] =	ROTATE_LEFT( _Data[i + 1] ,	MAGIC[amnt3][1] );
_Data[i + 2] =	ROTATE_LEFT( _Data[i + 2] ,	MAGIC[amnt3][2] );
_Data[i + 3] =	ROTATE_LEFT( _Data[i + 3] ,	MAGIC[amnt3][3] );
}
#pragma endregion

#pragma region BLOCK16
switch (amnt0)
{
case 0:
for(int i = 0; i < 16; i++)
_Block16[i].Spin0();
break;
case 1:
for(int i = 0; i < 16; i++)
_Block16[i].Spin1();
break;
case 2:
for(int i = 0; i < 16; i++)
_Block16[i].Spin2();
break;
case 3:
for(int i = 0; i < 16; i++)
_Block16[i].Spin3();
break;
}
#pragma endregion

#pragma region BLOCK64
switch (amnt1)
{
case 0:
Blocks[0].Spin0();
Blocks[1].Spin0();
Blocks[2].Spin0();
Blocks[3].Spin0();
break;
case 1:
Blocks[0].Spin1();
Blocks[1].Spin1();
Blocks[2].Spin1();
Blocks[3].Spin1();
break;
case 2:
Blocks[0].Spin2();
Blocks[1].Spin2();
Blocks[2].Spin2();
Blocks[3].Spin2();
break;
case 3:
Blocks[0].Spin3();
Blocks[1].Spin3();
Blocks[2].Spin3();
Blocks[3].Spin3();
break;
}
#pragma endregion

#pragma region BLOCK256
switch (amnt2)
{
case 0:
Spin0();
break;
case 1:
Spin1();
break;
case 2:
Spin2();
break;
case 3:
Spin3();
break;
}
#pragma endregion

}

}

// Expects the key to already have been reversed
void Block256::Reverse(const char * rKey)
{
for(char c = *(rKey++); c != 0; c = *(rKey++))
{
uint32 amnt0 =	c & PNWL_MASK1;
uint32 amnt1 = (c & PNWL_MASK2 ) >> 2;
uint32 amnt2 = (c & PNWL_MASK3 ) >> 4;
uint32 amnt3 = (c & PNWL_MASK4 ) >> 6;

#pragma region BLOCK256
switch (amnt2)
{
case 0:
rSpin0();
break;
case 1:
rSpin1();
break;
case 2:
rSpin2();
break;
case 3:
rSpin3();
break;
}
#pragma endregion

#pragma region BLOCK64
switch (amnt1)
{
case 0:
Blocks[0].rSpin0();
Blocks[1].rSpin0();
Blocks[2].rSpin0();
Blocks[3].rSpin0();
break;
case 1:
Blocks[0].rSpin1();
Blocks[1].rSpin1();
Blocks[2].rSpin1();
Blocks[3].rSpin1();
break;
case 2:
Blocks[0].rSpin2();
Blocks[1].rSpin2();
Blocks[2].rSpin2();
Blocks[3].rSpin2();
break;
case 3:
Blocks[0].rSpin3();
Blocks[1].rSpin3();
Blocks[2].rSpin3();
Blocks[3].rSpin3();
break;
}
#pragma endregion

#pragma region BLOCK16
switch (amnt0)
{
case 0:
for(int i = 0; i < 16; i++)
_Block16[i].rSpin0();
break;
case 1:
for(int i = 0; i < 16; i++)
_Block16[i].rSpin1();
break;
case 2:
for(int i = 0; i < 16; i++)
_Block16[i].rSpin2();
break;
case 3:
for(int i = 0; i < 16; i++)
_Block16[i].rSpin3();
break;
}
#pragma endregion

#pragma region BLOCK4
for(int i = 0; i < 64; i+=4)
{
_Data[i] =		ROTATE_RIGHT( _Data[i] ,		MAGIC[amnt3][0] );
_Data[i + 1] =	ROTATE_RIGHT( _Data[i + 1] ,	MAGIC[amnt3][1] );
_Data[i + 2] =	ROTATE_RIGHT( _Data[i + 2] ,	MAGIC[amnt3][2] );
_Data[i + 3] =	ROTATE_RIGHT( _Data[i + 3] ,	MAGIC[amnt3][3] );
}
#pragma endregion

}
}
}
```

And here is how you would encrypt some data:

```PinWheel::Block256 * blocks = reinterpret_cast<PinWheel::Block256 *>(memblock);

for(int i = 0; i < blockcount; i++)
{
}
```

Now for some visual examples of PNWL encryption in action:
(For illustration purposes, these were created by encrypting the image portion of either 24bpp bitmaps or grayscale bitmaps)

Mona:

Spoiler

Simple Triangles:

Spoiler

Flower ( grayscale bitmap)

Spoiler

Where I can see improvement: PNWL was designed to make use of SIMD commands, however it can be done without them.

I don't have a processor that supports AVX2, but I predict a 30% boost if it was used, for example, on the Roll portion. Furthermore, multithreading could yield excellent returns

Attached is the source code for PNWL and a quick console app to test it out.

Thank you

03 January 2013 - 06:37 PM

Hello all, I'm looking for some feedback regrading a model format and loader I've been working on over the past few days. I'm transitioning to C++, coming from C# and this has been a great exercise thus far, but I feel this post might end up a bit lengthy.

Some background as to what I wanted to accomplish with the file format:

• The model can contain a large number of meshes.
• Each mesh can have an arbitrary number of vertex buffers; not that a mesh should have 8, 16, or even 65,535 vertex streams, just that it could.
• Each mesh can have an arbitrary number of textures; again, see above.
• Each mesh and the model itself and should have some sort of bounding volume.
• Fast loading; the model should use local pointers and require minimal live processing.
• Possibility for compression

After some research, it appears that loading directly into memory and then adjusting local pointers seems to be one of the fastest ways to load an object. So the entire format reflects the objects that make up my model.

A model contains: a pointer to some ModelTextures, a pointer to the MeshHeaders, a pointer to the MeshCullDatas and a pointer to the MeshDrawDatas.

I've tried to implement some Data Oriented Design – a very different concept coming from C#. I've split up the meshes into arrays of data needed for different operations: Culling and Drawing.

Furthermore, I'm attempting to implement this as part of my content manager, so a ModelTexture, is really just a wrapper around a shared_ptr<Texture2D> that is retrieved from another content cache.

All right, so here is what the model format looks like, I made a diagram!
*sorry it's so tall...

The actual files are exported from a tool I've written in C#. I'm loading Collada files via Assimp, calculating any user requested data and displaying the model via SharpDX in a WinForms app.

In the end, the model gets exported to the file by the exporter first writing each mesh data object to “virtual memory,” adjusting all of the pointers and finally using a binary writer to spit out the finished file.

Pretty straight forward for me as I'm used to C#, but the scary stuff happens when we get to actually loading the file in C++.

First I load the entire file with an ifstream into a char[]. Then I cast the char[] to a Model. Now I need to offset the local pointers so that the model will work in memory; however! I read somewhere that you can't add pointers in C++, only subtract them, but to offset local pointers you needed to add!
After much internet searching, I finally found an object ptrdiff_t that I could retrieve from a pointer, add to, and then cast back to a pointer. The question then became, “Is this legal, what I'm doing?” For a full day I pondered before quizzically deciding that it should? be legal. I mean how else would you offset pointers when you shouldn't just cast to an int?
The next problem arrived when I realized that I needed to somehow delete the model from memory as well. Again not sure, as I had casted a char[] to a model, if I could delete the model. I pretended I could and wrote the destructor. Miraculously it seemed to work! The “Memory” window in Visual Studio seemed to show that the object had successfully been deleted, although I'm still not sure if I need to call delete on the model's pointers as they weren't created with new.

So now, I have all this code for loading a model, but I'm not sure if it's legal, safe, or even sensible!

```std::shared_ptr<Ruined::Graphics::Model> ModelLoader::Load(const std::string &name)
{
Ruined::Graphics::Model * model;

std::ifstream file (m_BaseDirectory + name, std::ios::in|std::ios::binary|std::ios::ate);
if (file.is_open())
{
// Get the file's total size
unsigned int size = file.tellg();
// Create a char[] of the size to load the file into
char* memblock = new char [size];
// Seek to the beginning and read the file
file.seekg (0, std::ios::beg);
// Finally close the file
file.close();

// Cast the char[] to a Ruined::Graphics::Model pointer
model = static_cast<Ruined::Graphics::Model *>((void*)memblock);

// The location of the model in memory
ptrdiff_t memOffset = (ptrdiff_t)model;

// Offset the model's local pointers
ptrdiff_t intOffset;

// Mesh Culling Datas
// intOffset = (ptrdiff_t)model->MeshCullDatas;
model->MeshCullDatas = (Ruined::Graphics::MeshCullData*)(memOffset + (ptrdiff_t)model->MeshCullDatas);

// Mesh Drawing Datas
// intOffset = (ptrdiff_t)model->MeshDrawDatas;
model->MeshDrawDatas = (Ruined::Graphics::MeshDrawData*)(memOffset + (ptrdiff_t)model->MeshDrawDatas);

// Model's Ruined::Graphics::ModelTexture pointer
// intOffset = (ptrdiff_t)model->Textures;
model->Textures = (Ruined::Graphics::ModelTexture*)(memOffset + (ptrdiff_t)model->Textures);

for(int t = 0; t < model->TextureCount; t++)
{
// Offset TextureName pointers
// intOffset = (ptrdiff_t)(model->Textures[t].TextureName);
model->Textures[t].TextureName = (char*)(memOffset + (ptrdiff_t)(model->Textures[t].TextureName));
}

HRESULT hresult;
Ruined::Graphics::MeshDrawData * tempMeshD = nullptr;
for(int m = 0; m < model->MeshCount; m++)
{

// Build the buffers
tempMeshD = &model->MeshDrawDatas[m];

// Offset Index Buffer
// intOffset = (ptrdiff_t)tempMeshD->IndexBuffer;
tempMeshD->IndexBuffer = (ID3D11Buffer*)(memOffset + (ptrdiff_t)tempMeshD->IndexBuffer);

// Offset Vertex Buffer
// intOffset = (ptrdiff_t)tempMeshD->VertexBuffer;
tempMeshD->VertexBuffers = (ID3D11Buffer**)(memOffset + (ptrdiff_t)tempMeshD->VertexBuffers);

// Offset Strides
// intOffset = (ptrdiff_t)tempMeshD->Strides;
tempMeshD->Strides = (unsigned int*)(memOffset + (ptrdiff_t)tempMeshD->Strides);

// Offset Resources
intOffset = (ptrdiff_t)tempMeshD->Resources;

// Convert Resources * to unsigned int *
unsigned int * index = (unsigned int*)(memOffset + intOffset);

// Assign to the poingters from the model's textures
for(int t = 0; t < model->MeshHeaders[m].ResourceCount; t++)

// Desc for the index buffer
D3D11_BUFFER_DESC indexBufferDesc;
indexBufferDesc.Usage = D3D11_USAGE_DEFAULT;
indexBufferDesc.ByteWidth = tempMeshD->IndexCount * (tempMeshD->IndexFormat == DXGI_FORMAT_R16_UINT ? sizeof(unsigned short) : sizeof(unsigned int));
indexBufferDesc.BindFlags = D3D11_BIND_INDEX_BUFFER;
indexBufferDesc.CPUAccessFlags = 0;
indexBufferDesc.MiscFlags = 0;

D3D11_SUBRESOURCE_DATA indexData;
indexData.pSysMem = tempMeshD->IndexBuffer;
indexData.SysMemPitch = 0;
indexData.SysMemSlicePitch = 0;

hresult = p_Graphics->GetDevice()->CreateBuffer(&indexBufferDesc, &indexData, &tempMeshD->IndexBuffer);
if(FAILED(hresult))
{
OutputDebugStringA("Failed to create Index Buffer");
}

// Create each vertex buffer
Ruined::Graphics::MeshBufferDesc * tempDesc = (Ruined::Graphics::MeshBufferDesc*)(tempMeshD->VertexBuffers);
for(unsigned int b = 0; b < tempMeshD->VertexBufferCount; b++)
{

// Each buffer gets a desc
D3D11_BUFFER_DESC bufferDesc;
bufferDesc.Usage = D3D11_USAGE_DEFAULT;
bufferDesc.ByteWidth = tempDesc[b].BufferWidth;
bufferDesc.BindFlags = D3D11_BIND_VERTEX_BUFFER;
bufferDesc.CPUAccessFlags = 0;
bufferDesc.MiscFlags = 0;

// Each buffer needs a subresource data
D3D11_SUBRESOURCE_DATA subData;
subData.pSysMem = (void*)((ptrdiff_t)tempDesc[b].Data + memOffset);
subData.SysMemPitch = 0;
subData.SysMemSlicePitch = 0;

hresult = p_Graphics->GetDevice()->CreateBuffer(&bufferDesc, &subData, &(tempMeshD->VertexBuffers[b]));
if(FAILED(hresult))
{
OutputDebugStringA("Failed to create Vertex Buffer");
}
}

}

}
else
{
std::string errorMsg = "Failed to load Model: ";
errorMsg += m_BaseDirectory + name + "\n";
OutputDebugStringA(errorMsg.c_str());

// Set model equal to something
model = new Ruined::Graphics::Model();
}

std::shared_ptr<Ruined::Graphics::Model> sModel(model);

return sModel;
}```

So that makes a little more sense, here is Model.h:

```#include "MeshDrawData.h"
#include "ModelTexture.h"

#include <memory>

namespace Ruined
{
namespace Graphics
{
// Combine Model Header and Model Pointers
struct __declspec(dllexport) Model
{

public:
unsigned short _FILETYPE;
unsigned short _FILEVERSION;
unsigned int ModelSize;

unsigned short TextureCount;
unsigned short MeshCount;
DirectX::BoundingBox BoundingBox;

// Pointers
ModelTexture * Textures;
MeshCullData * MeshCullDatas;
MeshDrawData * MeshDrawDatas;

public:
Model(void);

~Model(void);
};
}
}

#endif```

Here is ModelTexture.h:

```#pragma once
#ifndef _MODELTEXTURE_H
#define _MODELTEXTURE_H_

// Includes //
#include "Texture2D.h"

#include <memory>

namespace Ruined
{
namespace Graphics
{
struct __declspec(dllexport) ModelTexture
{
public:
char* TextureName;
std::shared_ptr<Texture2D> TextureContent;
};
}
}

#endif```

Here are the mesh objects:

```#ifndef _MESHCULLDATA_H_
#define _MESHCULLDATA_H_

#include <DirectXCollision.h>

namespace Ruined
{
namespace Graphics
{
struct __declspec(dllexport) MeshCullData
{
public:
DirectX::BoundingBox BoundingBox;
};
}
}

#endif```
```#ifndef _MESHHEADER_H_

#include "MeshBufferDesc.h"

namespace Ruined
{
namespace Graphics
{
{
Undefined       = 0x0000,
Texture         = 0x0001,
UVCoord         = 0x0002,
Color           = 0x0004,
Normal          = 0x0008,
Tangent         = 0x0010,
Binormal        = 0x0020,
BoneIndices     = 0x0040,
BoneWeights     = 0x0080,
SplitBuffers    = 0x0100,
AlphaBlend      = 0x4000
};

{
unsigned char UVStreamCount;
unsigned char ColorStreamCount;
unsigned short ResourceCount;
};
}
}

#endif```
```#ifndef _MESHDRAWDATA_H_
#define _MESHDRAWDATA_H_

#include <d3d11.h>

namespace Ruined
{
namespace Graphics
{
struct __declspec(dllexport) MeshDrawData
{
public:
unsigned int VertexBufferCount;
ID3D11Buffer ** VertexBuffers;
unsigned int * Strides;
ID3D11Buffer * IndexBuffer;
DXGI_FORMAT IndexFormat;
unsigned int IndexCount;
};
}
}

#endif```
```#ifndef _MESHBUFFERDESC_H_
#define _MESHBUFFERDESC_H_

namespace Ruined
{
namespace Graphics
{
// Used for creating vertex buffers.
// Only accessed at load time.
struct __declspec(dllexport) MeshBufferDesc
{
public:
unsigned int BufferWidth;
void * Data;
};
}
}

#endif```

Lastly here is the Model destructor:

```Model::~Model(void)
{
if(Textures != nullptr)
{
for(int t = 0; t < TextureCount; t++)
{
Textures[t].TextureContent.reset();
Textures[t].TextureName = nullptr;
}
}

if(MeshDrawDatas != nullptr)
{
for(int m = 0; m < MeshCount; m++)
{
if(MeshDrawDatas[m].IndexBuffer != nullptr)
{
MeshDrawDatas[m].IndexBuffer->Release();
MeshDrawDatas[m].IndexBuffer = nullptr;
}

for(int v = 0; v < MeshDrawDatas[m].VertexBufferCount; v++)
{
if(MeshDrawDatas[m].VertexBuffers[v] != nullptr)
{
MeshDrawDatas[m].VertexBuffers[v]->Release();
MeshDrawDatas[m].VertexBuffers[v] = nullptr;
}
}
}
}
}
```

Holly cow! That's one long post.
If anyone could take the time to read this, even just part of it, and lend me a hand, I would be very thankful.

### Point-light normal issue

18 June 2012 - 02:31 PM

So in my attempt to shrink my G-Buffer while maintaining the most detail, I thought I had made a breakthrough.
Previously I had been using 3 render-targets for my light pre-pass implementation and 1 more render-target for shadows.

My G-Buffer / Render-targets = ~7.5 MB
• Depth-map: R32 1024 x 576
• Light-map: ARGB32 1024 x 576
• Output : ARGB32 1024 x 576
• Shadows: R16 512 x 512
What I was doing looked something like this:
• Draw scene depth to the depth-map
• Draw the shadow buffer to its render target
• Draw/Calculate AO and shadows using ONLY the depth-map, to the Light-map
• Blur the light-map using the output render-target as a temporary buffer
• Draw scene lighting using depth
• Draw the scene to the Output buffer using the light-map
• Ping-pong between the light-map and output for post-process effects

Now this looked FINE, but I wasn't satisfied. I had completely ignored normals and as a result was stuck with only simple point-lights.
So I got to thinking about how I could squeeze in a normal-buffer. The most obvious idea of just making a new render-target would leave me squeezed at ~9.5 MB, which is NOT OK!
So after much pondering I finally came up with a way to do it in the render-targets I already had!
I can use the SAME G-BUFFER!

What I'm doing now, and what I'm stuck on:
• Draw scene depth to the depth-map and scene Normals to the output render-target
• Draw the shadow buffer to its render target
• Draw/Calculate AO and shadows using ONLY the depth-map, to the Light-map
• Blur the light-map using the Shadow render-target as a temporary buffer (Both AO and Shadows are only a representation of lightness / darkness and can easily be represented as just 1 byte / pixel)
• Draw scene lighting using depth and Normals!
• Draw the scene to the Output buffer using the light-map
• Ping-pong between the light-map and output for post-process effects

Now theoretically this should work, and it does... sort of.
I can go all the way through the render-cycle, and when I go to draw lights at step 5, I have both a depth and a normal buffer!
Unfortunately, when I try to draw lights taking into account the normal-buffer, the lights are calculated incorrectly.

Now after much pondering, searching and tweaking, I have still yet to get over this little obstacle.
I previously had only been using point-lights, just calculating the attenuation with a depth-calculated world position, and that worked fine,
and if I remember correctly, to take normals into account for an omni-directional light I just have to multiply my current output by the dot product of the lightVector and the surface normal, right?

So here's the pixel shader for the point light:
[source lang="cpp"]float4 PixelShaderFunction(VertexShaderOutput input) : COLOR0{ float2 screenSpace = GetScreenCoord(input.ScreenSpace); float lightDepth = input.Depth.x / input.Depth.y; float sceneDepth = tex2D(Depth, screenSpace).r; clip(sceneDepth > lightDepth || sceneDepth == 0 ? -1:1); float4 position; position.x = screenSpace.x * 2 - 1; position.y = (1 - screenSpace.y) * 2 - 1; position.z = sceneDepth; position.w = 1.0f; position = mul(position, iViewProjection); position.xyz /= position.w; // Surface World Position is calculated correctly // Light position is in world space float Distance = distance(LightPosition, position); clip( Distance > LightRadius ? -1:1); float3 normal = (2.0f * tex2D(Normal, screenSpace).xyz) - 1.0f; float3 lightVector = normalize(LightPosition - position); float lighting = saturate(dot(normal, lightVector)); return lighting * (1.1f - Distance / LightRadius) * LightColor;}[/source]

Here is the little bit of shader code that generates the Depth and Normal buffers:
[source lang="cpp"]PreVertexShaderOutput PreVertexShaderFunction(PreVertexShaderInput input){ PreVertexShaderOutput output; float4 worldPosition = mul(input.Position, World); float4 viewPosition = mul(worldPosition, View); output.Position = mul(viewPosition, Projection); output.Depth.xy = output.Position.zw; output.Normal = mul(input.Normal, World); return output;}PrePixelOutput PrePixelShaderFunction(PreVertexShaderOutput input){ PrePixelOutput output; output.Normal.xyz = (normalize(input.Normal).xyz * 0.5f) + 0.5f; output.Normal.a = 1; output.Depth = input.Depth.x / input.Depth.y; return output;}[/source]
Now, a couple of pictures would be nice to illustrate what I'm seeing.

- I overrode the final scene drawing to get the normal buffer to show up for this pic, so I am getting some sort of normal value,

- And here's what it looks like with the incorrect lighting, the lights are all squeezed and distorted.

- Here's what it should/used to look like in the light-buffer

So even after typing ALL this up, I still haven't been able to figure this out...
Is it the normals? is it the light? I don't know...

Any help would be greatly appreciated

### Navigation Mesh - "Which Polygon?"

15 May 2012 - 07:50 PM

Okay, so I have a working implementation of Navigation-Mesh path finding in my game - using A*
It works great!

I've gotten to the stage where I want to start making some AI actually use the navigation mesh to find the player - always I good point to be at.

Now this is where the problem arises:

I have to somehow determine which polygon in the mesh contains the player and each-and-every AI entity.

( So I can find the path from the entity to the player )

Regarding my "Meshes"

- All of the navigation-mesh's polygons are convex, containing up to 15 vertices

- A "polygon" is nothing more that a collection of the vertices, and in turn triangles, that make it up, and "pointers" to its neighbors.

I don't know if I should do some sort of ray-cast, compare distances, or some other approach.

Ray-Cast:

- Can be accurately done (my polygons follow a winding order)

- Could be very expensive

- Has to be done for each and every triangle and entity

Distance:

- Cheaper

- Not nearly as accurate

- How would I even get this to return tangible results?

So, I'm stuck as to which approach to choose, a hybrid solution, or a completely different method...

-or simply put, "How is this usually done, efficiently?"

Any ideas or suggestions would be greatly appreciated

You can see more and get a better idea - PICTURES - of what I'm doing on my blog (the last few posts have been about navigation meshes)

-Thanks Again

P.S. I'm using XNA, so I have a slight performance overhead...

### Light pre-pass problem

14 January 2012 - 10:36 AM

Okay, so I've set up a light pre-pass render system once before, except my new one is slightly different.
I'm trying to implement it without using a normal buffer. In theory this would work well for the toon-shaded game I'm making, except I've run into a problem.

When I render a pointlight (as a sphere) in the lighting pass, I'm sampling from the depth-map and comparing the sphere's depth.

Clipping any pixels from drawing where the scene is closer to the camera then the light. This sadly only works for one side of the point-light

BTW: I'm using XNA, but that's irrelevant to this problem...

```VertexShaderOutput VertexShaderFunction(VertexShaderInput input)
{

float4 worldPosition = mul(input.Position, World);
float4 viewPosition = mul(worldPosition, View);
output.Position = mul(viewPosition, Projection);

output.Depth = 1 - (viewPosition.z / viewPosition.w);

return output;
}

{
screenSpace /= float2(1280, 720);

float depth = tex2D(Depth, screenSpace);

clip(depth < input.Depth ? -1:1);

return LightColor;
}
```

When I render with this code, the clip function works as expected, but I don't know how to clip the sphere from lighting parts of the scene that are behind it, beyond it's reach.

Here's a pic showing that the depth behind the sphere isn't being taken into account:
The farthest cube shouldn't be receiving any light, and the plane shouldn't be shaded that mush in the rear.

Here's another picture showing the problem a bit more visibly. Top is the final image, bottom is the light buffer:

I think I'm just missing the last step for determining the influence the light has, but I just can't figure it out!
Any help would be greatly appreciated.

PARTNERS