Jump to content

  • Log In with Google      Sign In   
  • Create Account

lomateron

Member Since 03 Feb 2012
Offline Last Active Apr 29 2016 11:18 AM

#5236651 trying to make GPU physics deterministic

Posted by lomateron on 24 June 2015 - 07:41 PM

I am doing collision physics of spheres on GPU

I am doing the multiplayer by only sending user input so determinism is very important

One PC has new nvidia and other has old ATI

when I use floats in physics code I just have to wait less than 1 sec and I would see all balls in completely different position and as I said IEEE stricness(/Gis) doesn't works

now that I use uint-int, it works, waited 5 minutes in a very chaotic place, 1000 balls with explosions and collisions and all balls positions stay exactly same between 2 PCs




#5236451 how to get length of a int vector without overflow on 32 bit int

Posted by lomateron on 23 June 2015 - 04:20 PM

Every solution I used was too slow or fast with bad approximation

This is in hlsl 4, the biggest problem was dividing 62 bit uint made of 2x32bit by a 32 bit uint

But then I though... I only need 64 bit uint when the length of the vector is bigger than 2^16

So I just have to decrease the size of the vector when any of (x,y,z) is bigger than 2^16

And that is my solution




#5234531 trying to make GPU physics deterministic

Posted by lomateron on 12 June 2015 - 05:32 PM

GCN clearly doesn't follow modulo-2 overflow rules for 32 bit int multiplication

 

can you give an example plz, don't understand, are there other int operations that aren't deterministic?

 

I mean determinism by different gpu, AMD, NVIDIA, INTEL making them all produce same output every step in my updates using the same .exe and .fxo




#5234452 trying to make GPU physics deterministic

Posted by lomateron on 12 June 2015 - 09:25 AM

I have to tryyyyyy

 

stuck with directx 10

 

1 update every 1/250 seconds

every 4 updates the input changes

 

IEEE stricness(/Gis) doesn't works

 

so I thought about using ints textures

every integer 0 to 2^24 can be represented in float(meaning I could still use float textures but operating with them as ints)

 

the resolution of the physics space will be 2^24 places, velocity and force vectors will have this resolution too

 

but then in HLSL sqrt(), it only works with floats, there are some functions that I could use like http://stackoverflow.com/questions/4930307/fastest-way-to-get-the-integer-part-of-sqrtn

 

but in the end will this work will it be too slow what do you think? any recommendations?




#5221575 triangle always pointing at the camera

Posted by lomateron on 06 April 2015 - 01:56 AM

Many people do it the wrong way, even the ones at nvidia doing FleX, when you get very close to a ball, it will kind of "let you pass".

 

solution:

 

if you are using D3DXMatrixLookAtLH in the HLSL code you just need the View matrix data, before applying the world,view,projection matrices do this...

mMatrx[2] = float3(View._13, View._23, View._33);
mMatrx[0] = normalize(cross(float3(0.0f,1.0f,0.0f), mMatrx[2]));
mMatrx[1] = cross(mMatrx[2], mMatrx[0]);

input.Pos.xyz = mul(input.Pos.xyz, mMatrx);



#5189632 synthetic instruments tutorials?

Posted by lomateron on 28 October 2014 - 02:20 AM

Anyone knows good tutorials on how to create synthetic instruments from scratch like the ones you found in FL studio?

 

I have tried by myself but the more complex sounds I have made are just by adding and multiplying sine waves of different frequencies and volume between each other and changing this variables over time too, I get mehh sounds.




#5180774 how to represent this number

Posted by lomateron on 16 September 2014 - 12:19 PM

so is the problem clear now?


#5180176 1D and 3D textures useless?

Posted by lomateron on 13 September 2014 - 10:29 PM

UPDATE:

just reporting back, the change was successful and now the game runs a little bit slower when balls get random, but I think the massive increase change in texture size (from 225*225*225 to 400*400*200)is related to this too 




#5178952 deferred shading and clearing the swap chain render target

Posted by lomateron on 08 September 2014 - 03:59 PM

I did it!

without using the stencil test

the light spheres faces have to be flipped so only the inner faces get rendered.

"z" gets divided by "w" before the depth test so in the vertex shader I output z=w*z;

there is still the depthclip test "0 <= z <= w", so I have to disable it by creating a rasterizerState with DepthClipEnable: FALSE




#5157778 algorithm that predicts how a song continues

Posted by lomateron on 03 June 2014 - 05:02 AM

I come here again asking for someone with a good CPU to run this code I have created

You give it a .wav file with with 44100 samples per second, 2 channels=stereo, 16 bits per sample

Use songs that are around 30 seconds long

 

This line: std::cout << h << ", " << dataz << "\n";

Is the one that tells the progress, the program finishes when "h" reaches "dataz"

But if you want the code to run faster, delete that line

 

The algorithm uses the first half of the song to predict how the other half continues

It uses only the data of the left channel

 

I use __int64 because they are necessary

#include <stdio.h>
#include <fstream>
#include <iostream>

__int64 wichMate[65536];

int main()
{
	FILE* fp = fopen("a.wav", "rb");
	FILE* FW = fopen("output.wav", "wb");
	short* dutu=0;

	if (fp == NULL)
	{
		return 0;
	}

	char id[5];
	unsigned long size;
	short format_tag, channels, block_align, bits_per_sample;
	unsigned long format_length, sample_rate, avg_bytes_sec, data_size;

	fread(id, sizeof(char), 4, fp);
	fwrite(id, sizeof(char), 4, FW);
	id[4] = '\0';

	if (!strcmp(id, "RIFF"))
	{
		fread(&size, sizeof(unsigned long), 1, fp);
		fread(id, sizeof(char), 4, fp);
		fwrite(&size, sizeof(unsigned long), 1, FW);
		fwrite(id, sizeof(char), 4, FW);
		id[4] = '\0';

		if (!strcmp(id, "WAVE"))
		{
			fread(id, sizeof(char), 4, fp);
			fread(&format_length, sizeof(unsigned long), 1, fp);
			fread(&format_tag, sizeof(short), 1, fp);
			fread(&channels, sizeof(short), 1, fp);
			fread(&sample_rate, sizeof(unsigned long), 1, fp);
			fread(&avg_bytes_sec, sizeof(unsigned long), 1, fp);
			fread(&block_align, sizeof(short), 1, fp);
			fread(&bits_per_sample, sizeof(short), 1, fp);


			fwrite(id, sizeof(char), 4, FW);
			fwrite(&format_length, sizeof(unsigned long), 1, FW);
			fwrite(&format_tag, sizeof(short), 1, FW);
			fwrite(&channels, sizeof(short), 1, FW);
			fwrite(&sample_rate, sizeof(unsigned long), 1, FW);
			fwrite(&avg_bytes_sec, sizeof(unsigned long), 1, FW);
			fwrite(&block_align, sizeof(short), 1, FW);
			fwrite(&bits_per_sample, sizeof(short), 1, FW);

			fread(id, sizeof(char), 4, fp);
			fread(&data_size, sizeof(unsigned long), 1, fp);

			fwrite(id, sizeof(char), 4, FW);
			fwrite(&data_size, sizeof(unsigned long), 1, FW);

			dutu = new short[data_size / 2];
			fread(dutu, 1, data_size, fp);
			fwrite(dutu, 1, data_size / 2, FW);
		}
	}
	fclose(fp);

	data_size /= 4;
	__int64 dataz = __int64(data_size);

	__int64* arrre = new __int64[dataz];
	__int64* mumu = new __int64[dataz];

	for (__int64 i = 0; i < dataz; i++){ mumu[i] = __int64(dutu[i * 2]); arrre[i] = 0; }

	__int64 div = dataz / 2;
	__int64 din = div-1;

	for (__int64 h = 1; h < dataz; h++)
	{
		//std::cout << h << ", " << dataz << "\n";
		__int64 cok = mumu[h];
		
		if (h > din)
		{ 
			__int64 kno = 0;
			for (__int64 g = din - 1; g >= 0; g--)
			{
				arrre[kno] = arrre[kno + 1] + (65536-abs(mumu[g] - cok));
				kno++;
			}
		}
		else
		{
			__int64 kno = 0;
			for (__int64 g = h - 1; g >= 0; g--)
			{
				arrre[kno] += (65536-abs(mumu[g] - cok));
				kno++;
			}
		}

		if (h >= din)
		{
			for (__int64 i = 0; i < 65536; i++)
			{
				wichMate[i] = 0;
			}

			__int64 dede = din;
			for (__int64 i = 0; i < din; i++)
			{
				__int64 luko = mumu[dede] + 32768;
				__int64 mult2 = arrre[i];
				__int64 multt = mult2 * 65535;
				
				for (__int64 n = 0; n < 65536; n++)
				{
					wichMate[n] += multt - (abs(luko - n)*mult2);
				}
				dede--;
			}

			__int64 max = 0;

			for (__int64 i = 1; i < 65536; i++)
			{
				if (wichMate[i] >wichMate[max])
				{
					max = i;
				}
			}

			mumu[h + 1] = max - 32768;
		}
	}
	

	for (__int64 i = div; i < dataz; i++)
	{
		short dni = short(mumu[i]);
		fwrite(&dni, 2, 1, FW);
		fwrite(&dni, 2, 1, FW);
	}

	fclose(FW);

	return 0;
}



#5144935 change 1D addres to 2D addres

Posted by lomateron on 07 April 2014 - 01:10 AM

You could use a Texture1D instead of Texture2D: x is not defined by a texture

Why are w/x always floats?: because that how it is

 

I though you people used this conversion a lot too, and already have one at hand to exchange like:

ohh in nvidia hardware I will use this code... but in AMD ones I will use this one that works anywhere




#5144928 change 1D addres to 2D addres

Posted by lomateron on 07 April 2014 - 12:50 AM

gives 0 answers and wants my topic deleted

 

does someone else thinks this question is not worth an answer




#5144516 who has the biggest score

Posted by lomateron on 05 April 2014 - 12:09 AM

whut, can I know why the downvotes?




#5144464 who has the biggest score

Posted by lomateron on 04 April 2014 - 04:40 PM

In D3D10

I will render around 100,000 vertex with one VertexBuffer of just one vertice using DrawInstanced(), D3D10_PRIMITIVE_TOPOLOGY_POINTLIST

the output of the vertex shader is a random ID the can go from 0 to 2^32, and a random score that can go from 0 to 2^32

Various vertex can output the same ID, the total score of the ID is the sum of all scores with the same ID

I want to know which ID has the highest score, just using the GPU




#5143374 a better computer

Posted by lomateron on 30 March 2014 - 10:35 PM

So the brain is pretty good at thinking because it has a lot of connections between neurons

the limitation is the number of connections, the more the better

when building a computer, is there a way of making one that doesn't has this limit of a physical  "cable" connection between two places?

instead of using "cables" it will use the electromagnetic force, so problems like simulating the gravitational force between particles will be solved in O(1)

does a computer that work in this way has a name?

 






PARTNERS