New professional programming language

Started by
24 comments, last by MrJoshL 10 years, 9 months ago

Thanks for the feedback and compliment.

Not sure what you mean by the out flag ignoring case. Do you mean that the file that gets written doesn't have the same case you passed into the option?

If you embed your shader files as resources in the program then you will be able to view them. Pass them in as input files to the compiler and then reference them in program with r"MyShader.glsl" (the 'r' prefix being the important thing.) I do actually plan to make the compiler build shaders that are passed in this way so you can see errors of them without having to run your program. The only reason I haven't done this yet is because in Windows you have to make a window before you can run the shader compiler (I think) which is a pain.

At least you got a callstack smile.png but I'll put in a proper error.

Any problems you get please let me know and I'll try to fix them asap.

Advertisement

If you embed your shader files as resources in the program then you will be able to view them.

Please ignore this comment - it's total garbage. Tell me what file extensions you use for your shaders and I'll make it happen though.

Not sure what you mean by the out flag ignoring case. Do you mean that the file that gets written doesn't have the same case you passed into the option?

Yes, that's what I meant. It's not a big deal on Windows, but it might cause some trouble when you port to Linux.

What I meant was, when opening C-UP Studio(\cuprt debug opengl.cue), I can't view the contents of the shader files, just the .cup file.

Also ran into another error with C-UP Studio up, wasn't debugging the OpenGL sample but I got this:


Unhandled exception: Arithmetic error
At:
void EndSwRender(Window*) (line 348) : module Window
void DrawInvalidatedElementsR(Element*) (line 1669) : module Element
void DrawInvalidatedElementsR(Element*) (line 1681) : module Element
int Main(string[]) (line 1362) : module Main
int* .Invoke(Delegate_47150,void*[]) (line 1298) : module Main
void* Invoke(const(Function)&,const(void&)[] local) (line 1038) : module Reflection
void anon-func_39(void*) (line 406) : module Program
void FiberEntry(Fiber.EntryPoint) (line 102) : module Fiber
Unknown
Unknown
Unknown

C-UP studio displayed a white window after that.

I've fixed all of the issues you raised. The new version is on the website (I just overwrote the existing one.)

The crash was a divide by zero when the debugger was minimized (yes, it has been that well tested.)

Files will open in the debugger if they have one of these extensions: .cup .cl .frag .vert .glslv .glslf .glv .glf .txt

A couple of other things have changed:

- The optimiser works now (at least on all the tests I've run). It doesn't do that much optimisation yet, but some.

- Windows raw input is used instead of DirectInput so input.dll is no longer needed and you don't need to call Input.Init/Shutdown/Poll any more.

Thanks again,

Si

Hodgman mentioning radix sort got me wondering so I've knocked up an implementation.

On my 2 core laptop (AMD E-450 APU 1.65GHz), sorting 1024*1024 items takes:

1 core = 678ms

2 core = 368ms (1.84x speedup)

On my 4 core desktop (AMD Athalon 2 X4 630 2.8GHz), sorting the same number of items takes:

1 core = 333ms

2 core = 190ms (1.75x)

3 core = 136ms (2.45x)

4 core = 106ms (3.14x)

Pretty reasonably speedups.

Anyway, here's the radix sort code. It recursively subdivides the problem to generate the parallelism as described here:

http://software.intel.com/sites/default/files/m/b/3/6/3/8/22396-ParallelRadixSort_andreyryabov.pdf

It doesn't do parallel scatter as Hodgman suggested for reasons commented on in the code. I'd be interested to know if there's a better approach than I took.

As well as it being parallel internally you can also run multiple (non-overlapping) sorts in parallel if you need to.


// parallel radix sort
// in-place stable sort of items array into ascending key order
// type to be sorted must have RadixSortKey uint property
// you can pass in temporary workspace otherwise it will allocate one internally on the stack
public parallel void RadixSort<T>(T[] local items, T[] local itemsTemp = null)
{
	// workspace required for double buffering
	T[] local itemsAlt = itemsTemp;
	if (!itemsAlt)
		itemsAlt = new local T[items.Length];// = void;			// = void prevents zeroing the allocated memory (in theory - currently crashes compiler)

	if (items.Length > 4096)
		RadixSortP<T>(items, itemsAlt, 0);
	else
		RadixSortNP<T>(items, itemsAlt, 0);
}

private parallel void RadixSortP<T>(T[] local items, T[] local itemsAlt, int shift)
{
	// calculate prefix values
	int[] local prefixes = new local int[256];

	// if lots of keys then calculate prefixes in parallel
	if (items.Length >= 32768)
	{
		int split = items.Length / 4;
		int[] local [4] prefixesPar;
		for (int i = 0; i < 4; i++)
			prefixesPar = new local int[256];
		parallel
		{
			int s = 0;
			for (int i = 0; i < 3; i++, s += split)
				RadixSortCountPrefixes<T>(prefixesPar, items[s .. s + split], shift);
			// remainder
			RadixSortCountPrefixes<T>(prefixesPar[3], items[s .. items.Length], shift);
		}

		// sum the prefixes using simd add (stack allocated arrays are at least 16 byte aligned)
		foreach (int[] local pfxPar; prefixesPar)
		{
			int8[] local pfxSrc8 = cast(int8[] local)pfxPar;
			int8[] local pfxDst8 = cast(int8[] local)prefixes;
			foreach (int8& pfx, i; pfxSrc8)
				pfxDst8 += *pfx;
		}
	}
	// if not a large number of prefixes do them in serial
	else
	{
		RadixSortCountPrefixes<T>(prefixes, items, shift);
	}

	// prefix is sum of all previous values
	int sum = 0;
	foreach (int& prefix, i; prefixes)
	{
		int newSum = sum + *prefix;
		*prefix = sum;
		sum = newSum;
	}

	// scatter values out to destination array
	// can be done in parallel if shared memory is used but:
	//	- you'd need to use atomic increments which would kill performance if you have lots of the same key
	//	- would make the sort unstable
	// (maybe there's another way to partition the work I'm not seeing that avoids these issues)
	foreach (T& item; items)
	{
		int o = prefixes[(item.RadixSortKey << shift) >> 24]++;
		itemsAlt[o] = *item;
	}

	// done?
	if (shift < 24)
	{
		// now sort the sub-arrays in parallel if they're reasonably large or non-parallel if they're not
		parallel
		{
			int s = 0;
			for (int i = 0; i < 256; i++)
			{
				int e = prefixes;
		
				if ((e - s) > 4096)
					RadixSortP<T>(itemsAlt[s .. e], items[s .. e], shift + 8);
				else if ((e - s) > 1)
					RadixSortNP<T>(itemsAlt[s .. e], items[s .. e], shift + 8);
				else if ((e - s) > 0)
					items[s] = itemsAlt[s];

				s = e;
			}
		}
	}
}

// non-parallel version called by the master version above when arrays are sufficiently small
private void RadixSortNP<T>(T[] local items, T[] local itemsAlt, int shift) : parallel
{
	// calculate prefix values
	int[256] prefixes;

	RadixSortCountPrefixes<T>(&prefixes, items, shift);

	// prefix is sum of all previous values
	int sum = 0;
	foreach (int& prefix, i; prefixes)
	{
		int newSum = sum + *prefix;
		*prefix = sum;
		sum = newSum;
	}

	// scatter values out to destination array
	foreach (T& item; items)
	{
		int o = prefixes[(item.RadixSortKey << shift) >> 24]++;
		itemsAlt[o] = *item;
	}

	// if not done sort the sub-arrays
	if (shift < 24)
	{
		int s = 0;
		for (int i = 0; i < 256; i++)
		{
			int e = prefixes;

			if ((e - s) > 1)
				RadixSortNP<T>(itemsAlt[s .. e], items[s .. e], shift + 8);
			else if ((e - s) > 0)
				items[s] = itemsAlt[s];

			s = e;
		}
	}
}

// radix sort helper
parallel void RadixSortCountPrefixes<T>(int[] local prefixes, const(T)[] local items, int shift)
{
	foreach (const(T)& item; items)
		prefixes[(item.RadixSortKey << shift) >> 24]++;
}

And here's the little testbed I used to generate the results at the top:


import System.IO.Console;

uint randSeed = 0;

public void SeedRand(uint seed)
{
	randSeed = seed;
}

public uint Rand()
{
	randSeed = (randSeed * 1103515245 + 12345) & int.MaxValue;
	return randSeed;
}

public struct RadixItem
{
	private uint key;
	private int value;

	public this()
	{
		key = Rand();
		value = Rand();
	}

	public uint RadixSortKey() const : parallel
	{
		return key;
	}
}

public int Main(string[] args)
{
	SeedRand(12345);

	static const int numItems = 1024 * 1024;
	static const int numRuns = 10;

	RadixItem[] items = new RadixItem[numItems];
	RadixItem[] itemsTmp = new RadixItem[numItems];

	// uncomment this line to set number of threads job queue can use (it clamps internally to 1 .. number of physical cores on machine)
	//System.Profile.JobQueue.SetNumThreads(2);

	// run radix sort performance test
	Console.WriteLine("Timing radix sort of %d items", &numItems);
	ulong totalCycles = 0;

	// 10 runs
	for (int r = 0; r < numRuns; r++)
	{
		foreach (RadixItem& item; items)
			*item = RadixItem();

		ulong startCycle = GetProcessorCycleCount();

		RadixSort<RadixItem>(items, itemsTmp);

		ulong endCycle = GetProcessorCycleCount();
		totalCycles += endCycle - startCycle;

		// check sorted
		for (int i = 1; i < items.Length; i++)
			Assert(items.RadixSortKey >= items[i - 1].RadixSortKey);
	}

	// report average time taken
	float totalMs = cast(float)totalCycles / cast(float)GetProcessorCyclesPerSecond() / cast(float)numRuns * 1000.0f;
	Console.WriteLine("Average time taken over %d runs = %.2fms", &numRuns, &totalMs);

	return 0;
}

I keep mentioning shared memory. Here's how the scatter part of RadixSortP above might look if shared memory was used to parallelise it. Note the shared flag on the two output arrays of RadixScatter. All shared does is prevents the job manager making a dependency between this shared memory and any other shared memory, which matters in this case because we're passing the entire itemsAlt and prefixes arrays to each call to RadixScatter and without shared the calls would all depend on each other and be serialised.

I haven't compiled the code below but it should illustrate the concept.


// parallelism occurs at function scope, so we must introduce a new scatter function
parallel void RadixScatter<T>(shared T[] local itemsOut, shared int[] local prefixes, const(T)[] local itemsIn, int shift)
{
	foreach (const(T)& item; itemsIn)
	{
		// AtomicAdd returns the old value
		int o = AtomicAdd(&prefixes[(item.RadixSortKey << shift) >> 24], 1);
		itemsOut[o] = *item;
	}
}


// and here's the modified scatter part in RadixSortP
int split = items.Length / 4;	// arbitrarily split 4 ways
parallel
{
	int s = 0;
	for (int i = 0; i < 3; i++, s += split)
		RadixScatter<T>(itemsAlt, prefixes, items[s .. s + split], shift);
	// remainder
	RadixScatter<T>(itemsAlt, prefixes, items[s .. items.Length], shift);
}

It just occurred to me reading this that when I talk about releasing source code I just mean the c++ that comprises the compiler itself and runtime systems (garbage collector, job manager, etc.)

All of the actual C-UP source code is already available as it's embedded in the object files that are part of the download. It can be extracted out into individual files using the 'extract' command as described in the user guide. This includes source for things like the rigid body library, the debugger itself, collections, parallel collections, string formatting, opengl/cl bindings, etc.

You can also just use the debugger to browse source code if you wish (and even view the generated assembly language interleaved with the source.)

Hi folks,

As promised version 0.3 has just been uploaded to the website with a full source code release.

This version has libraries for audio (OpenAL) and networking (based on sockets but wrapped in classes.)

It also has support for touch input (see the OpenGL sample) and improvements to the way generics are implemented.

Please let me know if you have any problems or other feedback.

Thanks.

Hi C-UP,
I like studying graphic design. Inspired by your product, I've done some exercises on it.
I don't know what to do with these, but if you have any use for them please feel free:

2rncsw8.png

se2id1.png

a4ndxw.png

I've put an archive with the vectors here: [attachment=16524:C-UP Logos.zip]
Regards.

Thanks Kryzon, that's put a big smile on my grumpy old face :-)

The 3rd one is my favourite - what font is it? However, I do also like the way the UP is superscripted in the first two, and they'll fit better on the website.

Which one do you prefer?

I'll put something on the website when I get a chance (it's milestone week at real work so it'll have to wait 'till the weekend.)

This topic is closed to new replies.

Advertisement