New professional programming language

c-up · 2013-07-01T23:50:15

Hi game developers, I've invented a new programming language based on over 20 years of experience writing games for a living and I'd like to invite you all to give it a try. The language is called C-UP, which is a play on the classic gaming terminology for lives/turns/power-ups: 1-up, 2-up, etc. It's a C style imperative language so it should look very familiar to most of you but I hope it introduces enough new concepts to make it worthwhile. I guess the headline features are safe automatic parallelism, full support for SIMD vector types, proper arrays with slicing, multi-methods and generics, but there's lots of other good stuff. Vector math is all SIMD based with intrinsic functions for things like dot, cross, min, max, sin, cos, etc. There are libraries for collections, matrices, quaternions, window management, OpenGL, OpenCL, input, file system and 3d rigid body dynamics. Audio and networking are at the top of my list of libraries to implement next although you are free to make your own. There's also a debugger written in C-UP itself. You can download it for free (Windows only) at: www.c-up.net. Sorry for the basic website implementation but all of my time is being devoted to the language itself. Unfortunately this means that there are no forums or anything yet but you can email me at the address given on the site and of course discuss here. I'll try to answer any questions you have but it is just me and I do have a full time job in the industry as well so please bear with me. Thanks in advance for any time you're able to spend looking at this project - I hope you like it, Si P.S. Here's a little C-UP program to give you a taste of the language. This example draws some 2d coloured squares in parallel using simd vector maths and 2d arrays with slicing. // entry point int Main(string[] args) { // allocate a 2d dynamic array of 4 byte simd vector type // (a comprehensive range of simd types is supported) byte4[,] image = new byte4[64, 64]; parallel { // fill the top-left quarter in red (array slicing with .. allows us to make a sub-array aliased over the master array) BlendRect(image[0..32, 0..32], byte4(255, 0, 0, 255)); // fill the top-right quarter in green BlendRect(image[0..32, 32..64], byte4(0, 255, 0, 255)); // fill the bottom-left quarter in blue BlendRect(image[32..64, 0..32], byte4(0, 0, 255, 255)); // fill the bottom-right quarter in yellow BlendRect(image[32..64, 32..64], byte4(255, 255, 0, 255)); // blend a 50% alpha white square over the middle BlendRect(image[16..48, 16..48], byte4(255, 255, 255, 128)); // The calls to the parallel BlendRect function don't execute until the parallel block { } exits // and then they execute in parallel with automatic dependency checking. // In this example the first 4 squares will be filled in parallel and the final one will not happen until // they are all complete because it overlaps them all } return 0; } // fill an alpha blended rectangle parallel void BlendRect(byte4[,] local rect, byte4 colour) { // expand source colour to 16 bits per channel short4 srcColour = colour; // foreach on a 2d array gives each row in turn as a 1d array foreach (byte4[] row; rect) { // foreach on a 1d array yields each element in turn foreach (byte4& pixel; row) { // expand destination colour to 16 bits per channel short4 dstColour = *pixel; // perform blend - vector component swizzling and write masking is supported like glsl pixel.rgb = cast(byte4)(dstColour + ((srcColour - dstColour) * srcColour.aaaa) >> 8); } } }

Engines and Middleware Programming

Started by c-up April 29, 2013 07:50 PM

24 comments, last by MrJoshL 10 years, 9 months ago

c-up

324

Author

May 04, 2013 02:59 PM

Thanks for the feedback and compliment.

Not sure what you mean by the out flag ignoring case. Do you mean that the file that gets written doesn't have the same case you passed into the option?

If you embed your shader files as resources in the program then you will be able to view them. Pass them in as input files to the compiler and then reference them in program with r"MyShader.glsl" (the 'r' prefix being the important thing.) I do actually plan to make the compiler build shaders that are passed in this way so you can see errors of them without having to run your program. The only reason I haven't done this yet is because in Windows you have to make a window before you can run the shader compiler (I think) which is a pain.

At least you got a callstack but I'll put in a proper error.

Any problems you get please let me know and I'll try to fix them asap.

c-up

324

Author

May 04, 2013 03:09 PM

If you embed your shader files as resources in the program then you will be able to view them.

Please ignore this comment - it's total garbage. Tell me what file extensions you use for your shaders and I'll make it happen though.

Mussi

4,453

May 04, 2013 04:58 PM

Not sure what you mean by the out flag ignoring case. Do you mean that the file that gets written doesn't have the same case you passed into the option?

Yes, that's what I meant. It's not a big deal on Windows, but it might cause some trouble when you port to Linux.

What I meant was, when opening C-UP Studio(\cuprt debug opengl.cue), I can't view the contents of the shader files, just the .cup file.

Also ran into another error with C-UP Studio up, wasn't debugging the OpenGL sample but I got this:


Unhandled exception: Arithmetic error
At:
void EndSwRender(Window*) (line 348) : module Window
void DrawInvalidatedElementsR(Element*) (line 1669) : module Element
void DrawInvalidatedElementsR(Element*) (line 1681) : module Element
int Main(string[]) (line 1362) : module Main
int* .Invoke(Delegate_47150,void*[]) (line 1298) : module Main
void* Invoke(const(Function)&,const(void&)[] local) (line 1038) : module Reflection
void anon-func_39(void*) (line 406) : module Program
void FiberEntry(Fiber.EntryPoint) (line 102) : module Fiber
Unknown
Unknown
Unknown

C-UP studio displayed a white window after that.

c-up

324

Author

May 05, 2013 11:41 AM

I've fixed all of the issues you raised. The new version is on the website (I just overwrote the existing one.)

The crash was a divide by zero when the debugger was minimized (yes, it has been that well tested.)

Files will open in the debugger if they have one of these extensions: .cup .cl .frag .vert .glslv .glslf .glv .glf .txt

A couple of other things have changed:

- The optimiser works now (at least on all the tests I've run). It doesn't do that much optimisation yet, but some.

- Windows raw input is used instead of DirectInput so input.dll is no longer needed and you don't need to call Input.Init/Shutdown/Poll any more.

Thanks again,

c-up

324

Author

May 08, 2013 09:55 PM

Hodgman mentioning radix sort got me wondering so I've knocked up an implementation.

On my 2 core laptop (AMD E-450 APU 1.65GHz), sorting 1024*1024 items takes:

1 core = 678ms

2 core = 368ms (1.84x speedup)

On my 4 core desktop (AMD Athalon 2 X4 630 2.8GHz), sorting the same number of items takes:

1 core = 333ms

2 core = 190ms (1.75x)

3 core = 136ms (2.45x)

4 core = 106ms (3.14x)

Pretty reasonably speedups.

Anyway, here's the radix sort code. It recursively subdivides the problem to generate the parallelism as described here:

http://software.intel.com/sites/default/files/m/b/3/6/3/8/22396-ParallelRadixSort_andreyryabov.pdf

It doesn't do parallel scatter as Hodgman suggested for reasons commented on in the code. I'd be interested to know if there's a better approach than I took.

As well as it being parallel internally you can also run multiple (non-overlapping) sorts in parallel if you need to.


// parallel radix sort
// in-place stable sort of items array into ascending key order
// type to be sorted must have RadixSortKey uint property
// you can pass in temporary workspace otherwise it will allocate one internally on the stack
public parallel void RadixSort<T>(T[] local items, T[] local itemsTemp = null)
{
	// workspace required for double buffering
	T[] local itemsAlt = itemsTemp;
	if (!itemsAlt)
		itemsAlt = new local T[items.Length];// = void;			// = void prevents zeroing the allocated memory (in theory - currently crashes compiler)

	if (items.Length > 4096)
		RadixSortP<T>(items, itemsAlt, 0);
	else
		RadixSortNP<T>(items, itemsAlt, 0);
}

private parallel void RadixSortP<T>(T[] local items, T[] local itemsAlt, int shift)
{
	// calculate prefix values
	int[] local prefixes = new local int[256];

	// if lots of keys then calculate prefixes in parallel
	if (items.Length >= 32768)
	{
		int split = items.Length / 4;
		int[] local [4] prefixesPar;
		for (int i = 0; i < 4; i++)
			prefixesPar = new local int[256];
		parallel
		{
			int s = 0;
			for (int i = 0; i < 3; i++, s += split)
				RadixSortCountPrefixes<T>(prefixesPar, items[s .. s + split], shift);
			// remainder
			RadixSortCountPrefixes<T>(prefixesPar[3], items[s .. items.Length], shift);
		}

		// sum the prefixes using simd add (stack allocated arrays are at least 16 byte aligned)
		foreach (int[] local pfxPar; prefixesPar)
		{
			int8[] local pfxSrc8 = cast(int8[] local)pfxPar;
			int8[] local pfxDst8 = cast(int8[] local)prefixes;
			foreach (int8& pfx, i; pfxSrc8)
				pfxDst8 += *pfx;
		}
	}
	// if not a large number of prefixes do them in serial
	else
	{
		RadixSortCountPrefixes<T>(prefixes, items, shift);
	}

	// prefix is sum of all previous values
	int sum = 0;
	foreach (int& prefix, i; prefixes)
	{
		int newSum = sum + *prefix;
		*prefix = sum;
		sum = newSum;
	}

	// scatter values out to destination array
	// can be done in parallel if shared memory is used but:
	//	- you'd need to use atomic increments which would kill performance if you have lots of the same key
	//	- would make the sort unstable
	// (maybe there's another way to partition the work I'm not seeing that avoids these issues)
	foreach (T& item; items)
	{
		int o = prefixes[(item.RadixSortKey << shift) >> 24]++;
		itemsAlt[o] = *item;
	}

	// done?
	if (shift < 24)
	{
		// now sort the sub-arrays in parallel if they're reasonably large or non-parallel if they're not
		parallel
		{
			int s = 0;
			for (int i = 0; i < 256; i++)
			{
				int e = prefixes;
		
				if ((e - s) > 4096)
					RadixSortP<T>(itemsAlt[s .. e], items[s .. e], shift + 8);
				else if ((e - s) > 1)
					RadixSortNP<T>(itemsAlt[s .. e], items[s .. e], shift + 8);
				else if ((e - s) > 0)
					items[s] = itemsAlt[s];

				s = e;
			}
		}
	}
}

// non-parallel version called by the master version above when arrays are sufficiently small
private void RadixSortNP<T>(T[] local items, T[] local itemsAlt, int shift) : parallel
{
	// calculate prefix values
	int[256] prefixes;

	RadixSortCountPrefixes<T>(&prefixes, items, shift);

	// prefix is sum of all previous values
	int sum = 0;
	foreach (int& prefix, i; prefixes)
	{
		int newSum = sum + *prefix;
		*prefix = sum;
		sum = newSum;
	}

	// scatter values out to destination array
	foreach (T& item; items)
	{
		int o = prefixes[(item.RadixSortKey << shift) >> 24]++;
		itemsAlt[o] = *item;
	}

	// if not done sort the sub-arrays
	if (shift < 24)
	{
		int s = 0;
		for (int i = 0; i < 256; i++)
		{
			int e = prefixes;

			if ((e - s) > 1)
				RadixSortNP<T>(itemsAlt[s .. e], items[s .. e], shift + 8);
			else if ((e - s) > 0)
				items[s] = itemsAlt[s];

			s = e;
		}
	}
}

// radix sort helper
parallel void RadixSortCountPrefixes<T>(int[] local prefixes, const(T)[] local items, int shift)
{
	foreach (const(T)& item; items)
		prefixes[(item.RadixSortKey << shift) >> 24]++;
}

And here's the little testbed I used to generate the results at the top:


import System.IO.Console;

uint randSeed = 0;

public void SeedRand(uint seed)
{
	randSeed = seed;
}

public uint Rand()
{
	randSeed = (randSeed * 1103515245 + 12345) & int.MaxValue;
	return randSeed;
}

public struct RadixItem
{
	private uint key;
	private int value;

	public this()
	{
		key = Rand();
		value = Rand();
	}

	public uint RadixSortKey() const : parallel
	{
		return key;
	}
}

public int Main(string[] args)
{
	SeedRand(12345);

	static const int numItems = 1024 * 1024;
	static const int numRuns = 10;

	RadixItem[] items = new RadixItem[numItems];
	RadixItem[] itemsTmp = new RadixItem[numItems];

	// uncomment this line to set number of threads job queue can use (it clamps internally to 1 .. number of physical cores on machine)
	//System.Profile.JobQueue.SetNumThreads(2);

	// run radix sort performance test
	Console.WriteLine("Timing radix sort of %d items", &numItems);
	ulong totalCycles = 0;

	// 10 runs
	for (int r = 0; r < numRuns; r++)
	{
		foreach (RadixItem& item; items)
			*item = RadixItem();

		ulong startCycle = GetProcessorCycleCount();

		RadixSort<RadixItem>(items, itemsTmp);

		ulong endCycle = GetProcessorCycleCount();
		totalCycles += endCycle - startCycle;

		// check sorted
		for (int i = 1; i < items.Length; i++)
			Assert(items.RadixSortKey >= items[i - 1].RadixSortKey);
	}

	// report average time taken
	float totalMs = cast(float)totalCycles / cast(float)GetProcessorCyclesPerSecond() / cast(float)numRuns * 1000.0f;
	Console.WriteLine("Average time taken over %d runs = %.2fms", &numRuns, &totalMs);

	return 0;
}

c-up

324

Author

May 09, 2013 07:57 AM

I keep mentioning shared memory. Here's how the scatter part of RadixSortP above might look if shared memory was used to parallelise it. Note the shared flag on the two output arrays of RadixScatter. All shared does is prevents the job manager making a dependency between this shared memory and any other shared memory, which matters in this case because we're passing the entire itemsAlt and prefixes arrays to each call to RadixScatter and without shared the calls would all depend on each other and be serialised.

I haven't compiled the code below but it should illustrate the concept.


// parallelism occurs at function scope, so we must introduce a new scatter function
parallel void RadixScatter<T>(shared T[] local itemsOut, shared int[] local prefixes, const(T)[] local itemsIn, int shift)
{
	foreach (const(T)& item; itemsIn)
	{
		// AtomicAdd returns the old value
		int o = AtomicAdd(&prefixes[(item.RadixSortKey << shift) >> 24], 1);
		itemsOut[o] = *item;
	}
}


// and here's the modified scatter part in RadixSortP
int split = items.Length / 4;	// arbitrarily split 4 ways
parallel
{
	int s = 0;
	for (int i = 0; i < 3; i++, s += split)
		RadixScatter<T>(itemsAlt, prefixes, items[s .. s + split], shift);
	// remainder
	RadixScatter<T>(itemsAlt, prefixes, items[s .. items.Length], shift);
}

c-up

324

Author

May 15, 2013 10:04 AM

It just occurred to me reading this that when I talk about releasing source code I just mean the c++ that comprises the compiler itself and runtime systems (garbage collector, job manager, etc.)

All of the actual C-UP source code is already available as it's embedded in the object files that are part of the download. It can be extracted out into individual files using the 'extract' command as described in the user guide. This includes source for things like the rigid body library, the debugger itself, collections, parallel collections, string formatting, opengl/cl bindings, etc.

You can also just use the debugger to browse source code if you wish (and even view the generated assembly language interleaved with the source.)

c-up

324

Author

June 22, 2013 11:43 AM

Hi folks,

As promised version 0.3 has just been uploaded to the website with a full source code release.

This version has libraries for audio (OpenAL) and networking (based on sockets but wrapped in classes.)

It also has support for touch input (see the OpenGL sample) and improvements to the way generics are implemented.

Please let me know if you have any problems or other feedback.

Thanks.

Kryzon

4,684

June 27, 2013 12:15 AM

Hi C-UP,
I like studying graphic design. Inspired by your product, I've done some exercises on it.
I don't know what to do with these, but if you have any use for them please feel free:

I've put an archive with the vectors here: [attachment=16524:C-UP Logos.zip]
Regards.

c-up

324

Author

June 27, 2013 09:11 AM

Thanks Kryzon, that's put a big smile on my grumpy old face :-)

The 3rd one is my favourite - what font is it? However, I do also like the way the UP is superscripted in the first two, and they'll fit better on the website.

Which one do you prefer?

I'll put something on the website when I get a chance (it's milestone week at real work so it'll have to wait 'till the weekend.)

New professional programming language

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

New professional programming language

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Reticulating splines