implementation of neural network

Started by
3 comments, last by alvaro 7 years, 11 months ago

hi.

i made a basic framework for my neural nework implenetations but i donnt know really its done in right way. i made it using matrix for weights and neurons but as one of most basic concepts of neural network is parallel processing. i want to know how can i do that. should i do that using threading or using different cores and...?

i use c# as i work with unity.

can you give a good example of coding unity?

my next question is what is best, most complete neural network library for c#?

thank you for helping

Advertisement
Consider your inputs to a feed-forward fully-connected neural network as a column vector with real-valued entries. The operation of a typical layer does this

output = non_linearity(matrix * input + biases)

Here `output', `input' and `biases' are column vectors, and `non_linearity' is a function that applies a non-linear transformation to each coordinate in the vector (typically tanh(x) or max(0,x)).

For non-trivial neural networks the bulk of the work comes from the `matrix * input' operation, which can already be parallelized to some extent. However, you get much better parallelism if you compute your network on multiple data samples at the same time (a so-called "minibatch"). It turns out you can just replace the column vectors with matrices, so each column represents a separate data sample from the minibatch, and the formulas are essentially the same. This allows for much more efficient use of parallel hardware, especially if you are using GPUs. All you need to do is use a well-optimized matrix library.

I know nothing about C# or Unity, sorry.

Implementing genetic algorithms works excellent on the graphics card because you can try a million alternatives in each draw call and then iterate upon be best results.

Try to find a way to express data using an image and define steps of processing as image manipulations in read-paradigm.

For a single actor:

A pixel's color or location might represent a direction of some action.

Make a utility pass that gets the score of each action in a new image.

A laying 1D texture can look for the best result. Let the height be 1 if you use a 2D texture to represent it since padding is applied per row.

Read the 1D texture back to the CPU. (This will eat up 70% of your time unless you use DirectX 12 or Vulkan to run multiple things at the same time)

Do the last sweep on the CPU taking the best option and execute the action.

For multiple actors:

Each column can represent a unit and each row can represent a possible action.

The code to implement this will look a lot like spaghetti assembler code when fully optimized which is why I abandoned that strategy game when nothing more could be added.

I tried genetic algorithms on the GPU using DirectX and got really nice results but the shader compiler will fail 90% of the time when trying to do anything advanced like physical simulation because it was only tested for graphics.

Vulkan is still in Alpha stage with drivers missing but there you will be able to choose other compilers using SPIR-V intermediate code. What I don't like about Vulkan is that sampler states are connected to views and you have to create pipeline objects. Vulkan might win over DirectX 12 by simply supporting Windows 7 and phones.

Chokepoint for GPUs often usually are complex functions which arent easily done by the simplified instruction sets used by the highly paralleled processors. How well do the usual NN sigmoid activation functions work within the GPU instruction sets (and might some table lookup possibly be substituted to get around that) ?

--------------------------------------------[size="1"]Ratings are Opinion, not Fact

Chokepoint for GPUs often usually are complex functions which arent easily done by the simplified instruction sets used by the highly paralleled processors. How well do the usual NN sigmoid activation functions work within the GPU instruction sets (and might some table lookup possibly be substituted to get around that) ?


For any decently sized neural net, the vast majority of the time is spent doing matrix multiplications (matrix by column vector if you are feeding it a single sample, or matrix by matrix if you feed it multiple samples at a time, which is typically how training is done). Imagine a neuron with 1,000 inputs. Computing its activation takes 1,000 multiply-adds and one single call to the activation function.

Also, rectified linear units are increasingly common, so instead of 1/(1+exp(-x)), you simply need to compute max(0,x).

This topic is closed to new replies.

Advertisement