Problem: How to make a good fitness function?

Assumptions:

- There is a specially designed programming language.
- The program will not directly modify itself, but indirectly through the use of genetic algorithms.
- Using mutations combined with child generation from random parents.
- The language is fairly resilient to bit mutations.
- The merger of two parents will have a high chance of producing a new algorithm that is not completely broken.

A problem with using randomly created algorithms, compared to e.g. neural networks, is that these algorithms are unlikely to produce any usable output. At least initially. Designing a fitness function for a numerical output can be easy, which could be a measurement of the distance from the result to the target. But what about a fitness function for a sorting algorithm?

Using a fitness function limited to the outputs 0 and 1 is not good enough. It will almost always produce 0, not providing any help for the genetic algorithms. Given my preconceptions about sorting, I could create a fitness function that gives some score for actually reproducing some input on the output. And gradually increase the score the better the output is ordered.

But there are drawbacks with this kind of fitness functions:

- It presumes that a certain intermediary result is good, a step on the way to a final solution. But that may prevent the algorithm from finding the non obvious solutions I didn't know about.
- It requires a lot of effort to formulate. A different function will be needed for different algorithm problems.

I am not sure there a solution to this problem. Any feedback or references are welcome!