Abstract

The Evolution Framework is a Java class library for genetic algorithms. The following sections will explain why mankind needed it.

Contents

Preface
1 Genetic algorithms
1.1 Genome derivation
1.2 Selection
1.3 Common problems
1.3.1 Genetic depletion
1.3.2 Genetic chaos
1.4 Realization
2 The Evolution Framework
2.1 PoolImpl
2.2 GeneticFactory
3 Availability
4 Frequently asked questions

Preface


I do not expect this framework to be used ``industrially", where I would prefer a native implementation tailored for the specific problem, which implementation could be much more efficient than my framework. (Actually, I wrote a GraphMeister in C (in 1995) and in C++ (in 1996) on the Macintosh platform, not caring for reusage of classes, so I can tell the difference in performance.) I regard it rather as an input into the academic discussion of genetic algorithms.

I developed this stuff with Metrowerks CodeWarrior Pro 3 on Power Macintosh, complemented by Inprise JBuilder 2 on WinDOS NT (for cases where the CodeWarrior debugger was too buggy). It uses JDK 1.1.7 and JGL 3.1.
 

1 Genetic algorithms

The general idea is the following. We have to solve a problem, but we do not know how to. Solving the problem means finding a solution. If we have no idea how to find a solution, all we can do is guess (randomly), and then see if we, by chance, guessed a solution.

Now suppose we have a numeric criterion, let's call it fitness, telling us how good our guess is, and we are able to characterize our guesses by some digital information which I'd like to call genetic information or genome.
Then we can proceed as follows: Make a number n of guesses (the population) and evaluate them, i.e. compute their fitness values. Choose a number k < n, and replace the k worst guesses with new guesses, the genetic information of which is somehow derived from the genetic information of the better guesses. Repeat this procedure over and over again. If the fitness function and the algorithm for deriving genetic information are properly chosen, chances are good that this process produces better and better guesses and converges to a solution. That's all.
 

1.1 Genome derivation

A new genome will be constructed from a fixed number of "parent" genomes. If this number is 1, the evolution is non-sexual, and the child genome is merely a mutation of its parent genome. If the number is > 1, we have a sexual evolution, and the child genome has input from multiple parent genomes. Usually the genomes are subdivided into smaller pieces, called genes, and the child genome is constructed by copying each gene from a random parent. If the genes are sequentially ordered, there might be some fixed probability that the next gene is taken from the same parent as the previous one. I call this probability inertia. (Biologists probably have a better word for it.) The inertia may be regarded as glue coupling adjacent genes.
 

1.2 Selection

It's time to introduce the name individual for the "guesses" I was referring to in the introductory section. The first selection is that of the individuals that have to be replaced in a new generation. Obviously those with smaller fitness values should have a higher probability to be selected for replacement. But, as will be pointed out in 1.3, it can be wise to give some small survival chance even to the weak individuals, i.e. to not necessarily replace exactly the k weakest individuals. The second selection is that of parents. Obviously the fitter individuals should have a higher probability to be selected for reproduction. But, again, it's not a bad idea to give weak individuals a small chance.
 

1.3 Common problems

Design errors causing that the evolution does not converge to a solution. The following remarks are of a heuristic nature, but I have no doubt that it would be possible to base these considerations on a rigorous mathematical theory. As far as I know, this theory does not exist yet.
 

1.3.1 Genetic depletion

The individuals are becoming genetically more and more similar to each other, the overall genetic information decreases, the genetic variety shrinks down. May also be viewed as convergence to a local optimum that is far from the global optimum (if it is the global optimum, it's not a problem :-) Reason: genetic information that is not contained in the best individuals is rapidly deleted from the genetic pool, and there is insufficient creation of new genetic "ideas". Can be caused by excessive selection pressure (a fitness function that is to steep), a mutation rate that is too small, bad selection strategies.
 

1.3.2 Genetic chaos

This is the contrary: A short lifetime of all, even the good, genes. No genetic tendency is able to survive, lest to dominate. The evolution does not converge at all. Reasons: Insufficient selection pressure, too high mutation rate, too small inertia, poor definition of the genetic data. (When I developed the genetic core of the sample application GraphMeister during a Summer School on K-Theory in Italy in 1994, a major problem to solve was to find a definition of genes of a path in a graph which did not cause the children's phaenotype to be totally different from their parents' phaenotype, and which did not cause tiny mutations to change the phaenotype radically.)
 

1.4 Realization

Genetic algorithms for big problems are usually very time/memory-consuming, but what do you expect if you have no deterministic algorithm to solve your problem? Genetic algorithms are only a slightly more sophisticated form of randomly guessing.
 

The Evolution Framework

Actually it is not a framework in the Hollywood sense ("Don't call us. We'll call you."), but a class library providing an evolution engine that can be used for varying concrete evolution algorithms.

The JavaDoc API documentation is the reference to be consulted for questions on the API. I hope that a lot of advice on the correct usage of the Evolution Framework will arise from the study of my sample application GraphMeister.
 

2.1 PoolImpl

PoolImpl, the default implementation of the interface Pool, is the evolution engine. It hosts the population and is responsible for breeding new generations. It needs to be feeded with an implementation of GeneticFactory and evolution parameters (EvolutionParm).

The fitness values are positive real numbers. The individuals are kept in a 1-dimensional array, ordered by fitness. The probability for an individual to survive is its fitness divided by the sum of the fitnesses of all individuals (the "total fitness"). Once an individual is selected for replacement, the parents of the new individual are chosen. The probability to be chosen is the individual fitness divided by the total fitness, but an individual may not be chosen if it is already among the parents (in other words, an individual may not play multiple roles in the copulation :-). We do not prevent an individual from participating in the production of multiple childs in the process of breeding a new generation.
 

2.2 GeneticFactory

It is the client code's responsibility to implement this interface and pass the implementation to the PoolImpl. The client implementation might want to derive from the partial implementation GeneticFactoryImpl. The GeneticFactory's task is to create Individuals from scratch or from parents. The framework cannot know and does not care what the client's individuals look like.
 

Availability

The Evolution Framework is in the public domain. Nevertheless I would appreciate suggestions, corrections, bug reports, love confessions, and any other constructive comments (even if they are only addressing my English, which is not my native language).

This is work in progress (especially the documentation). I do not consider it to be finished yet.

Please download the library byte code, the source code.
 

Frequently asked questions

Why did I do this? (Obviously a question asked by myself frequently :-)

Just for fun. Who doesn't want to write his own class library? If you need some idea for your Ph.D. Thesis, "steal" it from me. I won't care.