Abstract
The Evolution Framework is a Java class library for genetic algorithms.
The following sections will explain why mankind needed it.
Contents
Preface
1 Genetic algorithms
1.1 Genome derivation
1.2 Selection
1.3 Common problems
1.3.1 Genetic depletion
1.3.2 Genetic chaos
1.4 Realization
2 The Evolution Framework
2.1 PoolImpl
2.2 GeneticFactory
3 Availability
4 Frequently asked questions
Preface
I do not expect this framework to be used ``industrially", where
I would prefer a native implementation tailored for the specific problem,
which implementation could be much more efficient than my framework. (Actually,
I wrote a GraphMeister in C (in 1995) and in C++ (in 1996) on the Macintosh
platform, not caring for reusage of classes, so I can tell the difference
in performance.) I regard it rather as an input into the academic discussion
of genetic algorithms.
I developed this stuff with Metrowerks CodeWarrior Pro 3 on Power Macintosh,
complemented by Inprise JBuilder 2 on WinDOS NT (for cases where the CodeWarrior
debugger was too buggy). It uses JDK 1.1.7 and JGL 3.1.
1 Genetic algorithms
The general idea is the following. We have to solve a problem, but we do
not know how to. Solving the problem means finding a solution. If we have
no idea how to find a solution, all we can do is guess (randomly), and
then see if we, by chance, guessed a solution.
Now suppose we have a numeric criterion, let's call it fitness,
telling us how good our guess is, and we are able to characterize our guesses
by some digital information which I'd like to call genetic information
or
genome.
Then we can proceed as follows: Make a number n of guesses
(the population) and evaluate them, i.e. compute their fitness values.
Choose a number k < n, and replace the k
worst guesses with new guesses, the genetic information of which is somehow
derived from the genetic information of the better guesses. Repeat this
procedure over and over again. If the fitness function and the algorithm
for deriving genetic information are properly chosen, chances are good
that this process produces better and better guesses and converges to a
solution. That's all.
1.1 Genome derivation
A new genome will be constructed from a fixed number of "parent" genomes.
If this number is 1, the evolution is non-sexual, and the child genome
is merely a mutation of its parent genome. If the number is > 1, we have
a sexual evolution, and the child genome has input from multiple parent
genomes. Usually the genomes are subdivided into smaller pieces, called
genes,
and the child genome is constructed by copying each gene from a random
parent. If the genes are sequentially ordered, there might be some fixed
probability that the next gene is taken from the same parent as the previous
one. I call this probability inertia. (Biologists probably have
a better word for it.) The inertia may be regarded as glue coupling adjacent
genes.
1.2 Selection
It's time to introduce the name individual for the "guesses" I was
referring to in the introductory section. The first selection is that of
the individuals that have to be replaced in a new generation. Obviously
those with smaller fitness values should have a higher probability to be
selected for replacement. But, as will be pointed out in 1.3, it can be
wise to give some small survival chance even to the weak individuals, i.e.
to not necessarily replace exactly the
k weakest individuals.
The second selection is that of parents. Obviously the fitter individuals
should have a higher probability to be selected for reproduction. But,
again, it's not a bad idea to give weak individuals a small chance.
1.3 Common problems
Design errors causing that the evolution does not converge to a solution.
The following remarks are of a heuristic nature, but I have no doubt that
it would be possible to base these considerations on a rigorous mathematical
theory. As far as I know, this theory does not exist yet.
1.3.1 Genetic depletion
The individuals are becoming genetically more and more similar to each
other, the overall genetic information decreases, the genetic variety shrinks
down. May also be viewed as convergence to a local optimum that is far
from the global optimum (if it is the global optimum, it's not a
problem :-) Reason: genetic information that is not contained in the best
individuals is rapidly deleted from the genetic pool, and there is insufficient
creation of new genetic "ideas". Can be caused by excessive selection pressure
(a fitness function that is to steep), a mutation rate that is too small,
bad selection strategies.
1.3.2 Genetic chaos
This is the contrary: A short lifetime of all, even the good, genes. No
genetic tendency is able to survive, lest to dominate. The evolution does
not converge at all. Reasons: Insufficient selection pressure, too high
mutation rate, too small inertia, poor definition of the genetic data.
(When I developed the genetic core of the sample application GraphMeister
during a Summer School on K-Theory in Italy in 1994, a major problem to
solve was to find a definition of genes of a path in a graph which did
not cause the children's phaenotype to be totally different from their
parents' phaenotype, and which did not cause tiny mutations to change the
phaenotype radically.)
1.4 Realization
Genetic algorithms for big problems are usually very time/memory-consuming,
but what do you expect if you have no deterministic algorithm to solve
your problem? Genetic algorithms are only a slightly more sophisticated
form of randomly guessing.
2 The Evolution Framework
Actually it is not a framework in the Hollywood sense ("Don't call us.
We'll call you."), but a class library providing an evolution engine that
can be used for varying concrete evolution algorithms.
The JavaDoc API documentation is the
reference to be consulted for questions on the API. I hope that a lot of
advice on the correct usage of the Evolution Framework will arise from
the study of my sample application GraphMeister.
2.1 PoolImpl
PoolImpl, the default
implementation of the interface Pool,
is the evolution engine. It hosts the population and is responsible for
breeding new generations. It needs to be feeded with an implementation
of GeneticFactory
and evolution parameters (EvolutionParm).
The fitness values are positive real numbers. The individuals are kept
in a 1-dimensional array, ordered by fitness. The probability for an individual
to survive is its fitness divided by the sum of the fitnesses of all individuals
(the "total fitness"). Once an individual is selected for replacement,
the parents of the new individual are chosen. The probability to be chosen
is the individual fitness divided by the total fitness, but an individual
may not be chosen if it is already among the parents (in other words, an
individual may not play multiple roles in the copulation :-). We do not
prevent an individual from participating in the production of multiple
childs in the process of breeding a new generation.
2.2 GeneticFactory
It is the client code's responsibility to implement this interface and
pass the implementation to the PoolImpl.
The client implementation might want to derive from the partial implementation
GeneticFactoryImpl.
The GeneticFactory's
task is to create Individuals
from scratch or from parents. The framework cannot know and does not care
what the client's individuals look like.
3 Availability
The Evolution Framework is in the public domain. Nevertheless I would appreciate
suggestions, corrections, bug reports, love confessions, and any other
constructive comments (even if they
are only addressing my English, which is not my native language).
This is work in progress (especially the documentation). I do not consider
it to be finished yet.
Please download the library byte code, the
source code.
4 Frequently asked questions
Why did I do this? (Obviously a question asked by myself frequently
:-)
Just for fun. Who doesn't want to write his own class library? If you
need some idea for your Ph.D. Thesis, "steal" it from me. I won't care.