In case you didn’t realise, I’m human!
I am the result of hundreds of millions of years of evolution and natural selection.
If you’re not a screen-reader, then you are, too.
Sure, you might look like your parents, but if we go really far back, we’ll find some things that look nothing like you.
Like really small protein compounds from the primordial soup.

But enough about soup. Let’s compare you with a calculator.

A calculator

(one that you’re allowed to use in exams)

You, probably

Hmm... yes... that’s definitely a calculator, and that’s definitely a person.
What’s the difference?
One is engineered; designed by us, and (if engineered properly) composed of modular components, with low coupling and high cohesion.
The other is living; designed by nature, with loads of tightly-packed organs that are very highly coupled.

What I’m trying to say here is that evolved things are very complex.
We want to make our own things that are this complex.
Hence, we have evolutionary algorithms!
With it, we can make things like this neural net that plays snake.

Evolution by natural selection is the best explanation we have for the complexity of living things.
But remember, that’s just a theory. A ga- uh, a scientific theory.
It’s not perfect, and it’s your duty as a scientist to find holes in it....
... or, just do what the lecturers tell you and get decent marks.

Evolutionary Algorithms

Do you remember stochastic local search?
If not, here’s a refresher on hill-climbing:

Start with a random solution
Make a small change to it
If that change was better, keep it. If not, discard.
Repeat until we reach a solution we like

It’s an algorithm used to search a solution space and find a local maximum.

A computer scientist climbing a solution space

to its maximum using a stochastic local search algorithm.

This is kind of like evolution, right?
There’s generations, and mutations, and stuff like that.
Evolutionary algorithms (EA) are like hill-climbers, but with a population.
Some individuals in the population might be luckier than others.
We need to concentrate on those, so their children can survive, and “discard” the others.
It sounds cruel, but that’s nature.

There are four requirements for evolution by natural selection:

Reproduction

Individuals can replicate themselves.
Copy(A) → B

Heritability

Offspring is similar to their parents.
B ≈ A

Variation

There are differences, or “mutations”, in children.
Vary(Copy(A)) → B1, B2, B3
B1 ≠ B2 ≠ B3

Change in fitness

The variations cause our fitness to change.
Copy(A) → B
Fitness(A) ≠ Fitness(B)

Here is the outline of an evolutionary algorithm (EA):

Initialise all individuals of a population
Until (satisfied or no further improvement)

Evaluate all individuals

For i=1 to P; Quality(Pop[i]) → Fitness[i]; End

Reproduce with variation

For i=1 to P;

Pick individual j with probability proportional to fitness

Pop2[i] = Pop[j];

Pop2[i] = mutate(Pop2[i]);

End

Pop2 → Pop;

Output best individual

The necessary components of an EA are:

Representation of individuals
Fitness function
Selection
Variation operators

Let’s go through each one.

Representation of Individuals

As a computer scientist, you know that computers store information as data.
That means, to represent our individuals, we need to encode them into data.
There are many ways to do this:

Problem	Individual encoding
A function that takes 20 binary arguments. Maximise the function.	A 20-bit binary number e.g. 00101110101000010111
A graph theory problem, like TSP	A string of node names, making a path e.g. BHGADCEF
Where to place x number of items	A set of (x,y) coordinates e.g. (12,34), (56,78), (90,12)

As you can see, there’s loads of ways to encode individuals as data.
If we can encode an individual into data, and we can copy data...
... that means we can copy an individual.
That covers our reproduction requirement of natural selection!

Fitness Function

In evolution, each individual has fitness.
That means we need a fitness function that takes an individual and returns a fitness.
Our fitness function needs to directly correlate to our “goal”, or what we want to achieve out of our algorithm.
That way, when we maximise our fitness, we’re closer to achieving our goal.
For example:

🤔 Problem	🏁 Goal	💪 Fitness function
A function that takes 20 binary arguments. Maximise the function.	We want to find a set of arguments that maximises our function.	The value outputted by the function.
Travelling Salesperson Problem (TSP)	Travel all nodes in the shortest length	Length of the tour (we would have to minimise our fitness function in this case, e.g. make it a cost function)
Position of drilling sites on an oil field	We want to extract as much oil as we can.	Oil yield from a simulation of the oil field.

An individual with a high fitness.

Selection

How do we select what individuals to, uh... “mate” and pass on to the next generation?
There’s a variety of ways to pick.

One of these ways is using fitness proportionate selection (FPS).
The slides compare it to a roulette wheel, but it’s basically individuals with a higher fitness have a higher chance of being selected.
For example, if we had a population of 2, and #1 had a fitness of 6, and #2 had a fitness of 4, then we’d have a 60% chance of selecting #1 and a 40% chance of selecting #2.

Another way is Boltzmann Selection: a more refined version of the fitness proportionate selection above.
The formula goes like this, where is the probability of picking individual :

Where:

- the fitness function of individual
- standard deviation in distribution of fitness
controls selection

means weak (neutral) selection; all individuals have an equal chance
means strong (choose the fittest) selection; only the highest fitness will be picked

Yet another way is Stochastic Universal Sampling (SUS).
Where FPS chooses several solutions from the population by repeated random sampling, SUS uses a single random value to sample all of the solutions by choosing them at evenly spaced intervals (source).
Comparing it to the roulette wheel, FPS spins the wheel multiple times to get samples.
However, SUS has multiple, evenly-spaced pointers on the wheel, and the wheel is spun once.

What’s good about this is that weaker individuals have a chance to reproduce too, and the fittest members won’t saturate the candidate space.
In other words, there’s less variation in the number of offspring any one individual has.

Huh? You want even more selection operators?
Fine. Here’s three more:

Tournament selection

Pick individuals, and choose the individual with the highest fitness

Truncation selection

Pick any individual in the top half of the population

Rank-based selection

Assign a rank to each individual based on fitness (e.g. worst fitness has rank 1)
Pick an individual with probability proportional to rank (or 1/rank, if that’s what you need).
How is this different from FPS?

In rank-based selection, only the ordinal assignment of fitness matters, not cardinal (it only matters that fitness is greater; not how much by).
For example, say i1 has a fitness of 1, and i2 has a fitness of 2.
i1 would have rank 1, and i2 would have rank 2.
FPS and rank-based selection wouldn’t be any different.
Now, what would happen if i2 had a fitness of 2,000?
FPS would pick i2 waaaay more times over i1!
However, their ranks are still equal, so the probabilities of rank-based selection wouldn’t change.

Variation Operators

There are two main variation operators. One of which is mutation.
With mutation, we slightly change the properties of an individual.
How much we change by is completely random. Some might change a lot, and some might not change at all.

For example, we could mutate the sentence “methinks it is like a weasel” by going through each letter and changing it to a different letter under a probability of 1/28 (28 being the number of letters in the sentence).

Therefore, on average, we only change 1 letter per mutation, but it’s still possible for some sentences to change a lot and some to not change at all.

How volatile our mutations can be (mutation rate) is a set variable for our evolutionary algorithm that we can change. We can tweak it to maximise the performance of our algorithm.

Polydactyly; a rare mutation in humans

that gives us more than 5 fingers / toes.

The other kind of variation is crossover, or recombination.
This refers to using information from two or more individuals to make a brand new individual.
There’s three versions of this:

One-point crossover

Randomly select one point on the strain.
Everything above the point is from parent 1, and everything below the point is from parent 2.

Two-point crossover

Randomly select two points on the strain.
Everything to the left of the first point and to the right of the second point is from parent 1, and everything between the first and second points are from parent 2.

Uniform crossover

Randomly select genes from parent 1 and 2 with even probabilities for each.

Here is the outline of our evolutionary algorithm 2.0, with crossover:

Initialise all individuals of a population
Until (satisfied or no further improvement)

Evaluate all individuals

For i=1 to P; Quality(Pop[i]) → Fitness[i]; End

Reproduce with variation

For i=1 to P;

Pick individual j with probability proportional to quality

Pick individual k with probability proportional to quality

Pop2[i] = crossover(Pop[j],Pop[k]);

Pop2[i] = mutate(Pop2[i]);

End

Pop2 → Pop;

Output best individual

Variations on a theme

There are two main kinds of genetic algorithm:

Steady state

Offspring replaces one person of the population each time they are generated

Pick parents for reproduction
Generate offspring with variation
Put offspring back into population

Generational

For each generation, P offspring are generated for the next population

Repeat P times

Pick parents for reproduction
Generate offspring with variation
Put offspring into NEW population

Current population = new population

For some weird reason, academics like to argue about evolutionary algorithms.
So much so that they split into factions, or “evolutionary algorithm schools”.
Here’s a range of evolutionary algorithm schools:

GA: genetic algorithms

Binary representation
Crossover is important

ES: evolutionary strategies

Real-valued representations
No crossover
Evolved mutation rate

GP: genetic programming

For evolving programs
Specifically, LISP s-expressions

EP: evolutionary programming

Finite state automata
Mutation only

According to Richard, the schools GA, ES, and EP came together to form their own conference, but the one in charge of GP said no and continued doing their own thing.

Setting the parameters of an evolutionary algorithm is something of a black art.
There’s:

Population size

If it’s too big, you waste computation
If it’s too small, there’s no useful diversity

Mutation rate / crossover rate

If it’s too low, there’s not enough innovation
If it’s too high, there’s too much destruction

Choosing a good fitness function
Selection pressure

Fitness scaling via: Boltzmann selection, rank-based, tournament size, truncation fraction
Diversity maintenance methods

Evolution History

Nowadays, evolution and natural selection are pretty well-known and generally accepted.
But how did we get here? This section covers the history of evolution up to this point.

Pre-Darwinian Evolution

1735

Carolus Linnaeus published Systema Naturae.

It was a book that described all living things that Carolus had studied, and put them into categories, like animals and plants.

1758

Carolus Linnaeus published the 10th edition of Systema Naturae.
It set up the formal classifications that are used (with few changes) today.
With that, systematic biology took off (finding living things and figuring out what kind of a thing it is in Linnaeus’ book).

1785

James Hutton published Theory of the Earth, or an Investigation of the Laws Observable in the Composition, Dissolution and Restoration of Land upon the Globe.

Around 1800

Jean-Baptiste Lamarck put forward the theory of inheritance of acquired characteristics, or “Lamarckism”.
He did not see all life sharing a common ancestral tree.

An early proponent of evolution was Charles Darwin’s grandfather, Erasmus Darwin.
The main two problems back then was:

Religion
How can evolution create such complex organisms?

“Natural theology”, advocated by the likes of William Paley, used complexity of life as evidence for God.

1830

Charles Lyell published Principles of Geology.
The Biblical view of creation is starting to turn over.
The world is far older than traditionally believed.

Natural Selection

1831

Darwin sailed out on The Beagle (a boat).
He publishes The Voyage of the Beagle.
It raised questions which lead to the theory of natural selection.

When he came back, he had examined a collection of finches.

He thought they were all one species, but they proved to be many species.
The simplest explanation was that a single migrant had adapted to the habitat on different islands.

He came up with a mechanistic explanation: natural selection:

“Natural variations in populations that provide small

selective advantages will take over the population.”

It was strongly influenced by Thomas Malthus’ Essay on the Principle of Population (1798).

Darwin then spent the next 20 years collecting evidence.
During this, he published definitive work on cirripedes (marine invertebrates including barnacles).

Another naturalist, Alfred Wallace, who also thought Malthis was cool, arrived at the same theory of natural selection.
He was also inspired by Darwin’s Voyages of the Beagle. He had explored South America and Malaya.
After the last trip, he sent his ideas to his mentor, Charles Darwin.

Wallace agreed to publish jointly with Darwin.

1832

Darwin wrote On the Origins of Species.
Everyone accepted the theory of evolution because:

Darwin was popular
He had 20 years worth of evidence

He had a lot of arguments, including:

power of artificial selection by human breeders
population of isolated islands by chance migrations
relatedness of species in a tree like structure

He also addressed problems with evolution:

missing fossil record
difficulty of evolving complex structures (e.g. the eye)

People believed evolution, but not natural selection.
They didn’t think it could make complex structures.
Darwin wrote six editions of Origins of Species, the final editions accepting possibilities of mechanisms besides natural selection.

Darwin could never explain inheritance.
His problem was:

Under crossover, any variation should be averaged out.
How can there be enough variation for natural selection to work?

Gregor Mendel had the answer with his Experiments in Plant Hybridization (1865) (the theory of discrete genetics), but it was neglected until 1900.

Darwin’s Problem

Let’s say we have a population of tall and short people.

When a tall person and a short person have a kid, under reproductive crossover, that kid will be medium height.

After a while, the population of tall and short people will average out to medium people.

With that, how will there be enough variation in the population to achieve natural selection?

(Of course, this isn’t how genetics works, but they didn’t know that.)

Everyone was stuck arguing about natural selection for the next 70-100 years.

1882

After Darwin died, advocates for natural selection shut up for a bit.
Lamarck’s theory of “acquired characteristics” was revived.

They came up with another theory:

Large mutations called “saltations” produce dramatic changes to organism structure, which explains speciation.

This contrasted the gradual, small changes of conventional Darwinism.

When Mendelian genetics was rediscovered in 1900, it was seen as a blow to natural selection.
Basically, natural selection was on the ropes.
It took three young Turks to turn things around...

The three young Turks who turned things around.

These three are:

Ronald A Fisher
Sewell Wright
John B S Haldane

They read the Origin of Species and ignored the establishment thinking.
They said “wait... what if:

Natural selection occurs
and genetic inheritance occurs”

And thus, the modern synthesis was born.

The Modern Synthesis

A cool picture from Wikipedia illustrating modern synthesis

1930

Fisher wrote the book The Genetical Theory of Natural Selection.
It didn’t catch on until around 17 years later.
Haldane helped push these ideas into mainstream biology.

1960

Natural selection and the modern synthesis, often called neo-Darwinism, triumphed.
But people still argued.
Fisher and Wright argued.
Wright said inbreeding is important, and small populations were necessary to explore complex fitness landscapes.

People also argued about what causes speciation.
Saltationists put this down to large mutations.
Geographical isolation was considered one of the major causes of speciation.
Ernst Mayr says that speciation by isolation is one of the driving forces of invention by evolution (and says that small populations in peripheral locations are more likely to be sources of innovation).

1972

Eldedge and Gould proposed the theory of punctuated evolution.
This is evolution that consists of periods of rapid evolution in speciation events, followed by long periods of stasis.
It explains rapid changes found in the fossil record.

People also argue about what level selection operates at.
Richard Dawkin and his fanbase say it’s only at the gene level.
But it may be more than that... who knows?

Fitness Landscapes

Definitions

Genome / Chromosome: A vector / string of genes

e.g. a bit string
“001111”

Gene: one variable in a genome

The ‘a’ in “abcde”

Allele: The value of a gene

‘tall’, ‘blue’, 0, 1, A, a

Locus: The position of a gene in a genome

The locus of ‘e’ is further from ‘a’ in “abcde”

Genotype: A type of genome

The difference between a genome and a genotype is that genomes refer to each individual string of genes, and genotypes refer to what the genome represents (the semantic).
For example, in the population:

0001111
0001111
1001000
1100011

There are 4 genomes, but 3 genotypes.

A fitness landscape consists of:

A set of genotypes
A neighbourhood

We arrange genotypes in a space such that:

Similar genotypes are close
Different genotypes are far
e.g. neighbouring genotypes differ by a single allele

Height of surface in this space = Fitness of each genotype

By rendering a fitness landscape in a graph, we can visualise how changes in the genome affect fitness.
Here’s an example of a fitness landscape:

As you can see, some genotypes do not yield good fitness, and some yield really good fitness.
There are also hills that a population can climb to achieve better fitness.

Wright’s genotype sequence space is a representational DAG (directed acyclic graph) of all the different genotype sequences, and how they relate to each other.
Here’s a few examples:

These graphs go from left to right, and map out every possible derivable genome.
Each node represents a possible derived genome.
Each edge adds one new gene at a time.
The distance from each node represents the number of genetic changes to reach from the source genome to the destination genome.
The number of genes represent the number of dimensions a genotype sequence space has.

Alright, these Crystal Maze looking things are cool, but with lots of genes, there’s going to be lots of dimensions.
It’ll become really hard to visualise with lots of dimensions.

An IRL genotype sequence space, with people inside it.

Wright’s fitness landscape is like a normal fitness landscape, but represented with contours instead of a fancy 3D graph:

The contours represent fitness elevation in a two-dimensional fitness landscape.
The +’s represent peaks in fitness, and the -’s represent valleys in fitness.

Intuition

We said before that evolutionary algorithms are like hillclimbers, but with a population.
But how does that actually change anything?
Let’s have an example:

With hillclimbing, we have a single individual and we “climb up the hill” we are currently sitting on, in the fitness landscape.
But look! Just to the right of us, there’s an even higher hill.
Because hillclimbing only cares about the hill we’re on, we’ll miss the chance to get an even better maximum.

Because evolutionary algorithms have a population, and in that population we have diversity, we can go up different hills.
The higher hill has greater fitness than the lower hill, so the population going up the higher hill will eventually be favoured.
That doesn’t mean we’ll always get the best maximum though: at the start, the subset that is going up the taller hill could die out faster than the subset going up the smaller hill, giving us the smaller maximum again.

Properties

Fitness landscapes have several properties:

Ruggedness / smoothness

How similar neighbouring points are in terms of fitness
How “rough” or “spiky” the fitness landscape surface looks

Local optima, global optima

Global optima is the very highest fitness peak in the landscape.
Local optima are peaks, but not the highest peaks.

Basins

How likely is it I arrive at a particular optimum?
In other words, how likely, if I started from somewhere on the space, would I end up at this optimum?
If the spike has a greater base width, then it’s more likely that a random space would land on that spike.
So it also refers to the base width of the spike at the bottom.

Neutrality and redundant mappings

Is the gradient zero?
In other words, is the fitness landscape elevation completely flat?
If so, then all the surrounding genotypes map to the same fitness value.
This is also called a neutral plateau.

Paths of (monotonically) increasing fitness

Do I need to go down in fitness and cross a ‘fitness valley’?

One of the biggest problems of all stochastic local search methods is going from one peak to another (typically a higher peak).

In this two-peak fitness landscape, if we’re stuck at the local optimum, how can we reach the global optimum?
This is a problem with natural evolution in biology, too.

Epistasis

Now, let’s talk about epistasis.
Let’s look it up!

Gross... that’s not what I meant!
Let’s look at the next definition:

Uh... how about I just tell you instead?

Epistasis: the fitness effect of an allele depending on the genetic context of that allele.
It creates ruggedness in a fitness landscape.
Huh?
In other words, the change in fitness due to substituting one allele for another at one locus, is sensitive to the alleles that are present at other loci.
You can think of epistasis as the property of genes having “dependencies” on each other.
What?
Let me just show you an example instead...

Example: Epistasis are the node_modules of a genome.

Here is a fitness landscape with no epistasis:

Fitness values:

This has no epistasis because the change of fitness we’ll get by switching from A to a (or vice versa) is the same, whether we have b or B in the second gene.
Here’s the maths behind it:

(if b is our second gene)
(if B is our second gene)

The changes in fitness are the same: . That means there’s no epistasis.
You can also think of this as the gradient of the lines, from to and to , being the same.

Now, here is a fitness landscape with epistasis:

Fitness values:

Let’s try comparing the changes in fitness again:

(if b is our second gene)
(if B is our second gene)

Uh oh! The changes in fitness are not the same.
When our second gene is , our change in fitness is .
When our second gene is , our change in fitness is .
That means the change in fitness for the first gene depends on the allele of the second gene, meaning this fitness landscape has epistasis.

Examples: Synthetic Landscapes

There are two kinds of landscapes:

Synthetic

Artificial landscapes we design ourselves, for demonstration and testing purposes

Biological

Fitness landscapes based on real-world data

First, we’re going to look at some synthetic landscapes.
There’s many different kinds...

Functions of unitation

The name means ‘from many dimensions to one’.
One of these functions include counting the number of 1’s in a bitstring:

This is very similar to our “methinks it is like a weasel” coursework.
You can also tweak this slightly to make a “trap function”:

By having this fitness valley, it’s a great landscape to test EAs on.

NK landscapes

Before we go over NK landscapes, we need to cover epistasis networks.
An epistasis network is a graph that represents the epistasis in a genome.
The nodes represent the problem variables (or genes).
The edges represent dependencies / epistasis between variables.

An NK landscape is a fitness landscape with epistasis, with two variables controlling the level of epistasis:

- number of variables
- number of dependencies between variables

In other words, each variable has a fitness contribution that is a function of its own state and others.

Example: NK landscapes are Kim Jong-un’s favourite fitness landscape

We have this genome, and it’s got epistasis, where:

1	2	3	4	5	6	7	8
0	1	1	1	0	0	1	1

So that means each gene also depends on three other genes for their fitness.
The dependencies between genes are represented by colour.
As a graph, that looks like this:

When , there are no dependencies.
The landscape looks like one big mountain, like Mt. Fuji.
As increases, the landscape gets more rugged; more peaks in a smaller space.

Value of	Illustration

Royal Roads and Staircases

Royal Roads split the variables into sub-systems / groups / blocks, so the fitness only goes up if a whole sub-system is completed.
You can think of royal roads as special kinds of NK landscapes, where the dependencies are in adjacent groups.
Here’s an example with a couple of genomes with the counting 1’s:

Bit string	Fitness
1111 0000 0000 0000	8
0000 1111 0000 0000	8
0000 0000 1111 0000	8
0000 0000 0000 1111	8
1111 1111 0000 0000	16
0000 0000 1111 1111	16
1111 0000 1111 1111	24
1111 1111 1111 1111	32

The epistasis networks of royal roads have clear separations between the groups:

With royal roads, you can solve any of the groups in any order.
But with royal staircases, you must solve them in a certain order.
A royal staircase is a special kind of royal road where a completed block only contributes fitness if the blocks before it are all completed also.
Here’s another example with counting 1’s:

Bit string	Fitness
1111 0000 0000 0000	8
1111 1111 0000 0000	16
1111 1111 1111 0000	24
1111 1111 1111 1111	32
0000 1111 0000 0000	0
0000 0000 1111 0000	0
1111 0000 1111 1111	8
1111 1111 0000 1111	16

Why are they called royal staircases?
It’s because their fitness landscapes look like stairs.

Each fitness “step” is a neutral plateau.
Going from one neutral plateau to the next is called “portal genotypes”.

Long paths

An adaptive walk is the path a population takes from a point on the fitness landscape up to a peak, where fitness is highest for that hill.
What’s the longest an adaptive walk could be?
For a problem like counting 1’s, it would be , because you’re linearly setting each gene to 1.
Is there a problem where the length of an adaptive walk is greater than ?

Given only one gene changes per generation, we can construct a fitness landscape whose adaptive walk length is greater than .
Here’s a small example where :

Bit string	Fitness
00000	1
10000	2
11000	3
11100	4
11110	5
11111	6
01111	7
00111	8
00011	9
10011	10
...

These are called “Long Path Problems”.
Usually, the path needed to get to the optimum is exponentially long.
This is because lots of genes have to be changed, and then reverted, and then changed again etc. to get to the optimum.

In some situations, landscapes are just too high-dimensional for us to visualise.
How do we “probe” some useful information in these situations?
There’s two methods:

Adaptive walks from a peak

By measuring how long it takes to walk to the peak of a hill, we can figure some things out.
For example, if the adaptive walk length is really small, is likely to be really high.
Or, if it’s greater than , it’s likely to be a long path problem.

Fitness distance correlation

Plotting fitness against Hamming distance in scatter plots can help visualise the shape of the fitness landscape:

Examples: Biological Landscapes

A biological example of a fitness landscape would be beak size:

Different beak sizes match different seeds, hence why there are multiple peaks.
This graph is in trait space, which means the axis are traits of the individual, such as bill depth.

But, if we think about this for a second, we realise things aren’t quite so simple.
What actually affects these “fitness” values?
There’s a whole sequence of things that derive fitness for individuals:

Nucleotide sequences (ACTGGGGACAC)
→ Amino acid sequence
→ Protein shape
→ Protein function
→ Trait value
→ Fitness contribution

That’s a lot of steps!
Evolution only really affects the nucleotide sequences, so those are the things we should be graphing.
But that’s so complex, that we’ll be spending the rest of our lives drawing one graph!
Plus, that fitness contribution is still epistatically dependent on other proteins.
So, what? We cry and switch modules?
Not today! We create small windows on real landscapes.

Complications

When we’re trying to reason with real-life evolution, we can’t map out an entire fitness landscape.
Instead, we “peek” and look at small windows of the real landscape.
How?
There are different ways to do this.

One way is looking at populations of different gene mutations:

... and then constructing accessible evolutionary paths from that data:

Another way is a series of one-step peeks into a high-dimensional space.
So, instead of mapping the whole landscape, we map points with a radius (like a flashlight in a horror game):

We can even use protein folding.
These long protein strings ultimately determine fitness.
We can “fold” these proteins, and each protein folds in a different way.
They can fold well, or they can fold... not so well.
If the protein folds the way it should, that means that protein has a low fold energy, and if it doesn’t fold the way it’s supposed to, that protein has a high fold energy.
We can map this fold energy to fitness.
We want proteins that fold properly, so a low fold energy is high fitness, and a high fold energy is low fitness.

It gets even more complicated...
There’s the whole Wright vs Fisher and the relevance of epistasis.
Things like:

Spaces and operators
Individuals vs Populations
Sexual populations vs Asexual populations
Noise and dynamic landscapes
Coevolution
Designing landscapes

Fisher thought:

“If we’re on a hill anyway, we’re always going to go uphill.”
“Other peaks and optima do not matter.”
“So epistasis doesn’t matter, either.”
“Dumbass.”

Wright thought:

“Epistasis and local optima are a big deal for natural populations.”
“They’re biologically significant too, dickhead.”

Fisher and Wright, hard at work.

How do populations go from one peak to another?
To do that, we need to partially ignore the local fitness gradients.
This can happen in small populations (or multi-dimensional populations).

We can represent populations in a certain way.
We can represent it as a point in an allele frequency space, which refers to the set of all possible vectors that denote the frequency of alleles in each loci of the gene.
Movements due to selection in sexual populations move to adjacent points in this space... sometimes.
Doesn’t make sense? Not a problem, here’s an example:

Example

0	0	1	1	1	1
0	0	1	1	1	1
1	0	1	0	0	0
1	1	0	0	1	1
2	1	3	2	3	3

The blue rows are the genes, and are bit strings.
At each column, the value in red is the sum of the 1 values in blue.
So the first column is ‘2’ because there are 2 1’s in the column above.
The red row is our allele frequency vector.
It is our representation of this population in our allele frequency space.

You know, all this time we’ve been assuming that distance = genetic differences (Hamming distance).
Like, the genotype AAB has a distance of 1 away from ABB.
But what if that’s not strictly true?
For example, could you get from:

000000000000000000

000000000111111111

in just one step?
With mutation, no.
With crossover...?

000000000000000000 +
111111111111111111 =
000000000111111111

So, we can get from that genome to that other genome in one move!
If that’s so, then they should have a distance of 1, right?
That means they should be neighbouring, right?
Terry Jones, back in 1995 when he was writing his PhD, would say “yes”.
Richard would say “no”, though.

So as you can see, the representation of a genome is very important.
It determines what movements are natural and what fitness landscapes exist.
When you design EAs, you need to choose your encoding wisely.

Sex

Just quickly, let’s go over the two-fold cost of sex, or the cost of males.
Let’s say individuals can produce 4 eggs per generation.
Asexuals can produce 1 → 4 → 16 → 64 eggs in 4 generations
With sexuals, it’s halved, because half of individuals are males who can’t produce eggs.
So sexuals can produce 1 → 2 → 4 → 8 in 4 generations. Far less!
In summary, males contribute genes but they don’t increase egg production or population growth.

So, why do we have males?
It’s a bit more complicated than just maths.
Males also provide parental investment.
Plus, egg production in females might be increased to compensate, so egg production is the same after all.
But why do humans sexually populate at all? Why don’t we asexually reproduce?
Let’s find that out...

There’s two ways to make new individuals with sex:

Reproduction
Recombination

With reproduction, each parent has two diploids.
Their sex cells takes one of the two strands, called haploids.
When they “have sex”, a haploid from ‘daddy’ and a haploid from ‘mummy’ join together to form a diploid again: a new individual.
We’re not really making any new genes; we’re shuffling them around.

With recombination, we make new genes for the children.
There’s two kinds of recombination:

Chromosomal reassortment: Chromosomes are split and shuffled in hard-coded cutoff points that are the same every time

Crossover: Chromosomes are split and shuffled in randomly selected cutoff points

Linkage Equilibrium

When you have crossover, the chance that two genes are still together after recombination depends on their physical linkage.
Physical Linkage: the tendency for alleles to travel together during recombination due to physical proximity on the chromosome.
In uniform recombination there is no physical linkage, because all genes are chosen at random.

In one-point crossover, there is physical linkage, because adjacent genes on one side of the cutoff point remain together.

When you do recombination, you can either join alleles:

Ab x aB → AB (or ab)

... or separate alleles:

AB x ab → Ab (or Ba)

How do the frequencies of these combinations change?

Linkage Equilibrium: joint frequencies of alleles are product of marginal frequencies
In other words, we say that there is no over (or under) representation of combinations of alleles.
Using maths, it means f(AB) = f(A) * f(B)

f(A) means the frequency of ‘A’ genes in the chromosomes

Linkage Disequilibrium is the deviation of the observed frequency of gene pairings from the expected frequency.
Basically, it’s any deviation from linkage equilibrium.

What does that really mean?
Let me use an example from Wikipedia.

Example: Two-loci and two-alleles

Let’s say you have a chromosome with two loci (positions) and two possible alleles for each loci:

Let’s say the frequencies of getting each of these are:

Haplotype	Frequency
ab
aB
Ab
AB

Those are the frequencies of the haplotypes (frequency of the combination of alleles together).
Here’s the frequencies of the alleles individually:

Allele	Frequency
a
A
b
B

If the loci and alleles are independent of each other, then you could say that:

In other words, the frequency of gene ‘ab’ is just the frequency where the first loci picks ‘a’ and the second loci picks ‘b’.
That’s linkage equilibrium.

However, linkage disequilibrium is where the loci and alleles are not independent of each other, and observed frequencies like don’t follow this pattern.

Remember physical linkage?
Uniform crossover has no physical linkage because random loci are picked for the child.
That means there’s no way for two joint alleles to be over or under represented, because they have an equal chance of being selected or not selected.
So uniform crossover is really good at returning alleles to linkage equilibrium.

However, what about one-point crossover?

If the joint alleles are close together, they’ll have high physical linkage, so they’ll be over represented.
If the joint alleles are far apart, they’ll have low physical linkage, so they’ll be closer to linkage equilibrium.

How do we get out of physical disequilibrium?
Have sex! (use sexual recombination, I mean)
A benefit of sexual recombination is that it reduces linkage disequilibrium.
Under-represented combinations are created more than they’re destroyed.
Over-represented combinations are destroyed more than they’re created.

Why do organisms have sex? Why can’t they just asexually mutate for everything?
What’s so good about reducing linkage disequilibrium?

Fisher / Muller Hypothesis

Let’s take our two-loci two-allele example again.
Let’s say we have a big population of them.
Upper-case alleles are more beneficial than lower-case ones.
However, through mutation, individuals can only go from:

ab → Ab or
ab → aB

This is linkage disequilibrium: the frequency of AB does not correlate with the frequency of A and B individually because mutation can’t get to AB.

With an asexual population, we get:

Individuals mutate and get Ab and aB, but since they can’t mutate to be both, there’s competition.
Eventually, Ab wins, and the beneficial B allele dies out.
However, with a sexual population:

With crossover, we don’t need mutation to reach AB; an Ab could mate with an aB and get an AB!
This removes competition between the two segregating alleles.
In other words, sex is advantageous because it allows alleles to be selected for independently.

Let’s look at this with bit-strings.
Let’s say you have two initial genotypes:

00100000000010000001 (3)

00000100000010001000 (3)

With asexuals, you can only produce more of these two genotypes.
Yeah, there’s mutation, but that’s quite slow.
But sexuals can produce all of these:

00100000000010000001 (3)	00100000000010000000 (2)
00100000000010001001 (4)	00100000000010001000 (3)
00100100000010000001 (4)	00100100000010000000 (3)
00100100000010001001 (5)	00100100000010001000 (4)
00000000000010000001 (2)	00000000000010000000 (1)
00000000000010001001 (3)	00000000000010001000 (2)
00000100000010000001 (3)	00000100000010000000 (2)
00000100000010001001 (3)	00000100000010001000 (3)

Way more variation!
Variation is good, because it means we can explore the search space better.

Muller’s Ratchet

Muller’s Ratchet is like the opposite of the Fisher / Muller hypothesis.
Instead of good alleles coming together, this is about removing bad alleles.

Let’s say you have a population of good genotypes.
That means almost any mutation would make the individual worse than better.
Like in the “methinks it is like a weasel” where you have a sentence that’s only one character off from being perfect.

With asexuals, mutations that bring the fitness down cannot be recovered (at least not easily). That’s like a ‘click’ of the ratchet you can’t go back on.
But with sexuals, you can cross two half-unmutated individuals to recover the original unmutated individual.

This assumes that:

Populations are finite
No elitism
No back mutations
No beneficial mutations (mostly because all genotypes are good)

This is cool because deleterious mutations are more common than beneficial ones.
However, this only applies to finite populations, and according to Charlesworth, the time for a ratchet click is very long in large populations.

💡 NOTE

Richard skipped over the deterministic mutation hypothesis, so I won’t go over it in the notes.

Let’s hope it doesn’t come up in the exams!

Ok sex is cool and all but what about from a computing point of view?
Is using sex any better than a simple hill-climber or something?
The examples I used before had no local optima, so a hill-climber would make all our problems go away, plus it’d be easier to implement.
In the next section, we’ll go over why sex is good for computer scientists too.

Crossover in EAs

Focussing Variation

Let’s have a look at uniform crossover again.
Let’s say we’re uniform crossover-ing these two bit-strings:

0	0	1	1	0	1	0	1
0	1	1	1	0	0	1	1

Have a look at the first bit. It’s always going to be zero, won’t it?
No matter which bit-string uniform crossover picks, that bit will always be zero.
All bits where the two individuals agree on the same bit won’t change:

0	0	1	1	0	1	0	1
0	1	1	1	0	0	1	1
-	?	-	-	-	?	?	-

What uniform crossover is effectively doing is just mutation on those bits where the individuals are different (where I’ve put the green question marks).
This concept is called focussing variation, or ‘focussing search’.

I know it’s hard to, but you must focus!

Let’s have a look at the Hurdles problem.

The X axis is the number of zeroes.
The Y axis is the fitness.
You’ll notice that there’s dips when the number of zeroes is odd, and peaks when it’s even.
Let’s say we have an individual on a peak. How do we get to the next highest peak?
For example, let’s say, we have 18 ones and 2 zeroes.

11111011111101111111

Let be the length of the bit-string.
How would asexuals (mutation) vs sexuals (uniform crossover) fare with this problem?

Asexuals

Sexuals

Let’s say each bit has a chance of being mutated.

We have two zeroes we need to mutate to get to the next peak.

However, we can’t just mutate one of the two zeroes, because we’d fall into the dip, get worse fitness, and this individual will die out.

We must mutate both zeroes at the same time.

The chance of that is , or .

To generalise that, the chance of mutation getting to the next peak is , where is how big the hurdle is.

Let’s say there’s two individuals at the two zeroes peak:

11111111111101101111

11011011111111111111

Remember, uniform crossover is the same as mutation on the bits that are different:

11111111111101101111

11011011111111111111

--?--?------?--?----

If we were to do mutation on those differing bits, we’d need 4 successive ones to get to the next peak.

The chance of that is .

To generalise that, the chance of uniform crossover getting to the next peak is , where is how big the hurdle is.

Sexuals are way more likely to solve the problem quicker than asexuals!
The chance a sexual solves the problem does not depend on the length of the bit string, but asexuals do.
In other words, sexuals are way better at “jumping”, or performing better in fitness landscapes with gaps.
Can hill-climbers do that? Pfft... no.

Let’s compare one point crossover with uniform crossover.
We have two bit-strings:

0	0	1	0	0	0	0	0	0	0	0	0	1	0	0	0	0	0	0	1
0	0	0	0	0	1	0	0	0	0	0	0	1	0	0	0	1	0	0	0
-	-	?	-	-	?	-	-	-	-	-	-	-	-	-	-	?	-	-	?

With uniform crossover, you can make up to 16 different bit-strings from these two.
With one-point crossover, you can make 3 unique bit-strings.

That sounds like crap, but hear me out.
The chance of getting each bit-string in one-point crossover is (on average; assuming all differing genes are equidistant).
Let’s say that both uniform crossover and one-point crossover gives you the children you want.
Which one are you gonna pick:

The one with exponential complexity or
The one with polynomial complexity

You’re going to pick the polynomial one, right?

If one-point crossover is going to give you the children you want anyway, you might as well use that and get the individuals you want faster than with uniform crossover or mutation.
Yeah, uniform crossover gives more variation, but it’s variation you don’t need. The group of resulting individuals will be ‘diluted’ with useless individuals of low fitness.

What’s the difference between:

the individuals you’d get from one-point crossover and
the individuals you’d get from uniform crossover?

Differing bits that are close together are unlikely to be recombined.
Here’s an example:

0	0	0	0	0	0	0	0	0	1	0	0	0	0	0	0	0	0	0	0
0	0	0	0	0	0	0	0	0	0	1	0	0	0	0	0	0	0	0	0
-	-	-	-	-	-	-	-	-	?	?	-	-	-	-	-	-	-	-	-

With one-point crossover, what’s the chance of getting a child with two ones?
It’d be very low, because the cutoff point would have to hit that sweet spot to get the one from the red parent and the one from the blue parent.
What about now?

0	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	1	0
-	?	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	?	-

Now there’s a huge chance that the bits are recombined, because there’s a much bigger window for the cutoff point.
In fact, it’s unlikely that they won’t get recombined.

In summary, two close differing bits are unlikely to both change because of their physical linkage.

This is the bias of one-point crossover.
What if we could use this, though...?
What if the bits we want to keep the same are kept close together?

Schemata and Building Blocks

Schemata Theorem

A schema is a general representation of a set of individuals that share common bits.
For example, given this set of bit-strings:

1	1	1	1	0	0	1	0	0	1	0	0	0
0	1	1	1	0	0	1	0	1	0	1	0	1
1	1	1	1	1	0	1	0	0	1	0	1	1
1	1	1	1	1	1	0	1	0	1	1	1	1

The schema is:

If you’re sick of these coloured tables already, it’s this:

*1010100*******0**

The genes that are the same for individuals in the schema are represented as ‘0’ or ‘1’.
The genes that can be anything for individuals in the schema are represented as ‘*’.

The order of a schema is how many specific genes there are in the schema.

For example, **1**0*1* has an order of 3.

The defining length is the distance from the first specified gene to the last specified gene.

For example, **101** has a shorter defining length than 1***0.

You can also say “shorter order” if you have a small defining length.
The schema fitness is the average fitness of all genotypes that fit that schema.

Building Block Hypothesis

Holland (1975), Goldberg (1989) and others said that schemas are the reason why GAs work so well.
They say that:

GAs find small schemas
Crossovers put small schemas together to make bigger ones
It keeps making bigger schemas until the whole problem is solved

These are Building Blocks: low-order, short defining length schema of above average fitness, that can be put together with crossover to make bigger building blocks.

Mitchell and Forrest set out to test this.
They created an idealised problem with a special kind of building block.
You’ve seen it before: the Royal Road!

Bit string	Fitness
1111 0000 0000 0000	8
0000 1111 0000 0000	8
0000 0000 1111 0000	8
0000 0000 0000 1111	8
1111 1111 0000 0000	16
0000 0000 1111 1111	16
1111 0000 1111 1111	24
1111 1111 1111 1111	32

As you can see, if you had:

1111 0000 0000 0000 (8) and
0000 0000 1111 1111 (16)

You can put these two building blocks together with crossover to form:

1111 0000 1111 1111 (24)

Mitchell and Forrest did this, but it’s still nothing a hill-climber can’t do.
In fact, a hill-climber does this better than a GA!
Plus, it’s a simpler algorithm!
How embarrassing...

So what can GA do better than hill-climber?
Well, that whole ‘sexuals being good at jumping’ thing from earlier was pretty good.
That’s from Shapiro and Prugel-Bennet (1997)’s ‘two wells’/’basin with a barrier’, and Thomas Jansen (2001)’s ‘Gap’/’jump’ function.

Also, Watson himself made a hierarchical modular building-block problem (called H-IFF) where hill-climber can solve it in , whereas GA with tight linkage can solve it in .

That’s great and all, but so what?
Who cares? Do YOU care?
GAs can solve some things better than hill-climbers, but...
... it’s not a standard GA, or even a ‘standard’ problem.
We’re just solving problems we’ve invented ourselves.
We’re not solving any practical engineering problem, and it has nothing to do with biology.

Can we show that GAs are the best thing ever in a more simple model?

Two-Module Model

We’re going a bit biological in this one.
Nucleotide sites within a gene are grouped both functionally and physically by virtue of shared transcription and translation.
In other words, this is a simple form of biologically-real modularity.

		gene 1 (determines how strong you are)										gene 2 (determines how smart you are)
		1	1	0	1	0	0					1	1	1	1	1	1

Obviously, we want to be strong and smart 😎 so we’d want to maximise the ones in both these genes.

Let’s say that mutations in the same gene have stronger synergy than mutations in different genes.
In other words, fitness increases exponentially with each good mutation; the closer we are to a perfect gene, the more important the mutations are.

You wanna see the fitness functions of this?

Fitness Function	Description
	This fitness function only takes into account gene 1. As you increase the number of beneficial mutations in gene 1, the fitness of the whole thing exponentially increases.
	This fitness function takes into account gene 1 and gene 2. As you increase the number of beneficial mutations in gene 1 and gene 2, the fitness of the whole thing exponentially increases. Notice there’s a spike where gene 1 and gene 2 are perfected. That’s our global optima! We want our individuals to reach it. The problem here is that a hill-climber can solve this really easily. It can just go up the slope of gene 1 or gene 2, and then slide along the edge of the landscape towards the global optimum. What’s Richard gonna do about that?
	He added noise! This isn’t a fitness function, but whatever. This is the noise he added. More specifically, it’s random epistasis among all loci. It was generated with random fitnesses between 0.5 and 1.
	This is what the fitness function looks after overlaying (by multiplying) the noise. Now, on the ridge of the fitness landscape, there are many small peaks, or local sub-optima. Hill-climbers are going to get stuck in there.

So, who’s going to win this one?
The hill-climbers?
Or the genetic algorithms?

Hill-Climbers

Genetic Algorithms

A hill-climber can easily find the peak of one gene.

The problem is finding the peak of the other gene without disrupting the first one.

Given the genes are loci long, it’ll take a hill-climber steps to get to one peak.

The hill-climber will have to mutate just the right genes to ‘jump’ to the highest peak.

It’ll take steps to do that.

Therefore, hill-climbers can solve this in exponential time.

Like hill-climbers, GAs can find the peak of one of the genes easily.

It takes steps to reach one of the peaks.

However, once two individuals reach the peaks of the other two genes, they can crossover and make a child at the global peak. This only takes one (lucky) step!

Therefore, GAs can solve this in polynomial time.

Richard did an experiment to prove this.
Go see it; it’s the flip-book PDF thing he did at the end of the 6th week Tuesday lecture. You can find a link to that in the Recorded Lectures section. It’s entertaining!
Basically, the results were:

Hill-climbers reach the individual gene peaks, but couldn’t find the global optimum, even after 1000 generations.
GAs with uniform crossover are similar: they struggle to find the global optimum.
GAs with one-point crossover find the global optimum at generation 58.

Richard’s 3000-page report PDF on the two-module model

In conclusion: if you want to be strong and smart, use one-point crossover.

One small footnote:

In GAs:

Mutations on genes finds good building blocks
Recombination on building blocks finds good genotypes

In biology:

Mutation on individual nucleotides finds good genes
Recombination on genes finds good genotypes

Coevolution

There are two definitions of coevolution:

sensu lato (broad):

“Evolution as a result of biological interactions”

sensu stricto (narrow):

“An evolutionary change in a trait of the individuals in one population in response to a trait of the individuals in a second population, followed by an evolutionary response by the second population to the change in the first”

The main thing to take away from this is reciprocity.
Hence, coevolution can also be called reciprocal evolution.

What’s the difference between normal and reciprocal evolution?

Normal evolution

Coevolution

With normal evolution, the agent of selection does not change as a result of the evolutionary response it causes.

In other words, individuals are always selected based on the same criteria, no matter what has happened.

With coevolution, the agent of selection does change as a result of the evolutionary response it causes.

In other words, individuals are selected based on the situation and what’s happened so far in the algorithm.

^ Positive feedback mechanism

In layman’s terms, with coevolution, you have two (or maybe more) populations that engage in “arms races” to improve and increase in fitness together.
Think of it this way:

When you catch a cold, it sucks. But it doesn’t last forever; eventually, you recover from it. Your body remembers that strain of cold so you won’t catch it again.
However, some of the cold viruses in your body survived; these survivors will go on to be the next generation of stronger cold viruses, that’ll one day come back around and infect you again.
This cycle continues, and forms the basis of competitive coevolution.

Missed opportunity for a covid joke? Maybe.

Pairwise vs Diffuse

In pairwise coevolution, we just have two homogenous populations that try to one-up each other. Similar to the common cold example from before.

Here’s the example from the slides:

We have antelopes and a field of grass.
The antelopes eat the grass. Oof.
The grass doesn’t like being eaten, so it evolves to be tougher.
The antelopes have a harder time eating it, so they evolve harder teeth.
This pairwise arms race continues on.

In diffuse coevolution, we have multiple populations on both sides, and one population could affect any of the other populations, directly or indirectly.

Here’s an adaptation of the example from the slides:

We have red, blue, and green antelopes, as well as orange, yellow, and magenta grass.
A case of diffuse coevolution could be:

Let’s say the red antelopes eat the orange grass.
The orange grass starts to evolve a poison that makes the red antelopes sick.
The red antelopes stop eating the orange grass and start eating the yellow and magenta grass, and some of them evolve stronger stomachs.
However, a blue antelope has tasted the orange grass and finds it delicious. The poison for the red antelopes is a taste enhancer for the blue antelopes!
Now, all the blue antelopes stop eating the other grasses and only eat orange grass.
The orange grass evolving poison has affected:

Red antelopes: they start eating other grass / evolve stronger stomachs
Blue antelopes: they only eat orange grass now. New diet could affect them in other ways
Yellow and magenta grass: they are no longer being eaten by blue antelopes, but they are being eaten by red antelopes, so the pressure to evolve defence mechanism will change to suit red antelopes

Sasaki & Godfray Model

Let’s say we have two populations: parasites and hosts.
The hosts have a resistance to parasites.
The parasites have a virulence (counter-resistance) to hosts.
However, resistance and virulence is negatively proportional to fitness.
So hosts and parasites need to “spend” some of their fitness to increase resistance / virulence.

They enter a cycle, where:

Hosts increase in resistance so parasites won’t infect them
Parasites increase in virulence so they can infect hosts
They both reach a point where they can’t increase resistance and virulence any longer
Hosts give up increasing resistance, because it doesn’t matter either way anymore: virulence is so high that high resistance and low resistance doesn’t matter, so we might as well have low resistance so we get our fitness back.
Parasites give up increasing virulence for the same reason: they don’t need so much virulence anymore, so they get fitness back.
The cycle continues again.

Heat maps when resistance and virulence have low fitness cost

Heat maps when resistance and virulence have high fitness cost

Coevolution in GAs

In a normal GA, you have a population of individuals and a fitness function that takes one individual.

In coevolution, you have two populations, and two fitness functions that return the fitness of an individual given the other individual in the other population.

Can you answer this question: how good are you at chess?
It’s hard, right? It depends.
How about this question: how good are you at chess compared to a grandmaster?
What about this: how good are you at chess compared to a complete novice?
This is subjective (internal) fitness: how good an individual is compared to a sample of individuals in the other population.
There’s also objective (external) fitness, which is the raw fitness of an individual, but this isn’t needed for coevolution; it’s mostly an analysis thing.

Have you heard of Watson’s Gambit? It’s where you

make a coevolutionary algorithm play for you.

There are different kinds of fitness evaluations:

Asymmetric

Controllers: e.g. pursuer / evader
Instances / classifiers: e.g. lists / sorters
Symbol production and understanding: e.g. communication

Symmetric

Two player games: chess, drafts, backgammon

Other

More complicated games, like poker

To calculate subjective fitness for an individual, they need to play against opponents.
Who should they play against?

Everyone? That’s expensive...
One opponent? The results will be noisy and inaccurate...
A sample? Now you’re talking!

So, you’ve co-evolved a population and now they’re all kick-ass chess players.
But is there really one best individual who can beat everyone?

What if you had one individual who was good at winning by being aggressive...
... and another individual who was good at winning by planning ahead?

Surely, different individuals will only do well against different opponents.
If the population represents a diverse set of strategies, how should scores against many players be combined? 🤔
... what? You’re asking me?
I don’t know. I just write the notes.

Why use Coevolution

There’s several reasons why:

Implicit fitness function

You don’t have to explicitly define a fitness function.
For example, you don’t have to define what makes a good chess player.
Each population defines the problem for the other, and the fitness is determined by the rules of the game.

Gradient engineering - arms race

Even if you did have a fitness function, how would you know if it provided an appropriate gradient?
In other words, even if you did define what makes a good chess player, how would you know if individuals would be able to evolve and climb their way to being a grandmaster?
With coevolution, the populations gradually increase in difficulty automatically.

Open-ended improvement

Even if you did have a fitness function and it had a good gradient, would it be open ended?
In other words, when an individual maxes out your fitness function, are they really the best they could possibly be at chess? Or is there still room for improvement?
Because coevolution doesn’t define a strict fitness function, the populations are free to explore the limits of the game itself.
The results may surprise you!

There’s more examples of coevolution:

Box grabbing creatures

Input list creator vs list sorter

Cellular automata classifiers vs initial conditions

Coevolution has so much freedom that, sometimes, it might not evolve what you want it to evolve.
What do you actually want?

A player that beats every possible strategy?
A player that would rather win and lose than draw all the time?
A player that gets the best average score against all players?
A player that writes the notes for you? (I wish...)

Problems with Coevolution

It’s not the best thing in the world, though.
There are problems with it:

Disengagement / loss of gradient

This is where one population beats all individuals of the other population, so nobody ever improves.
Imagine playing against a grandmaster chess player.
Assuming you’re not a grandmaster also, you’re not gonna learn anything because you’ll just get your arse handed to you.
The grandmaster won’t learn anything either because it’ll be easy for them.

Overspecialisation

This is where a population’s strategies are too specific to the opponents they’re playing.
If you do machine learning, this is very similar to overfitting.
Imagine you play against a player who always plays aggressively.
The only strategy you’ll learn is how to hammer someone who plays too aggressively.
So when you play against a different opponent, your strategy won’t work so well, and you’ll be crap.

Intransitivity

This is where populations follow a cycle of strategies.
Think rock paper scissors:

Rock → Scissors
Scissors → Paper
Paper → Rock

If pop1 all plays rock, pop2 will all play scissors.
Now that pop2 all plays scissors, pop1 will all play paper.
Now that pop1 all play paper, pop2 will all play rock.
This cycle will continue forever and no populations will actually reach a good optimum.

Red queen

This is where populations collude to be mediocre together.
If both populations get the same scores whether they try hard or not, why should they try hard?
Both populations will just play mediocre and get the same scores.
These are called mediocre stable states.

Cooperative Coevolution

I wanna talk about bumblebees.
Aren’t they cool? They go into flowers and collect nectar, and create honey.

Me busting my ass writing these notes

The system with the whole flowers producing nectar to entice bees while also spreading pollen is a pretty cool thing.
Almost too cool...
How did they get like that?
Did bees and flowers hold a meeting to decide on a mechanism that benefits them both?
No! Bees and flowers evolved like this through cooperative coevolution!

What is Cooperative Coevolution

Cooperative coevolution is where individuals from a number of populations collectively solve some problem together.
So, rather than chess players, hosts / parasites, pursuers / evaders etc., we have multiple agents working on different parts of the same problem.

With competitive coevolution, the fitness function returns the fitness of the individual in the context of an opponent individual.
With cooperative coevolution, we have a “shared domain model” and we often have more than two populations.

When we find the fitness of an individual, we pick the best individuals of every other group to calculate it with.

However, the main difference is how we interpret the whole thing.

Competitive vs Cooperative

In competitive coevolution:

The solution to the problem is an individual from one of the populations

In cooperative coevolution:

The solution to the problem is a collection of individuals, one from each population

However, the individuals don’t know the difference!
It’s not like we tell them “ok you gotta cooperate with these lot”
Individuals perform exactly the same whether it’s competitive or cooperative.
The only thing that’s different is how you interpret the results: get a solution from one individual, or a group.

In both cases, individuals’ fitnesses are sensitive to the genetics of the other populations.
Competition only exists within populations.
When you think about it, opponent individuals simply “set the scene” or “change the criteria of success” for their competition.
In other words, population 2 is just the (dynamic) environment population 1 evolves with respect to.

Example: Opponents “setting the scene”

Let’s say we have two populations.
We want to find the fitness of an individual in population 1, named .
We sample one opponent to calculate this with (crap, I know, but whatever), and we select the opponent .
So we can calculate fitness like this:

Easy, right?
Well, let’s change the way we look at it.
What if we made a new function:

You could say that, in this very instant, the individuals in population 1 are evolving with respect to the fitness function .
Our opponent, , simply plays a part in defining the fitness function of population 1.
Do you see how opponents “change the criteria of success” for their competition?

In summary, the mechanics behind competitive and cooperative coevolution is the same.
The only difference is how you interpret the results.
Here’s an example:

Let’s say you’re doing coevolution against two badminton players.
What do you want out of this algorithm?

The best badminton player for singles?

Then it’s competitive; pick an individual from either population.
This is adversarial coevolution.

The best badminton pair for doubles?

Then it’s cooperative; pick the best individuals from both populations.
This is compositional coevolution.

Badminton yeeaaaaa 🏸🏸🏸

Divide and Conquer

Competitive coevolution is about gradient engineering.
Ensuring that populations are as good as each other, so they can improve on one another.

However, cooperative coevolution is different.
Cooperative coevolution is about problem decomposition.
We want each population to focus on a subproblem.
Their fitness is represented by how well each individual’s sub-solutions come together to collectively solve the whole problem.
Sound familiar? This is the divide and conquer approach in general computer science!

Divide and conquer is used everywhere, including:

Designing cars 🚗
Designing space shuttles 🚀
Designing processors 💻
Managing businesses 💼
Town planning
Algorithm design (especially modular ones)

You can even apply this to optimisation, if it’s modular enough:

Train schedule timetabling for each city independently
Postal routes
Optimising a design by optimising sub-systems

Is there a case where divide and conquer doesn’t work?
Yeah: consider the prisoner’s dilemma. It’s a classic example of how optimisations of subproblems does not always constitute the optimisation of the whole problem.

If we evolve prisoners independently, they’ll both evolve to betray each other.
However, if we evolve pairs of prisoners (basically modelling a pair of prisoners as a single individual), they’ll evolve to ally.

If you’ve done Intelligent Agents and are doing Game Theory,

you’re probably sick of prisoner’s dilemma by now.

So, divide and conquer will only work if individual optimisations of each part provide optimisation of the whole.
But how do we actually split up a problem? That’s the real question.
We’ll go into more detail next topic, in Modularity!

When you think about it, cooperative coevolution is very much like building blocks and crossover:

Problems are decomposed into subproblems
Subproblems are put together to solve the bigger problem

There’s a big difference though:

Building block sub-problems are independent, whereas in cooperative coevolution, they might not be independent (sub-problems might only make sense in the context of the bigger problem).
There might be no way of evaluating a sub-solution on its own.

Modularity

Types of modularity

There are two kinds of modularity:

Repetition

Where parts of the evolved structure have repeated modules

Examples:

Fingers on a hand
Components on a circuit board
Cells of a multicellular organism

Remember this?

Functional decoupling

Where different modules have different roles in the bigger system.
Examples:

Nervous system and digestive system
Peripherals for a computer
Procedures in software code
Proteins

Modularity implies re-use.

Repetition→ re-use in a single device / organism
Functional decoupling → re-use in other devices / variants

Either way, there is self-contained functionality.

Modularity reflects the inherent structure of the problem space.

Repetition implies self-similarity in sub-problems.

For example, say you had the problem of driving 10 cars to Southampton.
You could split this problem up with repetition: the problem of driving 1 car to Southampton, but 10 times.

Functional decoupling does not imply this.

Identifying modular subnetworks in bigger, complex networks is useful for understanding how it all works (network dynamics).
It may look like this:

Modular subnetworks in a complex network

This is an example of structural representation: with connections between modules.

In evolving complex systems, we can identify modules based on how much they affect other parts of the system.
For example:

If we modified one bit of the system and there was no interference to this other bit, they’re probably in different subsystems.
If we modified one bit and there was loads of interference in this other bit, they’re probably in the same subsystems.
This is an example of functional representation: with changes affecting modules.

Concepts of modularity

There’s two ways to view modularity:

Structural (topological / connectedness)

This refers to the number of connections within a module (intramodular), or between modules (intermodular)

Functional (dynamics, behaviour, fitness)

This refers to how changes in one module affects results of other modules, in terms of dynamics, behaviour, fitness etc.

Keep in mind that structural is not the same as functional.
They are implicitly related, though.

When you think of modularity, you may think that modules barely affect each other at all.
But that’s not necessarily true.
Sure, there are more intramodular dependencies (connections inside the module itself) than intermodular dependencies (connections between modules), but that doesn’t mean they’re not important.

Think of it like this:

Your nervous system and digestive system are two different subsystems of your body.

What would happen if you used your nervous system to stop your breathing and die?

Your digestive system would shut down, too.

After all, you can’t eat anything if you’re dead.

And no: let’s assume zombies don’t exist.

If that’s too weird of an example of modules critically affecting each other, try this:

UNLIMITED POWER

In a tower PC, the power supply ‘supplies’ power to the components through sleeved cables.

You could say the ‘power system’ is separate from other ‘modules’, like the motherboard or the RAM.

But what if your power supply fails? Nothing would be able to run anymore.

In that sense, the power system still critically affects other systems.

Modularity is when intramodular dependencies have a greater effect on fitness than intermodular dependencies, right?
How can inter-connections be critical, yet systems exhibit meaningful modularity?
Isn’t that high coupling?
It’s complicated...

Examples

Simon’s “nearly-decomposable” systems

A “nearly-decomposable” system is a system with two properties:

“the short-run behaviour of each of the component subsystems is approximately independent of the short-run behaviour of the other components”
“in the long run the behaviour of any one of the components depends in only an aggregate way on the behaviour of the other components”

Uhh... ok? What does that mean?
It means that:

In the short run, components are pretty much independent of each other.
In the long run, however, the behaviour of components depends on the aggregate of the other components.

Some examples will help clear this up.
Here’s two examples that Richard came up with in the lecture:

The moon revolves around the Earth, and Saturn exists also, with their 82 moons.

In the short run, the moon revolves with little to no impact from Saturn and their moons, and vice versa with Saturn’s moons.

In the long run, they still do affect each other. Perhaps, after 100 years, our moon is shifted a couple metres from its orbit because of the gravity from Saturn and its moons (calculated by taking Saturn’s, and its moons’, mass as an aggregate).

Here’s another:

In the short run, people from the UK do not affect what people from the US think and do (unless they’re explicitly talking to them).

In the long run, however, the collective opinions and actions of people from the UK do affect what people in the US think and do, and vice versa for them.

Maybe.

If you don’t like those examples, here’s some that Simon came up with:

The Combination Lock

Let’s say you had a combination lock with 10 dials, and each having 100 positions (so a lot more advanced than the picture on the left).

To unlock it, you need to start with the first lock, guess the right number for that lock, and sequentially go through and guess each lock until you’ve got them all.

You always hear a ‘click’ whenever you get one lock right.

The problem here is that modules are completely independent of each other.

The Watchmaker’s Parable

We have a watchmaker, Mr. Casio, who creates watches by putting together 1000 components sequentially.

If Mr. Casio is interrupted, he has to start from scratch.

Richard, on the other hand, puts together 10 components, made out of 10 sub-components, made out of 10 modules.

If Richard is interrupted, he can just start again on the latest module / sub-component / whatever he was working on.

Richard has intermediate stable modules, but Mr. Casio does not, meaning he builds watches more efficiently.

We have modules and hierarchy, but the problem here is that making modules is just a deterministic sequence of steps.

You can’t create a fitness function for this, so there’s no way to show the functional representation of modules.

Heat Exchange

We have a building with many rooms and many cubicles.

Cubicles start at different temperatures, and heat travels between them.

Heat exchange rate is greater for cubicles in the same room than in different rooms.

In the short run, a cubicle’s temperature is only affected by other cubicles’ temperatures in the same room.

In the long run, cubicles can also be affected by the average temperatures of cubicles in the other rooms: the heat travels from those rooms to our room.

It’s a hierarchical modular dynamic system, which is nice, but the problem here is that there’s no real problem to solve. It has nothing to do with evolvability.

Heat exchange coefficients of cubicles in rooms

Picky picky... there’s always a problem, isn’t there?
We want to show GAs with crossover perform better than GAs without crossover using an example with modularity.
These examples aren’t the answer, but we can draw some inspiration.

Modularity in EC

Do you remember NK landscapes?
It’s a fitness landscape with two variables:

- number of variables
- number of dependencies between variables

It has inter-dependencies, but no modularity.

NK landscapes

Royal Roads

NKC (clustered)

Inter-dependencies between loci

No modularity or hierarchy

Inter-dependencies in building blocks

Building blocks are the modules

No hierarchy

Inter-dependencies between modules

There are lower level and higher level modules

Most benefit comes from optimising lowest level modules

You wanna see some more examples of modularity in synthetic landscapes?

The intra-block column represents the fitness function of one individual module.
The inter-block column represents the fitness function of the whole landscape with each successive complete module.

Name	Intra-block	Inter-block	Resultant landscape
Royal Roads
Concatenated Traps
NKCs
Hierarchical IFF

In royal roads, a module only affects the fitness until it’s all complete, which is why the intra-block landscape spikes right at the end.
As you complete modules, the fitness increases linearly, which is why the inter-block landscape is just a linear function.

In concatenated traps, a module has a dip, but goes back up right at the end, hence the name ‘trap’.
As you complete the modules, like the royal roads, the fitness increases linearly.

In NKCs, it’s a bit hard to discern because the epistasis is random, but the fitness still increases in proportion to the completed modules.

In Hierarchical IFF, a module is complete when it has all ones or all zeroes, but not a mixture.
The fitness of the overall landscape depends on how you complete the modules. We’ll see this in more detail in a sec, but the point is, it’s not linear like the others.

Hierarchical IFF as a Dynamic Landscape

Let’s go into more detail in this Hierarchical IFF (HIFF) function.
Let’s say we have two genes, and they can either be 0 or 1.
If they’re both 1, we get more fitness!
If they’re both 0, we get more fitness!
If they’re different, then we don’t get as much fitness.
Here’s the fitness function of HIFF:

If that’s hard to visualise, here’s the same function flipped on its side:

The two peaks are where both genes are 11 or 00, and the dip is where they’re different (01 or 10).
We’ll visualise it like this:

Pairwise inter-dependency

Using this as a base, we’ll make a landscape with:

Intra-dependencies that are more important than the inter-dependencies (modularity)
Inter-dependencies that are still important
In a way you can actually understand?!

You may think it’s crazy, but read on; things are about to start making sense.

Next, we’re going to increase the complexity of this landscape a bit.

Uniform inter-dependency

We’ve added two more genes, and all genes are dependent on each other.
However, it’s still not modular.
Then how about we add some modules?

Modular inter-dependency

(4 genes, 2-storey hierarchy)

Now, 1-2 and 3-4 are modules!
They have strong intra-dependencies, and weaker (but still prevalent) inter-dependencies.

If you have incomplete modules, you will be down in a dip.

If you have complete modules, but the modules’ aggregate (all 1’s or all 0’s) don’t match, you’ll get a local optimum.

If you’ve completed all modules and all module aggregates agree, you’ll get the global optimum.

That’s pretty cool! Now we have:

Intra-dependencies matter more than inter-dependencies (modules: complete modules is better than incomplete ones)
Inter-dependencies are still important (getting all modules correct is better than getting a few correct)

We can go even further, and add more genes and larger modules:

Modular inter-dependency

(8 genes, 2-storey hierarchy)

It’s pretty much the same as before, but with more genes, larger modules and wider fitness saddles.
In general, things are more “fine-grained”.
We have enough space to add another hierarchy, by creating another sub-module in our existing modules:

Modular inter-dependency

(8 genes, 3-storey hierarchy)

Here, we’ve got more local optima!

So, what’s so good about HIFF? We already have modularity and stuff in the royal roads.
The difference is that hill-climbers can solve royal roads, but not HIFF.
Hill-climbers can climb up the steps of the fitness landscape pretty easily.
However, hill-climbers have to flip a lot of bits to get from one optimum to the next.

Look at the fitness landscape in the 8-gene 3-storey example above.
If you’re on the local optimum a quarter from the left/right (with a half-complete module and a complete module), you’d have to flip 2 bits to get to the next local optimum (middle).
From the middle, you’d have to flip 4 bits to get to the global optimum (sides).
With HIFF, the number of bits you need to flip doubles each optimum. No way a hill-climber is going to do that!

💡 NOTE

Of course, if you’re on a quarter local optimum, you can flip two bits to get to the global maximum.

However, HIFFs are meant to have way more dimensions (or ‘storeys’) than just these three. Have a look at the HIFF landscape just above where I compared it to royal roads, concatenated trap, and NKC!

There’s many different ways of representing this clustered dependency network:

Name that I’ve given it	Image
Really Messy Graph Similar to the diagrams above, it’s just all the genes represented as nodes, and the dependencies represented as edges.
Plum Pudding (no relation to the J. J. Thompson model) Genes are represented as nodes again, but sub-modules are circle-grouped genes and modules are circle-grouped sub-modules and so on.
Hierarchical Chocolate Chip Cookie (HCCC) Network A more hierarchical representation. The bottom graphs are just Really Messy Graphs for each submodule, then the next levels up show the inter-dependencies.
Tree It’s just a hierarchical tree showing the structure of the modules and their dependencies.

There’s other ways to represent this too, that demonstrate aggregate effects better:

Normal Graph

Aggregate Graph

Squiggly Tree

Normal Tree

Modular system of four variables (like above).

Instead of uniformly connecting every node to each other, we abstract a whole module with an oval and the dependencies to-and-from it are based on aggregate states.

Instead of using ovals, aggregate states are represented as different shades of nodes.

Higher tree levels represent higher levels of abstraction, in other words, ‘phenotypic’.

The same as Squiggly Tree, but with edges instead of squiggly brackets.

In summary, modulation is epic.
It appears everywhere through nature, and helps simplify bigger problems.
It also gives us an example where GAs with crossover triumph over hill-climbers.
However, building block genes need to be next to each other for GA with crossover to work.
We’ve been looking at building blocks that look like:

But what if it looks like:

They’re still modules! Their loci just aren’t adjacent.
Crossover wouldn’t be able to help us here.

The difference here is the representation.
Oh my! Would you look at the name of the next section!

Representations

Before, we’ve been representing individuals as strings:

Bit strings: 0101101010
Normal strings: Methinksiusdfhdsf7w4ruinjkfvbh
Letters as genes: ACTGACACACGTAGCTAC

This isn’t what biology does, though.
In biology:

DNA is transcribed into amino acids
Amino acids fold into proteins
Proteins control cell division and differentiation, creating a variety of cells
The cells form together to make a creature
The creature exhibits phenotypes which determine fitness

In other words:

In our algorithms, we have a direct function from genotypes to fitness.
In real life, genotypes (DNA) determine the phenotype (eye colour, arm length etc.), which determines fitness.

The mapping from genotypes → phenotypes is pretty complicated. It depends on:

Epistasis: effect of a mutation depends on genetic background (other genes)
Pleiotropy: one gene influences many traits
Redundancy: many genotypes map to a particular phenotype
Neutrality: some mutations on some backgrounds have no fitness effect

This all affects the evolvability of a species.

Those are the properties that affect the mapping, but what are the properties of the mapping that affects the evolvability?

Connectivity (or selective accessibility): functionally similar beetles should have very similar genetics
Bias: good phenotypes are evolved to have a bias in the mapping (beetles with horns are biased over beetles with wings on their ankles)
Modularity - Functional Decomposition: genes that affect function of the heart should be mostly independent of genes that affect function of the ears
Modularity - Repetition: mutations in genes that affect your fingernails will probably affect your toenails too (nails are repeated modules)

Graphs (TSP)

In the graph colouring problem, we can represent a possible solution with a string.
We number the nodes from to , and in our string, the th position is the colour of the th node.

Easy game, easy life!
We can even use mutation and crossover on this.
Thing is, it won’t make for a very good solving algorithm.

In the travelling salesperson problem, there isn’t even a natural string representation.
We could represent it as a permutation of nodes:

However, mutation and crossover won’t work:

(invalid: permutation has node 2 twice)
,

(invalid: permutation has node 2 twice)
(invalid: permutation has node 3 twice)

So what are we going to do? 😭 Switch modules?
No! Don’t give up yet, we got this!
Our problem lies in our representation.
Remember, GAs are heuristic search mechanisms that work by exploring points on a search space with a similar cost to the previous points.
To do this, our variation operators (mutation and crossover) need to produce alterations that change cost as little as possible.

In TSP, it’s not the nodes that incur a cost: it’s the edges.
Instead of representing nodes, we can represent the edges instead.
Then, we can build operators that preserve as many edges as possible.

To do crossover with edges, we can use edge recombination.

It works like this:

We take some edges from parent 1

We take some edges from parent 2 that don’t clash with our existing choices

We fill in the gaps to make it a legal tour again

To do mutation with edges, we use the algorithm called 2-Opt.

It works like this:

Split the path into three partitions

CD DE EF FA AB BC

Reverse the middle partition

CD AF FE ED AB BC

We’re almost done; time for the actual mutation! Tweak the ‘neighbours’ of the second partition in the first and third partitions so that they enter and leave the reversed partition.

CA AF FE ED DB BC

Boom! You’ve mutated the path.
However, sometimes 2-opt isn’t enough.
In those cases, you can use 3-opt, which is similar but with more partitions.
It’s more powerful, but also more disruptive.
You can use this to perform a fast neighbourhood search, actually.
It’s fast because you don’t have to recompute the cost of paths with 2-opt and 3-opt; you can derive the new cost based on what edges you’ve changed.

Structured Representations

Before, we were representing paths and solutions.
How could we represent structure, like a graph?
We could use matrices!
With artificial neural networks, we could have a matrix of weights.

With deterministic finite automata, we could have a transition function matrix:

We can go from matrices to strings by putting each row next to each other:

Since we can use normal mutation and crossover on strings, we can use the same operations on matrices.

Variable-size Parameter Sets

One problem is variable-size parameter sets.
In other words, what if we don’t want our matrix to be fixed?
What if we want it to be of a variable size?
If we tried to crossover a 10x10 with a 15x15, it’s not clear what to do.

We can fix this with relative connections!
Instead of explicitly saying

“node 1 is connected to node 3 with weight 4”

We can say:

“node 1 is connected to the node 2 spaces along, with weight 4”.

Where “the node 2 spaces along” means “node 1 + 2”, which is “node 3”.

node 1		node 2	node 3
+2, 0.3	+3, -2.0	-1, 1.2	-2, 0.4

Now, when you do crossover with two graphs, it doesn’t matter how many nodes there were.

Trees

When we represent trees, we need to define a mutation and crossover operation.
For an example, let’s look at abstract syntax trees.

(4 * x) + (3 - y)

When you evolve these, that’s called Genetic Programming (GP).
One possible problem that can be solved with GP is system identification: finding an equation that describes a system.

When you mutate a tree, you can change a leaf or a binary node:

(3 * x) + (3 - y)

(4 * x) * (3 - y)

When you do crossover, you exchange subtrees:

This is cool and all, but variation can be quite disruptive.

(29 * x + 33) - (5 + x)

→

(29 * x + 33)(5 + x)

However, in binary trees, most nodes are leaves, so it’s not so bad.

Sometimes, subtrees won’t contribute to the formulae.
Like subtrees that are multiplied by zero:

Through evolution, these subtrees will grow and bloat the tree.
It’s not all bad though: when you have bloat, it reduces the disruptiveness of the mutation.
Sometimes, a bit of bloat is an advantage to balance out the disruptiveness of mutation.
Of course, bloat has two downsides:

Reduces interpretability of the formulae (it gets confusing)
Produces over-complicated formulae (leads to overfitting)

You can reduce bloat by incurring a fitness cost when trees get too big.

Grammar Based Representations

Do you remember context-free grammars?
If not, go refresh yourself on them in the ToC notes.
Here, they’re called L-systems.
They look like this:

S → ABA
A → aBa
B → AbA

However, that’s not useful for evolution.
What would be useful for evolution would be something like this:

Body → Arm Arm
Arm → Hand
Hand → Finger Finger Finger Finger Thumb
...

How do we represent these?

Cellular Encoding

One way is encoding a grammar in a tree, known as cellular encoding.
This is a way to represent artificial neural nets with a grammar.
How do we get from a tree to a grammar?

This is a special tree that modifies a neural net with just one internal node.
Uh... huh? Trees that modify neural nets?
We have a grammar for our trees:

Command → PAR | SEQ | END | ...
SEQ → Command Command
PAR → Command Command
...

Then, we evolve instantiations of this grammar as trees.
We use these trees to generate ANNs.
I’ll show you:

So here, we have one cell in the neural net, and it’s pointing to the root of our tree.

The root node has the command SEQ, and that command will be applied to our cell here.

SEQ means divide the cell sequentially in two, and make those two cells point to the children.

Now, our ‘a’ cell was split into two, ‘a’ and ‘b’, and they point to the root node’s children, PAR and SEQ.

Let’s perform the PAR command on that ‘a’ cell.

PAR means divide the cell in parallel and make the children cells point to the children nodes.

‘a’ was split into ‘a’ and ‘c’ in parallel, meaning they both come from the input instead of before where b’ came after ‘a’.

This keeps on going until the neural net is fully generated.

This exhibits properties of repeated modularity and self-similarity, which can be exploited for tasks that have said self-similarity.
For example, if we were evolving a frog with four legs, you could use the “PAR” node twice to split a node representing a leg into four nodes representing four legs.
Or something.

Turtle Graphics

Do you remember the Logo programming language?
Basically, there’s a cursor on the screen, and you can use commands to move the cursor around to draw stuff.
It’s pretty cool, but it’s only used for drawing stuff.

You can create grammars to draw stuff in Logo:

S → Forward 10, Right 1, S
→ Forward 10, Right 1, Forward 10, Right 1, S
→ Forward 10, Right 1, Forward 10, Right 1, Forward 10, Right 1, S
...

This grammar keeps drawing forward and going right slightly, to make a circle:

You can do branching too:

Forward 10, [Right 10, Forward 20], Left 10, Forward 20

You can make even more complex grammars and make actual trees:

Advanced Representations

Remember how we represented graphs?
What if those graphs were, like, lifelike mechanisms?
And the edges could expand, like rods?
So the mechanisms could move?
You can make fish!

Then evolve those fish and make land creatures!
Then evolve those land creatures and make dinosaurs!
Then evolve those dinosaurs and make apes!
Then evolve those apes and make humans!

Do you like automata?
I like automata.
How about an automata that can make a tree? (Structure-defining tree-based FSMs)
Or a ribcage (I know, it’s a bug with six legs)?
Or even you...?

How about using L-system grammars to make robots that walk around?

We can represent ANNs with grammars, but how do we actually evolve them?
There’s this algorithm called NEAT (NeuroEvolution of Augmented Topologies), and this guy used it to play Pac-Man!
In short, NEAT is a genetic algorithm for evolving neural nets.
Instead of evolving just the neural net’s weights, it also evolves the structure, with the nodes and everything.

Artificial Life

Do you watch Futurama?
Have you seen the episode “A Clockwork Origin”?
In it, nanobots are put on a lifeless planet, and they evolve into greater and greater robotic creatures.
That’s an example of artificial life.
There’s many different kinds of artificial life, as we’ll see...

Concept, Motivation, Definitions

What actually is artificial life?

“Artificial Life is a field of study devoted to understanding life by attempting to abstract the fundamental dynamical principles underlying biological phenomena, and recreating these dynamics in other physical media - such as computers - making them accessible to new kinds of experimental manipulation and testing.”

TL;DR, it means representing the abstract principles of life, and recreating them (usually in a computer) so we can manipulate and test it.
Life on earth: “life as we know it”
Artificial life: “life as it could be”

Why do you survive? Why does anything survive?
It depends on how good they are at:

Finding food
Avoiding predators
Finding mates (for sexuals)

Among other things, like:

Metabolic functions
Lifecycle functions

However, there’s nothing hard-coded into us that says “if you’re good at finding food, you’re allowed to pass on your genes”.
In our algorithms, we hard-code this, with things like “if you have the most ones in your bit-string, you’re selected for reproduction”.
The difference is:

Implicit fitness / selection: an individual that produces many offspring, in the context of its environment, is fit.

In real life, there are no set rules or fitness functions; the organisms that manage to reproduce are the ones that carry on their genes.

Explicit fitness / selection: an individual that performs well at a task/problem has ‘high fitness’ and is permitted to reproduce.

In simulations, if you’re bad at the given task, you will not reproduce.

There are two ways of thinking about artificial life:

Strong artificial life: It doesn’t matter if it’s biological or mechanical; if they both exhibit the same dynamical properties, they’re both living.
Weak artificial life: Mechanical life doesn’t have the same complexity and isn’t made of the right stuff; it’s not living.

There’s two categories of artificial life:

Dry artificial life: artificial life made mechanically, like computer simulations, robots etc.
Wet artificial life: artificial life made with biological components and ‘wet stuff’

There’s different approaches to wet artificial life:

Minimal genome: take a real bacteria, remove all unnecessary genetic material with trial and error until you get a super minimal bacteria
Artificial genome: synthesise a minimal metabolic system based on logical design (create your own custom genome of a bacteria, take an existing bacteria and replace its DNA with your custom genome).
Self-organisation in biomolecules: get a flask of biological ‘goo’, shake it around, and let it ‘self-organise’ and emerge spontaneously (people think this doesn’t work because there’s too much entropy, there’s not enough compartments)
Synthetic DNA: synthesise DNA and put it into an artificial cell

That’s not what I meant by ‘wet artificial life’...

Examples

Have people actually made artificial life before?
Of course they have! Let’s look at a few.

Karl Sims Creatures

Karl Sims made box creatures in a physical world with sensors and actuators.
He trained them to do different things, like swim around, move around, chase laser pointers etc.
I’ve already shown you this video before, but whatever.
Sims used an explicit fitness function, for things like how far they could move, how close they moved to the laser pointer etc.

He even made, um... sperm?

Framsticks

Framsticks is a freeware artificial life simulator.
It’s more modern than Karl Sims, and you can download it and use it yourself!
However, the creatures are more simplified.
You can make them:

Search for food
Reproduce spontaneously
Conserve energy

Framstick snakes 🐍

Darwin Pond

The previous two were physically “realistic”.
Those creatures tried to resemble real creatures.
Darwin Pond is a bit more abstract.
It’s a 2D world, with only swimming physics and restricted morphology.

However, it has implicit fitness!
There’s no hard-coded “if you find food, your genes will pass on”.
The creatures that find food themselves will live on to reproduce.

They’re making a remake of this called Richard.io.

Cellular Automata Based

What about something completely abstract?
Something that looks nothing like a creature?
Sure. We’ll do that next.

Before, reproduction was an automatic pre-defined behaviour given by the programmer.
Is there a way to code the ‘physics’ of the world to enable reproduction implicitly?
Sure! We can do that, with things like John Conway’s Game of Life.
If you don’t know what it is, it’s a big grid of on/off cells that are either turned on or off in the next generation based on the neighbouring cells.
This isn’t really evolution though.

How about Langton loops?
It’s similar to Game of Life, but it’s capable of making copies of loops.
So, we have reproduction, but it’s all just identical copies of loops.
There’s no variation, and there’s no change in fitness either.
There can’t be any variation because it’s a completely deterministic system.

The last one is Hiroki Sayama’s Evoloop.
There are 9 states for each cell, and there are 5 neighbourhood rules.
There’s nothing explicit coded except those cell rules, but it has:

Representation (loops)
Fitness (staying alive and having kids, I guess)
Reproduction (loops self-reproduce)
Variation (loops of different sizes)

As the algorithm goes on, you’ll see that the loops get smaller and smaller.
That’s because it’s an evolutionary advantage to get smaller, because it’s easier for them to exist (cells travel a shorter distance around the loop), and therefore, to reproduce.

At the start of the algorithm vs towards the end

All the properties of evolution are present, but it’s all implicit.
Is that it? Have we made living things now?
Not quite... This is cool and all, but it’s not very creative.
The individuals are just loops.
Can’t we implicitly evolve something that’s a bit more expressive?
No, I didn’t mean loops with feelings... I meant complex individuals.

Computer Program Based

Tom Ray created a world called Tierra that consists of computer programs replicating themselves.
They can read and write to areas of memory, and they can copy themselves to other parts of memory and run themselves there, called ‘forking’.
Random errors are sprinkled everywhere in memory at a low rate, creating mutations.
There’s a ‘reaper’ that kills old processes.

The first ever program in Tierra, named the Tierran ancestor, is a simple program that replicates itself with a copy function.
It’s written in an assembly-like language, so to run its copy function, it searches for a ‘copy’ label.

They let it run for a while, and eventually, parasites started to appear.
No, not computer viruses.
These parasites were imperfect copies of the ancestor with no copy function.
How can it replicate with no copy function?
When it tries to copy, the program counter keeps sequentially searching for a copy label until it reaches one that’s owned by another program: a host.
It then uses that host to replicate, and because parasites are much smaller, there can be more of them and can be copied much quicker.
However, there must be a host for them to survive, otherwise there’d be no copy functions to reproduce with.

They let it run for a bit longer, and hosts started to build resistance to parasites.
There were also hyper parasites, which were parasites that infected parasites.
How does that even work?
I don’t know. I just write the notes.

So, we have implicit evolution!
However, it’s a bit underwhelming, don’t you think?
All they make are parasites. That’s not very fun.
Where’s those cool blocky box grabbers and sperm that Karl Sims made?
Where’s the complexity?
In fact, how do we even know that evolution increases complexity at all?
That’s the topic of this module, and the reasoning behind the name Evolution of Complexity!

Arrow of Complexity

Does evolution naturally tend to increase the complexity of evolved things?

Life on Earth is more complex now than it was back in the primordial soup, and evolution is the only process capable of making that complexity, so evolution must increase the complexity of things.

Evolution simply favours individuals with a high fitness; that’s all. It doesn’t favour complexity at all. Tierra and Evoloop only created parasites. If evolution increased complexity, Tierra and Evoloop would create much more.

Who’s right? Don’t you wanna know the answer to this burning question?
What?! You really couldn’t care less?
Tough luck, chump. It’s part of the module.
Let’s have a look at research regarding this question...

First, how do we measure complexity?

Information Entropy

Information entropy: how much information there is in a signal.

Entropy is like a measure of how much surprise there is, or how predictable something is.
Given a fair coin toss, there are two possibilities. There’s only two options, so it’s fairly predictable. That’s 1 bit of entropy.
Given two coin tosses, there are four possibilities. That’s a bit less predictable than with one coin. That’s 2 bits of entropy.
Given another coin toss, but both sides are heads, it could be either heads or... heads. There’s no surprise there at all. That’s 0 bits of entropy.
If that doesn’t make a lot of sense, check out this short video explaining it.

Basically:

Totally predictable and ordered → low entropy
Totally random and unpredictable → high entropy

Is this a good way to measure complexity? Hmm...

Kolmogorov Complexity

The Kolmogorov complexity of something is the length of the shortest program (for universal Turing machines) that could generate it.
Therefore, the string “hello world” has more Kolmogorov complexity than the number 6.
There’s a problem with this though.
It’s generally incomputable.
For example, could you calculate the Kolmogorov complexity of:

An actual rainforest?
A Pacman frog?
Jonathan Joestar?
Richard Watson?

How would you even begin to generate those with a computer?

How many lines of code would it take to

perfectly 3D model a Pacman frog?

Kolmogorov complexity is maximised by randomness.
The truly random number has the largest computer program.
Entropy is also maximised by randomness.
A completely random 50:50 coin has the highest entropy a coin can have.

We don’t want it to be maximised by randomness...
We want it to be maximised by pattern, structure, or organisation.
What’s the point of using a complexity representation that says a random number is more complex than a human?
Bro... you live in a world with quantum mechanics. You’re made up of random numbers.

Alright, how about this?
If randomness is the problem, let’s say that instead of a Turing machine, we use a sort of ‘probabilistic’ Turing machine.
Transitions aren’t deterministic; they’re random instead.
The size of a machine to generate a random coin toss would be tiny.
Therefore, coin tosses would be low complexity!
That’s cool I guess, but it’s still incomputable, like with normal Turing machines.
I mean, we wouldn’t even be able to represent the complexity of Richard Watson.

Structural Complexity

How about we go less theoretical and think more structurally?
We could quantise complexity with the number of parts an individual has.
What do we describe as a “part”?
A part could be either an object or a process (connection between objects).
We could also incorporate hierarchies in this complexity.

Hmm... maybe. We’re getting there.

McShea’s Passive Diffusion Model

Has life on Earth actually increased in complexity?
Sure, we’re pretty complex, but we still have:

Non-living molecules
Replicating molecules
Protocells
Simple (prokaryote) cells / bacteria
Eukaryotic cells

Basically, bacteria that’s still not very complex still exist in our world.
Why haven’t they evolved to be complex, if evolution leads to complexity?

Yes, we still have these things in our world.

One explanation is that evolution doesn’t really drive complexity...
It just sort of happens. Some things evolve to be complex. Some don’t.
You could say complexity is a result of “passive diffusion”.
Complex things are made, but not because complexity is adaptive (adaptive meaning complexity is an evolutionary advantage).
How do we even prove such a thing?

To prove this, we need to provide three things:

Trend: An observable increase or decrease in a measured quantity
Generic Trend: robust regularity in trend (repeatable, like a ‘law’, not accidental)
Mechanism: something that explains why the generic trend happens

Alright, let’s set up the initial structure of our proof:

Trend: complexity of life on Earth has increased
Generic Trend: it could be generic... who knows?
Mechanism: evolution, ONLY IF complexity is adaptive

Two problems:

We don’t know if this trend is generic
We don’t know if evolution is the mechanism, because we don’t know if complexity is adaptive.

Oof.. this is pretty tricky.
What do the great evolutionary minds think?

Darwin: “from [evolution by natural selection] the production of higher animals directly follows”
Dennett: complexity increase is “forced move”
Stephen Jay Gould: it's all ‘historical contingency’ (accidental) – passive diffusion (McShea 1994) at most
John Maynard Smith: no expectation of increased complexity from natural selection
Stuart Kauffman: there is a generic trend, but the mechanism is self-organisation
Mark Bedau: this is all rhetoric, if you think you know a mechanism, show me the model that exhibits the generic trend. We haven't seen one yet!

Basically, Bedau says all this discussion is pointless until someone presents a model that exhibits the generic trend.
So, McShea demonstrated the “passive diffusion model”, with the complexity on the X axis and time on the Y axis.

This graph shows that there isn’t any “pressure” to be more complex.
There’s simply a distribution of individuals, with more individuals being simple than complex.

Boom! That shut Bedau up.
Or did it?
Well, is the X axis a reasonable dimension?
The X axis’ representation of complexity could be anything:

Number of legs
Number of wheels
Number of pink stripes

This model doesn’t actually prove anything about complexity, or produce anything complex.
It just differentiates between the various concepts we might talk about.

Hierarchical Structure

The Tierra world from Artificial Life had all the makings of evolution:

Natural selection
Implicit fitness
Infinite genetic space
Computational universality

But all it made were really small parasitic-like loops. What gives?
Where’s the complexity? Where’s the loop-based Skynet trying to take over our computers?
Perhaps we’re missing some “magic ingredient”?

Tierrans with the “magic ingredient”

Perhaps that magic ingredient is hierarchical structure?
Here are the major transitions in evolution:

Self-replicating molecules → chromosomes
Proto-cells → cells
Prokaryote cells → Eukaryote cells
Single celled organisms → multicellular organisms
Asexual populations → sexual populations
Homogeneous populations → eusocial colonies

Basically, we start out as really small self-replicating proteins from the primordial soup, and we go through stages, or “hierarchies”, until we get to the social human beings we are today.
When an organism reaches the next hierarchy, it can now only self-replicate as part of the larger whole (e.g. when single celled organisms become multicellular, if any more single celled organisms reproduce, that’ll just form a bigger part of the multicellular organism).

Our buddy McShea from earlier, and their friend Changizi, put these hierarchies into a graph and made sense of things on a cellular level:

The point is:

1) Individuals start as just a cell.	2) They clump together to form homogenous organisms.	3) They mutate to form heterogenous organisms.	4) The types of cells move apart and form compartments.	5) The organisms stick together and form colonies.

The different degrees of individuation are:

Monomorphic aggregate
Differentiated aggregate
Differentiated aggregates with intermediate level parts

This hierarchical evolution is pretty cool...
Maybe it really is a generic trend.
However, standard models of EA don’t support this kind of hierarchical organisation.
Plus, nobody has demonstrated a generic trend for hierarchical complexity in artificial life models.
Back to the drawing board...

If increases in complexity really is a generic trend, what’s the mechanism that causes it?
If evolution really does increase complexity, what are we missing in our current models?
Is it any of these?:

Rich physical environment
Coevolution
Self-organisation
Dynamical systems theory
Mechanisms of encapsulation
Hierarchy
Changes in the unit of selection

Who knows...?

Natural Induction (not examinable)

Apparently, this section is not examinable, so read this if you’re interested.
In addition, I don’t fully understand natural induction, so I’m giving a very brief abstraction of what natural induction is.
Please correct anything if it’s wrong!

In a population, we have a fitness landscape.
We want our individuals to be at the dips of the landscape (minimising function), because that means they have the highest fitness (lowest cost) they can have.
Thing is, they usually don’t go to the highest fitness levels.
Individual hill-climbers are selfish, and only remain on local optima that maximise their individual utility.
If these hill-climbers were to work together and collaborate, they could probably reach higher maxima.

These graphs will be minimisation functions, not maximising.

They don’t want to work together because that won’t maximise their individual utilities.
They need to be part of a ‘social organisation’ where they work together for the greater good.

So how do we cause individual incentives to become better aligned with social incentives?
How do we make “working for the greater good” also benefit individual utility?
We use natural induction!

First of all, what is induction?
Induction is the production of general rules from specific examples.
Some examples are:

This colourful thing tastes good 🍓
That colourful thing taste good 🍩
Therefore all colourful things taste good 🍓🍩🍌🍊🍬🍭🍫

Or mathematical induction:

The sum of natural numbers up to 1 is
The sum of natural numbers up to n + 1 is
Therefore the sum of natural numbers up to any n is

There’s even an example on Richard’s T-shirt:

Of course, induction isn’t always logically sound.
It’s just the idea behind the concept of natural induction.

Natural induction is a process by which a dynamical system becomes adaptively organised spontaneously.
It doesn’t require natural selection, or some kind of system-level objective function.
There’s three requirements for natural induction to occur in a dynamical system:

Hidden state - interface variables and internal state
Imperfect elastic interactions
PULSE - repeated exposure to stress or perturbation

If that makes no sense, read on: I’ll try and make heads or tails of it.

What is ‘hidden state’?

Let’s go back to our minimisation function example.

Let’s say we have an individual that’s stuck on the landscape somewhere.

From all our previous examples of individuals travelling across the fitness landscape, our individual is stuck in that local optimum dip.

But what if we could change the individual’s perception of their utility function?

In other words, all the individuals have their own mutable, shared “copy” of the fitness function that they travel across?

We could change the individuals’ perception of their utilities to travel up peaks they normally wouldn’t want to travel across!

This blue ‘perception’ of the utility function is the hidden state; it’s an invisible force that drives the variables we can see (namely, the individuals), and it’s called hidden state because we can change it.

What is ‘imperfect elastic interactions’?

How much do we want to change our hidden state?

If the hidden state is too rigid, it won’t change at all, and there’d be no point having it.

If the hidden state is too plastic-y, it’ll be like clay and change too much.

It’s not changing, so we’re not moving anywhere.

I didn’t want to change it that much!

We want a sweet spot.

We want to be able to change our hidden state; not so rigid that we can’t budge it, but not so soft that there’s zero resistance and it goes everywhere.

That’s what we mean by ‘imperfect elastic interactions’.

If this is a little hard to visualise, think of it like this:


Steel pole: impossible to bend. Too rigid.	Wet spaghetti: always falls out of shape. Too soft.	Copper wire: it’s malleable enough to bend, and it’s tough enough to retain its shape. Nice! We want our hidden state to be like this.

What is ‘PULSE’?

So we have a hidden state that lets us control where the individuals go as a group, and we can bend it to our will.

But what do we change the hidden state to?

Onto that in a second! First, let’s define what we mean by ‘PULSE’.

Do you remember simulated annealing from Intelligent Systems? If not, it’s about using temperature to ‘scramble’ the individuals to escape local optima.

Big β, low temperature	Small β, high temperature

That’s pretty much what PULSE is.

It’s the ability to ‘heat up’ individuals and make them scatter about to find other optima.

Great, so how do these properties result in natural induction?
Remember when I talked about increasing the heat of the population?
When we do that, the individuals will, on average, evenly fall all across the fitness landscape.

Because they all fall evenly, they form a ‘distribution’ of all the local optima in this landscape.

The distribution of the optima affects how much the hidden state is ‘pushed’ at each point in the landscape by the individuals.

Then, the individuals are heated up again and the cycle continues.
As this continues, the dips with the wider basins get greater and greater prevalence in the hidden state, so individuals land in the dips with the wider basins more.

#1
#2
#3

Now, we’ve pushed the hidden state so much that there’s only one really big dip.
That big dip drives all the individuals to the global optimum.

You see? With the power of teamwork, all the individuals explored the fitness space, and drove the hidden state in such a way that the global optimum came out on top and drove out the smaller local optima.
That’s a super simplified explanation of natural induction.
If you’re:

Masochistic
Richard Watson
Want to do a PhD with Richard Watson (a.k.a the first point)

Check out the lecture for natural induction in the recorded lectures for waay more information.

Compositional Evolution

A wise man once said:

“If it could be demonstrated that any complex organ existed which could not possibly have been formed by numerous, successive, slight modifications, my theory would absolutely break down.” - Darwin, 1859

Hmm... could we?
In more detail, Darwin says that every individual evolved through the succession of many ‘proto-systems’ that are gradually increasing in function.
Proto-systems are intermediary creatures in the evolutionary process that lead up to a more complex creature, like you or me.
We will show that it’s possible to make creatures without proto-systems, but that doesn’t mean Darwin’s theory is wrong.

Are proto-systems required?
Proto-systems leading up to complex individuals are a result of natural selection.
That means creating a new individual based on variation and whoever here has the highest fitness.
But there’s other ways to make new individuals:

Sexual recombination / hybridisation

Uniform crossover, one-point crossover etc.
You can get really big genetic changes in one step, unlike mutation.

Horizontal gene transfer

Genes are transferred across species barriers.
For example, a bacteria transferring antibiotic resistance genes to other species of bacteria through either transformation, transduction, or conjugation.
This is mostly done by bacteria.

Symbiogenesis

One organism merging with, and becoming a part of, another organism.

Major transitions

The hierarchical major transitions I mentioned back in Arrow of Complexity.

Self-replicating molecules → chromosomes
Proto-cells → cells
Prokaryote cells → Eukaryote cells
Single celled organisms → multicellular organisms
Asexual populations → sexual populations
Homogeneous populations → eusocial colonies

These ways of making individuals is less about mutating an existing individual, but about putting two individuals together to make a new individual.
We call this compositional evolution.

Gradual vs Compositional

Compositional evolution doesn’t use proto-systems, but are still capable of evolving life, like Darwin thought.
Therefore, a succession of gradually changing proto-systems, monotonically increasing in function, is not required for evolvability of certain kinds of complex systems.
Let’s look into this a little deeper:

Gradual Evolution:

Driven predominantly by small modifications, hence ‘successive slight modifications’.
Individuals are reached by linear accumulation of random variations.
Variation → genetic mutation

Compositional Evolution:

Evolution that involves the combination of pre-evolved systems or subsystems of genetic material that were pre-adapted in different individuals.
Variation → sexual recombination, horizontal gene transfer, symbiogenesis

On a fitness landscape, you can think of gradual evolution as like a hill-climber:

You can think of compositional evolution as two individuals forming a new individual on a completely different landscape:

How about on an evolutionary tree?
One with time on the Y axis and variation on the X axis.
Gradual evolution looks like this:

Simple enough. Everything starts at a common ancestor (the root), and through time, the variation of individuals changes, and species branch off.
How about compositional evolution?
Well, what do Doolittle and Margulis think?


Doolittle	Margulis

They’re both not exactly right.
With Doolittle, it seems like individuals from one branch are “becoming” individuals from another branch. That’s not quite right; we’re making a whole new individual altogether.
With Margulis, it looks like the two species keep evolving until they both evolve into the same thing. That’s also not right, because they’re meant to merge; not meet in the middle.
In reality, it should look like this:

Here, two individuals form a completely new individual.

All in all:

In sexual recombination,

subdivisions are subpopulations in a species
integration is done by sexual recombination
Examples:

Shifting Balance Theory (Wright 1931)
Building Block Hypothesis (Holland 1975)

In symbiotic encapsulation,

subdivisions are different species
integration is done by horizontal gene transfer and endosymbiosis (bacteria going inside, and becoming a part of, a host cell)
Examples:

Major evolutionary transitions
Problem decomposition by coevolution

Complex Systems

The different kinds of dependencies between variables can affect the system in various ways:

Dependency of variables	Arbitrary interdependencies	Modular interdependencies	Few / weak interdependencies
Landscape
Algorithmic paradigm (what algorithm to use?)	Exhaustive search Random search	Divide and Conquer problem decomposition	Hill-climbing - accumulation of small variations
Complexity	KN	Nk	KN
Evolutionary analogue	Impossible / ‘intelligent design’ (it’s virtually impossible to make an evolutionary algorithm find a global optimum in this)	Compositional evolution	Gradual evolution (need to find small paths)

Could we come up with a problem that can be solved with compositional evolution, but not with gradual evolution?
It needs to have the elements that make it hard for gradual evolution:

Ruggedness
Lots of local optima
Broad fitness saddles

It needs to have the elements that make it easy for compositional evolution:

Modular interdependency
Subsystems that can be solved independently and combined
Epistasis prevents linear accumulation of subsystems

How about the HIFF function we looked at back in Modularity, to show that there exists a problem that crossover can do better than a hill-climber?

HIFF

If you forgot, HIFF looks like this:

It’s specially designed so one-point crossover can solve it really quickly, but hill-climbers cannot.
If hill-climber relates to gradual evolution, and crossover relates to compositional evolution, won’t the latter be better than the former at HIFFs?
Here’s a graph:

Yep! GAs with one-point crossover are way better.
So compositional evolution is way better at HIFFs than gradual evolution.
But we already know that from Modularity.
How about we make the problem slightly more complex?

Random Genetic Map and SEAM

Before, we assumed that all modules are located together on the genome.
Not all real-world problems are going to have a convenient genome like that.
What if we scrambled the locations of the modules?

How do the algorithms fare now?

Oof...
We need an algorithm that not only uses compositional evolution, but also identifies modules regardless of location on the genome.
Introducing SEAM!

SEAM uses coevolution of different individuals to cover different parts of the problem space.
When it finds a successful group (in other words, a permutation of loci that behave very ‘modular’), it encapsulates that and remembers it.
That’s called symbiotic encapsulation.
In a more general sense, it finds the modules with permutations of loci, and then evolves using that in mind.
How well does SEAM perform?

That looks good!
It’s low at the start because it’s still learning where all the modules are.
Once it knows, it performs just as well as a one-point crossover GA.
This makes SEAM really good for:

Divide and conquer with compositional evolution
Evolving parts and putting them together

The only problem is that it’s only good for HIFF.
A problem that we invented.
🙃

0	0	1	0	0	0	0	0	0	0	0	0	1	0	0	0	0	0	0	1
0	0	0	0	0	1	0	0	0	0	0	0	1	0	0	0	1	0	0	0
-	-	?	-	-	?	-	-	-	-	-	-	-	-	-	-	?	-	-	?

0	0	0	0	0	0	0	0	0	1	0	0	0	0	0	0	0	0	0	0
0	0	0	0	0	0	0	0	0	0	1	0	0	0	0	0	0	0	0	0
-	-	-	-	-	-	-	-	-	?	?	-	-	-	-	-	-	-	-	-

0	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	1	0
-	?	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	?	-

1	1	1	1	0	0	1	0	0	1	0	0	0
0	1	1	1	0	0	1	0	1	0	1	0	1
1	1	1	1	1	0	1	0	0	1	0	1	1
1	1	1	1	1	1	0	1	0	1	1	1	1

0	0	1	0	0	0	0	0	0	0	0	0	1	0	0	0	0	0	0	1
0	0	0	0	0	1	0	0	0	0	0	0	1	0	0	0	1	0	0	0
-	-	?	-	-	?	-	-	-	-	-	-	-	-	-	-	?	-	-	?

0	0	0	0	0	0	0	0	0	1	0	0	0	0	0	0	0	0	0	0
0	0	0	0	0	0	0	0	0	0	1	0	0	0	0	0	0	0	0	0
-	-	-	-	-	-	-	-	-	?	?	-	-	-	-	-	-	-	-	-

0	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	1	0
-	?	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	?	-

1	1	1	1	0	0	1	0	0	1	0	0	0
0	1	1	1	0	0	1	0	1	0	1	0	1
1	1	1	1	1	0	1	0	0	1	0	1	1
1	1	1	1	1	1	0	1	0	1	1	1	1

0	0	1	0	0	0	0	0	0	0	0	0	1	0	0	0	0	0	0	1
0	0	0	0	0	1	0	0	0	0	0	0	1	0	0	0	1	0	0	0
-	-	?	-	-	?	-	-	-	-	-	-	-	-	-	-	?	-	-	?

0	0	0	0	0	0	0	0	0	1	0	0	0	0	0	0	0	0	0	0
0	0	0	0	0	0	0	0	0	0	1	0	0	0	0	0	0	0	0	0
-	-	-	-	-	-	-	-	-	?	?	-	-	-	-	-	-	-	-	-

0	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	1	0
-	?	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-	?	-

1	1	1	1	0	0	1	0	0	1	0	0	0
0	1	1	1	0	0	1	0	1	0	1	0	1
1	1	1	1	1	0	1	0	0	1	0	1	1
1	1	1	1	1	1	0	1	0	1	1	1	1