# Algorithm

### Evolution as a search mechanism

The evolution mechanism depicted by Neo-Darwinism can be easily interpreted as a strategy to explore new organism’s physical features without seriously compromising the possibility to exploit existing successful features, as genotype variation is gradual and multiple copies of  the set of so-far successful genes are in any case retained by the population. So evolution as theorised by Neo-Darwinism can be considered in a sense as a search of the genotype corresponding to an optimal phenotype, the one which ensures the survival and the best adaptation of a species to a certain environment. Quoting Fogel (1994):

“Darwinian evolution is intrinsically a robust search and optimization mechanism”
Fogel (1994), page 3

and also:

`evolution is an obvious optimizing problem-solving process
Fogel (1994), page 4

Evolutionary algorithms or evolutionary computing (Winkler 2007) are the result of the attempts to transpose the evolutionary and genetic mechanisms implied by Neo-Darwinism into an algorithm that can be used to perform a directed search or exploration of a user-defined space. The basic components shared by all evolutionary algorithms are (Sims 1993, Michalewicz 1996, Eiben et al 2002}:

1. GENOTYPE-PHENOTYPE MAPPING, also called representation. Evolutionary algorithms handle and refine programs, be they vectors, mathematical equations or other entities. A way to represent such programs is key for the artificial evolution process to be able to evolve them. The chosen representation has to allow for evolvability, as it “must give a non-vanishing likelihood that variation produces performances improvement” (Altenberg_94). Evolvability is not always easy to guarantee (Affenzeller et al. 2004), as this property should be ensured by the user through appropriate coding and representation.
2. POPULATION of genotypes, also called individuals or chromosomes. Evolutionary algorithms require several individuals to be able to search effectively for the optimal genotype. As reported in Zhang and Mühlenbein (1995), “the evolutionary approach differs from most other search techniques in that it makes a parallel search simultaneously involving hundreds or thousands of points in the search space”. A way to generate the an initial population of individuals from which the evolution can start is then required.
3. FITNESS FUNCTION, a function that introduces selection pressure, according to the Darwinian concept of environment. Usually the fitness value associated to a genotype is computed by a computer according to specific criteria, but there are examples of evolutionary algorithms used in art where fitness value is evaluated by the user(Sims 1993, Johanson and Poli 1998).
4. SELECTION process that probabilistically selects the genotypes having the best fitness (Fogel 1994). Darwinian selection relies on heritability, or the assumption that “better individuals are more likely to produce better offspring” (Blickle 1996, page 361). Quoting Blickle (1996) (page 361), “… without heritability, selection of better individuals makes no sense”.
5. GENETIC OPERATORS that either copy or alter parents’ genotypes to produce offspring’s genotypes.
6. PARAMETERS that defines population size, the maximum number of generations and a few other issues related to the practical implementation of the algorithm (Michalewicz_96).

The basic set of operations that are common to all evolutionary algorithms are shown in Table 1 (Koza 1992, Banzhaf et al. 1998, Brameier and Banzhaf 2007, Poli et al. 2008).

Table 1 : evolutionary algorithms basic set of operations
1. Initialization of the population
2. Fitness evaluation
(may require execution)
3. Selection
4. Reproduction
– replication
–  crossover
–  mutation
5. Termination criterion check.
If met go to 6 otherwise go to 2
6. Stop (maximum number of
generations reached
or termination criterion met)

The algorithm starts with the random generation of the initial population of genotypes (1), from which an iterative process develops. At each iteration, also called generation,  each individual of the current population undergoes fitness evaluation (2) (it may require the execution of the program), selection (3), reproduction (crossover and  mutation) (4). The algorithm stops when a particular termination criterion is met (5), i.e a particularly fit individual is found, or the maximum number of generations is reached (6).

### Tree-based genetic programming for symbolic regression

Genetic programming is a specific implementation of the evolutionary algorithm introduced above, assuming that individuals take the form of syntax trees. The aim of the evolution is the generation of (meta)models in explicit form. In the following the specific elements of tree-based GP are described.

1. Initialization of the population

Population initialisation is the process through which a certain number of syntax trees (also known as individuals) is randomly generated combining a set of symbols (Lew et al. 2006, Poli et al. 2008), called primitives, provided by the user. These symbols either represent mathematical functions (functional set F), which require a positive number of arguments (arity>0), or variables and numerical constants (terminal set T).
The initialisation process is critical to the success of the evolution. The convergence rate, or number of generations needed to produce an acceptable individual, usually improves if as much different combinations of primitives as possible are generated, resulting from the increase in the likelihood of finding right from the first generation a set of “good” individuals, or syntax trees corresponding to metamodels with high accuracy and generalisation ability. According to the experiments performed by Langdon (2000), GP populations appear to retain a long-term memory of the way they are initialised: Langdon (2000, p. 110) showed that the average tree size to depth ratio in the population tends to keep constant and equal to the value of this ratio set during the initialisation.

Many techniques have been developed to ensure variability in the initial population. The most common are no limit method, full method, grow method and ramped (half-and-half) method (Koza 1992).

Initialisation might not be necessarily random, and any prior knowledge regarding the shape or primitives of the desired solution can be exploited at this initialisation stage. For example, high-quality metamodels generated by other metamodelling techniques can be embedded in the initial population, hopefully providing important genetic material that can boost the search for an even better metamodel (Poli et al. 2008).

2. Fitness function

The fitness function has the role of creating “environmental pressure” into the artificial world populated by syntax trees. It is a function that assesses the quality of each individual phenotype and assigns it a score, called fitness value. It is assumed that a smaller fitness value corresponds to more accurate metamodels. Historically, the first metric used to measure fitness value was the sum of the absolute errors (Koza 1992) (examples may be found also in Nordin et al. (1996) and Smits and Kotanchek (2004)):

$F(k,t)=\sum_{j=1}^{m}|f(\mathbf{x}_{j})-\tilde{f}_{k,t}(\mathbf{x}_{j})| \quad \quad (1)$

where:

• $F(k,t)$ is the fitness value of individual $k$ at generation $t$
• $m$ is the number of fitness cases in the training data set
• $f(\mathbf{x}_{j})$ is the actual response of the system at the fitness case $\mathbf{x}_{j}$
• $\tilde{f}_{k,t}(\mathbf{x}_{j})$ is the value returned by individual $k$ at generation $t$ for the fitness case $\mathbf{x}_{j}$

The metric defined by Eq. (3.1) has also been used in its averaged form (average of the sum of the absolute errors) (Langdon et al. 1999). Many other metrics have been introduced, based on 2-norm (root mean square error or sum of the squared errors (Banzhaf 1994, Chellapilla 1997)), p-norm, inf-norm (p-norm with p=inf is the magnitude of the maximum error on the training data set) or hit (error defined only beyond a given threshold) (Smits and Kotanchek 2004). One of the most common fitness functions is anyway the root mean square error (RMSE), or the squared averaged 2-norm:

$F(k,t)= \sqrt{\dfrac{1}{m} \sum_{j=1}^{m} {\lvert f(\mathbf{x}_{j})-\tilde{f}_{k,t}(\mathbf{x}_{j}) \rvert}^2 }$

where the symbols used have the same meaning indicated for Eq. (1). Regardless the norm chosen to define the fitness function, dividing the norm by the number of fitness cases is generally recommended in case the number of fitness cases fed into the GP implementation is increased throughout the evolution, to avoid sudden increments in fitness value due only to the addition of new fitness cases.

3. Selection

The selection process identifies the fittest individuals in the population and grants them the privilege of passing their genotypes to the new generations through the reproduction stage. Better fitness values (lower error) determine higher selection probability, whereas worse fitness values (higher error) reduce such probability. As the probabilistic nature of the selection process is another assumption of the evolutionary paradigm, it is important to remind that even though selection encourages the evolution of the fittest individuals, it does not prevent low fitness individuals from spreading their genotype through reproduction.

Fitness proportionate selection (Koza 1992, Holland 1992, Whigham 1995, Nordin et al. 1996, Lew et al. 2006) is the original selection method proposed by J. R. Koza. This selection method assigns to each individual a probability to be selected that is proportional to its fitness value. The selection process can be easily implemented using the normalised fitness $n_k;t$ defined in Koza (1992): the fitness range [0; 1] is split in a number of subintervals equal to the number of individuals and having width $n_k;t$. The subinterval containing a number randomly generated in the range [0; 1] determines the selected individual. This process is often compared to the spin of a roulette wheel having slices of different angles, proportional to the individuals’ fitness values and so it can also be found in literature as roulette wheel selection (Fogel 1994, Ferreira 2001, Lew et al. 2006).

Another basic selection strategy is truncation selection (Zhang and Mühlenbein 1995, Blickle and Thiele 1997). In this method a subset of the population, usually defined by a percentage of the best individuals, is admitted into a mating pool, the rest of the population being excluded from any further operation. Individuals undergoing genetic operations are randomly selected in the mating pool with uniform probability.

Selection has a profound effect on evolution (Luke and Spector 1997 1998), therefore it should be performed in a balanced way. If selection gives too much evolutionary advantage to highly fit individuals population variability may be excessively reduced and the evolutionary search constrained to a subregion of the design space. On the other hand, if selective pressure is too low, and individuals probability to be selected loosely related to fitness value, the risk of a blind or undirected exploration of the search space is high. In both cases the chances of success are dramatically reduced: the evolution is likely to converge prematurely to a “globally suboptimal solution” (Koza 1992, p. 104). Such behaviour has been observed in all dialects of evolutionary computation (see Secton 2.3.1, Chapter 2) and for this reason many conclusions obtained in GA and ES can be applied to GP. This possibility is also underpinned by the independence of the selection mechanism from the rest of the GP implementation (Blickle and Thiele 1997, p. 362).

4. Genetic operators

Genetic operators role is to generate a new population from the privileged set of individuals identified by the selection process. The primary operators commonly used in genetic programming are reproduction, crossover and mutation.

Reproduction consists in copying a selected individual to the new population without changes to the genotype. Reproduction is usually associated with the concept of elite. The elite is a group of highly fit individuals that are propagated between generations without alteration, its use being termed elitism. In evolutionary algorithms elitism is a common way to keep a memory of the best solutions found during the exploration. It is usually recommended when at each generation extensive variations are introduced in selected individuals’ genotypes, to prevent such huge variations from having destructive effect on the good individuals found (Zhang and Mühlenbein 1995, Ferreira 2001, Lew et al. 2006, Vladislavleva 2008). Elitism is usually implemented by maintaining an archive, which may be part of the population or a separate population (Teller and Veloso 1996, Sætrom and Hetland 2003, Vladislavleva 2008). In either case, the archive is usually updated with the best individuals at each generation.

Crossover is historically considered the primary search operator in genetic programming (Koza 1992). The most common version of the crossover operator, generally referred to as standard crossover or subtree crossover, generates two offspring from two parents, swapping the subtrees branching out of two randomly selected nodes in each parent’s syntax tree, called crossover points. The animation below shows an example of a subtree crossover between two parents generating two offspring:

Subtree crossover

The result of the operation is two new syntax trees that resemble the parents but at the same present new features. The extent of the variation produced by subtree crossover on a syntax tree depends on the choice of the crossover points.

Mutation generates a single offspring from a single parent. A range of different strategies can be used to modify the parent genotype: mutation operators can just change a few nodes in a syntax tree or radically change the individual (Van Belle and Ackley 2002).    Mutation can be useful to escape suboptimal individuals, but also lethal, as it may turn a successful individual into a low-quality one in a single generation. Due to mutation unpredictability on fitness value, different methods have been developed to control the amount of variation introduced in the genotype (Chellapilla 1997). All of them require the choice, usually random, of a node, called mutation point, where the genotype modification has to take place. Point mutation and subtree mutation lie at the opposite ends of the spectrum regarding the amount of variation introduced in a genotype.
Point mutation is the mutation operator that produces the smallest “visible” variation in a syntax tree. It replaces the content of a node in the selected tree with another primitive of the same kind: a function is replaced by another function having the same number of arguments (arity), a variable or constant is replaced by another element of the terminal set (Sims 1993, Zhang and Mühlenbein 1995, Chellapilla 1997). The simple animation provided below shows how point mutation is performed:

Point mutation

Subtree mutation (Koza 1992, p. 106) produces a larger “visible” variation on the parent’s genotype than point mutation, as it replaces the branch stemming out of the mutation point with a subtree randomly generated, as shown in the example below.

Subtree mutation

5. Termination criteria

Termination criteria define the conditions under which the evolutionary search is stopped. The simplest and most commonly used criterion is to terminate the evolution when the quality of the best individual, measured by fitness (for example RMSE) or by the number of hits (Koza 1992), reach a given threshold. A hit is a score associated to an individual, ranging from 0 to the number of fitness cases in the building data set. A hit is scored when the response produced by a syntax tree for a fitness case is within a given tolerance from the corresponding actual response (Banzhaf 1994, Chellapilla 1997).

Criteria based on the measurement of the current quality of the best individual evolved, like the ones described above, fail in detecting if the population as a whole still has the potential to further improve the quality of the best individual. As a result, they may prematurely terminate the evolution, in case scarce knowledge of the problem leads the user to define a low threshold. A few termination criteria have been developed to stop the evolution automatically when the whole potentiality of the evolution is considered exploited. These approaches are based on different definitions of population “hidden potential”:

• hidden potential as impossibility to improve the population (Affenzeller and Wagner 2004, Yin et al. 2007). This approach assumes that convergence is reached when it
is not possible to generate a certain number of individuals that have better fitness value than their parents
• hidden potential as impossibility to perturb the population (Banzhaf et al. 1996). It is usually observed that in the final stage of the evolution the quality of the best individuals hardly improves or degrades, due to the dramatic decrease of both destructive and constructive crossover rates (Nordin et al. 1996). The proportion of destructive crossovers performed each generation can then be used as a signal of the impossibility to further perturbate the population, and so of convergence. This strategy was used in Banzhaf et al. (1996): the rate of destructive crossover was monitored throughout the evolution, and when that rate fell under 10% the evolution was stopped. It is interesting to note that this approach is slightly different from the previous one, as it is based on the idea that convergence is a condition in which the fitness of the best individuals of the population can neither be increased nor decreased using crossover.

Acknowledgements

Thanks to http://www.stykz.net/index.php for the free animation software!

References

• M. Affenzeller and S. Wagner. SASEGASA: A new generic parallel evolutionary algorithm for achieving highest quality results. Journal of Heuristics, 10:243–267, 2004.
• L. Altenberg. Emergent phenomena in genetic programming. In A.V. Sebald and L.J. Fodel, editors, Proceedings of the 3rd annual conference on evolutionary programming, San Diego, California, USA, 1994. World Scientific, Singapore.
• W. Banzhaf, F.D. Francone, R.E. Keller, and P. Nordin. Genetic programming: an introduction: on the automatic evolution of computer programs and its applications. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 1998.
• T. Blickle. Evolving compact solutions in genetic programming: A case study. In H.-M. Voigt, W. Ebeling, I. Rechenberg, and H.-P. Schwefel, editors, Parallel Problem Solving From Nature IV. Proceedings of the International Conference on Evolutionary Computation, volume 1141 of LNCS, pages 564–573, Berlin, Germany, 1996. Springer-Verlag.
• M. Brameier and W. Banzhaf. Linear Genetic Programming. Number XVI in Genetic and Evolutionary Computation. Springer, 2007.
• A. E. Eiben and M. Schoenauer. Evolutionary computing. Information Processing Letters, 82:1–6, 2002.
• D. B. Fogel. An introduction to simulated evolutionary optimization. IEEE Transactions on Neural Networks, 5(1):3 –14, 1994.
• E. Johanson and R. Poli. GP-Music: An interactive genetic programming system for music generation with automated fitness raters. Technical Report CSRP-98-13, University of Birmingham, School of Computer Science, 1998.
• J. R. Koza. Genetic programming: on the programming of computers by means of natural selection. MIT press, Cambridge, MA, USA, 6th edition, 1992.
• Z. Michalewicz. Genetic Algorithm + Data Structures = Evolution Programs. Springer-Verlag, New York, USA, 3rd edition, 1996.
• R. Poli, W. B. Langdon, and N. F. McPhee. A field guide to genetic programming. Published via http://lulu.com and freely available at http://www.gp-field-guide.org.uk, 2008.
• K. Sims. Interactive evolution of equations for procedural models. The Visual Computer, 9(8):466–476, 1993.
• Winkler, H. Efendic, L. Del Re, M. Affenzeller, and S. Wagner. Online modelling based on genetic programming. International Journal of Intelligent Systems Technologies and Applications, 2(2/3):255–270, 2007.
• B.-T. Zhang and H. Mühlenbein. Balancing accuracy and parsimony in genetic programming. Evolutionary Computation, 3(1), 1995.