Genetic Algorithm for Hyper-Parameter Tuning
Charles Darwin: “Natural Selection” is a manuscript, in which he presented his theory of natural selection and its role in biological evolution.
Darwin regarded Natural Selection as his main work, while On the Origin of Species was written for a wider audience. He always intended to finish Natural Selection, but because of frail health, the publicity and work involved in publishing six editions of On the Origin of Species, plus other research and publications, he never got around to finish it.
Natural Selection was transcribed after Darwin’s death, and first published in 1975.
“Survival of the fittest” is a phrase that originated from Darwinian evolutionary theory as a way of describing the mechanism of natural selection. The biological concept of fitness is defined as reproductive success. In Darwinian terms the phrase is best understood as “Survival of the form that will leave the most copies of itself in successive generations.”
Gregor Mendel: “Father of modern genetics”. Mendelian inheritance is a type of biological inheritance that follows the principles originally proposed by Gregor Mendel. Ronald Fisher combined these ideas with the theory of natural selection in his 1930 book The Genetical Theory of Natural Selection , putting evolution onto a mathematical footing and forming the basis for population genetics within the modern evolutionary synthesis.
Fisher being the first to argue that “Mendelism therefore validates Darwinism” and stating with regard to mutations that “The vast majority of large mutations are deleterious; small mutations are both far more frequent and more likely to be useful”
Optimization using Genetic Algorithm:
Genetic algorithms (GAs) are search algorithms based on the principles of natural selection and genetics. Genetic algorithms abstract the problem space as a population of individuals, and try to explore the fittest individual by producing generations iteratively. GA evolves a population of initial individuals to a population of high quality individuals, where each individual represents a solution of the problem to be solved. The quality of each rule is measured by a fitness function as the quantitative representation of each rule’s adaptation to a certain environment. The procedure starts from an initial population of randomly generated individuals. During each generation, three basic genetic operators are sequentially applied to each individual with certain probabilities, i.e. selection, crossover and mutation.
- [Start] Generate random population of n chromosomes (suitable solutions for the problem).
- [Fitness] Evaluate the fitness f(x) of each chromosome x in the population.
- [New population] Create a new population by repeating following steps until the new population is complete
a. [Selection] Select two parent chromosomes from a population according to their fitness (the better fitness, the bigger chance to be selected).
b. [Crossover] With a crossover probability cross over the parents to form a new offspring (children). If no crossover was performed, offspring is an exact copy of parents.
c. [Mutation] With a mutation probability mutate new offspring at each locus (position in chromosome).
d. [Accepting] Place new offspring in a new population.
4. [Replace] Use new generated population for a further run of algorithm.
5. [Test] If the end condition is satisfied, stop, and return the best solution in current population.
6. [Loop] Go to step 2.
Optimizing Machine Learning models with GA:
Library “darwin-mendel” from PyPi uses the GA technique to perform hyper-parameter tuning for several Regression and Classification models. Below is the example of Random Forest Classification using a common dataset from sklearn.
Initial population is created by randomly picking the values of hyper-parameters from individual’s universal sets (Provided by user as the param dictionary). The size of population is defined by the user (Default is 50). Below is the table that signifies the initial population.
Fitness function is defined by the accuracy matrix, selected by the user. If MAPE is selected for a regression model then it will become the criteria for selecting the next batch of off-springs and this is how the minimal MAPE is reached. If accuracy is selected for a classification model then the aim of the algorithm would be to select and recombine chromosomes that will produce children to maximize accuracy.
Mutation rate by default is 5% (user can change that also) which is done to take impurities from outside to reach the global minima quickly.
The more number of generation is a GA, the better will be accuracy but time taken will also be increased. To take a decision, user can plot a graph between number of generation and accuracy, the point of inflection would be clear (Default value is 10).
Code for darwin-mendel:
Code for RandomizedSearchCV:
Code for GridSearchCV:
Keeping the data and range of hyper-parameters same, 3 different algorithms were tried. PFB the results of them.
From the above table, it’s evident that the accuracy and time taken by GA is very efficient. On the large ranges of parameters, GA gets to the best combination keeping optimization of accuracy as the objective very quickly. Both random search and grid search are not a smart way of tuning hyper-parameters, GA is a guided way to achieve that.