In this article we discuss about the concepts of Genetic Algorithm, how they works and how to use them to improve a Deep-Learning Neural Network.
No technical details or rocket science in this article.
Do not expect to get code or formulas in this article. I talk about the concepts and use references for the most complicated parts.
Concepts
Genetic Algorithm is (on my point of view) one of the best examples of how seek in the nature the knowledge necessary to improve and evolve the computer science.
As the name looks like, Generic Algorithms mimics to behavior and expertise that nature uses (with obviously success) to evolve and prevail on world and on the universe.
From billion years of evolution, the genomic way to evolve is combining information (aka Genes) from an Parent and an Mother, or in other worlds, from progenitors. Also, the genomic recombination (Crossover) is susceptible to mutations, and despite the name, mutations can be better.
Parents
Everything starts with the parents. Two of them to be more exactly. They provide the necessary structure and information to create the new children’s.
As i say in the beginning, this article uses an Deep Learning Neural Network to explain Generic Algorithm and now is time to add it into the explanation.
For education purpose, we create a Full Dense Neural Network with 5 Inputs and 3 Outputs, doesn’t matter the purpose of this Neural Network.
Initially our network has 2 fixed layers (input and output) and a variable number of hidden layers each with a variable number of neurons and with unknown Activation Function. We can call these combinations as Chromosome invoking the real name of that “”object/thing/element“” that’s carry our biological information.
But, in our case, we have one digital Chromosome holding the parameters that we want to optimize.
Quick review of a Neuron
So, usually we define the hidden layer structure using intuition and past experience. That works, but are not the best way to do that. Some options are in place to optimize and seek the best hidden layers structure and setup.
Traditionally we choose the brute force strategy, just creating models with all options (layer count, layer size, activation function and model optimizer), but that option consumes a tremendous amount of time and a lot of computational resources.
One smart strategy is to use a Generic Algorithm to crossover the candidates over generations until we can determinate the best combination for our model.
Chromosome Structure
Well, we have chromosomes, but, what they carry?
They carry information, information about our hidden layers, our neurons and about the model.
There many many many and many formats of a Chromosome, so i present that i use for my projects.
##############################################################
# Model Optimizer #
##############################################################
######Layer Layout (One Line for Each Layer) #################
# Number of Neurons, Layer Type, Neuron Activation Function #
# Number of Neurons, Layer Type, Neuron Activation Function #
# Number of Neurons, Layer Type, Neuron Activation Function #
##############################################################
Now in English.
My Chromosome stores:
- Model Optimizer (adam/rmsprop)
- Hidden Layer Layout
- Number of neurons (10, 20, 30, 40, 50, 60)
- Layer Type (Dense)Layer Layout (With each layer info). For each layer:
- Activation Function (relu/tanh/sigmoid)
The values above are examples, you should choose for you needs, but, usually the Model Optimizer and Activation Function options we don’t change. As example, in my next article (Using Generic Algorithm with Keras – Link as Soon), that’s are my parameters:
# Define Hyperparameters for NN
HIDDEN_LAYER_COUNT = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
HIDDEN_LAYER_DENSE_NEURONS = [8, 16, 24, 32, 64, 128, 256, 512]
HIDDEN_LAYER_DROPOUT_RATE = [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9]
HIDDEN_LAYER_ACTIVATIONS = ['tanh', 'relu', 'sigmoid', 'linear', 'softmax']
HIDDEN_LAYER_TYPE = ['dense', 'dropout']
MODEL_OPTIMIZER = ['adam', 'rmsprop']
How It Works?
As you can see in the diagram, the Genetic Algorithm implementation consist on 8 distinct steps. Now, let’s explore each step in details.
Initial Population
So, where the first parents came from? The simple answer is: Generate it randomly.
Yes, randomly. It works, trust me. Works like a charm.
Define your population size (maybe 20 or 30 individuals in population) and generate each one with totally random configuration. Randomly choose each parameters on chromosome.
That’s looks crazy i know. The first evaluation of they random agents will be disastrous. But, remember that nature starts that way in the beginning of life, using random chemical reactions.
Evaluate Individuals
We need to define success. But, what is success?
Best detection? Best classification? Lower False Positive? Speed of evaluation? Processing Time? Maybe the combination of all that information?
You should define an evaluator, a way to classify all models and score them.
As example, think about a Spam Detector Neural Network. What are the best evaluator? Maybe the Lower False Positives with High True Positive Detection.
Run evaluation on each model in current generation. Apply your evaluator and sort the individuals by rank.
Stop Decision
Defines a Stop condition. What are satisfactory condition?
Maybe after 10/20/100 cycles of processing (Generations) you choose the best of the best Neural Network Model.
Personally, i usually use 2 conditions:
- 100 generations; and
- after N (maybe 4 or 5) generations the individuals regress on the evolutionary process (Yes, that risk may happen). In other means, if some generations the best models are worst then previous models.
As you can see above, in generations 19, 20, 21 and 22 our model reverse they evolution.
Choosing Best Individuals + Random Individuals
After evaluate each individual in the population, select the N top fittest individuals from the population (Usually 4 in an population of 20). And also selected Y (usually 2 in an population of 20) randomly selected individuals from the non-top performers.
Choosing a Non Fittest individuals will keep us away from getting stuck in the local maximum and provides genetic variety on evolutionary processing.
Creating the Children’s, Crossover
After previous step we need to create new children’s to complete our new population through randomly (yes, random again) selection of each information of each of 2 parents.
For example, we took the 2 parents above. The result children are composed by parts from Parent 1 and parts from Parent 2 chosen randomly (in blue). We repeat the children generation until we complete our new population. That population should like that:
Mutations, for Good or not
Now, we have our children’s created. It’s time to apply some mutations with low random probability.
That’s is required to maintain some amount of randomness in the genetic algorithm and give the chance to appears a new specie that our initial generation was not able to create. As a random process, mutation can make good or bad changes in each chromosome. If the processes make a good changes, the chromosome will survive to evaluation process, if not, well, the chromosome will disappear on next generation.
Again, the probability to apply mutation varies, but i use exactly 10% of probability to apply mutation on each Chromosome parameter.
TIP: 10% of probability works well in an population of 20 individuals. If you uses a larger population, reduces the probability.
Under an 100 individuals population, consider choose between 5% and 2% of probability.
As you see on image above, our algorithm make 2 mutations. First on layer 2 and the second one in layer 3, changing the Activation Function and Neurons Count respectively.
Saving Model
Now, you just save the best model that you can find. You can save they to disk or database. That’s are important because the model contains all structure, parameters, weights and other configurations to run it in production environment.
Results
Our Neural Network show a fast evolution process starting on generation 2 (check Generation 1 Result, that’s is terrible) and the maximum of evolution was reach on Generation 13. After that we don’t find any improvements at Generation 36.
Our Model performs well in both Score/Fitness conditions: Average Score over Generation and Total Score over Generation.
So, i hove you enjoy it. 😉