Genetic Algorithms and the Knapsack Problem: A Beginners’ Guide | by Egor Howell

[ad_1]

Get hands-on experience with genetic algorithms and learn how to solve the knapsack problem step by step

In one of my previous articles, we introduced and discussed the genetic optimization algorithm. This method is inspired by the theory of evolution where the fittest solutions in a population survive and pass on their more optimal characteristics. As a result, the weaker characteristics become extinct over time and we converge to better solutions.

For this post, I assume the reader has some basic knowledge of the inner workings of the genetic algorithm. If you are unfamiliar, I highly suggest checking out my previous post linked detailing the algorithm linked below:

In this post, we will carry out a walkthrough on how you can apply the genetic algorithm to a famous combinatorial optimization problem, the knapsack problem. The knapsack problem has many real-life applications such as inventory management, traffic control, and supply chain efficiency. Therefore, it is an important problem and concept that Data Scientists should be aware of.

The statement the knapsack problem poses is very simple:

‘There are several items with corresponding weights and values. We need to determine which set of items leads to the highest total value subject to a weight constraint’

This problem sounds simple to solve. However, the number of solutions is subject to a combinatorial explosion as the number of items increases. Therefore, the problem quickly becomes intractable to solve by brute force. This is where we turn to approximate solution heuristic algorithms like the genetic algorithm. These don’t guarantee in finding the global optimum but will find a reasonable solution (local optimum) in a sufficient duration of time.

Translating the Problem

In this walkthrough, we will be considering the simplest knapsack version, the 0–1 knapsack problem. This is where an item is either wholly in the knapsack, 1 or True, or it is not at all in the knapsack, 0 or False.

In the case of the genetic algorithm, the allele is 0 or 1 (in or not in), the gene is a specific item and the chromosomes are potential solutions. The diagram below visualizes this translation:

Setup

We begin by setting up our problem by listing the number of items with their corresponding weights and values and also the maximum weight the knapsack can hold.

GitHub Gist by author.

Algorithm

After setting up our problem, we need to decide on the operators for our genetic algorithm. For this problem, we will use quite a simple set:

Selection: Roulette wheel selection. This technique samples from a probability distribution of the current solutions in the population, N, where each solution’s probability, p_s, of being selected is related to its fitness, f_s:

Mutation: The mutation will be very simple. Each gene will have a 10% chance of changing alleles. This may be too high depending on the length of the chromosome as it is often recommended the mutation rate be 1/length.

It is best practise to code all these operators as their own methods in a GeneticAlgorithm class:

GitHub Gist by author.

Results

We can now run our algorithm and plot the fitness score as a function of generations (iterations):

GitHub Gist by author.

As we can see, over the generations the algorithm is finding a better and better solution. However, the main caveat, as with all heuristic algorithms, is that we do not know if this solution is the global optimum.

Another key point with genetic algorithms is that there are so many different operators with sundry versions. The ones selected here were pretty arbitrary, however, for big industry-related problems it is important to try several operators to see if one suits the problem better than another. This process is called hyper-heuristics and is an interesting emerging field between machine learning and optimization.

In this article, we have shown how to apply the genetic optimization algorithm to the knapsack problem. This problem is very simple to understand but due to its combinatorial nature, it quickly explodes in complexity. The genetic algorithm is a heuristic that finds a reasonable solution to this problem in sufficient time.

The full code used in this blog can be found on my GitHub here:

(All emojis designed by OpenMoji — the open-source emoji and icon project. License: CC BY-SA 4.0)

[ad_2]

Source link