The Ultimate Guide to Linear Programming for Big Data Optimization | by Rahulraj Singh

[ad_1]

Merging statistics, linear algebra, and data science for optimal decision making

Imagine working for an oil company in the Middle East. A large part of the region’s economy is based on the rigging your organization does and consequently, that is based on whether or not you find oil when you dig a new site. Digging up a new site, and setting up equipment for oil extraction is very costly, so you do not want to be wrong in choosing the site, especially since the company has a limited budget allocated to researching and finding new sites. How do you decide where to dig? How do you finalize the resources you will be allocating to this new exploration so you do not incur losses? You guessed it right, Linear Programming is here to rescue you. Having said that, it won’t find the best result every time, but it will ensure that your profits for every site are maximized. Still unclear on how this will work? No worries! I will solve this problem for you towards the end of this story.

The way I see it is, if you can formulate a problem in terms of linear equations and specify numeric constraints on all the variables part of this linear equation, you have a problem that can be solved using Linear Programming.

I was asked in an interview once, how you would explain Linear Programming (LP) to a 6-year-old. Of course, my answer was garbage, so I asked my interviewer how he would do it and this is what my interviewer said:

Linear programming solves problems by finding the best way to establish a task pertaining to certain rules. Imagine having a lot of toys and wanting to put them all in one box, but the box can only fit a limited number of toys. There is also a weight limit on how much the box can accommodate or it would break. Linear programming can help you figure out and select toys to put in the box so you are able to fit the most toys possible without breaking the box.

Let us now get a little technical and try to solve a problem using an LP technique.

Formulation of a Linear Program

For the formulation of a linear program, it is essential to keep the following concepts in mind:

The relationships that need to be simplified must be linear.
The values, either in number or in properties must be subject to a constraint.
The goal is to find a set of values that maximize or minimize the objective function while satisfying all the constraints.
The value of decision variables should be non-negative.

Strategies to Solve a Linear Program

Graphical Method: When you have two decision variables for solving a problem, the graphical method[5] can be used. It is essentially a graph that plots the constraints as lines and then tries to find an area on the graph that satisfies all the constraints presented by the individual lines. Then, you replace the variables in your objective function with the points referenced at the corner points of the feasible region area selected from the graph. The corners that yield the highest or lowest value (based on a maximization or minimization problem) is the result.
Simplex Method: The simplex method[6] can be used on any number of decision variables. Imagine the same graph we created for the graphical method above, and select a random point in the feasible region. Now, iteratively move to adjacent feasible solutions until you find the optimal solution. For doing this, you would need a pivot variable that will be used to perform row operations on the constraints as we move to adjacent, new feasible solutions until the optimal (maximum or minimum) value is reached.
Branch and Bound Method: For problems that only involve integers, the branch and bound method[7] might be a good bet. As the name suggests, it divides the problem into multiple small sub-problems and then solves each sub-problem individually. The solution would then be the combination of each sub-problem’s result in a way that minimizes or maximizes the final solution based on the requirement.
Cutting Plane Method: The cutting plane method[8] is an extension of the graphical method where it can introduce more than one constraint by cutting the feasible region plane by the newly added constraints. By doing this, it automatically reduces the optimal region that might contain the optimized result. And then, it follows the same process as the graphical method for finding the result.
Duality Theory: Duality theory[9] converts an LP problem into a dual problem and then attempts to solve it based on the properties of the dual problem. This method relies on the intuition that solving a dual problem optimally also solves the LP problem.
Stochastic Programming: If some parameters of the original problem are uncertain or stochastic, the stochastic programming[10] method can be used to solve the LP problem. Algorithms and techniques such as Monte Carlo simulation and scenario analysis can then be used to find the optimal solution.

Why is LP so important for Data Scientists?

It all comes down to the applications of linear programming and more to the fact that problems we face in the world today can efficiently be reduced to linear equations.

Solving a Decision Problem using Linear Programming

Irrespective of whether your industry is Technology, Manufacturing, or Finance, there is a very high probability that you are dealing with data and optimization problems.

Every decision relies on optimization because the idea is, you want to be doing better today then you were doing yesterday. How do you achieve that improvement? You optimize the exisiting solution.

Now, let’s return back to our oil rigging example. We are trying to maximize the profits for an oil rigging company and they need to understand how to allocate resources to best return the maximum profits.

Let say, they have 4 potential sites in their horizon and products that will potentially come out of these sites will sell at the following profits $25, $14, $38, and $20, respectively.
Since the manpower required in drilling and storing oil into containers, the maximum number of daily stored units cannot exceed 60.
Now for oil to be produced, these sites also require some raw products, named A and B (which is the resource input from the company’s costs). For each unit of production, the products require the following input consumption:

The first product needs three units of the raw material A.
The second product requires two units of the raw material A and only one unit of the raw material B.
The third product needs one unit of raw material A and two units of raw material B.
The fourth product needs three units of raw material B.
Also, the raw materials used in the oil industry are costly. So the company can only use up to a 100 units of the raw material A and up to a 120 units of raw material B in one day.

Let us now form our equations based on the conditions explained above:

Objective Function

Maximize (25 x_1 + 14 x_2 + 38 x_3 + 20 x_4) — Profit

Conditionals

x_1 + x_2 + x_3 + x_4 <= 60 (Manpower constraints)

3.x_1 + 2.x_2 + x_3 <= 100 (Raw Material A)

x_2 + 2.x_3 + 3.x_4 <= 120 (Raw Material B)

x_1, x_2, x_3, x_4 >= 0 (Non-zero constraint)

Using Scipy and PuLP to solve the linear program

There are two well-built libraries that allow us to solve Linear problems and we will use both of them on our problem to ascertain the results between them and come to a solid conclusion.

Let’s first import the necessary libarieis and packages needed:

pip install scipy
pip install pulp
from scipy.optimize import linprog
from pulp import LpMaximize, LpProblem, LpStatus, lpSum, LpVariable

1. Using Linprog optimization from Scipy

object_variables = [-25, -14, -38, -20]# Define a matrix for the left-side of our equations
matrix_1 = [[1, 1, 1, 1],   # Manpower Constraints
[3, 2, 1, 0],    # Raw Material Constraints for A
[0, 1, 2, 3]]    # Raw Material Constraints for B
# Define a matrix for the right-side of the equations 
matrix_2 = [60,     # Manpower Constraints 
100,     # Raw Material Constraints for A
120]     # Raw Material Constraints for B
optimization_function = linprog(c=object_variables, A_ub=matrix_1, 
b_ub=matrix_2, method="revised simplex")

Results

con: array([], dtype=float64)
fun: -2280.0
message: 'Optimization terminated successfully.'
nit: 1
slack: array([ 0., 40.,  0.])
status: 0
success: True
x: array([ 0.,  0., 60.,  0.])

2. Using PuLP optimization

# Define the model
model = LpProblem(name="resource-allocation", sense=LpMaximize)# Decision Variables
decision_variable = i: LpVariable(name=f"decision_variablei", 
lowBound=0) for i in range(1, 5)
# Conditionals (Constraints from the problem definition)
model += (lpSum(decision_variable.values()) <= 60, "Manpower_Constraints")
model += (3 * decision_variable[1] + 2 * decision_variable[2] + 
decision_variable[3] <= 100, "Raw_Material_Constraints_A")
model += (decision_variable[2] + 2 * decision_variable[3] + 
3 * decision_variable[4] <= 120, "Raw_Material_Constraints_B")
# Objective function
model += 25 * decision_variable[1] + 14 * decision_variable[2] 
+ 38 * decision_variable[3] + 20 * decision_variable[4]
# Solve the optimization problem and print results
status = model.solve()
print(f"status: model.status, LpStatus[model.status]")
print(f"objective: model.objective.value()")
for x in decision_variable.values():
print(f"x.name: x.value()")
for name, constraint in model.constraints.items():
print(f"name: constraint.value()")

Results

status: 1, Optimal
objective: 2280.0
decision_variable1: 0.0
decision_variable2: 0.0
decision_variable3: 60.0
decision_variable4: 0.0
Manpower_Constraints: 0.0
Raw_Material_Constraints_A: -40.0
Raw_Material_Constraints_B: 0.0

Analysis of Results

From the results, we can conclude that the maximum possible profit from the setup is 2280, and this can be confirmed by both the models that we are using above.
It also suggests that we should only rig the oil field once, for 60 units of Product C, which will yield the most profit.
It might be be disadvantageous to dig the other rigs, as the cost of creation exceeds the profits they will generate.
The first slack is 0 (represented by manpower_constraints in the pulp model) which means the factories should work at maximum capacity manpower, i.e. 60.
The second slack is 40, meaning the raw material A is being used for 60 units out of the total 100.
The final slack is 0, which means the factory should use all of raw material B, i.e. 120 units.

[ad_2]

Source link