Introduction to Gradient Descent with Python

In this article, I’m going to talk about a popular optimization algorithm in machine learning: gradient descent. I’ll explain what gradient descent is, how it works, and then we’ll write the gradient descent algorithm from scratch in Python. This article assumes you are familiar with derivatives in Calculus.

What is Gradient Descent?

Gradient descent is an optimization algorithm for finding the minimum of a function. The algorithm does this by checking the steepness of the slope along the graph of a line and using that to slowly move towards the lowest point, which presumably has a slope of 0. Write an algorithm to find the lowest cost.

You can think of gradient descent as akin to a blind-folded hiker on top of a mountain trying to get to the bottom. The hiker must feel the incline, or the slope, of the mountain in order to get an idea of where she is going. If the slope is steep, the hiker is closer to the peak and can take bigger steps. If the slope is less steep, the hiker is closer to the bottom and takes smaller steps. If the hiker feels flat ground (a zero slope), she can assume she’s reached the bottom, or minimum.

So given a function with a convex graph, the gradient descent algorithm attempts to find the minimum of the function by using the derivative to check the steepness of points along the line and slowly move towards a slope of zero. After all, “gradient” is just another word for slope.

Implement Gradient Descent in Python

Before we start, import the SymPy library and create a “symbol” called x. We’ll be needing these lines later when we are working with math functions and derivatives.

from sympy import *
x = Symbol('x')

We create our gradient_descent function and give it two parameters: cost_fn, starting_point and learning_rate. The cost_fn is the math function that we want to find the minimum of. The initial_guess parameter is the integer that is our first guess for the x-value of the minimum of the function. We will update this variable to be our new guess after each learning iteration. The last parameter is the learning rate.

def gradient_descent(cost_fn, initial_guess, learning_rate):
    df = cost_fn.diff(x)
    df = lambdify(x, df)

    new_x = initial_guess

    for n in range(100):
        # Step 1: Predict (Make a guess)
        previous_x = new_x

        # Step 2: Calculate the error
        gradient = df(previous_x)

        # Step 3: Learn (Make adjustments)
        new_x = previous_x - learning_rate * gradient

Inside the function, we first get the derivative of the cost function that was inputted as a parameter using the diff function of the SymPy library. We store the derivative in the df variable. Then, we use the lambdify function because it allows us to plug our predictions into the derivative function. Read my article on calculating derivatives in Python for more info on this.

In the for loop, our gradient descent function is following the 3-step algorithm that is used to train many machine learning tools:

  1. Predict (Make a guess)
  2. Calculate the error
  3. Learn (Make adjustments)

You can learn more about this process in this article on how machines “learn.”

In the for loop, the first step is to make an arbitrary guess for the x-value of the minimum of the function. We do this by setting previous_x to new_x, which is the user’s initial guess. previous_value will help us keep track of the preceding prediction value as we make new guesses.

Next, we calculate the error or, in other words, we see how far our current guess is from the minimum of the function. We do this by calculating the derivative of the function at the point we guessed, which will give us the slope at that point. If the slope is large, the guess is far from the minimum. But if the slope is close to 0, the guess is getting closer to the minimum.

Next, we “learn” from the error. In the previous step, we calculated the slope at the x-value that we guessed. We multiply that slope by the learning_rate and subtract that from the current guess value stored in previous_x. Then, we store this new guess value back into new_x.

Then, we run these steps over and over in our for loop until the loop is over.

Before we run our gradient descent function, let’s add some print statements at the end so we can see the values of at the minimum of the function.

print('Minimum occurs at x-value:', min_x)
print('Slope at the minimum is: ', df(min_x))

Now, let’s run our gradient descent function and see what type of output we get with an example. In this example, the cost function is f(x) = x2. The initial guess for x is 3 and the learning rate is 0.1

my_fn = x**2
gradient_descent(my_fn, 3, 0.1)

Currently, we are running the learning loop an arbitrary amount of times. In this example, the loop runs 100 times. But maybe we don’t need to run the loop this many times. Oftentimes you already know ahead of time how precise a calculate you need. You can tell the loop to stop running once a certain level of precision is met. There are many ways you can implement this, but I’ll show you using that for loop we already have.

precision = 0.0001

for n in range(100):
    previous_x = new_x
    gradient = df(previous_x)
    new_x = previous_x - learning_rate * gradient
    
    step_size = abs(new_x - previous_x) 
    
    if step_size < precision:
        break

First, we define a precision value that the gradient descent algorithm should be within. You can also make this a parameter to the function if you choose.

Inside the loop, we create a new variable called step_size which is the distance between previous_x and new_x, which is the new guess that was just calculated in the “learning” step. We take the absolute value of this difference in case it’s negative.

If the step_size is less than the precision we specified, the loop will finish, even if it hasn’t reached 100 iterations yet.

Instead of solving a cost function analytically, the gradient descent algorithm converges on the minimum of a function by brute force. Like a blind-folded hiker, the algorithm goes down the valley (the cost function), following the slope of the graph until it reaches the minimum point.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s