In this article, I’m going to talk about a popular optimization algorithm in machine learning: gradient descent. I’ll explain what gradient descent is, how it works, and then we’ll write the gradient descent algorithm from scratch in Python. This article assumes you are familiar with derivatives in Calculus.
What is Gradient Descent?
Gradient descent is an optimization algorithm for finding the minimum of a function. The algorithm does this by checking the steepness of the slope along the graph of a line and using that to slowly move towards the lowest point, which presumably has a slope of 0. Write an algorithm to find the lowest cost.
You can think of gradient descent as akin to a blind-folded hiker on top of a mountain trying to get to the bottom. The hiker must feel the incline, or the slope, of the mountain in order to get an idea of where she is going. If the slope is steep, the hiker is closer to the peak and can take bigger steps. If the slope is less steep, the hiker is closer to the bottom and takes smaller steps. If the hiker feels flat ground (a zero slope), she can assume she’s reached the bottom, or minimum.
So given a function with a convex graph, the gradient descent algorithm attempts to find the minimum of the function by using the derivative to check the steepness of points along the line and slowly move towards a slope of zero. After all, “gradient” is just another word for slope.

Implement Gradient Descent in Python
Before we start, import the SymPy library and create a “symbol” called x. We’ll be needing these lines later when we are working with math functions and derivatives.
from sympy import * x = Symbol('x')
We create our gradient_descent
function and give it two parameters: cost_fn
, starting_point
and learning_rate
. The cost_fn
is the math function that we want to find the minimum of. The initial_guess
parameter is the integer that is our first guess for the x-value of the minimum of the function. We will update this variable to be our new guess after each learning iteration. The last parameter is the learning rate.
def gradient_descent(cost_fn, initial_guess, learning_rate): df = cost_fn.diff(x) df = lambdify(x, df) new_x = initial_guess for n in range(100): # Step 1: Predict (Make a guess) previous_x = new_x # Step 2: Calculate the error gradient = df(previous_x) # Step 3: Learn (Make adjustments) new_x = previous_x - learning_rate * gradient
Inside the function, we first get the derivative of the cost function that was inputted as a parameter using the diff
function of the SymPy library. We store the derivative in the df
variable. Then, we use the lambdify
function because it allows us to plug our predictions into the derivative function. Read my article on calculating derivatives in Python for more info on this.
In the for loop, our gradient descent function is following the 3-step algorithm that is used to train many machine learning tools:
- Predict (Make a guess)
- Calculate the error
- Learn (Make adjustments)
You can learn more about this process in this article on how machines “learn.”
In the for loop, the first step is to make an arbitrary guess for the x-value of the minimum of the function. We do this by setting previous_x
to new_x
, which is the user’s initial guess. previous_value
will help us keep track of the preceding prediction value as we make new guesses.
Next, we calculate the error or, in other words, we see how far our current guess is from the minimum of the function. We do this by calculating the derivative of the function at the point we guessed, which will give us the slope at that point. If the slope is large, the guess is far from the minimum. But if the slope is close to 0, the guess is getting closer to the minimum.
Next, we “learn” from the error. In the previous step, we calculated the slope at the x-value that we guessed. We multiply that slope by the learning_rate and subtract that from the current guess value stored in previous_x
. Then, we store this new guess value back into new_x
.
Then, we run these steps over and over in our for loop until the loop is over.
Before we run our gradient descent function, let’s add some print statements at the end so we can see the values of at the minimum of the function.
print('Minimum occurs at x-value:', min_x) print('Slope at the minimum is: ', df(min_x))
Now, let’s run our gradient descent function and see what type of output we get with an example. In this example, the cost function is f(x) = x2. The initial guess for x is 3 and the learning rate is 0.1
my_fn = x**2 gradient_descent(my_fn, 3, 0.1)
Currently, we are running the learning loop an arbitrary amount of times. In this example, the loop runs 100 times. But maybe we don’t need to run the loop this many times. Oftentimes you already know ahead of time how precise a calculate you need. You can tell the loop to stop running once a certain level of precision is met. There are many ways you can implement this, but I’ll show you using that for loop we already have.
precision = 0.0001 for n in range(100): previous_x = new_x gradient = df(previous_x) new_x = previous_x - learning_rate * gradient step_size = abs(new_x - previous_x) if step_size < precision: break
First, we define a precision value that the gradient descent algorithm should be within. You can also make this a parameter to the function if you choose.
Inside the loop, we create a new variable called step_size
which is the distance between previous_x
and new_x
, which is the new guess that was just calculated in the “learning” step. We take the absolute value of this difference in case it’s negative.
If the step_size
is less than the precision
we specified, the loop will finish, even if it hasn’t reached 100 iterations yet.
Instead of solving a cost function analytically, the gradient descent algorithm converges on the minimum of a function by brute force. Like a blind-folded hiker, the algorithm goes down the valley (the cost function), following the slope of the graph until it reaches the minimum point.