Introduction to Gradient Descent with Python

In this article, I’m going to talk about a popular optimization algorithm in machine learning: gradient descent. I’ll explain what gradient descent is, how it works, and then we’ll write the gradient descent algorithm from scratch in Python. This article assumes you are familiar with derivatives in Calculus.

What is Gradient Descent?

Gradient descent is an optimization algorithm for finding the minimum of a function. The algorithm does this by checking the steepness of the slope along the graph of a line and using that to slowly move towards the lowest point, which presumably has a slope of 0. Write an algorithm to find the lowest cost.

You can think of gradient descent as akin to a blind-folded hiker on top of a mountain trying to get to the bottom. The hiker must feel the incline, or the slope, of the mountain in order to get an idea of where she is going. If the slope is steep, the hiker is closer to the peak and can take bigger steps. If the slope is less steep, the hiker is closer to the bottom and takes smaller steps. If the hiker feels flat ground (a zero slope), she can assume she’s reached the bottom, or minimum.

So given a function with a convex graph, the gradient descent algorithm attempts to find the minimum of the function by using the derivative to check the steepness of points along the line and slowly move towards a slope of zero. After all, “gradient” is just another word for slope.

Implement Gradient Descent in Python

Before we start, import the SymPy library and create a “symbol” called x. We’ll be needing these lines later when we are working with math functions and derivatives.

from sympy import *
x = Symbol('x')

We create our gradient_descent function and give it two parameters: cost_fn, starting_point and learning_rate. The cost_fn is the math function that we want to find the minimum of. The initial_guess parameter is the integer that is our first guess for the x-value of the minimum of the function. We will update this variable to be our new guess after each learning iteration. The last parameter is the learning rate.

def gradient_descent(cost_fn, initial_guess, learning_rate):
    df = cost_fn.diff(x)
    df = lambdify(x, df)

    new_x = initial_guess

    for n in range(100):
        # Step 1: Predict (Make a guess)
        previous_x = new_x

        # Step 2: Calculate the error
        gradient = df(previous_x)

        # Step 3: Learn (Make adjustments)
        new_x = previous_x - learning_rate * gradient

Inside the function, we first get the derivative of the cost function that was inputted as a parameter using the diff function of the SymPy library. We store the derivative in the df variable. Then, we use the lambdify function because it allows us to plug our predictions into the derivative function. Read my article on calculating derivatives in Python for more info on this.

In the for loop, our gradient descent function is following the 3-step algorithm that is used to train many machine learning tools:

  1. Predict (Make a guess)
  2. Calculate the error
  3. Learn (Make adjustments)

You can learn more about this process in this article on how machines “learn.”

In the for loop, the first step is to make an arbitrary guess for the x-value of the minimum of the function. We do this by setting previous_x to new_x, which is the user’s initial guess. previous_value will help us keep track of the preceding prediction value as we make new guesses.

Next, we calculate the error or, in other words, we see how far our current guess is from the minimum of the function. We do this by calculating the derivative of the function at the point we guessed, which will give us the slope at that point. If the slope is large, the guess is far from the minimum. But if the slope is close to 0, the guess is getting closer to the minimum.

Next, we “learn” from the error. In the previous step, we calculated the slope at the x-value that we guessed. We multiply that slope by the learning_rate and subtract that from the current guess value stored in previous_x. Then, we store this new guess value back into new_x.

Then, we run these steps over and over in our for loop until the loop is over.

Before we run our gradient descent function, let’s add some print statements at the end so we can see the values of at the minimum of the function.

print('Minimum occurs at x-value:', min_x)
print('Slope at the minimum is: ', df(min_x))

Now, let’s run our gradient descent function and see what type of output we get with an example. In this example, the cost function is f(x) = x2. The initial guess for x is 3 and the learning rate is 0.1

my_fn = x**2
gradient_descent(my_fn, 3, 0.1)

Currently, we are running the learning loop an arbitrary amount of times. In this example, the loop runs 100 times. But maybe we don’t need to run the loop this many times. Oftentimes you already know ahead of time how precise a calculate you need. You can tell the loop to stop running once a certain level of precision is met. There are many ways you can implement this, but I’ll show you using that for loop we already have.

precision = 0.0001

for n in range(100):
    previous_x = new_x
    gradient = df(previous_x)
    new_x = previous_x - learning_rate * gradient
    step_size = abs(new_x - previous_x) 
    if step_size < precision:

First, we define a precision value that the gradient descent algorithm should be within. You can also make this a parameter to the function if you choose.

Inside the loop, we create a new variable called step_size which is the distance between previous_x and new_x, which is the new guess that was just calculated in the “learning” step. We take the absolute value of this difference in case it’s negative.

If the step_size is less than the precision we specified, the loop will finish, even if it hasn’t reached 100 iterations yet.

Instead of solving a cost function analytically, the gradient descent algorithm converges on the minimum of a function by brute force. Like a blind-folded hiker, the algorithm goes down the valley (the cost function), following the slope of the graph until it reaches the minimum point.

How Do Machines Learn?

You’ve probably heard of machine learning models that can read human handwriting or understand speech. You might know that these models had to be trained in order to accomplish these tasks– they had to learn. But how exactly does a machine “learn”? What are the steps involved?

In this article, I’m going to be giving a high-level overview of how the “learning” in machine learning happens. I’m going to talk about fundamental ML concepts including cost functions, optimization, and linear regression. I’ll outline the basic framework used in most machine learning techniques.

Data is the foundational of any machine learning model. In a nutshell, the data scientist feeds a bunch of data into the ML model and, as it starts to “learn” from the data, the model will eventually develop a solution. What is the solution? The solution is typically a function that describes the relationship in the data. For a given input, the function should be able to provide the expected output.

In the case of linear regression, one of the most basic ML models, the regression model “learns” two parameters: the slope and the intercept. Once the model learns these parameters to the desired extent, the model can be used to compute the output y for a given input X (in the linear regression equation y = b0 + b1*X). If you’re unfamiliar with linear regression, take a look at my article on linear regression to understand this better.

So now that we know what the goal of machine learning is, we can talk about how exactly the learning happens. The machine learning model usually follows three core steps in order to “learn” the relationship in the data as described by the solution function:

  1. Predict
  2. Calculate the error
  3. Learn

The first step is for the model to make a prediction. To start, the model may make arbitrary guesses for the values that it is solving for in the solution function. In the case of linear regression, the ML model would make guesses for the values of the slope and intercept.

Next, the model would check its prediction against the actual test data and see how good/bad the prediction was. In other words, the model calculates the error in its prediction. In order to compare the prediction against the data, we need to find a way to measure how “good” our prediction was.

Finally, the model will “learn” from its error by adjusting its prediction to have a smaller error.

The model will repeat these 3 steps– predict, calculate error, and learn– a bunch of times and slowly come to the best coefficients for the solution. This simple 3-step algorithm is the basis for training most machine learning models.

When I talked about calculating error earlier, I didn’t talk about the ways in which we measure how “good” or “bad” our predictions are. That leads me to the next topic: cost functions. In machine learning, a cost function is a mechanism that returns the error between predicted outcomes and the actual outcomes. Cost functions measure the size of the error to help achieve the overall goal of optimizing for a solution with the lowest cost.

The objective of an ML model is to find the values of the parameters that minimize the cost function. Cost functions will be different depending on the use case but they all have this same goal.

The Residual Sum of Squares is an example of a cost function. In linear regression, the Residual Sum of Squares is used to calculate and measure the error in predicted coefficient values. It does this by finding the sum of the gaps between the predicted values on the linear regression line and the actual data point values (check out this article for more detail). The lowest sum indicates the most accurate solution.

Cost functions fall under the broader category of optimization. Optimization is a term used in a variety of fields, but in machine learning it is defined as the process of progressing towards the defined goal, or solution, of an ML model. This includes minimizing “bad things” or “costs”, as is done in cost functions, but it also includes maximizing “good things” in other types of functions.

In summary, machine learning is typically done with a fundamental 3-step process: make a prediction, calculate the error, and learn / make adjustments. The error in a prediction is calculated using a cost function. Once the error is minimized, the model is done “learning” and is left with a function that should provide the expected result for future data.

Introduction to Linear Regression

In this article, I will define what linear regression is in machine learning, delve into linear regression theory, and go through a real-world example of using linear regression in Python.

What is Linear Regression?

Linear regression is a machine learning algorithm used to measure the relationship between two variables. The algorithm attempts to model the relationship between the two variables by fitting a linear equation to the data.

In machine learning, these two variables are called the feature and the target. The feature, or independent variable, is the variable that the data scientist uses to make predictions. The target, or dependent variable, is the variable that the data scientist is trying to predict.

Before attempting to fit a linear regression to a set of data, you should first assess if the data appears to have a relationship. You can visually estimate the relationship between the feature and the target by plotting them on a scatterplot.

If you plot the data and suspect that there is a relationship between the variables, you can verify the nature of the association using linear regression.

Linear Regression Theory

Linear regression will try to represent the relationship between the feature and target as a straight line.

Do you remember the equation for a straight line that you learned in grade school?

y = mx + b, where m is the slope (the number describing the steepness of the line) and b is the y-intercept (the point at which the line crosses the vertical axis)

Equation of a Straight Line

Equations describing linear regression models follow this same format.

The slope m tells you how strong the relationship between x and y is. The slope tells us how much y will go up or down for a given increase or decrease in x, or, in this case, how much the target will change for a given change in the feature.

In theory, a slope of 0 would mean there is no relationship at all between the data. The weaker the relationship is, the closer the slope is to 0. But if there is a strong relationship, the slope will be a larger positive or negative number. The stronger the relationship is, the steeper the slope is.

Unlike in pure mathematics, in machine learning, the relationship denoted by the linear equation is an approximation. That’s why we refer to the slope and the intercept as parameters and we must estimate these parameters for our linear regression. We even use a different notation in which the intercept constant is written first and the variables are greek symbols:

Simple Linear Regression in Python (From Scratch) | by Aidan Wilson |  Towards Data Science

Even though the notation is different, it’s the exact same equation of a line y=mx+b. It is important to know this notation though because it may come up in other linear regression material.

But how do we know where to make the linear regression line when the points are not straight in a row? There are a whole bunch of lines that can be drawn through scattered data points. How do we know which one is the “best” line?

There will usually be a gap between the actual value and the line. In other words, there is a difference between the actual data point and the point on the line (fitted value/predicted value). These gaps are called residuals. The residuals can tell us something about how “good” of an estimate our line is making.

Look at the size of the residuals and choose the line with the smallest residuals. Now, we have a clear method for the hazy goal of representing the relationship as a straight line. The objective of the linear regression algorithm is to calculate the line that minimizes these residuals.

For each possible line (slope and intercept pair) for a set of data:

  1. Calculate the residuals
  2. Square them to prevent negatives
  3. Add the sum of the squared residuals

Then, choose the slope and intercept pair that minimizes the sum of the squared residuals, also known as Residual Sum of Squares.

Linear regression models can also be used to estimate the value of the dependent variable for a given independent variable value. Using the classic linear equation, you would simply substitute the value you want to test for x in y = mx + b; y would be the model’s prediction for the target for your given feature value x.

Linear Regression in Python

Now that we’ve discussed the theory around Linear Regression, let’s take a look at an example.

Let’s say we are running an ice cream shop. We have collected some data for daily ice cream sales and the temperature on those days. The data is stored in a file called temp_revenue_data.csv. We want to see how strong the correlation between the temperature and our ice cream sales is.

import pandas
from pandas import DataFrame 

data = pandas.read_csv('temp_revenue_data.csv')

X = DataFrame(data, columns=['daily_temperature'])
y = DataFrame(data, columns=['ice_cream_sales'])

First, import Linear Regression from the scikitlearn module (a machine learning module in Python). This will allow us to run linear regression models in just a few lines of code.

from sklearn.linear_model import LinearRegression

Next, create a LinearRegression() object and store it in a variable.

regression = LinearRegression()

Now that we’ve created our object we can tell it to do something:

The fit method runs the actual regression. It takes in two parameters, both of type DataFrame. The feature data is the first parameter and the target data is the second. We are using the X and y DataFrames defined above., y)     

The slope and intercept that were calculated by the regression are available in the following properties of the regression object: coef_ and intercept_. The trailing underline is necessary.

# Slope Coefficient

# Intercept

How can we quantify how “good” our model is? We need some kind of measure or statistic. One measure that we can use is called R2, also known as the goodness of fit.

regression.score(X, y)
output: 0.5496...

The above output number (in percentage) is the amount of variation in ice cream sales that is explained by the daily temperature.

Note: The model is very simplistic and should be taken with a grain of salt. It especially does not do well on the extremes.

Complete Guide to JavaScript ES6 Destructuring

Destructuring assignment is a JavaScript technique in which you take the values from an array, or properties from objects, and assign them to local variables.

Using destructuring allows you to write code that is cleaner, more concise, and more readable.

We will first look at destructuring rules for arrays then for objects.


Let’s say we have an array of emojis:

const emojis = ['🐶', '🐱', '🐭'];

We want to pull out the values and assign them to local variables. We have two options of doing this:

 Option 1: One-by-One

const dog = emojis[0]; 
const cat = emojis[1];
const mouse = emojis[2];

By using destructuring assignment, you can accomplish the same thing with just one simple line.

 Option 2: Destructuring

const [dog, cat, mouse] = emojis;

Syntax for Destructuring an Array

Follow the const with a set of brackets []. Inside the brackets, you can assign a variable name for each index in the array. The variable’s value will coincide with the index at which it matches in the array.

Omit values from destructuring

If there is an array value that you don’t want to assign to a variable, you can omit it from the destructuring by adding a comma without a variable name to skip that index. In the example below, we are omitting cat:

const [dog, , mouse] = emojis;

Put remaining values in separate array

If you only want to name the first couple values and accumulate the rest into a smaller array, use the spread syntax (...)

In the example below, we are creating a variable for the first element then putting the rest in a shortened array:

const [dog,] = emojis;console.log(rest)        // ['🐱', '🐭']

Provide a default value

You can set a default value for the element at an index in case the value in the array is undefined. Think of it like a “fallback” value.

const emojis = [undefined, '🐱', '🐭']const [dog = '🐕', cat, mouse] = emojisconsole.log(dog)         // '🐕'


Let’s say we have an object describing a person:

const person = {
name: 'Alex',
age: 23,
inSchool: true

We want to pull out the properties of the object and assign them to local variables. We have two options of doing this:

 Option 1: One-by-One

const name =;
const age = person.age;
const inSchool = person.inSchool;

 Option 2: Destructuring

const { name, age, inSchool } = person;

Syntax for Destructuring an Object

Follow the const with a set of braces {}. Inside the braces, put the names of object properties that you want.

The variable names in the destructured object (on the left of the equals) must match the property names exactly.

Use custom variable name for property

To use a different name for the variable than what’s provided on the object itself, put the property name then a colon followed by the new variable name.

const { name: personName, age, inSchool } = person;console.log(personName)       // 'Alex'

Defining a new name is useful when dealing with name collisions or when destructuring with JSON objects whose property names are not valid variable names because they’re strings.

Nested property destructuring

You can also access the properties of objects within objects through destructuring.

const people = {
person = {
name = 'Bob'
}const { person: { name } } = people;console.log(name) // 'Bob'

Provide a default value

You can provide a default value for a property in case the object does not have that property.

Since the person object we defined above doesn’t have a job property, it will be set to the default value ( ‘Unemployed’):

const { name, age, inSchool, job = 'Unemployed' } = person;


We have an array of dogs:

const dogs = [
{ name: "Sally", age: 6, children: { name: "Blue", age: 1 }},
{ name: "Fido", age: 4 },
{ name: "Sissy", age: 3},

How would you use destructuring to assign a variable for Sally’s child’s name with just two lines?


const [sallyInfo,] = dogs;const { children: { blueName } } = sallyInfo;console.log(blueName)       // 'Blue'

I hope you enjoyed this lesson. Thanks for reading! Comment any questions.

Flutter Error: The argument type ‘String’ can’t be assigned to the parameter type ‘Uri’

The Error

If you are using a string URI when dealing with the http package in Flutter, you may be seeing this error:

The Error
The argument type 'String' can't be assigned to the parameter type 'Uri' at .... (argument_type_not_assignable)

This error is due to an update in the package.

The Solution

Parse the String to be an explicit URI by using the Uri.parse() method:

http.get(yourString) becomes http.get(Uri.parse(yourString)) becomes

Here is it in an example:

String dataURL = "";
http.Response response = await http.get(Uri.parse(dataURL));

To improve compile-time type safety, the http package (version 0.13.0) introduced changes that made all functions that previously accepted Uris or Strings now accept only Uris instead.

You will need to explicitly use Uri.parse to convert Strings to Uris. In the previous version, the http packaged called that for you behind the scenes.

How to Use MediaStreams in React

Do you need to access the user’s webcam for video chat or the user’s microphone for a recording? In this simple tutorial, I’ll show you how to access and use the computer’s media inputs with React.

The MediaDevices interface provides access to connected media input devices like cameras and microphones.

Get access to user’s media input

After getting the user’s permission, the MediaDevices.getUserMedia() method produces a MediaStream. This stream can have multiple tracks containing the requested types of media. Examples of tracks include video and audio tracks.

The getUserMedia() method takes in a constraints parameter with two members: video and audio, describing any configuration settings for the tracks. It returns a Promise that resolves to a MediaStream object. You can set your video element’s src to the stream.

// get the user's media stream
    const startStream = async () => {
        let newStream = await navigator.mediaDevices
            video: true,
            audio: true,
          .then((newStream) => {
            webcamVideo.current.srcObject = newStream;


Here are some examples of preferences that you can customize in the stream:

// Requests default video and audio
{ audio: true, video: true }

// Requests video with a preference for 1280x720 camera resolution. No audio
  audio: false,
  video: { width: 1280, height: 720 }

// Requires minimum resolution of 1280x720
  audio: true,
  video: {
    width: { min: 1280 },
    height: { min: 720 }

// Browser will try to get as close to ideal as possible
  audio: true,
  video: {
    width: { min: 1024, ideal: 1280, max: 1920 },
    height: { min: 576, ideal: 720, max: 1080 }

// On mobile, browser will prefer front camera 
{ audio: true, video: { facingMode: "user" } }

// On mobile, browser will prefer rear camera
{ audio: true, video: { facingMode: { exact: "environment" } } }

Save user’s media stream in a variable

After you get the user’s media stream from .getUserMedia(), you should save the stream in a state variable. This is so that you can manipulate the stream later (to stop it, get a track from it, etc.)

For example, if you want to stop the stream, get all of the stream’s tracks using the MediaStream.getTracks() method and call the .stop() method on them.

If you want to access the audio separately, use the MediaStream.getAudioTracks() method. To access video separately, use MediaStream.getVideoTracks().

You should also have state that controls if media input is on or off. You should use the useRef hook to control the video element in the DOM.

This is the final code:

import React, { useState, useRef } from 'react';

const App = () => {
    // controls if media input is on or off
    const [playing, setPlaying] = useState(false);

    // controls the current stream value
    const [stream, setStream] = useState(null);
    // controls if audio/video is on or off (seperately from each other)
    const [audio, setAudio] = useState(true);
    const [video, setVideo] = useState(true);

    // controls the video DOM element
    const webcamVideo = useRef();

    // get the user's media stream
    const startStream = async () => {
        let newStream = await navigator.mediaDevices
            video: true,
            audio: true,
          .then((newStream) => {
            webcamVideo.current.srcObject = newStream;


    // stops the user's media stream
    const stopStream = () => {
        stream.getTracks().forEach((track) => track.stop());

    // enable/disable audio tracks in the media stream
    const toggleAudio = () => {
        stream.getAudioTracks()[0].enabled = audio;

    // enable/disable video tracks in the media stream
    const toggleVideo = () => {
       stream.getVideoTracks()[0].enabled = !video;

    return (
      <div className="container">
	 <video ref={localVideo} autoPlay playsInline></video>
	    onClick={playing ? stopStream : startStream}>
	    Start webcam

	 <button onClick={toggleAudio}>Toggle Sound</button>
	 <button onClick={toggleVideo}>Toggle Video</button>

export default App;

How to Format on Save in VSCode

If format on save was previously not working for you, this is what fixed it for me.

  1. Install Prettier in VSCode (
  2. Open Settings pane (Command + ,)
  3. Make sure the Editor: Format on Save property is enabled (You can search and find the property in the search bar)
  4. Set the Editor: Default Formatter to Prettier – Code formatter

Next.js Global State w/ React Context API

The easiest way to implement global state management in your Next.js application is the React Context API.

I made a new file called ContextProvider.js and created a Context object called MyContext:

import React, { useState, useContext, useEffect } from 'react';

const MyContext = React.createContext();

export function useMyContext() {
    return useContext(MyContext);

export function MyProvider({ children }) {
    const [myValue, setMyValue] = useState(false);

    return (
        <MyContext.Provider value={{myValue, setMyValue}}>

Once this is done, go back to pages/_app.js and wrap your component with the context provider:

import { MyProvider } from '../contexts/MyProvider';

function MyApp({ Component, pageProps }) {
    return (
            <Component {...pageProps} />

export default MyApp;

Now I can use the state within MyContext in every part of the application.

Make sure to be careful about how much you put into Context. You don’t want unnecessary re-renders across pages when you could just share them across specific components.

Choosing the Best Type of Website for You

There are different types of websites/web apps that come with their own advantages and tradeoffs. The main categories of websites include:

  1. Static websites
  2. Single page applications (SPAs)
  3. Server side rendering (SSR)
  4. Static site generator

When choosing which type of website is best for your needs, you need to look at many factors including SEO, speed, ease of maintenance, technical skill, hosting, and more.

In this article, I will be comparing these four different types of websites based on SEO, speed, ease of maintenance.

SEO (Search Engine Optimization) relates to the process of increasing the traffic to a website from search engines.

Speed refers to how fast your website loads upon initial request and on any subsequent page requests.

Ease of maintenance deals with how convenient it is to update your code; is it modular or repeated?

Static websites

Static websites are made up of static HTML pages (which may include JavaScript and CSS).

Static simply means the site is not generated on the server. Instead of being rendered dynamically, the HTML (and CSS/JS) files just sit on the server waiting to be sent off to a client.

These website pages are uploaded to a CDN / web host. To view a new page on the site, a new request needs to be made to the server.

Static websites have good SEO because web crawlers can view all of the HTML content since the fully populated pages are sent from the server.

On the other hand, static sites are more annoying to update because they require more code re-writing. For example, if you want to change the design of a navigation bar used on all pages, you must manually update it on each page.

These sites can also be slow when navigating to a new page because you need to make a new request to the server each time.

These websites also are not the best at handling and displaying dynamic data.

Single Page Application (SPA)

A single page application is a web application that loads only a single page and updates the body content dynamically using client-side JavaSCript.

Only a single server request is made for an initial mostly-empty HTML page. The pages are populated with content using JS on the client side.

Some examples of SPA frameworks are React or Vue. These frameworks control all of the content, pages, and data fetching from the browser, not the server.

SPAs are faster than traditional static sites because SPAs only require 1 server request.

SPAs utilize component-driven design which makes updating the UI easier because you only have to update it in one place

SPAs are not SEO friendly because the server sends back blank HTML pages.

Server Side Rendered (SSR)

Server side rendered pages are rendered on the server– on the fly– after every page request

When a page request is made, the server first gets any data for the page from a database, then puts that data into a template file, and sends back the resulting HTML page to the browser.

SSR pages are good for SEO and they are easy to re-design because they use templating.

Because a fresh request must be made for every page, these types of sites can be slow.

Static Site Generator

Static Site Generation describes the process of compiling and rendering a website at build time before the site is deployed to the server.

Static pages are compiled at build-time (before deployment)

Before deployment, the SSG framework builds the site and fetches any data that needs to be put into pages. The framework then spits out static pages and a JS bundle that can be deployed to a CDN or static host.

The initial request for the site requires the server to send the files (similar to static site), but the site behaves more like a SPA afterwards: all routing is handled by the JS bundle.

This makes SSGs good for SEO, speed, and updatability.

You will have to decide which type of website is best for you based on the strengths and weaknesses of each type and how important these factors are to the website you’re building.