LeetCode #125: Valid Palindrome (Solution in Python & Explanation)

In this article, I will be explaining how to solve the Valid Palindrome problem on LeetCode. This is an Easy-level problem.

The Problem

A phrase is a palindrome if, after converting all uppercase letters into lowercase letters and removing all non-alphanumeric characters, it reads the same forward and backward. Alphanumeric characters include letters and numbers.

Given a string s, return true if it is a palindrome, or false otherwise.

Example 1:

Input: s = "A man, a plan, a canal: Panama"
Output: true
Explanation: "amanaplanacanalpanama" is a palindrome.

Example 2:

Input: s = "race a car"
Output: false
Explanation: "raceacar" is not a palindrome.

Example 3:

Input: s = " "
Output: true
Explanation: s is an empty string "" after removing non-alphanumeric characters.
Since an empty string reads the same forward and backward, it is a palindrome.

Solution

Alphanumeric characters are the lowercase letters a-z, capital letters A-Z, and integers 0-9.

So, to solve this, we first want to remove all non-alphanumeric characters, including spaces, punctuation, etc., from the input string. Then, we need to check the new string to see if it’s a palindrome. We have two ways we can do this.

Solution #1

For this solution, we create a new string which contains all of the alphanumeric characters from the input string (lowercased). Then, we compare this new string to its reversed version. If they are the same, the function returns true. If not, it returns false.

In order to create the new string, we loop through the characters in the original input string and, if the character is alphanumeric, we append it to the new string.

Finally, we can check if the string (without any special characters or spaces) is a palindrome by using a boolean expression to equate it to its reverse. We can get the reverse of a string by using str[::-1]

def isPalindrome(self, s: str) -> bool:
    newStr = ""
    
    for char in s:
        if char.isalnum():
            newStr += c.lower()
    return newStr == newStr[::-1]

This solution has a time complexity of O(n) because we have to loop through the string to remove non-alphanumeric characters. It also has a space complexity of O(n) because we are creating another string, which could be at most the size of the input string.

Solution #2

For this solution, we will create pointers to the beginning and end of the array which will make sure that the two sides have the same characters in the same order.

First, create two pointers— one which points to the beginning of the string and the other which points to the end. The two pointers will move inwards, checking each character until the reach the middle or overlap with each other.

As the pointers loop through the characters, they will skip any characters that are not alphanumeric. To do this, we use while loops that will continue to run until the character at the left/right index is alphanumeric. We also add a check to make sure that the left/right index is in bounds by checking left < right / right > left

We must make sure to compare the lowercase versions of the characters to each other. If any character at the left pointer does not match the corresponding character at the right pointer, then we know that it’s not a palindrome and can return False. If the entire loop finishes then that must mean the string is a palindrome and we can return True.

def isPalindrome(self, s: str) -> bool:
    left, right = 0, len(s) - 1 
 
    while left < right:
        while left < right and not s[left].isalnum():
            left +=1
        while right > left and not s[right].isalnum():
            right -= 1
        if s[left].lower() != s[right].lower():
            return False 
        left, right = left + 1, right - 1
    return True

The time complexity for this solution is O(n) because you are still looping through the entire string. The space complexity is O(1) because you are not making a new string.

LeetCode: Contains Duplicates (Solution in Python & Explanation)

In this article, I will be explaining how to solve the Contain Duplicates problem on LeetCode. This is an Easy-level question for Arrays.

The Problem

Given an integer array nums, return true if any value appears at least twice in the array, and return false if every element is distinct.

Example 1:

Input: nums = [1,2,3,1]
Output: true

Example 2:

Input: nums = [1,2,3,4]
Output: false

Example 3:

Input: nums = [1,1,1,3,3,4,3,2,4,2]
Output: true

Solution

This problem requires that we loop through the array. We may not have to loop through the entire array depending on when / if we find the first duplicate.

As we loop through the array, we check each array item to see if we’ve seen it before. How can we check if we’ve seen a value in the array before? The naive method would be to do a second loop through the array for each. The program will have a complexity of O(n2) because you are looping through each array element and then, within that loop, you are looping through each array element again.

def containsDuplicate(self, nums: List[int]) -> bool:
        dups = set()
        for n in nums:
            for s in nums:
                if n == s:
                   return True
            dups.add(n)
        return False

This solution is slow and we can solve this problem in a more efficient way.

As we loop through the array, we can keep track of the values that we’ve seen thus far. Then, as we continue looping through and checking new values, we can check if they are already in the set of values that we’ve seen before. How can we keep track of the values that we’ve seen? We can use a hash set. This solution takes O(n) time because we are only looping through the array once instead of twice.

def containsDuplicate(self, nums: List[int]) -> bool:
        dups = set()
        for n in nums:
            if n in dups:
                return True
            dups.add(n)
        return False

There is another solution that is even more simple in terms of readability. It is only one line. Basically, we convert the list of numbers into a hash set and compare the length of the set to the length of the original list. If the lengths are different then that must mean that there are duplicate items in the original list because the duplicates are taken out of the set.

This solution uses the no duplicates property of hash sets to solve the problem in an extremely simple way. The time complexity of this solution is also O(n) because that is the complexity of converting a list to a set.

def containsDuplicate(self, nums: List[int]) -> bool:
        return len(set(nums))!=len(nums)

Overlapping Histograms with Matplotlib Library in Python

In this tutorial, you will learn how to plot overlapping histograms on the same graph. This is helpful when you want to show a comparison between two sets of data.

Step 1: Import the matplotlib library and matplotlib.pyplot interface

import pandas as pd

import matplotlib
%matplotlib inline
import matplotlib.pyplot as plt

Step 2: Load the dataset

baby_df = pd.read_csv('baby.csv')
baby_df.head(5)

Step 3: Plot overlapping histograms

# Split the dataframe by column value
smoker_df = baby_df.loc[baby_df['Maternal Smoker'] == True]
nonsmoker_df = baby_df.loc[baby_df['Maternal Smoker'] == False]

# Generate histogram plot
plt.hist(smoker_df["bmi"], 
         label='Maternal Smokers BMI')
  
plt.hist(nonsmoker_df['bmi'], 
         label='Maternal Non-smokers BMI')
  
plt.legend(loc='upper right')
plt.title('Mother BMI for smokers and non-smokers')
plt.show()

We’ve generated overlapping histograms! But, in this graph, it’s hard to see the blue histogram. We can give the histograms an opacity value less than 1.0 so that they become translucent, or see-through. This will allow us to see both of them.

Set alpha values

The only difference is adding the optional alpha parameter to the hist method. The alpha value can be any decimal between 0 and 1. Each plot can have a unique alpha value.

# Generate histogram plot
plt.hist(smoker_df["bmi"], 
         alpha=0.5,
         label='Maternal Smokers BMI')
  
plt.hist(nonsmoker_df['bmi'], 
         alpha=0.5
         label='Maternal Non-smokers BMI')

You can also have more than 2 overlapping plots. In this case, it can be helpful to manually set the color for each histogram. We can do this by adding the color parameter to each hist call:

plt.hist(dataset['value'], 
         alpha=0.5, 
         label='val 1',
         color='red') # customized color parameter
  
plt.hist(dataset['value2'], 
         alpha=0.5,
         label='val 2',
         color='green')
  
plt.hist(dataset['value3'], 
         alpha=0.5,
         label='val3',
         color='yellow')

How to Split Pandas DataFrame into Multiple DataFrames by Column Value in Python

In this tutorial, I will show you how to create a separate DataFrame for each value for a given column.

Use the pandas.DataFrame.groupby(column) method to group the DataFrame by the values found in the column named column.

grouped_df = df.groupby('Column')

This method returns a GroupBy object. You can see this yourself by printing the new dataframe by running print grouped_df:

<pandas.core.groupby.generic.DataFrameGroupBy object at 0x7efe3ec55670>

With the GroupBy object from the previous function, call the DataFrameGroupBy.get_group(group) method on the new dataframe grouped_df

This method will return a Dataframe of all the rows that have the value group in the column named column.

grouped_df.get_group('column_value')

Alternatively, you can call both methods in one line:

value_df = df.groupby('Column').get_group('column_value')

Return the n largest/smallest values for column in DataFrame

Get n-largest values from a particular column in Pandas DataFrame

df.nlargest(5, 'Gross')

Return the first n rows with the smallest values for column in DataFrame

df.nsmallest(5, ['Age'])

To order by the smallest values in column “Age” and then “Salary”, we can specify multiple columns like in the next example.

df.nsmallest(5, ['Age', 'Salary'])

There is also an optional keep parameter for the nlargest and nsmallest functions. keep has three possible values: {'first', 'last', 'all'}. The default is 'first'

Where there are duplicate values:

  • first : take the first occurrence.
  • last : take the last occurrence.
  • all : do not drop any duplicates, even it means selecting more than n items.
df.nlargest(5, 'Gross', keep='last')

Working with a New Dataset / DataFrame

When you are working with a new Pandas DataFrame, these attributes and methods will give you insights into key aspects of the data.

The dir function let’s you look at all of the attributes that a Python object has.

dir(df)

The shape attribute returns a tuple of integers indicating the number of elements that are stored along each dimension of an array. For a 2D-array with N rows and M columns, shape will be (N,M). 

df.shape

You may be working with a dataframe that has hundreds or thousands of rows. To get a glimpse of the data inside a dataframe without printing out all of the values you can use the head and tail methods.

Returns the first n rows in the dataframe

df.head() # returns rows 0-4
df.head(n) # returns the first n rows

Returns the last n rows in the dataframe

df.tail()
df.tail(n)

The count method of a dataframe shows you the number of entries for each column

df.count()

Check if there are any missing values in any of the columns

pd.isnull(df).any()

The info method of the dataframe gives a bunch of information. It tells

  1. The number of entries in the df
  2. The names of the columns
  3. The number of columns
  4. The number of entries in each column
  5. The dtype of each column
  6. If there are null values in a column
df.info()

Different Ways to Create Pandas DataFrames

A Pandas DataFrame is a 2D labeled data structure with columns of potentially different types.

There are a variety of different methods and syntaxes that can be used to create a pd.DataFrame.

Firstly, make sure you import the pandas module:

import pandas as pd

Method 1: Creating DataFrame from list of lists

# initialize list of lists
data = [['bob', 20], ['jane', 30], ['joe', 40]]
 
# Create the pandas DataFrame
df = pd.DataFrame(data, columns = ['Name', 'Age'])
df

Output:

Method #2: Creating DataFrame from dictionary of lists

In this method, you define a dictionary which has the column name as the key which corresponds to an array of row values.

# initialize dictionary of lists
data = {'Name': ['Bob', 'Joe', 'Jane', 'Jack'],
        'Age': [30, 30, 21, 40]}
 
# Create DataFrame
df = pd.DataFrame(data)
df

Output:

You can use custom index values for the DataFrame by adding a parameter to the pd.DataFrame function. Set the optional index parameter of the pd.DataFrame function to an array of strings for the index values.

df = pd.DataFrame(data, index=['first',
                                'second',
                                'third',
                                'fourth'])
df

Output:

In the same way that we just defined the index values, you can also define the column names separately. Set the optional columns parameter of the pd.DataFrame function to an array of strings for the column values.

Notice that the row values are now defined as a list of lists rather than a dictionary of lists. This is because the column values are no longer being defined with them.

df = pd.DataFrame(
    [[4,5,6],
     [7,8,9],
     [10,11,12]],
    index = ['row_one','row_two','row_three'],
    columns=["a","b","c"]
    )

df

Output:

Method #3: Creating DataFrame using zip() function.

The zip function returns an iterator of tuples where the corresponding items in each passed iterator is paired together. By calling the list function on the object returned from the zip function, we convert the object to a list which can be passed into the pd.DataFrame function.

name = ["Bob", "Sam", "Sally", "Sue"]
age = [19, 17, 51, 49]

data = list(zip(name, age))

df = pd.DataFrame(data,
                  columns = ['Name', 'Age'])

df

Output:

How to Split Training and Test Data in Python

In this article, I’ll be explaining why you should split your dataset into training and testing data and showing you how to split up your data using a function in the scikitlearn library.

If you are training a machine learning model using a limited dataset, you should split the dataset into 2 parts: training and testing data.

The training data will be the data that is used to train your model. Then, use the testing data to see how the algorithm performs on a dataset that it hasn’t seen yet.

If you use the entire dataset to train the model, then by the time you are testing the model, you will have to re-use the same data. This provides a slightly biased outcome because the model is somewhat “used” to the data.

We will be using the train_test_split function from the Python scikitlearn library to accomplish this task. Import the function using this statement:

from sklearn.model_selection import train_test_split

This is the function signature for the train_test_split function:

sklearn.model_selection.train_test_split(*arrays, test_size=None, train_size=None, random_state=None, shuffle=True, stratify=None)

The first parameters to the function are a sequence of arrays. The allowed inputs are lists, numpy arrays, scipy-sparse matrices or pandas dataframes.

So the first argument is gonna be our features variable and the second argument is gonna be our targets.

# X = the features array
# y = the targets array
train_test_split(X, y, ...)

The next parameter test_size represents the proportion of the dataset to include in the test split. This parameter should be either a floating point number or None (undefined). If it is a float, it should be between 0.0 and 1.0 because it represents the percentage of the data that is for testing. If it is not specified, the value is set to the complement of the train size.

This is saying that I want the test data set to be 20% of the total:

train_test_split(X, y, test_size=0.2)

train_size is the proportion of the dataset that is for training. Since test_size is already specified, there is no need to specify the train_size parameter because it is automatically set to the complement of the test_size parameter. That means the train_size will be set to 1 – test_size. Since the test_size is 0.2, train_size will be 0.8.

The function has a shuffle property, which is set to True by default. If shuffle is set to True, the function will shuffle the dataset before splitting it up.

What’s the point of shuffling the data before splitting it? If your dataset is formatted in an ordered way, it could affect the randomness of your training and testing datasets which could hurt the accuracy of your model. Thus, it is recommended that you shuffle your dataset before splitting it up.

We could leave the function like this or add another property called random_state.

random_state controls the shuffling applied to the data before applying the split. Pass an int for reproducible output across multiple function calls. We are using the arbitrary number 10. You can really use any number.

train_test_split(X, y, test_size=0.2, random_state=10)

The function will return four arrays to us: a training and testing dataset for the feature(s), and a training and testing dataset for the target.

We can use tuple unpacking to store the four values that the function returns:

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=10)

Now, you can verify that the splitting was successful.

The percent of the training set will be the number of rows in X_train divided by the total number of rows in the dataset as a whole:

len(X_train)/len(X)

The percent of the testing dataset will be the number of rows in X_test divided by the total number of rows in the dataset:

len(X_train)/len(X)

The numbers returned by these calculations will probably not be exact numbers. For example, if you are using an 80/20 split, then this division by give you numbers like 0.7934728 instead of 0.80 and 0.1983932 instead of 0.20.

That’s it!

How to set Python default version to 3.x on macOS

In this tutorial, I’ll show you how to update your Mac’s default version of Python from 2.x to 3.x.

If Python 3.X is already installed on your system, even if it’s not the default version, then you should be able to run version 3.X using the python3 command in your Terminal. But in this tutorial, I’ll be showing you how you can run version 3.X using the default python command.

Step 0: Install Homebrew

Homebrew is a very popular package manager for macOS. If you don’t already have it installed, you can install it by running this command:

/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"

Step 1: Install Python with Homebrew

This command will Python 3.

brew install python

Step 2:

Next, we’ll check the symlinks for Python 3.x. A symlink or a Symbolic Link is simply enough a shortcut to another file.

ls -l /usr/local/bin/python*

This should output something like the following:

Look at the first line. It shows default python being symlinked to the brew installed python3. If it does not show this exactly, then set it as the default python symlink run the following:

ln -s -f /usr/local/bin/python3 /usr/local/bin/python

You should probably run this command just to be sure. Run ls -l /usr/local/bin/python* again and make sure you have the desired output.

Step 3: Verify Python 3.X install

Run the following command to see which executable Python binary will launch when you issue a python command to the shell:

which python

The output should be this: /usr/local/bin/python

Finally, check if the default Python version has changed:

python --version

The terminal should output the newest version of Python 3.x. For me, it is Python 3.9.7.

That’s it! Hope this helps.

How to Create 3-D Charts with Matplotlib in Jupyter Notebook

In this article, I will show you how to work with 3D plots in Matplotlib. Specifically, we will be making a 3D line plot and surface plot.

First, import matplotlib and numpy. %matplotlib inline just sets the backend of matplotlib to the ‘inline’ backend, so the output of plotting commands shows inline (directly below the code cell that produced it).

import matplotlib.pyplot as plt
import numpy as np

%matplotlib inline

Add this import statement to work with 3D axes in matplotlib:

from mpl_toolkits.mplot3d.axes3d import Axes3D

Now, let’s generate an empty 3D plot

fig = plt.figure()
ax = plt.axes(projection='3d')

plt.show()

3-D Line Plot

Now, it’s time to put a graph on the plot. We’ll start by making a 3D line plot.

We need to define values for all 3 axes:

z = np.linspace(0, 1, 100)
x = z * np.sin(25 * z)
y = z * np.cos(25 * z)

If we print out the shape of the arrays we just created, we’ll see that they are one-dimensional arrays.

>>> print('Z Array: ', z.shape)
Z Array:  (100,)

This is important because the plot3D function only accepts 1D arrays as inputs. Now, we can add the plot!

ax.plot3D(x, y, z, 'blue')

plt.show()

Final code:

import matplotlib.pyplot as plt
import numpy as np
from mpl_toolkits.mplot3d.axes3d import Axes3D
%matplotlib inline

fig = plt.figure()
ax = plt.axes(projection='3d')

z = np.linspace(0, 1, 100)
x = z * np.sin(25 * z)
y = z * np.cos(25 * z)

ax.plot3D(x, y, z, 'blue')

plt.show()

3-D Surface Plot

Now, we will make a 3D surface graph on the plot.

We need to define values for all 3 axes:

x = np.linspace(start=-2, stop=2, num=200)
y = x_4.copy().T
z = 3**(-x_4**2-y_4**2)

The x, y, z arrays we just created are all 1D arrays. The surface plot function requires 2D array inputs so we need to convert the numpy arrays to be 2D. We can use the reshape function for this.

x = np.reshape(x,(1, x.size))
y = np.reshape(y,(1, y.size))
z = np.reshape(z,(1, z.size))

So now if we print our arrays, we see that they’re 2D.

>>> print('X Array: ', x.shape)
X Array:  (200, 200)

Now, we can plot the surface graph!

ax.plot_surface(x, y, z)

plt.show()

Final code:

import matplotlib.pyplot as plt
import numpy as np
from mpl_toolkits.mplot3d.axes3d import Axes3D
%matplotlib inline

fig = plt.figure()
ax = plt.axes(projection='3d')

x = np.linspace(start=-2, stop=2, num=200)
y = x_4.copy().T
z = 3**(-x_4**2-y_4**2)

x = np.reshape(x,(1, x.size))
y = np.reshape(y,(1, y.size))
z = np.reshape(z,(1, z.size))

ax.plot_surface(x, y, z)

plt.show()

We can also add a title and axis labels to the graph:

ax.set_title('My Graph')

ax.set_xlabel('X', fontsize=20)
ax.set_ylabel('Y', fontsize=20)
ax.set_zlabel('Z', fontsize=20)

This is just a basic intro to 3D charting with Matplotlib. There are a variety of other types of plots and customizations that you can make. Happy graphing!