df – Joshua Bowen's Notes

Get n-largest values from a particular column in Pandas DataFrame

df.nlargest(5, 'Gross')

Return the first n rows with the smallest values for column in DataFrame

df.nsmallest(5, ['Age'])

To order by the smallest values in column “Age” and then “Salary”, we can specify multiple columns like in the next example.

df.nsmallest(5, ['Age', 'Salary'])

There is also an optional keep parameter for the nlargest and nsmallest functions. keep has three possible values: {'first', 'last', 'all'}. The default is 'first'

Where there are duplicate values:

first : take the first occurrence.
last : take the last occurrence.
all : do not drop any duplicates, even it means selecting more than n items.

df.nlargest(5, 'Gross', keep='last')

When you are working with a new Pandas DataFrame, these attributes and methods will give you insights into key aspects of the data.

The dir function let’s you look at all of the attributes that a Python object has.

dir(df)

The shape attribute returns a tuple of integers indicating the number of elements that are stored along each dimension of an array. For a 2D-array with N rows and M columns, shape will be (N,M).

df.shape

You may be working with a dataframe that has hundreds or thousands of rows. To get a glimpse of the data inside a dataframe without printing out all of the values you can use the head and tail methods.

Returns the first n rows in the dataframe

df.head() # returns rows 0-4
df.head(n) # returns the first n rows

Returns the last n rows in the dataframe

df.tail()
df.tail(n)

The count method of a dataframe shows you the number of entries for each column

df.count()

Check if there are any missing values in any of the columns

pd.isnull(df).any()

The info method of the dataframe gives a bunch of information. It tells

The number of entries in the df
The names of the columns
The number of columns
The number of entries in each column
The dtype of each column
If there are null values in a column

df.info()

Tag: df

Return the n largest/smallest values for column in DataFrame

Working with a New Dataset / DataFrame