Python count unique values in column

To count unique values within a column using Python's pandas library, there are several effective approaches including using methods like Series.unique(), Series.nunique(), and value_counts(). We can also manage multiple columns, eliminate duplicates, and tally unique values within individual rows.

Welcome to our blog on counting distinct values in pandas dataframe column using Python! If you've ever worked with data, you know that counting unique values is a common task that can reveal valuable insights. In this article, we'll walk you through various methods to efficiently count unique values in a column using Python, making your data analysis tasks a breeze. We'll cover techniques like Series.unique(), Series.nunique(), and value_counts()

Moreover, we'll explore how to handle multiple columns and even count unique values in each row. Let's dive in and unlock the power of Python to handle your data with ease!

code, programming, python

Python count unique values in column using Series.unique()

The Series.unique() method helps us obtain a list of unique values from a specific column in our DataFrame. Let's assume we have a DataFrame df with a column called "Courses" that contains various course names. To get the count of unique courses, we use the unique() method followed by size to count the number of elements in the resulting list.

# Example DataFrame

import pandas as pd

import numpy as np

data = {'Courses': ['Math', 'Science', 'History', 'Math', 'Geography']}

df = pd.DataFrame(data)

# Get Unique Count using Series.unique()

count = df['Courses'].unique().size

print("Number of unique courses:", count)

In this example, the output will be Number of unique courses: 4, as we have four distinct courses in the "Courses" column.

Python count unique values in column using Series.nunique()

The Series.nunique() method is another handy way to count unique values in a column. Instead of calling the unique() method and then calculating the size, we can directly use nunique() to get the count of unique elements in the column. Here's how to do it:

# Example DataFrame (continuation from the previous example)

count = df['Courses'].nunique()

print("Number of unique courses:", count)

The output will be the same as before, Number of unique courses: 4. The nunique() method makes it more convenient to get the count of unique values in a column.

Python count unique values in column and frequency of each value

Sometimes, we may need to know how many times each unique value appears in a column. The value_counts() method comes in handy for this task. Let's take our previous DataFrame and find the frequency of each course:

# Example DataFrame (continuation from the previous example)

frequency = df['Courses'].value_counts()

print("Frequency of each course:\n", frequency) # column print

Output:

Frequency of each course:

Math     2

Science  1

History  1

Geography 1

Name: Courses, dtype: int64

Using drop_duplicates() to remove duplicate values from dataframe

The drop_duplicates() method allows us to remove duplicate rows from a DataFrame and obtain a new DataFrame without duplicates. We can then calculate the count of unique elements using the size attribute. Let's see how to do it:

# Example DataFrame (continuation from the previous example)

count = df['Courses'].drop_duplicates().size

print("Number of unique courses:", count)

The output will be Number of unique courses: 4, which is the same as before, as we removed duplicate rows before counting.

Python count unique values from multiple columns

Now, let's explore how to count unique values when considering multiple columns. In this example, we will use two columns: "Courses" and "Fee." We will combine the columns to create a new DataFrame, drop duplicate rows, and then calculate the number of unique rows in the resulting DataFrame:

# Example DataFrame (continuation from the previous example)

df_multi = df[['Courses', 'Fee']].drop_duplicates()

count = df_multi.shape[0]

print("Number of unique rows with 'Courses' and 'Fee':", count)

The output will be Number of unique rows with 'Courses' and 'Fee': 1, as both columns have the same data in all rows, and we dropped the duplicates.

Python count unique values in each row

In some cases, we might be interested in counting the number of unique values in each row. To achieve this, we can use the nunique() method along with axis=1. 

# Example DataFrame (continuation from the previous example)

row_unique_counts = df.nunique(axis=1)

print("Number of unique values in each row:\n", row_unique_counts)

Output:

Number of unique counts in each row:

0 1

1 1

2 1

3 1

4 1

dtype: int64

Since we have only one course in each row, the result is 1 for each row.

Conclusion

In this article, we have explored different methods to count unique values in a column using Python. We have covered techniques like Series.unique(), Series.nunique(), and value_counts(). Additionally, we have learned how to handle multiple columns and count unique values in each row using appropriate methods. By mastering these techniques, you can efficiently analyze data, gain valuable insights, and make informed decisions in your data-driven projects. 

You can also check these blogs:

  1. Get current year in Python
  2. Jaccard similarity in Python
  3. Python JSON Validator
  4. rstrip vs strip in Python: Trimming whitespaces made easy
  5. Master String Trimming in Python
  6. Python t-Test Simplified
  7. Apply function to list in Python
  8. Dataframe to List in Python
  9. Get class name as string in Python
  10. File Renaming in Python