Dataframe to List in Python

To convert a pandas DataFrame to a list in Python, we can use methods like `values.tolist()`, `to_dict(orient='records')`, or list comprehension with `iterrows()`. These methods allow us to extract the DataFrame data and represent it in a list format for easier manipulation and analysis.

In Python, a DataFrame is a two-dimensional tabular data structure provided by libraries like Pandas. It offers powerful tools to manipulate and analyze data efficiently. There are scenarios where we might need to convert pandas dataframe column into a list to work with the data in a different format or to perform specific operations. In this blog, we'll explore various methods to convert a pandas DataFrame to list in Python.

Converting Python Dataframe to List using ‘values’ property and Dataframe constructor

In this method, we use the values attribute property of a data frame to convert dataframe column to a NumPy array. Then, we convert the NumPy array to a Python nested list. The values property returns the underlying data as a two-dimensional NumPy array, where each row corresponds to a row in the DataFrame, and each pandas column represents a pandas index column in the entire DataFrame.

Here's a step-by-step explanation of the code:

import pandas as pd

# Sample DataFrame

data = {'Name': ['John', 'Alice', 'Bob'],

     'Age': [28, 24, 22]}

df = pd.DataFrame(data)

We start by creating a sample DataFrame `df` with two columns: 'Name' and 'Age'.

# Converting DataFrame to a list

data_list = df.values.tolist()

Using the `values` property, we extract the underlying data from the DataFrame as a NumPy array. Then, we call the `tolist()` method on the NumPy array to convert it into a Python list. The result is the `data_list`, which contains the DataFrame data in list format.

The `data_list` will be:

[['John', 28], ['Alice', 24], ['Bob', 22]]

Converting Python Dataframe to List using `to_dict` method

In this method, we first convert the DataFrame into a dictionary using the `to_dict` method, specifying the `orient` parameter as `'records'`. The `'records'` orient returns a list of dictionaries where each dictionary corresponds to a row in the DataFrame.

Here's a detailed breakdown of the code:

import pandas as pd

# Sample DataFrame

data = {'Name': ['John', 'Alice', 'Bob'],

     'Age': [28, 24, 22]}

df = pd.DataFrame(data)

Similar to Method 1, we create the sample DataFrame `df`.

# Converting DataFrame to a dictionary

data_dict = df.to_dict(orient='records')

Using the `to_dict` method, we convert the DataFrame into a dictionary. The `orient` parameter is set to `'records'`, which means we want the output in the form of a list of dictionaries, where each dictionary represents a row in the DataFrame.

# Converting dictionary to a list

data_list = list(data_dict)

Finally, we convert the dictionary obtained from the to_dict method into a Python list using the list() function. The resulting data_list will be the same as in Method 1:

[{'Name': 'John', 'Age': 28}, {'Name': 'Alice', 'Age': 24}, {'Name': 'Bob', 'Age': 22}] # column names

Converting Python Dataframe to List using iterrows method

In this method, we use the iterrows method of the DataFrame to iterate over its rows one by one. For each row, we convert it to a Python list and append it to the final list containing all rows' data. Here's a step-by-step explanation of the code:

import pandas as pd

# Sample DataFrame

data = {'Name': ['John', 'Alice', 'Bob'],

     'Age': [28, 24, 22]}

df = pd.DataFrame(data)

Again, we create the sample DataFrame df`.

# Converting pandas column of DataFrame to a list using iterrows

data_list = [list(row) for index, row in df.iterrows()]

We use a list comprehension to iterate over each row in the DataFrame using the iterrows method. For each row, we extract its data as a Python list and append it to the data_list.

The resulting `data_list` will be:

[['John', 28], ['Alice', 24], ['Bob', 22]]

Conclusion

In this blog, we discussed three different methods to convert a DataFrame to a list in Python. Each method has its advantages and may be more suitable depending on your specific use case and the size of your dataset. The values property and the to_dict method are more straightforward and efficient, while the iterrows method offers more control over the conversion process. Choose the data structures and the method that best fits your needs and enables you to manipulate and work with the DataFrame data effectively in your Python projects.

You can also check these blogs:

  1. Splice in Python
  2. Exploring BigQuery Client for Python
  3. How to remove multiple items from a Python list?
  4. Python JSON Validator
  5. Simplify JSON Manipulation with Python jq
  6. Master String Trimming in Python
  7. Python t-Test Simplified
  8. Converting Lists to Sets in Python
  9. File Renaming in Python
  10. How to make a directory in Python if it does not exist?