Sorting Data Frames Based on Column Values While Dealing With Complex Decimal Formats Using pandas in Python.

Sorting Data Frames Based on Column Values

In this article, we will explore how to sort a pandas data frame based on column values while dealing with complex formats such as decimal numbers with two digits after the decimal point.

Creating the Data Frame

To demonstrate our solution, let’s create a sample data frame with the col1 column in string format. We’ll shuffle the data randomly for illustration purposes.

data = ['9.1', '10.1', '10.2', '10.11', '9.8', '10.0', '9.10', '9.7', '9.9']
df = pd.DataFrame(data, columns=['col1'])

Splitting the Column

The key to sorting our data frame is splitting the col1 column into two separate columns, each representing the whole number and fractional parts of the decimal value.

new = df['col1'].str.split('.', expand=True)

Here, we’re using the .str.split() function from pandas to split the values in the col1 column. The expand=True argument tells pandas to create new columns instead of returning a Series with multiple elements.

The resulting new DataFrame will have two columns: 0 and 1, each containing the whole number and fractional parts of the original decimal value, respectively.

#    0   1
#0   9   1
#1  10   1
#2  10   2
#3  10  11
#4   9   8
#5  10   0
#6   9  10
#7   9   7
#8   9   9

Adding New Columns to the Data Frame

Now that we have our new DataFrame, let’s add these new columns back into our original data frame.

df['num0'] = new[0].astype(int)
df['num1'] = new[1].astype(int)

Here, we’re using the .astype() function to convert the values in our new DataFrame from strings to integers. We’re then assigning these new columns to our original data frame.

Sorting the Data Frame

With our new columns in place, we can now sort our data frame based on the whole number (num0) and fractional parts (num1).

df = df.sort_values(['num0', 'num1'])

Here, we’re using the .sort_values() function to sort our data frame. We’re passing a list of column names as the key argument, which tells pandas how to compare values when sorting.

The resulting sorted data frame will have the original columns in order, with the col1 column sorted based on the whole number and fractional parts.

#    col1  num0  num1
#0    9.1     9     1
#7    9.7     9     7
#4    9.8     9     8
#8    9.9     9     9
#6   9.10     9    10
#5   10.0    10     0
#1   10.1    10     1
#2   10.2    10     2
#3  10.11    10    11

Optional Steps: Removing Intermediate Columns and Resetting the Index

If you don’t want to keep the intermediate columns num0 and num1, you can remove them from your sorted data frame like so:

df = df.sort_values(['num0', 'num1'])['col1']

Alternatively, if you want to reset the index of your data frame while keeping the sorted order, you can use the .reset_index(drop=True) function.

df = df.sort_values(['num0', 'num1']).reset_index(drop=True)

Conclusion

In this article, we demonstrated how to sort a pandas data frame based on column values while dealing with complex formats such as decimal numbers with two digits after the decimal point. By splitting the col1 column into separate whole number and fractional parts columns, and then sorting based on these new columns, we were able to achieve our desired result.


Last modified on 2024-03-26