Python Snippets: Dropping Infinite Values From Dataframes in Pandas
Infinite values can occur more often than people expect, especially for calculated data.
For example, in a recent post I calculated the Twitter Follower-Friend ratio by dividing the followers_count
series by the friends_count
series.
But what happens when friends_count
is zero?
Inf
.
In that particular case, I wanted to drop the rows. Here’s how to do it: 1
import pandas as pd
import numpy as np
# example dataframe
df = pd.DataFrame({"a": [1, 2, 3, 4], "b": [9, 0, 8, 0]})
df["c"] = df["a"] / df["b"]
df
a | b | c | |
---|---|---|---|
0 | 1 | 9 | 0.111111 |
1 | 2 | 0 | inf |
2 | 3 | 8 | 0.375000 |
3 | 4 | 0 | inf |
# replace inf with NaN then dropna
df.replace([np.inf, -np.inf], np.nan).dropna(subset=["c"], how="all")
a | b | c | |
---|---|---|---|
0 | 1 | 9 | 0.111111 |
2 | 3 | 8 | 0.375000 |
Mind you, this is only helpful if you want to discard rows with inf
values.
Otherwise, df.replace()
can be used to “fix” your values to something that makes sense for the application without discarding the row.