Python 用熊猫系列替换NaN。地图(dict)
我正在学习pandas教程,该教程显示通过将字典传递给series.map方法来替换列中的值。以下是本教程的一个片段: 但是,当我尝试这一点时:Python 用熊猫系列替换NaN。地图(dict),python,pandas,dictionary,dataframe,nan,Python,Pandas,Dictionary,Dataframe,Nan,我正在学习pandas教程,该教程显示通过将字典传递给series.map方法来替换列中的值。以下是本教程的一个片段: 但是,当我尝试这一点时: cols = star_wars.columns[3:9] # Booleans for column values answers = { "Star Wars: Episode I The Phantom Menace":True, "Star Wars: Episode II Attack of the
cols = star_wars.columns[3:9]
# Booleans for column values
answers = {
"Star Wars: Episode I The Phantom Menace":True,
"Star Wars: Episode II Attack of the Clones":True,
"Star Wars: Episode III Revenge of the Sith":True,
"Star Wars: Episode IV A New Hope":True,
"Star Wars: Episode V The Empire Strikes Back":True,
"Star Wars: Episode VI Return of the Jedi":True,
NaN:False
}
for c in cols:
star_wars[c] = star_wars[c].map(answers)
我得到name错误:未定义名称“NaN”
那么我做错了什么
编辑:为了更好地解释我的目标,我有如下专栏:
我试着用假来代替南,用真来代替非南
编辑2:这是我将NaN
更改为np.NaN
后仍然面临的问题的图像:
然后,如果我重新运行mapping单元格并再次显示输出,所有False和NaN值都会触发。很简单,Python没有内置的
NaN
名称。但是,NumPy确实如此,因此您可以使用np.nan
使您的映射不会抛出错误。还有math.nan
,正如乔恩指出的那样,它等于float('nan')
answers = {
"Star Wars: Episode I The Phantom Menace":True,
"Star Wars: Episode II Attack of the Clones":True,
"Star Wars: Episode III Revenge of the Sith":True,
"Star Wars: Episode IV A New Hope":True,
"Star Wars: Episode V The Empire Strikes Back":True,
"Star Wars: Episode VI Return of the Jedi":True,
np.nan:False
}
但不要在这里停下来,因为这样做行不通。
另一个棘手的问题是,nan
在技术上并不等于任何东西,因此在这样的映射中使用它将不会有效
>>> np.nan == np.nan
False
因此,数据帧中的NaN值不会被np.NaN
作为键拾取,而是保持NaN。有关这方面的进一步解释,请参阅。此外,我敢打赌您的nan
值实际上就是字符串nan
最小演示
>>> df
0 1
0 Star Wars: Episode I The Phantom Menace nan
1 Star Wars: Episode IV A New Hope nan
2 Star Wars: Episode IV A New Hope Star Wars: Episode IV A New Hope
>>> for c in df.columns:
df[c] = df[c].map(answers)
>>> df
0 1
0 True NaN
1 True NaN
2 True True
# notice we're still stuck with NaN, as our nan strings weren't picked up
>>> answers = {
"Star Wars: Episode I The Phantom Menace",
"Star Wars: Episode II Attack of the Clones"
"Star Wars: Episode III Revenge of the Sith",
"Star Wars: Episode IV A New Hope",
"Star Wars: Episode V The Empire Strikes Back",
"Star Wars: Episode VI Return of the Jedi",
}
>>> df
0 1
0 Star Wars: Episode I The Phantom Menace nan
1 Star Wars: Episode IV A New Hope nan
2 Star Wars: Episode IV A New Hope Star Wars: Episode IV A New Hope
>>> df.isin(answers)
0 1
0 True False
1 True False
2 True True
更好的解决方案
话虽如此,这似乎不是一个很好的使用口述或地图-你可以只定义一组星球大战字符串,然后在你感兴趣的列的整个部分使用
answers = {
"Star Wars: Episode I The Phantom Menace",
"Star Wars: Episode II Attack of the Clones"
"Star Wars: Episode III Revenge of the Sith",
"Star Wars: Episode IV A New Hope",
"Star Wars: Episode V The Empire Strikes Back",
"Star Wars: Episode VI Return of the Jedi",
}
starwars.iloc[:, 3:9].isin(answers)
最小演示
>>> df
0 1
0 Star Wars: Episode I The Phantom Menace nan
1 Star Wars: Episode IV A New Hope nan
2 Star Wars: Episode IV A New Hope Star Wars: Episode IV A New Hope
>>> for c in df.columns:
df[c] = df[c].map(answers)
>>> df
0 1
0 True NaN
1 True NaN
2 True True
# notice we're still stuck with NaN, as our nan strings weren't picked up
>>> answers = {
"Star Wars: Episode I The Phantom Menace",
"Star Wars: Episode II Attack of the Clones"
"Star Wars: Episode III Revenge of the Sith",
"Star Wars: Episode IV A New Hope",
"Star Wars: Episode V The Empire Strikes Back",
"Star Wars: Episode VI Return of the Jedi",
}
>>> df
0 1
0 Star Wars: Episode I The Phantom Menace nan
1 Star Wars: Episode IV A New Hope nan
2 Star Wars: Episode IV A New Hope Star Wars: Episode IV A New Hope
>>> df.isin(answers)
0 1
0 True False
1 True False
2 True True
所以我对另一个解决方案的问题是,由于它的工作方式,代码在第一次运行后不会以相同的方式运行。我在一个Jupyter笔记本上工作,所以我想要一些可以运行多次的东西。我只是一名Python初学者,但以下代码似乎能够运行多次,并且只在第一次运行时更改值:
cols = star_wars.columns[3:9]
# Booleans for column values
answers = {
"Star Wars: Episode I The Phantom Menace":True,
"Star Wars: Episode II Attack of the Clones":True,
"Star Wars: Episode III Revenge of the Sith":True,
"Star Wars: Episode IV A New Hope":True,
"Star Wars: Episode V The Empire Strikes Back":True,
"Star Wars: Episode VI Return of the Jedi":True,
True:True,
False:False,
np.nan:False
}
for c in cols:
star_wars[c] = star_wars[c].map(answers)