Python 用熊猫系列替换NaN。地图（dict）_Python_Pandas_Dictionary_Dataframe_Nan

Python 用熊猫系列替换NaN。地图（dict）

python pandas dictionary dataframe

Python 用熊猫系列替换NaN。地图（dict）,python,pandas,dictionary,dataframe,nan,Python,Pandas,Dictionary,Dataframe,Nan,我正在学习pandas教程，该教程显示通过将字典传递给series.map方法来替换列中的值。以下是本教程的一个片段：但是，当我尝试这一点时： cols = star_wars.columns[3:9] # Booleans for column values answers = { "Star Wars: Episode I The Phantom Menace":True, "Star Wars: Episode II Attack of the

我正在学习pandas教程，该教程显示通过将字典传递给series.map方法来替换列中的值。以下是本教程的一个片段：

但是，当我尝试这一点时：

cols = star_wars.columns[3:9]

# Booleans for column values
answers = {
        "Star Wars: Episode I  The Phantom Menace":True, 
        "Star Wars: Episode II  Attack of the Clones":True, 
        "Star Wars: Episode III  Revenge of the Sith":True,
        "Star Wars: Episode IV  A New Hope":True,
        "Star Wars: Episode V  The Empire Strikes Back":True,
        "Star Wars: Episode VI  Return of the Jedi":True,
        NaN:False
        }

for c in cols:
    star_wars[c] = star_wars[c].map(answers)

我得到

name错误：未定义名称“NaN”

那么我做错了什么

编辑：为了更好地解释我的目标，我有如下专栏：

我试着用假来代替南，用真来代替非南

编辑2:这是我将

NaN

更改为

np.NaN

后仍然面临的问题的图像：

然后，如果我重新运行mapping单元格并再次显示输出，所有False和NaN值都会触发。

很简单，Python没有内置的

NaN

名称。但是，NumPy确实如此，因此您可以使用

np.nan

使您的映射不会抛出错误。还有

math.nan

，正如乔恩指出的那样，它等于

float（'nan'）

answers = {
        "Star Wars: Episode I  The Phantom Menace":True, 
        "Star Wars: Episode II  Attack of the Clones":True, 
        "Star Wars: Episode III  Revenge of the Sith":True,
        "Star Wars: Episode IV  A New Hope":True,
        "Star Wars: Episode V  The Empire Strikes Back":True,
        "Star Wars: Episode VI  Return of the Jedi":True,
        np.nan:False
        }

但不要在这里停下来，因为这样做行不通。另一个棘手的问题是，
nan
在技术上并不等于任何东西，因此在这样的映射中使用它将不会有效

>>> np.nan == np.nan False
因此，数据帧中的NaN值不会被
np.NaN
作为键拾取，而是保持NaN。有关这方面的进一步解释，请参阅。此外，我敢打赌您的
nan
值实际上就是字符串
nan
最小演示

>>> df 0 1 0 Star Wars: Episode I The Phantom Menace nan 1 Star Wars: Episode IV A New Hope nan 2 Star Wars: Episode IV A New Hope Star Wars: Episode IV A New Hope >>> for c in df.columns: df[c] = df[c].map(answers) >>> df 0 1 0 True NaN 1 True NaN 2 True True # notice we're still stuck with NaN, as our nan strings weren't picked up

>>> answers = { "Star Wars: Episode I The Phantom Menace", "Star Wars: Episode II Attack of the Clones" "Star Wars: Episode III Revenge of the Sith", "Star Wars: Episode IV A New Hope", "Star Wars: Episode V The Empire Strikes Back", "Star Wars: Episode VI Return of the Jedi", } >>> df 0 1 0 Star Wars: Episode I The Phantom Menace nan 1 Star Wars: Episode IV A New Hope nan 2 Star Wars: Episode IV A New Hope Star Wars: Episode IV A New Hope >>> df.isin(answers) 0 1 0 True False 1 True False 2 True True
更好的解决方案话虽如此，这似乎不是一个很好的使用口述或地图-你可以只定义一组星球大战字符串，然后在你感兴趣的列的整个部分使用

answers = { "Star Wars: Episode I The Phantom Menace", "Star Wars: Episode II Attack of the Clones" "Star Wars: Episode III Revenge of the Sith", "Star Wars: Episode IV A New Hope", "Star Wars: Episode V The Empire Strikes Back", "Star Wars: Episode VI Return of the Jedi", } starwars.iloc[:, 3:9].isin(answers)
最小演示

>>> df 0 1 0 Star Wars: Episode I The Phantom Menace nan 1 Star Wars: Episode IV A New Hope nan 2 Star Wars: Episode IV A New Hope Star Wars: Episode IV A New Hope >>> for c in df.columns: df[c] = df[c].map(answers) >>> df 0 1 0 True NaN 1 True NaN 2 True True # notice we're still stuck with NaN, as our nan strings weren't picked up

>>> answers = { "Star Wars: Episode I The Phantom Menace", "Star Wars: Episode II Attack of the Clones" "Star Wars: Episode III Revenge of the Sith", "Star Wars: Episode IV A New Hope", "Star Wars: Episode V The Empire Strikes Back", "Star Wars: Episode VI Return of the Jedi", } >>> df 0 1 0 Star Wars: Episode I The Phantom Menace nan 1 Star Wars: Episode IV A New Hope nan 2 Star Wars: Episode IV A New Hope Star Wars: Episode IV A New Hope >>> df.isin(answers) 0 1 0 True False 1 True False 2 True True

所以我对另一个解决方案的问题是，由于它的工作方式，代码在第一次运行后不会以相同的方式运行。我在一个Jupyter笔记本上工作，所以我想要一些可以运行多次的东西。我只是一名Python初学者，但以下代码似乎能够运行多次，并且只在第一次运行时更改值：

cols = star_wars.columns[3:9] # Booleans for column values answers = { "Star Wars: Episode I The Phantom Menace":True, "Star Wars: Episode II Attack of the Clones":True, "Star Wars: Episode III Revenge of the Sith":True, "Star Wars: Episode IV A New Hope":True, "Star Wars: Episode V The Empire Strikes Back":True, "Star Wars: Episode VI Return of the Jedi":True, True:True, False:False, np.nan:False } for c in cols: star_wars[c] = star_wars[c].map(answers)