Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/343.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 用熊猫系列替换NaN。地图(dict)_Python_Pandas_Dictionary_Dataframe_Nan - Fatal编程技术网

Python 用熊猫系列替换NaN。地图(dict)

Python 用熊猫系列替换NaN。地图(dict),python,pandas,dictionary,dataframe,nan,Python,Pandas,Dictionary,Dataframe,Nan,我正在学习pandas教程,该教程显示通过将字典传递给series.map方法来替换列中的值。以下是本教程的一个片段: 但是,当我尝试这一点时: cols = star_wars.columns[3:9] # Booleans for column values answers = { "Star Wars: Episode I The Phantom Menace":True, "Star Wars: Episode II Attack of the

我正在学习pandas教程,该教程显示通过将字典传递给series.map方法来替换列中的值。以下是本教程的一个片段:

但是,当我尝试这一点时:

cols = star_wars.columns[3:9]

# Booleans for column values
answers = {
        "Star Wars: Episode I  The Phantom Menace":True, 
        "Star Wars: Episode II  Attack of the Clones":True, 
        "Star Wars: Episode III  Revenge of the Sith":True,
        "Star Wars: Episode IV  A New Hope":True,
        "Star Wars: Episode V  The Empire Strikes Back":True,
        "Star Wars: Episode VI  Return of the Jedi":True,
        NaN:False
        }

for c in cols:
    star_wars[c] = star_wars[c].map(answers) 
我得到
name错误:未定义名称“NaN”

那么我做错了什么

编辑:为了更好地解释我的目标,我有如下专栏:

我试着用假来代替南,用真来代替非南

编辑2:这是我将
NaN
更改为
np.NaN
后仍然面临的问题的图像:


然后,如果我重新运行mapping单元格并再次显示输出,所有False和NaN值都会触发。

很简单,Python没有内置的
NaN
名称。但是,NumPy确实如此,因此您可以使用
np.nan
使您的映射不会抛出错误。还有
math.nan
,正如乔恩指出的那样,它等于
float('nan')

answers = {
        "Star Wars: Episode I  The Phantom Menace":True, 
        "Star Wars: Episode II  Attack of the Clones":True, 
        "Star Wars: Episode III  Revenge of the Sith":True,
        "Star Wars: Episode IV  A New Hope":True,
        "Star Wars: Episode V  The Empire Strikes Back":True,
        "Star Wars: Episode VI  Return of the Jedi":True,
        np.nan:False
        }
但不要在这里停下来,因为这样做行不通。 另一个棘手的问题是,
nan
在技术上并不等于任何东西,因此在这样的映射中使用它将不会有效

>>> np.nan == np.nan 
False
因此,数据帧中的NaN值不会被
np.NaN
作为键拾取,而是保持NaN。有关这方面的进一步解释,请参阅。此外,我敢打赌您的
nan
值实际上就是字符串
nan

最小演示

>>> df
                                          0                                  1
0  Star Wars: Episode I  The Phantom Menace                                nan
1         Star Wars: Episode IV  A New Hope                                nan
2         Star Wars: Episode IV  A New Hope  Star Wars: Episode IV  A New Hope

>>> for c in df.columns:
        df[c] = df[c].map(answers)


>>> df
      0     1
0  True   NaN
1  True   NaN
2  True  True

# notice we're still stuck with NaN, as our nan strings weren't picked up
>>> answers = {
            "Star Wars: Episode I  The Phantom Menace",
            "Star Wars: Episode II  Attack of the Clones" 
            "Star Wars: Episode III  Revenge of the Sith",
            "Star Wars: Episode IV  A New Hope",
            "Star Wars: Episode V  The Empire Strikes Back",
            "Star Wars: Episode VI  Return of the Jedi",
            }

>>> df
                                          0                                  1
0  Star Wars: Episode I  The Phantom Menace                                nan
1         Star Wars: Episode IV  A New Hope                                nan
2         Star Wars: Episode IV  A New Hope  Star Wars: Episode IV  A New Hope

>>> df.isin(answers)

      0      1
0  True  False
1  True  False
2  True   True
更好的解决方案 话虽如此,这似乎不是一个很好的使用口述或地图-你可以只定义一组星球大战字符串,然后在你感兴趣的列的整个部分使用

answers = {
        "Star Wars: Episode I  The Phantom Menace",
        "Star Wars: Episode II  Attack of the Clones" 
        "Star Wars: Episode III  Revenge of the Sith",
        "Star Wars: Episode IV  A New Hope",
        "Star Wars: Episode V  The Empire Strikes Back",
        "Star Wars: Episode VI  Return of the Jedi",
        }

starwars.iloc[:, 3:9].isin(answers) 
最小演示

>>> df
                                          0                                  1
0  Star Wars: Episode I  The Phantom Menace                                nan
1         Star Wars: Episode IV  A New Hope                                nan
2         Star Wars: Episode IV  A New Hope  Star Wars: Episode IV  A New Hope

>>> for c in df.columns:
        df[c] = df[c].map(answers)


>>> df
      0     1
0  True   NaN
1  True   NaN
2  True  True

# notice we're still stuck with NaN, as our nan strings weren't picked up
>>> answers = {
            "Star Wars: Episode I  The Phantom Menace",
            "Star Wars: Episode II  Attack of the Clones" 
            "Star Wars: Episode III  Revenge of the Sith",
            "Star Wars: Episode IV  A New Hope",
            "Star Wars: Episode V  The Empire Strikes Back",
            "Star Wars: Episode VI  Return of the Jedi",
            }

>>> df
                                          0                                  1
0  Star Wars: Episode I  The Phantom Menace                                nan
1         Star Wars: Episode IV  A New Hope                                nan
2         Star Wars: Episode IV  A New Hope  Star Wars: Episode IV  A New Hope

>>> df.isin(answers)

      0      1
0  True  False
1  True  False
2  True   True

所以我对另一个解决方案的问题是,由于它的工作方式,代码在第一次运行后不会以相同的方式运行。我在一个Jupyter笔记本上工作,所以我想要一些可以运行多次的东西。我只是一名Python初学者,但以下代码似乎能够运行多次,并且只在第一次运行时更改值:

cols = star_wars.columns[3:9]

# Booleans for column values
answers = {
        "Star Wars: Episode I  The Phantom Menace":True,
        "Star Wars: Episode II  Attack of the Clones":True, 
        "Star Wars: Episode III  Revenge of the Sith":True,
        "Star Wars: Episode IV  A New Hope":True,
        "Star Wars: Episode V The Empire Strikes Back":True,
        "Star Wars: Episode VI Return of the Jedi":True,
        True:True,
        False:False,
        np.nan:False
        }

for c in cols:
    star_wars[c] = star_wars[c].map(answers)