在ipython笔记本中使用pandas dataframe map函数时出错_Python_Pandas_Ipython Notebook

在ipython笔记本中使用pandas dataframe map函数时出错

python pandas

在ipython笔记本中使用pandas dataframe map函数时出错,python,pandas,ipython-notebook,Python,Pandas,Ipython Notebook,我刚开始使用Python，在使用Kaggle Titanic数据时遇到了一些问题。以下是我在ipython笔记本上输入的内容（train.csv来自上面kaggle链接的泰坦尼克号数据）：然后，我继续检查“性别”列中是否有任何不良数据： df['Sex'].value_counts() 返回： male 577 female 314 dtype: int64 # PassengerId Survived Pclass Name Sex Ag

我刚开始使用Python，在使用Kaggle Titanic数据时遇到了一些问题。

以下是我在ipython笔记本上输入的内容（train.csv来自上面kaggle链接的泰坦尼克号数据）：

然后，我继续检查“性别”列中是否有任何不良数据：

df['Sex'].value_counts()

male      577

female    314

dtype: int64

#    PassengerId    Survived    Pclass  Name    Sex Age SibSp   Parch   Ticket  Fare    Cabin   Embarked    Gender
    0   1   0   3   Braund, Mr. Owen Harris male    22  1   0   A/5 21171   7.2500  NaN S   1
    1   2   1   1   Cumings, Mrs. John Bradley (Florence Briggs Th...   female  38  1   0   PC 17599    71.2833 C85 C   0
    2   3   1   3   Heikkinen, Miss. Laina  female  26  0   0   STON/O2. 3101282    7.9250  NaN S   0
    3   4   1   1   Futrelle, Mrs. Jacques Heath (Lily May Peel)    female  35  1   0   113803  53.1000 C123    S   0

    Survived    Pclass  Age Gender  Embarked
0   0   3   22  1   S
1   1   1   38  0   C
2   1   3   26  0   S
3   1   1   35  0   S
4   0   3   35  1   S
5   0   3   NaN 1   Q

这不会产生任何错误。要验证它是否创建了一个名为“性别”的整数值新列，请执行以下操作：

df

male      577

female    314

dtype: int64

#    PassengerId    Survived    Pclass  Name    Sex Age SibSp   Parch   Ticket  Fare    Cabin   Embarked    Gender
    0   1   0   3   Braund, Mr. Owen Harris male    22  1   0   A/5 21171   7.2500  NaN S   1
    1   2   1   1   Cumings, Mrs. John Bradley (Florence Briggs Th...   female  38  1   0   PC 17599    71.2833 C85 C   0
    2   3   1   3   Heikkinen, Miss. Laina  female  26  0   0   STON/O2. 3101282    7.9250  NaN S   0
    3   4   1   1   Futrelle, Mrs. Jacques Heath (Lily May Peel)    female  35  1   0   113803  53.1000 C123    S   0

    Survived    Pclass  Age Gender  Embarked
0   0   3   22  1   S
1   1   1   38  0   C
2   1   3   26  0   S
3   1   1   35  0   S
4   0   3   35  1   S
5   0   3   NaN 1   Q

…成功后，性别列追加到末尾，女性为0，男性为1。现在，我创建了一个新的pandas数据帧，它是df数据帧的子集

df2 = df[ ['Survived', 'Pclass', 'Age', 'Gender', 'Embarked'] ]
df2

male      577

female    314

dtype: int64

#    PassengerId    Survived    Pclass  Name    Sex Age SibSp   Parch   Ticket  Fare    Cabin   Embarked    Gender
    0   1   0   3   Braund, Mr. Owen Harris male    22  1   0   A/5 21171   7.2500  NaN S   1
    1   2   1   1   Cumings, Mrs. John Bradley (Florence Briggs Th...   female  38  1   0   PC 17599    71.2833 C85 C   0
    2   3   1   3   Heikkinen, Miss. Laina  female  26  0   0   STON/O2. 3101282    7.9250  NaN S   0
    3   4   1   1   Futrelle, Mrs. Jacques Heath (Lily May Peel)    female  35  1   0   113803  53.1000 C123    S   0

    Survived    Pclass  Age Gender  Embarked
0   0   3   22  1   S
1   1   1   38  0   C
2   1   3   26  0   S
3   1   1   35  0   S
4   0   3   35  1   S
5   0   3   NaN 1   Q

…显示有3个唯一值（S、C、Q）：

但是，当我尝试执行我认为与我将男/女转换为1/0时相同类型的操作时，我得到一个错误：

df2['Embarked_int'] = df2['Embarked'].map( {'S': 0, 'C': 1, 'Q': 2}).astype(int)

    ValueError                                Traceback (most recent call last)
<ipython-input-29-294c08f2fc80> in <module>()
----> 1 df2['Embarked_int'] = df2['Embarked'].map( {'S': 0, 'C': 1, 'Q': 2}).astype(int)

C:\Anaconda\lib\site-packages\pandas\core\generic.pyc in astype(self, dtype, copy, raise_on_error)
   2212 
   2213         mgr = self._data.astype(
-> 2214             dtype=dtype, copy=copy, raise_on_error=raise_on_error)
   2215         return self._constructor(mgr).__finalize__(self)
   2216 

C:\Anaconda\lib\site-packages\pandas\core\internals.pyc in astype(self, dtype, **kwargs)
   2500 
   2501     def astype(self, dtype, **kwargs):
-> 2502         return self.apply('astype', dtype=dtype, **kwargs)
   2503 
   2504     def convert(self, **kwargs):

C:\Anaconda\lib\site-packages\pandas\core\internals.pyc in apply(self, f, axes, filter, do_integrity_check, **kwargs)
   2455                                                  copy=align_copy)
   2456 
-> 2457             applied = getattr(b, f)(**kwargs)
   2458 
   2459             if isinstance(applied, list):

C:\Anaconda\lib\site-packages\pandas\core\internals.pyc in astype(self, dtype, copy, raise_on_error, values)
    369     def astype(self, dtype, copy=False, raise_on_error=True, values=None):
    370         return self._astype(dtype, copy=copy, raise_on_error=raise_on_error,
--> 371                             values=values)
    372 
    373     def _astype(self, dtype, copy=False, raise_on_error=True, values=None,

C:\Anaconda\lib\site-packages\pandas\core\internals.pyc in _astype(self, dtype, copy, raise_on_error, values, klass)
    399             if values is None:
    400                 # _astype_nansafe works fine with 1-d only
--> 401                 values = com._astype_nansafe(self.values.ravel(), dtype, copy=True)
    402                 values = values.reshape(self.values.shape)
    403             newb = make_block(values,

C:\Anaconda\lib\site-packages\pandas\core\common.pyc in _astype_nansafe(arr, dtype, copy)
   2616 
   2617         if np.isnan(arr).any():
-> 2618             raise ValueError('Cannot convert NA to integer')
   2619     elif arr.dtype == np.object_ and np.issubdtype(dtype.type, np.integer):
   2620         # work around NumPy brokenness, #1987

ValueError: Cannot convert NA to integer

ValueError回溯（最近一次调用）
在（）
---->1 df2['oodden_int']=df2['oodden'].map（{'S'：0，'C'：1，'Q'：2}）.aType（int）
astype中的C:\Anaconda\lib\site packages\pandas\core\generic.pyc（self、dtype、copy、raise\on\u错误）
2212
2213 mgr=self.\u data.astype(
->2214 dtype=dtype，copy=copy，raise_on_错误=raise_on_错误）
2215返回self.\u构造器（经理）。\u完成\u（self）
2216
C:\Anaconda\lib\site packages\pandas\core\internals.pyc在astype中（self，dtype，**kwargs）
2500
2501 def astype（自我，数据类型，**kwargs）：
->2502返回self.apply（'astype'，dtype=dtype，**kwargs）
2503
2504 def转换（自身，**kwargs）：
C:\Anaconda\lib\site packages\pandas\core\internals.pyc在应用中（self、f、axes、filter、do\u integrity\u check、**kwargs）
2455拷贝=对齐（拷贝）
2456
->2457应用=getattr（b，f）（**kwargs）
2458
2459如果存在（已应用，列表）：
astype中的C:\Anaconda\lib\site packages\pandas\core\internals.pyc（self、dtype、copy、raise\on\u错误、值）
369 def astype（self、dtype、copy=False、raise\u on\u error=True、values=None）：
370返回self.\u astype（dtype，copy=copy，raise\u on\u error=raise\u on\u error，
-->371个值=个值）
372
373定义类型（self、dtype、copy=False、raise=True、values=None、，
C:\Anaconda\lib\site packages\pandas\core\internals.pyc in\u astype（self、dtype、copy、raise\u on\u error、values、klass）
399如果值为无：
400#u astype_nansafe仅适用于1-d
-->401 values=com.\u astype\u nansafe（self.values.ravel（），dtype，copy=True）
402值=值。重塑（self.values.shape）
403 newb=生成块（值，
C:\Anaconda\lib\site packages\pandas\core\common.pyc in_astype_nansafe（arr，dtype，copy）
2616
2617如果np.isnan（arr.any（）：
->2618 raise VALUERROR（'无法将NA转换为整数'）
2619 elif arr.dtype==np.object_uu和np.issubdtype（dtype.type，np.integer）：
2620年#解决NumPy破碎问题#1987年
ValueError:无法将NA转换为整数

你知道为什么我在第二次使用map函数而不是第一次使用map函数时会出现这个错误吗？在每值计数（）的装载列中没有NAN值。我猜这是一个noob问题：）

默认情况下不计数

NAN

值，你可以通过执行

df['ocarded']来更改它。值计数（dropna=False）

我查看了性别列（577+314=891）和性别列（644+168+77=889）的

value\u计数，它们相差2，这意味着必须有2个NaN
值
因此，您可以先删除它们（使用dropna
），或者使用fillna
将它们填充为所需的值
另外，astype（int）
是多余的，因为您无论如何都要映射到int。
我刚刚在同一个数据集上遇到这个问题。删除'astype.int'解决了整个问题。
我认为您不需要这些astype（int）
s，顺便说一句。有些东西不算总数。你的性别有891行value\u计数
但是你的性别有889行，这意味着你必须有NaN
值，如果NaN
值不被计数，你可以确认这一点，如果你尝试df['ocarded']。value\u计数（dropna=False）
，这意味着您需要先填充NaN，然后再调用MAP。错误消息的最后一行：ValueError:无法将NA转换为整数。您可能必须从数据帧中删除NAs。关于冗余的.astype（int），我觉得它在那里很奇怪。从Kaggle的教程中复制了它。感谢您的澄清！@AmiI没有意识到默认情况下，value\u counts（）
删除了NaN值。添加了这个命令df2=df2.dropna（子集=['mounted']）
这样就删除了NaN，但是我现在在尝试运行map命令时遇到了一个不同的错误：`C:\Anaconda\lib\site packages\IPython\kernel\u main\uuz.py:1:SettingWithCopyWarning:试图在数据帧切片的副本上设置值。请尝试使用.loc[row\u indexer，col\u indexer]=value改为`您得到该警告，因为df2
是您原始df的副本，因为这行：df2=df[['surved'、'Pclass'、'Age'、'Gender'、'demounted']]
所以您需要在进行复制之前进行装载映射。运行df['demounted\u int']=df['demounted']时，我不会得到相同的错误。map（{S'：0，'C'：1，'Q'：2}）
（谢谢）。为什么iPython关心我是否对副本而不是原始数据帧运行命令？我会假设没有链接…进一步考虑一下，如果我想有一个具有不同列的单独数据帧，我最好只是重新导入数据并与新数据帧关联，而不是制作副本