Python 2.7 pandas drop()-错误-轴中不包含标签[]
我使用drop()从某些列中清除带有垃圾值(NaN、NaT、“”)的行Python 2.7 pandas drop()-错误-轴中不包含标签[],python-2.7,pandas,Python 2.7,Pandas,我使用drop()从某些列中清除带有垃圾值(NaN、NaT、“”)的行 for index, row in user_data_to_clean.iterrows(): if row.email != row.email or row.email == '' or row.email == ' ': user_data_to_clean.drop(index, inplace=True) email_count = email_count + 1
for index, row in user_data_to_clean.iterrows():
if row.email != row.email or row.email == '' or row.email == ' ':
user_data_to_clean.drop(index, inplace=True)
email_count = email_count + 1
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-22-bb0cb6d83902> in <module>()
24
25 if row.email != row.email or row.email == '' or row.email == ' ':
---> 26 user_data_to_clean.drop(index, inplace=True)
27 email_count = email_count + 1
28
/home/eyebell/local_bin/janacare/virtenv/lib/python2.7/site-packages/pandas/core/generic.pyc in drop(self, labels, axis, level, inplace, errors)
1871 new_axis = axis.drop(labels, level=level, errors=errors)
1872 else:
-> 1873 new_axis = axis.drop(labels, errors=errors)
1874 dropped = self.reindex(**{axis_name: new_axis})
1875 try:
/home/eyebell/local_bin/janacare/virtenv/lib/python2.7/site-packages/pandas/indexes/base.pyc in drop(self, labels, errors)
2964 if errors != 'ignore':
2965 raise ValueError('labels %s not contained in axis' %
-> 2966 labels[mask])
2967 indexer = indexer[~mask]
2968 return self.delete(indexer)
ValueError: labels [124] not contained in axis
这里的问题是什么
我知道实现我目标的另一种方法是切片,
但我想知道这里出了什么问题 IIUC您可以使用矢量化而不是作为
drop
与iterrows()
,因为iterrows()
非常慢:
对于通过NaN
和NaT
进行的屏蔽,请使用:
样本:
import pandas as pd
import numpy as np
user_data_to_clean = pd.DataFrame({'email':['','aa',' ', np.nan, 'dd'],
'a':[7,5,6,4,7],
'b':[7,8,9,1,2]})
print (user_data_to_clean)
a b email
0 7 7
1 5 8 aa
2 6 9
3 4 1 NaN
4 7 2 dd
布尔掩码:
print ((user_data_to_clean.email != '') &
(user_data_to_clean.email != ' ') &
(user_data_to_clean.email.notnull()))
0 False
1 True
2 False
3 False
4 True
Name: email, dtype: bool
print (user_data_to_clean[(user_data_to_clean.email != '') &
(user_data_to_clean.email != ' ') &
(user_data_to_clean.email.notnull()) ])
a b email
1 5 8 aa
4 7 2 dd
IIUC您可以使用矢量化而不是作为
drop
与iterrows()
一起使用,因为iterrows()
非常慢:
对于通过NaN
和NaT
进行的屏蔽,请使用:
样本:
import pandas as pd
import numpy as np
user_data_to_clean = pd.DataFrame({'email':['','aa',' ', np.nan, 'dd'],
'a':[7,5,6,4,7],
'b':[7,8,9,1,2]})
print (user_data_to_clean)
a b email
0 7 7
1 5 8 aa
2 6 9
3 4 1 NaN
4 7 2 dd
布尔掩码:
print ((user_data_to_clean.email != '') &
(user_data_to_clean.email != ' ') &
(user_data_to_clean.email.notnull()))
0 False
1 True
2 False
3 False
4 True
Name: email, dtype: bool
print (user_data_to_clean[(user_data_to_clean.email != '') &
(user_data_to_clean.email != ' ') &
(user_data_to_clean.email.notnull()) ])
a b email
1 5 8 aa
4 7 2 dd
我会这样做: 测试DF:
In [43]: df = pd.DataFrame({'email':['x@x.x', 'aaa@aaa.com',' ', np.nan, 'a@mail.com', '1', 'xxx@gmail.com', '', np.nan], 'col': np.random.randint(0,100,9)})
In [44]: df
Out[44]:
col email
0 89 x@x.x
1 81 aaa@aaa.com
2 82
3 43 NaN
4 71 a@mail.com
5 3 1
6 48 xxx@gmail.com
7 48
8 71 NaN
清理:
In [53]: df = df[(df.email.notnull()) & (df.email.str.strip().str.len() > 5)]
In [54]: df
Out[54]:
col email
1 97 aaa@aaa.com
4 77 a@mail.com
6 47 xxx@gmail.com
PS如果你想要一个严肃、健壮(但速度慢)的电子邮件验证,请使用模块
如果需要电子邮件计数
,请在清理后执行此操作:
email_count = len(df)
我会这样做: 测试DF:
In [43]: df = pd.DataFrame({'email':['x@x.x', 'aaa@aaa.com',' ', np.nan, 'a@mail.com', '1', 'xxx@gmail.com', '', np.nan], 'col': np.random.randint(0,100,9)})
In [44]: df
Out[44]:
col email
0 89 x@x.x
1 81 aaa@aaa.com
2 82
3 43 NaN
4 71 a@mail.com
5 3 1
6 48 xxx@gmail.com
7 48
8 71 NaN
清理:
In [53]: df = df[(df.email.notnull()) & (df.email.str.strip().str.len() > 5)]
In [54]: df
Out[54]:
col email
1 97 aaa@aaa.com
4 77 a@mail.com
6 47 xxx@gmail.com
PS如果你想要一个严肃、健壮(但速度慢)的电子邮件验证,请使用模块
如果需要电子邮件计数
,请在清理后执行此操作:
email_count = len(df)
您可以改为将
用户数据\u检查到\u clean.loc[124]
<代码>iloc查看行的位置,而不是标签。您可能试图删除以前删除的一行。@ayhan:谢谢,原来是我的错误。我正在删除已经删除的行。您是否可以改为将user\u data\u检查到\u clean.loc[124]
<代码>iloc查看行的位置,而不是标签。您可能试图删除以前删除的一行。@ayhan:谢谢,原来是我的错误。我正在删除已经删除的行。谢谢,我正在使用validate_电子邮件模块。它工作得很好。谢谢,我正在使用验证电子邮件模块。它工作得很好。我现在正在使用布尔掩蔽方法。iterrows运行得非常慢。非常感谢。我现在正在使用布尔掩蔽方法。iterrows运行得非常慢。非常感谢你。