Python 请仅从数据帧中选择数字或整数字段_Python_Numpy_Pandas

Python 请仅从数据帧中选择数字或整数字段

python numpy pandas

Python 请仅从数据帧中选择数字或整数字段,python,numpy,pandas,Python,Numpy,Pandas,我有一个数据帧（df）：类型是对象我会选择一个值为整数或数字的记录： A B 0 1 green 1 2 red 3 3 yellow 感谢对数据帧调用apply（注意双方括号df[['A']]而不是df['A']）并调用字符串方法isdigit（），然后我们设置paramaxis=1以逐行应用lambda函数。这里发生的是，索引用于创建布尔掩码 In [66]: df[df[['A']].apply(lambda x: x[0].isd

我有一个数据帧（df）：

类型是对象

我会选择一个值为整数或数字的记录：

     A    B
0    1    green
1    2    red
3    3    yellow

感谢对数据帧调用

apply

（注意双方括号

df[['A']]

而不是

df['A']

）并调用字符串方法

isdigit（）

，然后我们设置param

axis=1

以逐行应用lambda函数。这里发生的是，索引用于创建布尔掩码

In [66]:
df[df[['A']].apply(lambda x: x[0].isdigit(), axis=1)]
Out[66]:
       A       B
Index           
0      1   green
1      2     red
3      3  yellow

更新

如果您使用的是版本或更新版本，则以下操作也将起作用：

In [6]:
df[df['A'].astype(str).str.isdigit()]

Out[6]:
   A       B
0  1   green
1  2     red
3  3  yellow

在这里，我们使用

astype

将序列转换为

str

，然后调用向量化

还请注意，

convert_objects

已被弃用，对于最新版本的

0.17.0

或更新版本的

to_numeric

，您可以使用

convert_objects

，当

convert_numeric=True

时，将强制将所有非数值对象设置为

nan

。删除它们并建立索引将得到结果

这比在更大的帧上使用

apply

要快得多，因为这都是在cython中实现的

In [30]: df[['A']].convert_objects(convert_numeric=True)
Out[30]: 
    A
0   1
1   2
2 NaN
3   3
4 NaN

In [31]: df[['A']].convert_objects(convert_numeric=True).dropna()
Out[31]: 
   A
0  1
1  2
3  3

In [32]: df[['A']].convert_objects(convert_numeric=True).dropna().index
Out[32]: Int64Index([0, 1, 3], dtype='int64')

In [33]: df.iloc[df[['A']].convert_objects(convert_numeric=True).dropna().index]
Out[33]: 
   A       B
0  1   green
1  2     red
3  3  yellow

我个人认为，与

.apply（）

请注意，

convert\u对象

已被弃用

>>> df[['A']].convert_objects(convert_numeric=True)
__main__:1: FutureWarning: convert_objects is deprecated.  Use the data-type specific converters pd.to_datetime, pd.to_timedelta and pd.to_numeric.

从0.17.0开始：使用

pd.到_numeric

，设置

errors='concurve'

，以便不正确的解析返回NaN。使用

notnull

返回要在原始数据帧上使用的布尔掩码：

>>> df[pd.to_numeric(df.A, errors='coerce').notnull()]
   A       B
0  1   green
1  2     red
3  3  yellow

它工作得很好。我尝试使用

df.apply（lambda x:isinstance（df[A]，（int，float）），axis=1）

，但它总是返回False。你的函数运行得更好第一个解决方案对我不起作用，但第二个解决方案有效。（0.24.1版）更简洁并不意味着更好。许多内置函数采用了外部python函数（如map）可能无法访问的优化。

>>> df[['A']].convert_objects(convert_numeric=True)
__main__:1: FutureWarning: convert_objects is deprecated.  Use the data-type specific converters pd.to_datetime, pd.to_timedelta and pd.to_numeric.

>>> df[pd.to_numeric(df.A, errors='coerce').notnull()]
   A       B
0  1   green
1  2     red
3  3  yellow