Pandas 在多个列中查找字符串？_Pandas

Pandas 在多个列中查找字符串？

pandas

Pandas 在多个列中查找字符串？,pandas,Pandas,我有一个数据帧，有3列tel1、tel2、tel3 我希望在一列或多列中保留包含特定值的行：例如，我想保留列tel1、tel2或tel3以“06”开头的行我该怎么做？谢谢让我们使用此df作为示例数据帧： In [54]: df = pd.DataFrame({'tel{}'.format(j): ['{:02d}'.format(i+j) for i in range(10

我有一个数据帧，有3列tel1、tel2、tel3 我希望在一列或多列中保留包含特定值的行：

例如，我想保留列tel1、tel2或tel3以“06”开头的行

我该怎么做？

谢谢

让我们使用此

df

作为示例数据帧：

In [54]: df = pd.DataFrame({'tel{}'.format(j): 
                            ['{:02d}'.format(i+j) 
                             for i in range(10)] for j in range(3)})

In [71]: df
Out[71]: 
  tel0 tel1 tel2
0   00   01   02
1   01   02   03
2   02   03   04
3   03   04   05
4   04   05   06
5   05   06   07
6   06   07   08
7   07   08   09
8   08   09   10
9   09   10   11

您可以使用以下命令查找

df['tel0']

中以

'06'

开头的值 :

要将两个布尔级数与逻辑or组合，请使用

：

In [73]: df['tel0'].str.startswith('06') | df['tel1'].str.startswith('06')
Out[73]: 
0    False
1    False
2    False
3    False
4    False
5     True
6     True
7    False
8    False
9    False
dtype: bool

或者，如果要使用逻辑Or组合布尔序列列表，可以使用

reduce

：

In [79]: import functools
In [80]: import numpy as np
In [80]: mask = functools.reduce(np.logical_or, [df['tel{}'.format(i)].str.startswith('06') for i in range(3)])

In [81]: mask
Out[81]: 
0    False
1    False
2    False
3    False
4     True
5     True
6     True
7    False
8    False
9    False
Name: tel0, dtype: bool

一旦有了布尔

掩码

，就可以使用

df.loc

选择相关行：

In [75]: df.loc[mask]
Out[75]: 
  tel0 tel1 tel2
4   04   05   06
5   05   06   07
6   06   07   08

注意，除了startswith之外，还有许多其他功能。您可能会发现

str.contains

对于查找哪些行包含字符串非常有用。请注意，

str.contains

默认情况下将其参数解释为正则表达式模式：

In [85]: df['tel0'].str.contains(r'6|7')
Out[85]: 
0    False
1    False
2    False
3    False
4    False
5    False
6     True
7     True
8    False
9    False
Name: tel0, dtype: bool

我喜欢在以下情况下使用dataframe.apply：

#搜索dataframe多个列

#生成一些随机数
将随机导入为r
rand_number=[[r.randint（100000，999999）表示范围内（3）]表示范围内（20）]
df=pd.DataFrame.from_记录（随机数，列=['tel1'，'tel2'，'tel3']）
df.head（）
#一个非常简单的搜索函数
#如果需要速度，请在此处使用cpython；-）
def searchfilter（行，search='5'）：
#df.apply以列表的形式返回行或列
对于行中的字符串：
#字符串在这里是一个数字，所以我们必须对其进行强制转换。
如果str（字符串）.startswith（搜索）：
返回真值
其他：
返回错误
#将searchfunction应用于每一行
result_bool_array=df.apply（searchfilter，axis=1）#axis参数用于按行运行它
df[结果布尔数组]
#在应用中使用lambda的其他搜索
result\u bool\u array=df.apply（lambda行：searchfilter（行，search='6'），axis=1）

谢谢你的回答。functools很有用，但它似乎不适用于Nan值（无法使用包含NA/Nan值的向量进行索引）。我惊讶地发现没有简单的解决方案。类似于：df[['TEL1'，'TEL2'，'BOB'，FOO']].str.startwith（'06'）的错误消息：

ValueError:cannot index with vector containing NA/NaN value

如果数据帧的索引中有NaN值，则可能会出现此错误消息。通常，索引中最好有唯一的非NaN值。要使索引唯一，可以使用df=

df.reset\u index（）

。这会将旧索引移动到新列（如果是多索引，则为列）。如果你不想改变你的索引，另一个选择是用序数而不是布尔值来索引：

df.iloc[np.where（mask）[0]

。

In [85]: df['tel0'].str.contains(r'6|7')
Out[85]: 
0    False
1    False
2    False
3    False
4    False
5    False
6     True
7     True
8    False
9    False
Name: tel0, dtype: bool