Python 如何在数据框列上打印给定字符串的出现?

Python 如何在数据框列上打印给定字符串的出现?,python,pandas,Python,Pandas,我有以下数据帧 import pandas as pd data = [['Alexa',10],['Bob',12],['Clarke',13]] df = pd.DataFrame(data,columns=['Name','Age']) df 检查“名称”列中是否存在某些字符 mylist=['a','e'] pattern = '|'.join(mylist) df['contains']=df['Name'].str.contains(pattern) 如果mylist值存在,上

我有以下数据帧

import pandas as pd

data = [['Alexa',10],['Bob',12],['Clarke',13]]
df = pd.DataFrame(data,columns=['Name','Age'])
df
检查“名称”列中是否存在某些字符

mylist=['a','e']
pattern = '|'.join(mylist)
df['contains']=df['Name'].str.contains(pattern)
如果mylist值存在,上述代码将给出true或false

如何在输出中获取字母列

    Name    Age contains  letters
0   Alexa   10  True      e a 
1   Bob     12  False     
2   Clarke  13  True      a e

您可以在此处使用
set
intersection和列表理解,这将比
pandas
string方法更快:

check = set('ae')
df.assign(letters=[set(n.lower()) & check for n in df.Name])


另一种选择是:

df.assign(letters=df.Name.str.findall(r'(?i)(a|e)'))


第二种方法A)将包括重复,B)将较慢:

In [89]: df = pd.concat([df]*1000)

In [90]: %timeit df.Name.str.findall(r'(?i)(a|e)')
2.34 ms ± 93.7 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [91]: %timeit [set(n.lower()) & check for n in df.Name]
1.45 ms ± 23.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

如何使其成为数据帧中的列。
df['letters']=[set(n.lower())&检查df.Name中的n]
     Name  Age    letters
0   Alexa   10  [A, e, a]
1     Bob   12         []
2  Clarke   13     [a, e]
In [89]: df = pd.concat([df]*1000)

In [90]: %timeit df.Name.str.findall(r'(?i)(a|e)')
2.34 ms ± 93.7 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [91]: %timeit [set(n.lower()) & check for n in df.Name]
1.45 ms ± 23.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)