Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/string/5.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
String 熊猫:打印a';匹配';而不仅仅是布尔结果_String_Pandas_Dataframe - Fatal编程技术网

String 熊猫:打印a';匹配';而不仅仅是布尔结果

String 熊猫:打印a';匹配';而不仅仅是布尔结果,string,pandas,dataframe,String,Pandas,Dataframe,我有一个包含文本和子字符串的列。目标是通过文本进行迭代,如果存在匹配项,我希望在新列中打印该匹配项,而不是仅查找真/假语句。如何做到这一点?当前代码: sLength = len(dfEx5) substring = ['AmericanAir', 'JetBlue', 'SouthwestAir', 'united','USAirways', 'VirginAmerica'] dfEx5['mentions'] = pd.Series(1, index=dfEx5.index) #Add a

我有一个包含文本和子字符串的列。目标是通过文本进行迭代,如果存在匹配项,我希望在新列中打印该匹配项,而不是仅查找真/假语句。如何做到这一点?当前代码:

sLength = len(dfEx5)
substring = ['AmericanAir', 'JetBlue', 'SouthwestAir', 'united','USAirways', 'VirginAmerica']
dfEx5['mentions'] = pd.Series(1, index=dfEx5.index) #Add a new column 'mentions' with 1's
pd.options.mode.chained_assignment = None #To deal with the 'SettingWithCopyWarning'

dfEx5['mentions'] = next((substring for substring in dfEx5['text'] if substring in dfEx5['text']), True)

其中
dfEx5['text']
是一个
pandas.core.series.series
使用
apply
方法传递自定义函数:

substring = ['AmericanAir', 'JetBlue', 'SouthwestAir', 'united','USAirways', 'VirginAmerica']
df= pd.DataFrame([["AmericaAir5","JetBlue2"],["JetBlue2","SouthwestAir"]],columns=['text','what'])
def searchr(x,s):
    for i in s:
        if x.find(i)+1:
            return i
        else:
            continue
df["mentions"]=df['text'].apply(searchr,args=(substring,))
或者,您可以使用正则表达式:

import re
r = re.compile('('+"|".join(substring)+')')
df["m"] = df.text.str.extract(r)

第一种方法似乎比regex str concat快

你想要的
str.extract
谢谢你的回答,我想我可以接受这个答案。再给我点时间试试。