String 熊猫：如何在列中搜索一组字符串？_String_Pandas_Dataframe

String 熊猫：如何在列中搜索一组字符串？

string pandas dataframe

String 熊猫：如何在列中搜索一组字符串？,string,pandas,dataframe,String,Pandas,Dataframe,我有一个数据框，其中包含一列推文。这些文本包含所谓的“@”提及。我想在此数据框中添加一个新列，其中包含在该行中找到的特定“@”项。代码： dfEx5.text.apply(str) #Convert all elements in the text-column to a string-type dfEx5['mentions'] = pd.np.where(dfEx5.text.str.contains("@AmericanAir"), "@AmericanAir",

我有一个数据框，其中包含一列推文。这些文本包含所谓的“@”提及。我想在此数据框中添加一个新列，其中包含在该行中找到的特定“@”项。代码：

dfEx5.text.apply(str) #Convert all elements in the text-column to a string-type

dfEx5['mentions'] = pd.np.where(dfEx5.text.str.contains("@AmericanAir"), "@AmericanAir",
                    pd.np.where(dfEx5.text.str.contains("@JetBlue"), "@JetBlue",
                    pd.np.where(dfEx5.text.str.contains("@SouthwestAir"), "@SouthwestAir",
                    pd.np.where(dfEx5.text.str.contains("@united"), "@united",
                    pd.np.where(dfEx5.text.str.contains("@USAirways"), "@USAirways",
                    pd.np.where(dfEx5.text.str.contains("@VirginAmerica"), "@VirginAmerica",))))))

首先，我将所有元素转换为字符串类型。如果列中包含“@AmericanAir”，则在提及列中添加“@AmericanAir”，以此类推

谢谢你的帮助

pandas.Series.str.findall

我会在我的手表组中找到所有提到的名字，然后选第一个

df.text.str.findall('|'.join(watch)).str[0]

0      @AmericanAir
1          @JetBlue
2     @SouthwestAir
3           @united
4        @USAirways
5    @VirginAmerica
Name: text, dtype: object

通过

assign

df.assign(mentions=df.text.str.findall('|'.join(watch)).str[0])

                    text        mentions
0  @AmericanAir @JetBlue    @AmericanAir
1               @JetBlue        @JetBlue
2          @SouthwestAir   @SouthwestAir
3  @united @SouthwestAir         @united
4             @USAirways      @USAirways
5         @VirginAmerica  @VirginAmerica

如果你愿意，你可以留下所有的提及

df.assign(mentions=df.text.str.findall('|'.join(watch)))

                    text                  mentions
0  @AmericanAir @JetBlue  [@AmericanAir, @JetBlue]
1               @JetBlue                [@JetBlue]
2          @SouthwestAir           [@SouthwestAir]
3  @united @SouthwestAir  [@united, @SouthwestAir]
4             @USAirways              [@USAirways]
5         @VirginAmerica          [@VirginAmerica]

安装程序

使用

dfEx5.str.extract（r'（@AmericanAir |@JetBlue…）

等。如果有多个提及，你预计会发生什么？@Jon Clements我假设每条推文都是关于一家公司的。但是你是对的，如果推文中提到多家公司，我会使用第一个提及的公司。嗯，它似乎还不起作用。如何搜索“dfEx5”的“文本”列如果你提到六家公司中的一家？那么这个名字是如何印在“提及”栏中的？谢谢！除非你给出一个具体的例子，否则我帮不了你。请阅读并相应地编辑你的文章。

watch = [
    '@SouthwestAir',
    '@VirginAmerica',
    '@united',
    '@JetBlue',
    '@USAirways',
    '@AmericanAir'
]
text = """\
@AmericanAir @JetBlue
@JetBlue
@SouthwestAir
@united @SouthwestAir
@USAirways
@VirginAmerica
"""
df = pd.DataFrame(dict(text=text.splitlines()))

df

                    text
0  @AmericanAir @JetBlue
1               @JetBlue
2          @SouthwestAir
3  @united @SouthwestAir
4             @USAirways
5         @VirginAmerica