Python 使用列表理解的数据帧子集_Python_Pandas_List Comprehension

Python 使用列表理解的数据帧子集

python pandas

Python 使用列表理解的数据帧子集,python,pandas,list-comprehension,Python,Pandas,List Comprehension,我有一个数据帧a，它有一个名为text的列，它是长字符串。我想保留字符串列表“author\u id”中包含任何字符串的“A”行 A data frame: Dialogue Index author_id text 10190 0 573660 How is that even possible? 10190 1 23442 @573660 I do apologize. 10190 2 573661 @AAA

我有一个数据帧a，它有一个名为text的列，它是长字符串。我想保留字符串列表“author\u id”中包含任何字符串的“A”行

A data frame:
Dialogue Index  author_id   text
10190       0    573660    How is that even possible?
10190       1    23442     @573660 I do apologize. 
10190       2    573661    @AAA do you still have the program for free checked bags? 

author_id list:
[573660, 573678, 5736987]

因此，由于573660位于author_id列表中，并且位于A的文本列中，因此我的预期结果是只保留数据帧A的第二行：

 Dialogue   Index   author_id   text
 10190        1       23442     @573660 I do apologize.

我能想到的最天真的解决方法是：

 new_A=pd.DataFrame()   
 for id in author_id:
      new_A.append(A[A['text'].str.contains(id, na=False)]

但这需要很长时间

所以我提出了这个解决方案：

[id in text for id in author_id for text in df['text'] ]

但这不适用于数据帧的子集设置，因为我为df['text']中的每个author id的所有字符串获取真-假值

因此，我在数据框中创建了一个新列，它是对话和索引的组合，因此我可以在列表理解中返回它，但它给出了一个错误，我不知道如何解释

A["DialogueIndex"]= df["Dialogue"].map(str) + df["Index"]

newA = [did for did in df["DialogueIndex"]  for id in author_id if df['text'].str.contains(id)  ]

error: ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

请帮忙

只需使用

str.contains

查看

text

是否包含指定列表中的任何作者（通过将所有作者加入

）

然后您可以屏蔽原始的

数据帧

：

df[df.text.str.contains('|'.join(list(map(str, author_id_list))))]
#   Dialogue  Index  author_id                     text
#1     10190      1      23442  @573660 I do apologize.

如果您的

author\u id\u列表已经是字符串，那么您可以去掉列表（映射（…）
，只加入原始列表。
您可以使用apply，然后检查author\u id\u列表中的每个项目是否在文本中
df[df.text.apply(lambda x: any(str(e) in x for e in author_id_list))]


Dialogue    Index   author_id   text
1   10190   1   23442   @573660 I do apologize.

也许有一种更快的方法可以做到这一点，但我相信这会让你得到你想要的答案！
df[df.text.apply(lambda x: any(str(e) in x for e in author_id_list))]


Dialogue    Index   author_id   text
1   10190   1   23442   @573660 I do apologize.