Python 仅引用条件为True的数据帧_Python_Pandas_Dataframe

Python 仅引用条件为True的数据帧

python pandas dataframe

Python 仅引用条件为True的数据帧,python,pandas,dataframe,Python,Pandas,Dataframe,类似于但有点不同（这个答案不起作用）。我试图引用条件为真的数据帧。在我的例子中，字符串中是否包含单词库中的单词。如果单词在字符串中，我希望以后能够使用特定的数据帧（如如果为true，则拉出链接并继续搜索）。因此，我： wordBank=[“炸弹”、“爆炸”、“抗议”， “端口延迟”、“端口关闭”、“劫持”， “热带风暴”、“热带低气压”] rss=pd.read\u csv（'RSSfeed2019.csv'） #打印（rss.head（））提要=[]#提要对象列表对于rss['url']中

类似于但有点不同（这个答案不起作用）。我试图引用条件为真的数据帧。在我的例子中，字符串中是否包含单词库中的单词。如果单词在字符串中，我希望以后能够使用特定的数据帧（如如果为true，则拉出链接并继续搜索）。因此，我：

wordBank=[“炸弹”、“爆炸”、“抗议”，
“端口延迟”、“端口关闭”、“劫持”，
“热带风暴”、“热带低气压”]
rss=pd.read\u csv（'RSSfeed2019.csv'）
#打印（rss.head（））
提要=[]#提要对象列表
对于rss['url']中的url。标题（5）：
feeds.append（feedparser.parse（url））
#打印（提要）
帖子=[]#帖子列表[（标题1，链接1，摘要1），（标题2，链接2，摘要2）…]
对于输入源：
对于在feed.entries中发布的内容：
如果hasattr（post，“summary”）：
posts.append（（post.title、post.link、post.summary））
其他：
posts.append（（post.title，post.link））
df=pd.DataFrame（posts，columns=['title'，'link'，'summary']）
如果（df['summary'].str.find（wordBank））或（df['title'].str.find（wordBank））：
打印（df[“标题]）

从另一个问题开始尝试

df=pd.DataFrame（posts，columns=['title'，'link'，'summary']）
对于wordBank中的word：
掩码=（df['summary'].str.find（word））或（df['title'].str.find（word））
df.loc[mask，'summary']=word
df.loc[掩码，'标题']=单词

如何让它打印摘要或标题中包含单词的字段的标题？我希望能够进一步操纵这些帧。在当前代码中，它会打印数据框中的每个标题，因为我认为既然一个是真的，它会打印所有标题。如何仅引用为true的标题？

给定以下设置：

posts = [["Global protest Breaks Record", 'porttechnology.org/news/global-teu-breaks-record/', "The world’s total cellular containership fleet has passed 23 million TEU for the first time, according to shipping experts Alphaliner."],
         ["Global TEU Breaks Record", 'porttechnology.org/news/global-teu-breaks-record/', "The world’s total cellular containership fleet has passed 23 million TEU for the first time, according to shipping experts Alphaliner."],
         ["Global TEU Breaks Record", 'porttechnology.org/news/global-teu-breaks-record/', "There is a tropical depression"]]

df = pd.DataFrame(posts, columns=['title', 'link', 'summary'])
print(df)

设置

                          title  ...                                            summary
0  Global protest Breaks Record  ...  The world’s total cellular containership fleet...
1      Global TEU Breaks Record  ...  The world’s total cellular containership fleet...
2      Global TEU Breaks Record  ...                     There is a tropical depression

你可以：

# create mask
mask = df['summary'].str.contains(rf"\b{'|'.join(wordBank)}\b", case=False) | df['title'].str.contains(rf"\b{'|'.join(wordBank)}\b", case=False)

# extract titles
titles = df['title'].values

# print them
for title in titles[mask]:
    print(title)

输出

Global protest Breaks Record
Global TEU Breaks Record

请注意，标题中第一行有

抗议

，摘要中最后一行有

热带低压

。的关键思想是使用正则表达式来匹配
wordBank
中的一个选项。查看有关regex的更多信息，以及。
的文档。您能提供一个帖子示例吗？请参阅编辑-删除帖子，因为我看到的时间有点长，但如果您能包含前10个或其他内容，甚至一个虚拟示例，将更容易验证所提供的解决方案是否真的如预期那样工作。哦，错了-我明白您的意思。这里有一个：
52“全球标准箱破纪录”https://www.porttechnology.org/news/global-teu-breaks-record/?utm_source=Feeds&utm_campaign=News&utm_medium=rss 据航运专家Alphaliner称，全球蜂窝式集装箱船船队总数首次超过2300万TEU
好的，谢谢你-这可以满足我99%的需求。我有一个小问题。既然您正在提取标题，那么它是否与数据帧分离？（不确定我是否问得对）换句话说，如何用原始数据创建新的数据帧？例如：
对于titles[mask]中的title:hits=pd.DataFrame[titles[mask]]打印（hits['link']）
您可以执行df[mask]，这将创建一个新的数据框