Python 3.x 如何在从Python网站提取数据时忽略一个类中的文本

Python 3.x 如何在从Python网站提取数据时忽略一个类中的文本,python-3.x,web-scraping,beautifulsoup,Python 3.x,Web Scraping,Beautifulsoup,我试图从网站上提取评论,每当有人回复评论时,之前的帖子就会包含在评论中。我试图在提取时忽略这些回复 url = "https://www.f150forum.com/f118/do-all-2018-f150-trucks-come-adaptive-cruise-control-369065/index2/" page = requests.get(url) soup = BeautifulSoup(page.text, 'html.parser') comments_lst= soup.f

我试图从网站上提取评论,每当有人回复评论时,之前的帖子就会包含在评论中。我试图在提取时忽略这些回复

url = "https://www.f150forum.com/f118/do-all-2018-f150-trucks-come-adaptive-cruise-control-369065/index2/"
page = requests.get(url)
soup = BeautifulSoup(page.text, 'html.parser')

comments_lst= soup.findAll('div',attrs={"class":"ism-true"})
comments =[]
for item in comments_lst:
    result = [item.get_text(strip=True, separator=" ")]
    comments.append(result)
quotes = []
for item in soup.findAll('div',attrs={"class":"panel alt2"}):
    result = [item.get_text(strip=True, separator=" ")]
    quotes.append(result)

对于最终结果,我不希望引用列表中的数据包含在我的评论中。我尝试使用if,但结果不正确

示例注释[6]给出了以下结果

'Quote: Originally Posted by jeff_the_pilot What the difference between adaptive cruise control on 2018 versus 2017? I believe mine brakes if I encroach another vehicle. It will work in stop and go traffic!'
我的预期结果

It will work in stop and go traffic!

您需要添加一些逻辑,以使用class
panel alt2删除divs中包含的文本:

comments =[]
for item in comments_lst:
    result = [item.get_text(strip=True, separator=" ")]
    if div := item.find('div', class_="panel alt2"):
        result[0] = ' '.join(result[0].split(div.text.split()[-1])[1:])
    comments.append(result)

>>> comments[6]
[' It will work in stop and go traffic!']

这将获取所有不带引号的消息:

import requests
from bs4 import BeautifulSoup

url = "https://www.f150forum.com/f118/do-all-2018-f150-trucks-come-adaptive-cruise-control-369065/index2/"

soup = BeautifulSoup(requests.get(url).content, 'html.parser')

msgs = []
for msg in soup.select('[id^="post_message_"]'):
    for div in msg.select('div:has(> div > label:contains("Quote:"))'):
        div.extract()
    msgs.append( msg.get_text(strip=True, separator='\n') )

#print(msgs) # <-- uncomment to see all messages without Quoted messages

print(msgs[6])

它说附近的语法无效“:=”我尝试使用“!=”但它抛出了一个错误,说“name'div'未定义”@anonymous13哎呀,我使用的是python3.8语法。您可以将其替换为
div=item.find('div',class=“panel alt2”)
,如果div:
It will work in stop and go traffic!