Python BeautifulSoup-按标点符号筛选铁路超高列表结果_Python_Beautifulsoup_Python 3.7

Python BeautifulSoup-按标点符号筛选铁路超高列表结果

python

Python BeautifulSoup-按标点符号筛选铁路超高列表结果,python,beautifulsoup,python-3.7,Python,Beautifulsoup,Python 3.7,我试图从Python的结果中排除问号和冒号，但是它们一直出现在最终输出中。结果按“无”过滤，但不按标点符号过滤任何帮助都将不胜感激 #Scrape BBC for Headline text url = 'https://www.bbc.co.uk/news' res = requests.get(url) html_page = res.content soup = BeautifulSoup(html_page, 'html.parser') tags = soup.find_all(c

我试图从Python的结果中排除问号和冒号，但是它们一直出现在最终输出中。结果按“无”过滤，但不按标点符号过滤

任何帮助都将不胜感激

#Scrape BBC for Headline text
url = 'https://www.bbc.co.uk/news'
res = requests.get(url)
html_page = res.content
soup = BeautifulSoup(html_page, 'html.parser')

tags = soup.find_all(class_='gs-c-promo-heading__title')
#print(headlines)
headlines = list()

for i in tags:
    if i.string is not None:
        if i.string != ":":
            if i.string != "?":
                headlines.append(i.string)

您正在将整个字符串与字符进行比较，但想知道字符串是否包含字符-如果您真的想这样做，只需使用

不在中即可：
if ':' not in i.string:
    if '?' not in i.string:

您的方法的问题是，您将跳过结果。我认为最好清除循环中的结果并替换这些字符：
for i in tags:
    print(i.string.replace(':', '').replace(':',''))

如果你想清除更多的字符，也许有更好的方法使用正则表达式
示例
import requests
from bs4 import BeautifulSoup
url = 'https://www.bbc.co.uk/news'
res = requests.get(url)
html_page = res.content
soup = BeautifulSoup(html_page, 'html.parser')

tags = soup.find_all(class_='gs-c-promo-heading__title')
#print(headlines)
headlines = list()

for i in tags:
    if i.string is not None:
        if ':' not in i.string:
            if '?' not in i.string:
                headlines.append(i.string)
headlines

from bs4 import BeautifulSoup

#Scrape BBC for Headline text
url = 'https://www.bbc.co.uk/news'
res = requests.get(url)
html_page = res.content
soup = BeautifulSoup(html_page, 'html.parser')

tags = soup.find_all(class_='gs-c-promo-heading__title')
#print(headlines)
headlines = []

def hd_format(text):
    return re.sub(r"\?|\:", "", text)

for i in tags:
    if i.string is not None:
        headlines.append(hd_format(i.string))

下面是一个正则表达式格式化函数，用于从字符串中排除？
和：
：
def hd_format(text):
   return re.sub(r"\?|\:", "", text)

您可以添加任何其他要排除的字符，只需使用\
分隔它们，并使用\
转义特殊字符即可
完整代码
import requests
from bs4 import BeautifulSoup
url = 'https://www.bbc.co.uk/news'
res = requests.get(url)
html_page = res.content
soup = BeautifulSoup(html_page, 'html.parser')

tags = soup.find_all(class_='gs-c-promo-heading__title')
#print(headlines)
headlines = list()

for i in tags:
    if i.string is not None:
        if ':' not in i.string:
            if '?' not in i.string:
                headlines.append(i.string)
headlines

from bs4 import BeautifulSoup

#Scrape BBC for Headline text
url = 'https://www.bbc.co.uk/news'
res = requests.get(url)
html_page = res.content
soup = BeautifulSoup(html_page, 'html.parser')

tags = soup.find_all(class_='gs-c-promo-heading__title')
#print(headlines)
headlines = []

def hd_format(text):
    return re.sub(r"\?|\:", "", text)

for i in tags:
    if i.string is not None:
        headlines.append(hd_format(i.string))

不幸的是，这仍然让我需要删除的标点符号通过。也许这适用于较旧版本的BeautifulSoup。在第4行，我的版本不承认“bs”缩写。不过，我非常感谢你的帮助。如果你只想删除？而且：我的代码应该可以工作。“bs”是我导入它的方式。我将更新代码以匹配您的导入