Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/296.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python BeautifulSoup-按标点符号筛选铁路超高列表结果_Python_Beautifulsoup_Python 3.7 - Fatal编程技术网

Python BeautifulSoup-按标点符号筛选铁路超高列表结果

Python BeautifulSoup-按标点符号筛选铁路超高列表结果,python,beautifulsoup,python-3.7,Python,Beautifulsoup,Python 3.7,我试图从Python的结果中排除问号和冒号,但是它们一直出现在最终输出中。结果按“无”过滤,但不按标点符号过滤 任何帮助都将不胜感激 #Scrape BBC for Headline text url = 'https://www.bbc.co.uk/news' res = requests.get(url) html_page = res.content soup = BeautifulSoup(html_page, 'html.parser') tags = soup.find_all(c

我试图从Python的结果中排除问号和冒号,但是它们一直出现在最终输出中。结果按“无”过滤,但不按标点符号过滤

任何帮助都将不胜感激

#Scrape BBC for Headline text
url = 'https://www.bbc.co.uk/news'
res = requests.get(url)
html_page = res.content
soup = BeautifulSoup(html_page, 'html.parser')

tags = soup.find_all(class_='gs-c-promo-heading__title')
#print(headlines)
headlines = list()

for i in tags:
    if i.string is not None:
        if i.string != ":":
            if i.string != "?":
                headlines.append(i.string)

您正在将整个字符串与字符进行比较,但想知道字符串是否包含字符-如果您真的想这样做,只需使用
不在
中即可:

if ':' not in i.string:
    if '?' not in i.string:
您的方法的问题是,您将跳过结果。我认为最好清除循环中的结果并替换这些字符:

for i in tags:
    print(i.string.replace(':', '').replace(':',''))
如果你想清除更多的字符,也许有更好的方法使用正则表达式

示例

import requests
from bs4 import BeautifulSoup
url = 'https://www.bbc.co.uk/news'
res = requests.get(url)
html_page = res.content
soup = BeautifulSoup(html_page, 'html.parser')

tags = soup.find_all(class_='gs-c-promo-heading__title')
#print(headlines)
headlines = list()

for i in tags:
    if i.string is not None:
        if ':' not in i.string:
            if '?' not in i.string:
                headlines.append(i.string)
headlines
from bs4 import BeautifulSoup

#Scrape BBC for Headline text
url = 'https://www.bbc.co.uk/news'
res = requests.get(url)
html_page = res.content
soup = BeautifulSoup(html_page, 'html.parser')

tags = soup.find_all(class_='gs-c-promo-heading__title')
#print(headlines)
headlines = []

def hd_format(text):
    return re.sub(r"\?|\:", "", text)

for i in tags:
    if i.string is not None:
        headlines.append(hd_format(i.string))

下面是一个正则表达式格式化函数,用于从字符串中排除

def hd_format(text):
   return re.sub(r"\?|\:", "", text)
您可以添加任何其他要排除的字符,只需使用
\
分隔它们,并使用
\
转义特殊字符即可

完整代码

import requests
from bs4 import BeautifulSoup
url = 'https://www.bbc.co.uk/news'
res = requests.get(url)
html_page = res.content
soup = BeautifulSoup(html_page, 'html.parser')

tags = soup.find_all(class_='gs-c-promo-heading__title')
#print(headlines)
headlines = list()

for i in tags:
    if i.string is not None:
        if ':' not in i.string:
            if '?' not in i.string:
                headlines.append(i.string)
headlines
from bs4 import BeautifulSoup

#Scrape BBC for Headline text
url = 'https://www.bbc.co.uk/news'
res = requests.get(url)
html_page = res.content
soup = BeautifulSoup(html_page, 'html.parser')

tags = soup.find_all(class_='gs-c-promo-heading__title')
#print(headlines)
headlines = []

def hd_format(text):
    return re.sub(r"\?|\:", "", text)

for i in tags:
    if i.string is not None:
        headlines.append(hd_format(i.string))

不幸的是,这仍然让我需要删除的标点符号通过。也许这适用于较旧版本的BeautifulSoup。在第4行,我的版本不承认“bs”缩写。不过,我非常感谢你的帮助。如果你只想删除?而且:我的代码应该可以工作。“bs”是我导入它的方式。我将更新代码以匹配您的导入