获取BS4 Python decompose（）中除mailto:和tel:之外的所有HTML数据_Python_Beautifulsoup

获取BS4 Python decompose（）中除mailto:和tel:之外的所有HTML数据

python

获取BS4 Python decompose（）中除mailto:和tel:之外的所有HTML数据,python,beautifulsoup,Python,Beautifulsoup,我需要从HTML中取出电话号码和电子邮件我可以得到数据 description_source = soup.select('a[href^="mailto:"]'), soup.select('a[href^="tel:"]') 但我不想要它我正在尝试使用分解我得到这个错误 TypeError:decompose（）接受1个位置参数，但给出了2个我考虑过使用 SoupStrainer 但看起来我必须包括除邮寄和电话之外的所有信息才能获得正

我需要从HTML中取出电话号码和电子邮件

我可以得到数据

description_source = soup.select('a[href^="mailto:"]'),  
                     soup.select('a[href^="tel:"]')

但我不想要它

我正在尝试使用

分解

我得到这个错误

TypeError:decompose（）接受1个位置参数，但给出了2个

我考虑过使用

SoupStrainer

但看起来我必须包括除邮寄和电话之外的所有信息才能获得正确的信息

此位的完整当前代码如下

import requests
from bs4 import BeautifulSoup as bs4

item_number = '122124438749' 

ebay_url = "http://vi.vipr.ebaydesc.com/ws/eBayISAPI.dll?ViewItemDescV4&item=" + item_number
r = requests.get(ebay_url)
html_bytes = r.text
soup = bs4(html_bytes, 'html.parser')

description_source = soup.decompose('a[href^="mailto:"]')
#description_source.

print(description_source)

尝试使用

find_all（）

。找到该页面中的所有链接，然后检查哪些链接包含电话和电子邮件。然后使用

extract（）将其删除。

使用

lxml

解析器以加快处理速度。还建议在官方文档中使用

import requests
from bs4 import BeautifulSoup

item_number = '122124438749' 

ebay_url = "http://vi.vipr.ebaydesc.com/ws/eBayISAPI.dll?ViewItemDescV4&item=" + item_number
r = requests.get(ebay_url)
html_bytes = r.text
soup = BeautifulSoup(html_bytes, 'lxml')

links = soup.find_all('a')
email = ''
phone = ''

for link in links:
    if(link.get('href').find('tel:') > -1):
        link.extract()

    elif(link.get('href').find('mailto:') > -1):
        link.extract()

print(soup.prettify())

你也可以使用

decompose（）

来代替

extract（）

发布你的完整代码。嘿，这是一个很棒的小脚本，我玩得很开心。。但是，这将返回我希望删除的数据。。我希望删除所有tel:和mailto:HTML。必须下载整个html。。然后在不使用te的情况下保存，并使用tel:alos将语法分析器更改为XML！嘿，伙计。太棒了。：）

import requests
from bs4 import BeautifulSoup

item_number = '122124438749' 

ebay_url = "http://vi.vipr.ebaydesc.com/ws/eBayISAPI.dll?ViewItemDescV4&item=" + item_number
r = requests.get(ebay_url)
html_bytes = r.text
soup = BeautifulSoup(html_bytes, 'lxml')

links = soup.find_all('a')
email = ''
phone = ''

for link in links:
    if(link.get('href').find('tel:') > -1):
        link.extract()

    elif(link.get('href').find('mailto:') > -1):
        link.extract()

print(soup.prettify())