使用python beautifulsoup将html一分为二_Python_Web Scraping_Beautifulsoup

使用python beautifulsoup将html一分为二

python web-scraping

使用python beautifulsoup将html一分为二,python,web-scraping,beautifulsoup,Python,Web Scraping,Beautifulsoup,我正试图放弃一个网站，我需要削减一半的HTML代码。问题是HTML代码组织得不是很好，我不能只使用findAll 以下是我解析HTML代码的代码： resultats = requests.get(URL) bs = BeautifulSoup(resultats.text, 'html.parser') 我想做的是为我找到的每个划分bs：解决方案可能非常简单，但我找不到它编辑：网站，这将删除不包含html的全文： import urllib2, json, re from bs4 im

我正试图放弃一个网站，我需要削减一半的HTML代码。问题是HTML代码组织得不是很好，我不能只使用

findAll

以下是我解析HTML代码的代码：

resultats = requests.get(URL)
bs = BeautifulSoup(resultats.text, 'html.parser')

我想做的是为我找到的每个

划分

bs

：

解决方案可能非常简单，但我找不到它

编辑：网站，

这将删除不包含html的全文：

import urllib2, json, re
from bs4 import BeautifulSoup

url = "https://fr.wikipedia.org/wiki/Liste_de_sondages_sur_l'%C3%A9lection_pr%C3%A9sidentielle_fran%C3%A7aise_de_2017#Avril"
resultats = urllib2.urlopen(url)
html = resultats.read()

soup = BeautifulSoup(html, 'html5lib')
soup = soup.get_text() # Extracts Text from HTML

print soup

如果您想保留某些信息，可以添加以下内容：

soup = re.sub(re.compile('yourRegex', re.DOTALL), '', soup)\
       .strip()

这将删除不包含html的整个文本：

import urllib2, json, re
from bs4 import BeautifulSoup

url = "https://fr.wikipedia.org/wiki/Liste_de_sondages_sur_l'%C3%A9lection_pr%C3%A9sidentielle_fran%C3%A7aise_de_2017#Avril"
resultats = urllib2.urlopen(url)
html = resultats.read()

soup = BeautifulSoup(html, 'html5lib')
soup = soup.get_text() # Extracts Text from HTML

print soup

如果您想保留某些信息，可以添加以下内容：

soup = re.sub(re.compile('yourRegex', re.DOTALL), '', soup)\
       .strip()

不要使用图像。以文本形式添加示例数据。一些html代码或链接，请参阅我试图删除的网站：不要使用图像。以文本形式添加示例数据。一些html代码或链接，请参阅我试图删除的网站：谢谢，似乎工作正常，但我需要保持漂亮的组格式，而不是文本格式。之后我需要使用findAll。谢谢，看起来很好，但我需要保持漂亮的组格式，而不是文本格式。之后我需要用芬德尔。