Python 删除特定文本的嵌套网页
提前感谢找到所有的Python 删除特定文本的嵌套网页,python,python-2.7,python-3.x,web-scraping,beautifulsoup,Python,Python 2.7,Python 3.x,Web Scraping,Beautifulsoup,提前感谢找到所有的tr标签并使用方法获取文本。然后按\n删除文本,并使用删除空字符串。在这里,您可以在一行中获得所需的所有数据 from bs4 import BeautifulSoup from urllib.request import urlopen import re #beautiful soup scrape scraped = urlopen('http://www.example.org/inmates/').read() soup = BeautifulSoup(scrape
tr
标签并使用方法获取文本。然后按\n
删除文本,并使用删除空字符串。在这里,您可以在一行中获得所需的所有数据
from bs4 import BeautifulSoup
from urllib.request import urlopen
import re
#beautiful soup scrape
scraped = urlopen('http://www.example.org/inmates/').read()
soup = BeautifulSoup(scraped, 'html.parser')
for item in soup.find_all('tr',{'id' : re.compile('^inmate') }):
for name in item ('td',{'class' : "row alt"}):
print (item)
输出
from bs4 import BeautifulSoup
from urllib.request import urlopen
import re
#beautiful soup scrape
scraped = urlopen('http://www.example.org/inmates/').read()
soup = BeautifulSoup(scraped, 'html.parser')
for item in soup.find_all('tr', {'id' : re.compile('^inmate')}):
data = list(filter(None, item.get_text().split('\n')))
print(data)
data = list(filter(None, item.get_text().split('\n')))[2:]
如果要删除前2个元素,只需对列表进行切片即可
['3', 'View', 'LAST Name', 'FIRST Name', 'LAST Name', '08/26/2017', '41', 'M']
输出
from bs4 import BeautifulSoup
from urllib.request import urlopen
import re
#beautiful soup scrape
scraped = urlopen('http://www.example.org/inmates/').read()
soup = BeautifulSoup(scraped, 'html.parser')
for item in soup.find_all('tr', {'id' : re.compile('^inmate')}):
data = list(filter(None, item.get_text().split('\n')))
print(data)
data = list(filter(None, item.get_text().split('\n')))[2:]
我还没有检查过,但我相信它会起作用的。非常感谢。