Python 删除特定文本的嵌套网页_Python_Python 2.7_Python 3.x_Web Scraping_Beautifulsoup

Python 删除特定文本的嵌套网页

python python-2.7 python-3.x web-scraping

Python 删除特定文本的嵌套网页,python,python-2.7,python-3.x,web-scraping,beautifulsoup,Python,Python 2.7,Python 3.x,Web Scraping,Beautifulsoup,提前感谢找到所有的tr标签并使用方法获取文本。然后按\n删除文本，并使用删除空字符串。在这里，您可以在一行中获得所需的所有数据 from bs4 import BeautifulSoup from urllib.request import urlopen import re #beautiful soup scrape scraped = urlopen('http://www.example.org/inmates/').read() soup = BeautifulSoup(scrape

提前感谢

找到所有的

tr

标签并使用方法获取文本。然后按

\n

删除文本，并使用删除空字符串。在这里，您可以在一行中获得所需的所有数据

from bs4 import BeautifulSoup
from urllib.request import urlopen
import re

#beautiful soup scrape
scraped = urlopen('http://www.example.org/inmates/').read()
soup = BeautifulSoup(scraped, 'html.parser')

for item in soup.find_all('tr',{'id' : re.compile('^inmate') }):
    for name in item ('td',{'class'  : "row alt"}):
        print (item)

输出

from bs4 import BeautifulSoup
from urllib.request import urlopen
import re

#beautiful soup scrape
scraped = urlopen('http://www.example.org/inmates/').read()
soup = BeautifulSoup(scraped, 'html.parser')

for item in soup.find_all('tr', {'id' : re.compile('^inmate')}):
    data = list(filter(None, item.get_text().split('\n')))
    print(data)

data = list(filter(None, item.get_text().split('\n')))[2:]

如果要删除前2个元素，只需对列表进行切片即可

['3', 'View', 'LAST Name', 'FIRST Name', 'LAST Name', '08/26/2017', '41', 'M']

输出

from bs4 import BeautifulSoup
from urllib.request import urlopen
import re

#beautiful soup scrape
scraped = urlopen('http://www.example.org/inmates/').read()
soup = BeautifulSoup(scraped, 'html.parser')

for item in soup.find_all('tr', {'id' : re.compile('^inmate')}):
    data = list(filter(None, item.get_text().split('\n')))
    print(data)

data = list(filter(None, item.get_text().split('\n')))[2:]

我还没有检查过，但我相信它会起作用的。非常感谢。