Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/354.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/7/python-2.7/5.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 删除特定文本的嵌套网页_Python_Python 2.7_Python 3.x_Web Scraping_Beautifulsoup - Fatal编程技术网

Python 删除特定文本的嵌套网页

Python 删除特定文本的嵌套网页,python,python-2.7,python-3.x,web-scraping,beautifulsoup,Python,Python 2.7,Python 3.x,Web Scraping,Beautifulsoup,提前感谢找到所有的tr标签并使用方法获取文本。然后按\n删除文本,并使用删除空字符串。在这里,您可以在一行中获得所需的所有数据 from bs4 import BeautifulSoup from urllib.request import urlopen import re #beautiful soup scrape scraped = urlopen('http://www.example.org/inmates/').read() soup = BeautifulSoup(scrape

提前感谢

找到所有的
tr
标签并使用方法获取文本。然后按
\n
删除文本,并使用删除空字符串。在这里,您可以在一行中获得所需的所有数据

from bs4 import BeautifulSoup
from urllib.request import urlopen
import re

#beautiful soup scrape
scraped = urlopen('http://www.example.org/inmates/').read()
soup = BeautifulSoup(scraped, 'html.parser')

for item in soup.find_all('tr',{'id' : re.compile('^inmate') }):
    for name in item ('td',{'class'  : "row alt"}):
        print (item)
输出

from bs4 import BeautifulSoup
from urllib.request import urlopen
import re

#beautiful soup scrape
scraped = urlopen('http://www.example.org/inmates/').read()
soup = BeautifulSoup(scraped, 'html.parser')

for item in soup.find_all('tr', {'id' : re.compile('^inmate')}):
    data = list(filter(None, item.get_text().split('\n')))
    print(data)
data = list(filter(None, item.get_text().split('\n')))[2:]
如果要删除前2个元素,只需对列表进行切片即可

['3', 'View', 'LAST Name', 'FIRST Name', 'LAST Name', '08/26/2017', '41', 'M']
输出

from bs4 import BeautifulSoup
from urllib.request import urlopen
import re

#beautiful soup scrape
scraped = urlopen('http://www.example.org/inmates/').read()
soup = BeautifulSoup(scraped, 'html.parser')

for item in soup.find_all('tr', {'id' : re.compile('^inmate')}):
    data = list(filter(None, item.get_text().split('\n')))
    print(data)
data = list(filter(None, item.get_text().split('\n')))[2:]

我还没有检查过,但我相信它会起作用的。非常感谢。