Python 使用BS4从远程HTML解析表
我是Python3新手,我想解析一个HTML页面。我使用BS4,并希望解析,例如,此页面: 我只对这本书感兴趣Python 使用BS4从远程HTML解析表,python,web-scraping,beautifulsoup,Python,Web Scraping,Beautifulsoup,我是Python3新手,我想解析一个HTML页面。我使用BS4,并希望解析,例如,此页面: 我只对这本书感兴趣 <div id="history" style="display:none" > 表及其相关的标记 这是我的。我不知道如何迭代表中的所有 import urllib.request from html.parser import HTMLParser url_to_parse = 'http://www.myfxbook.com/members/fxgrowthbot
<div id="history" style="display:none" >
表及其相关的import urllib.request
from html.parser import HTMLParser
url_to_parse = 'http://www.myfxbook.com/members/fxgrowthbot/forex-growth-bot/71611'
from bs4 import BeautifulSoup
print( 'Requesting URL ' + url_to_parse + '...')
response = urllib.request.urlopen( url_to_parse )
print('Done')
print( 'Reading URL ' + url_to_parse + '...')
html = response.read()
print('Done')
soup = BeautifulSoup( str(html) )
print( '*** History ***')
for h in soup.find_all("div", attrs={"id" : "history"}):
print( 'Found Historyy <div>!')
history = soup.select("#history")
# How to iterate over history table's td?
任何帮助都将不胜感激
问候以下是您的做法:
import urllib.request
from bs4 import BeautifulSoup
url_to_parse = 'http://www.myfxbook.com/members/fxgrowthbot/forex-growth-bot/71611'
response = urllib.request.urlopen(url_to_parse)
html = response.read()
soup = BeautifulSoup(html)
a = soup.find(id='history').find_all('td')
print(len(a)) # 300
以下是我使用的完整代码,它只返回第一行的
url_to_parse = 'http://www.myfxbook.com/members/autotrade/wallstreet-forex-robot-real/95290'
print( 'Requesting URL ' + url_to_parse + '...')
response = urllib.request.urlopen( url_to_parse )
print('Done')
print( 'Reading URL ' + url_to_parse + '...')
html = response.read()
print('Done')
soup = BeautifulSoup( str(html) )
history_td = soup.find(id='history').find_all('td')
for td in history_td:
print(td)
那么历史=汤呢。选择历史。查找所有内容?谢谢本杰明。这并没有奏效。相反,我必须使用:history=soup.selecthistory,然后选择history\u tds=history[0]。find\u alltd,recursive=True。但是,无论递归参数的值是什么,它只提供对第一个