Python 使用BS4从远程HTML解析表_Python_Web Scraping_Beautifulsoup

Python 使用BS4从远程HTML解析表

python web-scraping

Python 使用BS4从远程HTML解析表,python,web-scraping,beautifulsoup,Python,Web Scraping,Beautifulsoup,我是Python3新手，我想解析一个HTML页面。我使用BS4，并希望解析，例如，此页面：我只对这本书感兴趣 <div id="history" style="display:none" > 表及其相关的标记这是我的。我不知道如何迭代表中的所有 import urllib.request from html.parser import HTMLParser url_to_parse = 'http://www.myfxbook.com/members/fxgrowthbot

我是Python3新手，我想解析一个HTML页面。我使用BS4，并希望解析，例如，此页面：

我只对这本书感兴趣

<div id="history"  style="display:none" >

表及其相关的标记

这是我的。我不知道如何迭代表中的所有

import urllib.request
from html.parser import HTMLParser

url_to_parse = 'http://www.myfxbook.com/members/fxgrowthbot/forex-growth-bot/71611'

from bs4 import BeautifulSoup
print( 'Requesting URL ' + url_to_parse + '...')
response = urllib.request.urlopen( url_to_parse )
print('Done')

print( 'Reading URL ' + url_to_parse + '...')
html = response.read()
print('Done')

soup = BeautifulSoup( str(html) )

print( '*** History ***')
for h in soup.find_all("div", attrs={"id" : "history"}):
print( 'Found Historyy <div>!')

history = soup.select("#history")
# How to iterate over history table's td?

任何帮助都将不胜感激

问候

以下是您的做法：

import urllib.request
from bs4 import BeautifulSoup

url_to_parse = 'http://www.myfxbook.com/members/fxgrowthbot/forex-growth-bot/71611'
response = urllib.request.urlopen(url_to_parse)
html = response.read()
soup = BeautifulSoup(html)
a = soup.find(id='history').find_all('td')

print(len(a))  # 300

以下是我使用的完整代码，它只返回第一行的：导入urllib.request 从bs4导入BeautifulSoup

url_to_parse = 'http://www.myfxbook.com/members/autotrade/wallstreet-forex-robot-real/95290'

print( 'Requesting URL ' + url_to_parse + '...')
response = urllib.request.urlopen( url_to_parse )
print('Done')

print( 'Reading URL ' + url_to_parse + '...')
html = response.read()
print('Done')

soup = BeautifulSoup( str(html) )

history_td = soup.find(id='history').find_all('td')
for td in history_td:
    print(td)

那么历史=汤呢。选择历史。查找所有内容？谢谢本杰明。这并没有奏效。相反，我必须使用：history=soup.selecthistory，然后选择history\u tds=history[0]。find\u alltd，recursive=True。但是，无论递归参数的值是什么，它只提供对第一个的访问，而不提供对其他参数的访问。有什么想法吗？正如我在之前的评论中所说的，这只会让我接触到我刚刚用你的新解决方案尝试的第一个。我怎么才能接近其他人？本杰明，你是对的，我犯了一个错误。有几个，但只有第一行的。那么，你是否设法抓住了桌子上所有的？从chrome获得的第一行数据的XPath是//*[@id=tradingHistoryTable]/tbody/tr[2]，第二行数据的XPath是/*[@id=tradingHistoryTable]/tbody/tr[4]。请您确认您的代码只获得第一行的？这怎么可能？？我使用Python3.3.3和BS4.3.2，windows 64位。“你呢？是我还是你抄了我的答案？”本杰明图格：是的，我抄了。所以，我不明白为什么我只得到第一行的，而你，得到了300+…我已经用完整的脚本更新了我的答案，让我知道。您不需要使用strhtml，只需要使用html。这就是问题所在。