Python中的Selenium Scrape_Python_Selenium_Screen Scraping

Python中的Selenium Scrape

python selenium

Python中的Selenium Scrape,python,selenium,screen-scraping,Python,Selenium,Screen Scraping,很抱歉问了这个基本问题，但我仍在努力学习。我正试图找到一种聪明的方法，使用Selenium2和Python（在一个页面上有以下内容的多个），使用以下HTML抓取一些股票数据：提前谢谢将该字符串加载到名为html的变量中 from bs4 import BeautifulSoup soup = BeautifulSoup(html) tags = soup.findAll('td') for tag in tags: print tag.getText() BeautifulSoup

很抱歉问了这个基本问题，但我仍在努力学习。我正试图找到一种聪明的方法，使用Selenium2和Python（在一个页面上有以下内容的多个

），使用以下HTML抓取一些股票数据：

提前谢谢

将该字符串加载到名为html的变量中

from bs4 import BeautifulSoup
soup = BeautifulSoup(html)
tags = soup.findAll('td')
for tag in tags:
    print tag.getText()

BeautifulSoup是解析数据的多种方法之一。如果通过查找字符串理解基本Python，也可以使用纯Python函数“请向我们展示您迄今为止所做的尝试，并将其添加到问题中，告诉我们您面临的问题是什么？这样，这里的人会乐意帮助您。我会首先考虑您试图放弃的股票数据的典型格式，然后再从中着手。”。然后，也只有到那时，您才能开发一种提取所需信息的好方法。您真的需要Selenium吗？即，您是否需要加载JavaScript内容或其他内容？除此之外：用selenium加载页面，提取其页面源代码（通过page_source属性），并将html加载到BeautifulSoup中。之后，您可以使用findAll方法/find方法解析相关信息。使用BeautifulSoup会比我上面的方法更快或更高效吗？效率不确定。你得自己计时。

def getData():
    tickerData=[]
    tickerCounter=0
    ignoreText=['Symbol','T','Bid','Ask','Last',' ','','Change','Volume','FSI','Buy   Sell  ']  
    if quoteType=="Summary":
        numDataPoints=9
    elif quoteType=="Detail":
        numDataPoints=21

    for tr in driver.find_elements_by_xpath("//table[contains(@class, 'tableStyle2')]"):
        tds=tr.find_elements_by_tag_name('td')
        for td in tds:
            if td.text not in ignoreText:
                if len(tickerData) == numDataPoints:
                    insertData(tickerData,tickerCounter)
                    tickerData=[]
                    tickerCounter += 1
                tickerData.append(td.text)                      
    insertData(tickerData,tickerCounter)

from bs4 import BeautifulSoup
soup = BeautifulSoup(html)
tags = soup.findAll('td')
for tag in tags:
    print tag.getText()