如何使用JAVASCRIPT抓取站点_Javascript_Python_Web Crawler

如何使用JAVASCRIPT抓取站点

javascript python web-crawler

如何使用JAVASCRIPT抓取站点,javascript,python,web-crawler,Javascript,Python,Web Crawler,我正在使用Python3.x并使用Beautifulsoap练习爬行我想学习如何使用JAVASCRIPT抓取网站（例如）所以，我使用了URL，然后我得到了pdf文件但是，在第一个代码中 "href = javascript:__doPostBack("ct100$ContentPlaceHolder1$btnDown','')" href="javascript:fn_FileDownLoad('NewsLetter/Attach/2016/12/KIPF_161111.pdf', '

我正在使用Python3.x并使用Beautifulsoap练习爬行

我想学习如何使用JAVASCRIPT抓取网站

（例如）

所以，我使用了URL，然后我得到了pdf文件

但是，在第一个代码中

"href = javascript:__doPostBack("ct100$ContentPlaceHolder1$btnDown','')"

href="javascript:fn_FileDownLoad('NewsLetter/Attach/2016/12/KIPF_161111.pdf',
'_KIPF_161111.pdf');">KIPF_161111.pdf</a>

对吧

我很困惑。

我想这就是您需要的方法：

get\u attribute（）

用法如下：

from selenium import webdriver
driver = webdriver.PhantomJS("your phantomjs path")
driver.get("your target url")

#firstly locate the block you need by specifying the css attribute,
#then get its inner HTML code
html = driver.find_element_by_css_selector('...').get_attribute('innerHTML')

#or you can locate the block by the id attribute
html = driver.find_element_by_id('...').get_attribute('innerHTML')

selenium中有一个名为

get_attribute（）

的方法，您可以使用它来获取动态页面的html代码。

"href = javascript:__doPostBack("ct100$ContentPlaceHolder1$btnDown','')"

href="javascript:fn_FileDownLoad('NewsLetter/Attach/2016/12/KIPF_161111.pdf',
'_KIPF_161111.pdf');">KIPF_161111.pdf</a>

from selenium import webdriver
driver = webdriver.PhantomJS("C:\phantomjs.exe")
driver.get("http://blablablablablabla.html")
submitButton.click()

from selenium import webdriver
driver = webdriver.PhantomJS("your phantomjs path")
driver.get("your target url")

#firstly locate the block you need by specifying the css attribute,
#then get its inner HTML code
html = driver.find_element_by_css_selector('...').get_attribute('innerHTML')

#or you can locate the block by the id attribute
html = driver.find_element_by_id('...').get_attribute('innerHTML')