Javascript Python请求和福布斯“欢迎”页面重定向

Javascript Python请求和福布斯“欢迎”页面重定向,javascript,python,selenium,beautifulsoup,python-requests,Javascript,Python,Selenium,Beautifulsoup,Python Requests,有没有可能通过《福布斯》欢迎页面浏览申请?我正在尝试访问这篇文章 http://www.forbes.com/sites/andygreenberg/2012/10/15/how-i-accidentally-helped-compromise-the-secret-keys-of-high-security-handcuffs/ 对于大多数人来说,在重定向到实际的文章之前,它将以一个启动屏幕欢迎页面结束。我注意到,在Chrome中,文章的URL解析为实际文章后会附加一个值,尽管每次都是随机的

有没有可能通过《福布斯》欢迎页面浏览申请?我正在尝试访问这篇文章

http://www.forbes.com/sites/andygreenberg/2012/10/15/how-i-accidentally-helped-compromise-the-secret-keys-of-high-security-handcuffs/
对于大多数人来说,在重定向到实际的文章之前,它将以一个启动屏幕欢迎页面结束。我注意到,在Chrome中,文章的URL解析为实际文章后会附加一个值,尽管每次都是随机的

http://www.forbes.com/sites/andygreenberg/2012/10/15/how-i-accidentally-helped-compromise-the-secret-keys-of-high-security-handcuffs/#216cc0922071
我有一种感觉,这可能涉及到cookies,但到目前为止,我的代码除了构成欢迎页面的html之外,还没有捕获任何html

url = 'http://www.forbes.com/sites/andygreenberg/2012/10/15/how-i-accidentally-helped-compromise-the-secret-keys-of-high-security-handcuffs/'
hdrs = {"User-Agent": 'Mozilla/5.0 (Windows NT 6.0; WOW64; rv:24.0) Gecko/20100101 Firefox/24.0'}
session = requests.session()
text = session.get(url, headers=hdrs, allow_redirects=True)
print ('headers', text.headers)
print ('cookies', requests.utils.dict_from_cookiejar(session.cookies))
print ('html',  text.text)
输出

headers {'Content-Type': 'text/html;charset=utf-8', 'Backend': 'templates', 'Date': 'Tue, 30 Aug 2016 22:37:15 GMT', 'Connection': 'keep-alive', 'Accept-Ranges': 'bytes', 'Content-Language': 'en-US', 'X-Cnection': 'close', 'X-Frame-Options': 'SAMEORIGIN', 'Content-Length': '1983', 'Server': '', 'Vary': 'Accept-Encoding', 'Content-Encoding': 'gzip'}
cookies {'forbesbeta': 'A'}
html <!DOCTYPE html><html class="no-js" lang=""><head><title>Forbes Welcome</title><meta charset="UTF-8"><meta http-equiv="X-UA-Compatible" content="IE=edge,chrome=1"><meta name="viewport" content="width=device-width,initial-scale=1,maximum-scale=2"><meta name="description" content="Forbes Welcome page -- Forbes is a global media company, focusing on business, investing, technology, entrepreneurship, leadership, and lifestyle."><meta name="keywords" content="business news, market analysis, company profiles, personal finance, management, entrepreneurship, investments, financial advice, economy, technology news"><link rel="stylesheet" href="http://i.forbesimg.com/welcomead/styles/abd4e3d6.main.css"><script type="text/javascript">fbs_settings = {
                mobile: 'false',
                preview: 'false',
                test: 'false',
                classes: 'WyJwYWdlR29vZ2xlQWRTdWJjb250ZW50IiwiYWRoaSIsImFkX2tleXdvcmRzX2JvdF9yIiwiZ29vZ2xlLWFkLWFmYy1oZWFkZXIiLCJhcnRpY2xlX2JvdHRvbV9hZCIsImFkc1lOIiwidG9wQWRXcmFwcGVyIiwicmVnaW9uLW1pZGRsZS1hZCIsImFkc0RpdiIsInNfYWQyIiwiYWR3b3JkLWJveCIsImpzLWFkLWltdSIsImFkLXNwb25zb3JlZC1wb3N0IiwiY2VudGVyQWQiLCJiei1hZCIsImFkLTcyOHg5MCIsImdwdC1hZHMiLCJzcG9uc29yLXRleHQtY29udGFpbmVyIiwiYWRfcmVjdGFuZ3VsYXIiLCJob21lQWRCb3hJbkJpZ25ld3MiLCJwb3NfYWR2ZXJ0IiwiY29udGFpbnMtYWQiLCJ0b3AtYWRzZW5zZS1iYW5uZXIiLCJwYWdlSGVhZGVyQWQiLCJibG9jay1zcG9uc29yZWQtbGlua3MiLCJhZDI1MC1oMSIsImNoYW5nZV9BZENvbnRhaW5lciIsImFkX2dyaWQiLCJzcG9uc29yLXNlcnZpY2VzIiwidmlld19hZHNfYm90dG9tX2JnIl0='
            };</script><script type="text/javascript">try {
                fbs_settings.data = {"channel":"channel_0","section":"section_0","location":"welcomead_default","panel":"welcome_ad","contentPositions":[{"position":1,"title":"Quote of the Day","description":"\"Success is a terrible thing and a wonderful thing... Just do what you love.”","following":false,"byline":"Gene Wilder","hideDescription":false,"sponsored":false,"twitterHandle":"","hashtag":""}],"panelId":"panel4","limit":0,"swimlane":false,"more":false,"enableAds":false,"removeBVPrepend":false,"brandvoiceHeader":false,"profileLink":false,"fullListLink":false,"pagination":false,"filters":false,"year":0};
            } catch (err) {
                fbs_settings.data = null;
            }</script><script type="text/javascript">try {
                fbs_settings.angular_preload = ["//i.forbesimg.com/forbes/scripts/c632bd7f.vendor.js","//i.forbesimg.com/forbes/scripts/99f3b378.scripts.js","//i.forbesimg.com/forbes/styles/860430fd.main.css"];
            } catch (err) {
                fbs_settings.angular_preload = null;
            }</script><script src="http://i.forbesimg.com/welcomead/scripts/vendor/69216742.modernizr.js"></script></head><body><div id="app" class="container clearfix default-template ad-300-by-250"><div id="navigation"></div><div id="content"><div id="adblock-hover" class="hidden"><span class="close-btn preloaded"><span class="close">CLOSE</span> <i class="icon icon-close"></i></span> <img> <a href="//www.forbes.com/adblock/instructions/" target="_blank">More Options</a></div>  <script>(function() {
                        setTimeout(function() {
                            var inviEles = document.getElementsByClassName('invisible');
                            for (var ele in inviEles) {
                                if (!inviEles[0]) {
                                    return;
                                }
                                inviEles[0].className = inviEles[0].className.replace('invisible', '');
                            }
                            if (window.performance && performance.mark) {
                                performance.mark('content_visible');
                            }
                        });
                    })();</script><div class="content-container"><div class="content-inner"><h1 class="title">  <i class="invisible branding icon icon-forbes-logo"></i> <span class="top invisible">Quote of</span> <span class="bottom invisible">the Day</span></h1><div class="body">  <p class="body-content invisible">"Success is a terrible thing and a wonderful thing... Just do what you love.”</p>  <p class="body-byline invisible">Gene Wilder</p>  </div></div></div><div class="circle-wrapper"><div class="circle invisible"></div><img class="circle fallback hidden" src="http://i.forbesimg.com/welcomead/images/circle.png"></div>  </div><div id="ads"></div></div><!--[if lte IE 9]>
        <script src="http://i.forbesimg.com/welcomead/scripts/b9b8347c.legacy.js"></script>
        <![endif]--><script src="http://i.forbesimg.com/welcomead/scripts/1a364ca6.vendor.js"></script><script src="http://i.forbesimg.com/welcomead/scripts/8951c3c8.main.js"></script></body></html>

我想,作为一个浏览器,最终可以解析文章,请求也应该能够解析,但由于我不知道福布斯在做什么,我也不知道如何恰当地设计请求参数。有什么想法吗?

我当时从不费心,但后来在另一个项目中使用了Selenium,并且有一个用户要求提供答案,所以下面是使用Selenium通过福布斯首页的基本方法

您需要为selenium安装一个驱动程序,可以是firefox驱动程序、chrome驱动程序,也可以是PhantomJS无头驱动程序。如果您在Mac上,chromedriver可以通过自制软件轻松安装,或者将单个PhantomJS驱动程序文件复制到注释中指定的路径


请求不运行Javascript,因此不会发生。您将需要像Selenium这样的东西,即使这样,也需要等待和点击按钮。另外,我很确定你通过刮擦违反了他们的ToS。谢谢,我来看看Selenium。等待不是什么大问题,因为我已经在脚本中构建了延迟,以避免被阻塞。只要“按钮点击”可以自动进行,而不需要我坐在那里点击按钮!是的,这一切都可以很容易地自动化,但你肯定需要一些能够运行JS的东西。嗨,詹姆斯,你能解决这个问题吗?我正试图得到一篇福布斯的文章,我面临着同样的问题。@Cesar我已经添加了一个解决方案。如果你需要更多的建议,让我知道你特别想做什么,我会看看我是否能帮上忙。
from selenium import webdriver
url = 'http://www.forbes.com/sites/andygreenberg/2012/10/15/how-i-accidentally-helped-compromise-the-secret-keys-of-high-security-handcuffs/'
browser = webdriver.Chrome() # or webdriver.PhantomJS('usr/bin/phantomjs')

browser.get(url)
browser.implicitly_wait(5)
browser.find_element_by_xpath('/html/body/div/div[1] /div/div[1]').click() #  a very explicit xpath to the continue button

# now grab whatever you want from the resulting code using...

browser.find_element_by_css_selector('css selector info').get_attribute('innerHTML')
browser.find_element_by_xpath('xpath info').get_attribute('innerHTML') 
# 'innerHTML grabs whatever the tags you select are surrounding, but other attributes are also possible such as ('href') on an <a> tag.