Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/wix/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 相同的urllib2-PULSOUP4代码适用于家庭计算机,但不适用于aws ec2服务器_Python_Beautifulsoup_Urllib2 - Fatal编程技术网

Python 相同的urllib2-PULSOUP4代码适用于家庭计算机,但不适用于aws ec2服务器

Python 相同的urllib2-PULSOUP4代码适用于家庭计算机,但不适用于aws ec2服务器,python,beautifulsoup,urllib2,Python,Beautifulsoup,Urllib2,所以,我有一个非常简单的python beautiful soup程序,它可以打印网页的前1000个字符 from bs4 import BeautifulSoup import urllib2 def soup_maker(url): class RedirectHandler(urllib2.HTTPRedirectHandler): def http_error_302(self, req, fp, code, msg, headers):

所以,我有一个非常简单的python beautiful soup程序,它可以打印网页的前1000个字符

from bs4 import BeautifulSoup
import urllib2

def soup_maker(url):
    class RedirectHandler(urllib2.HTTPRedirectHandler):
        def http_error_302(self, req, fp, code, msg, headers):
            result = urllib2.HTTPError(req.get_full_url(), code, msg, headers, fp)
            result.status = code
            return result



    hdr = {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.64 Safari/537.11',
                                'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
                                'Accept-Charset': 'ISO-8859-1,utf-8;q=0.7,*;q=0.3',
                                'Accept-Encoding': 'none',
                                'Accept-Language': 'en-US,en;q=0.8',
                                'Connection': 'keep-alive'}

    req = urllib2.Request(url,headers=hdr)
    opener = urllib2.build_opener(RedirectHandler())
    webpage = opener.open(req)
    soup = BeautifulSoup(webpage, "html5lib")
    return soup

if __name__ == "__main__":
    url = 'https://offerupnow.com/'
    print str(soup_maker(url))[0:1000]
在我的家用电脑上,它输出:

<!DOCTYPE html>
<html lang="en-US"><head>
    <title>OfferUp - Buy. Sell. Simple.</title>


    <meta charset="utf-8"/>
    <meta content="OfferUp, Offer Up, social shopping, online deals, classifieds, Buy local stuff, Local stuff for sale, Shop local, Local shopping, Local marketplace, Local yard sales, Local garage sales, Gently used baby stuff, Sell locally, Buy locally, Sell stuff online" name="keywords"/>
    <meta content="OfferUp is revolutionizing how we sell by making it a snap! Instantly connect with buyers and sellers near you." name="description"/>
    <meta content="hVckgnfxPSIIYHASW6k-BapqZdaFc19eRe0nI8CneNM" name="google-site-verification"/>
    <meta content="1d7e114ee3af2b13ced8508628f804b9" name="p:domain_verify"/>

    <meta content="NOODP" name="robots"/>
    <meta content="index,follow" name="robots"/>




    <meta content="summary" name="twitter:card"/>
    <meta content="@offerup" name="twitter:site"/>
    <meta content="OfferUp" name="twi
<html><head>
<meta content="noindex,nofollow" name="robots"/>
<script>
(function() {  function getSessionCookies() {   cookieArray = new Array();   var cName = /^\s?incap_ses_/;   var c = document.cookie.split(";");   for (var i = 0; i < c.length; i++) {    key = c[i].substr(0, c[i].indexOf("="));    value = c[i].substr(c[i].indexOf("=") + 1, c[i].length);    if (cName.test(key)) {     cookieArray[cookieArray.length] = value    }   }   return cookieArray  }  function setIncapCookie(vArray) {   try {    cookies = getSessionCookies();    digests = new Array(cookies.length);    for (var i = 0; i < cookies.length; i++) {     digests[i] = simpleDigest((vArray) + cookies[i])    }    res = vArray + ",digest=" + (digests.join())   } catch (e) {    res = vArray + ",digest=" + (encodeURIComponent(e.toString()))   }   createCookie("___utmvc", res, 20)  }  function simpleDigest(mystr) {   var res = 0;   for (var i = 0; i < mystr.length; i++) {    res += mystr.charCodeAt(i)   }   return res  }  fun

报价-购买。卖简单。

您正在抓取的站点可能会阻止来自AWS IP地址的请求,因为这些请求很可能是在抓取机器人。@snakecharmerb我想您可能是对的,有办法解决这个问题吗?