Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/331.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 如何刮国定网站_Python_Proxy_Web Scraping_Scrapy - Fatal编程技术网

Python 如何刮国定网站

Python 如何刮国定网站,python,proxy,web-scraping,scrapy,Python,Proxy,Web Scraping,Scrapy,我正在尝试用刮痧刮网站。 但有重定向到错误404页,因为我不是来自那个国家。 如果使用代理,我有相同的。 我的代码: 结果: 2017-03-23 12:49:29 [scrapy] DEBUG: Redirecting (302) to <GET https://wc-prod-joomla.s3.amazonaws.com/404/404.html> from <GET https://www.officeworks.com.au/shop/SearchDisplay?se

我正在尝试用刮痧刮网站。 但有重定向到错误404页,因为我不是来自那个国家。 如果使用代理,我有相同的。 我的代码:

结果:

2017-03-23 12:49:29 [scrapy] DEBUG: Redirecting (302) to <GET https://wc-prod-joomla.s3.amazonaws.com/404/404.html> from <GET https://www.officeworks.com.au/shop/SearchDisplay?searchTerm=acer&storeId=10151&langId=-1&pageSize=24&beginIndex=0&sType=SimpleSearch&resultCatEntryType=2&showResultsPage=true&searchSource=Q&pageView=>
2017-03-23 12:49:34 [scrapy] DEBUG: Crawled (200) <GET https://wc-prod-joomla.s3.amazonaws.com/404/404.html> (referer: None)
<200 https://wc-prod-joomla.s3.amazonaws.com/404/404.html>
2017-03-23 12:49:34 [scrapy] INFO: Closing spider (finished)

我还可以尝试什么使其工作?

产品数据位于以下url中:

https://www.officeworks.com.au/webapp/wcs/stores/servlet/OWPriceView?storeId=10151&catalogId=10551&nc=true&productId=90235%2C90237%2C90239%2C504502%2C532522%2C559534%2C450004%2C495002%2C315544%2C582002%2C90229%2C112392%2C450006%2C536530%2C536532%2C536534%2C536536%2C597502%2C605514%2C396502%2C423002%2C536518%2C559532%2C610502
此页面使用JavaScript从上面的URL获取数据

In [1]: url = '''https://www.officeworks.com.au/webapp/wcs/stores/servlet/OWPriceView?storeId=10151&catalogId=10551&nc=true&productId=90235%2C90237%2C90239%2C504502%2C5
   ...: 32522%2C559534%2C450004%2C495002%2C315544%2C582002%2C90229%2C112392%2C450006%2C536530%2C536532%2C536534%2C536536%2C597502%2C605514%2C396502%2C423002%2C536518%2C
   ...: 559532%2C610502'''

In [2]: fetch(url)
2017-03-23 20:01:00 [scrapy.core.engine] INFO: Spider opened
2017-03-23 20:01:01 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.officeworks.com.au/webapp/wcs/stores/servlet/OWPriceView?storeId=10151&catalogId=10551&nc=true&productId=90235%2C90237%2C90239%2C504502%2C532522%2C559534%2C450004%2C495002%2C315544%2C582002%2C90229%2C112392%2C450006%2C536530%2C536532%2C536534%2C536536%2C597502%2C605514%2C396502%2C423002%2C536518%2C559532%2C610502> (referer: None)

In [3]: import json

In [4]: json.loads(response.text)
Out[4]: 
[{'bulkbuy': True,
  'hasContractPrice': False,
  'partNumber': 'ACC120',
  'price': '$357.00',
  'priceRange': [{'maximumQuantity': '2.0',
    'minimumQuantity': '1',
    'value': {'currency': 'AUD', 'value': 357.0}},
   {'maximumQuantity': '',
    'minimumQuantity': '3',
    'value': {'currency': 'AUD', 'value': 314.0}}],
  'priceRangeExclTax': [],
  'productId': '90229'},
[1]中的
:url=''https://www.officeworks.com.au/webapp/wcs/stores/servlet/OWPriceView?storeId=10151&catalogId=10551&nc=true&productId=90235%2C90237%2C90239%2C504502%2C5
…:32522%2C559534%2C450004%2C495002%2C315544%2C582002%2C90229%2C112392%2C450006%2C536530%2C536532%2C536534%2C536536%2C597502%2C605514%2C396502%2C423002%2C536518%2C
…:559532%2C610502''
在[2]中:获取(url)
2017-03-23 20:01:00[刮屑.堆芯.发动机]信息:星形轮已打开
2017-03-23 20:01:01[scrapy.core.engine]调试:爬网(200)(参考:无)
在[3]中:导入json
[4]中:json.load(response.text)
出[4]:
[{'bulkbuy':没错,
“hasContractPrice”:错误,
“零件号”:“ACC120”,
“价格”:“357.00美元”,
'priceRange':[{'maximumQuantity':'2.0',
“最小数量”:“1”,
'value':{'currency':'AUD','value':357.0},
{'maximumQuantity':'',
“最小数量”:“3”,
'value':{'currency':'AUD','value':314.0}],
“价格范围不含税”:[],
'productId':'90229'},

你在哪里?我可以从瑞士加载原始URL。你能在浏览器中加载URL吗?@MartinBonner我来自乌克兰。在使用vpn的浏览器中加载良好。如果你使用chrome,请使用“菜单>更多工具>开发者工具”并选择“网络”tab以查看所有正在进行的请求。我确信其他浏览器也有类似的功能。显然,您需要确保将vpn用于curl/scrapy请求。我首先从www.iplocation.net获取,看看您得到了什么。@MartinBonner解决了。我刚找到另一个代理ip。我认为问题不在JavaScript中。它是专业的连接到网站的问题,但不是从我的国家刮网站我重定向到
https://www.officeworks.com.au/webapp/wcs/stores/servlet/OWPriceView?storeId=10151&catalogId=10551&nc=true&productId=90235%2C90237%2C90239%2C504502%2C532522%2C559534%2C450004%2C495002%2C315544%2C582002%2C90229%2C112392%2C450006%2C536530%2C536532%2C536534%2C536536%2C597502%2C605514%2C396502%2C423002%2C536518%2C559532%2C610502
In [1]: url = '''https://www.officeworks.com.au/webapp/wcs/stores/servlet/OWPriceView?storeId=10151&catalogId=10551&nc=true&productId=90235%2C90237%2C90239%2C504502%2C5
   ...: 32522%2C559534%2C450004%2C495002%2C315544%2C582002%2C90229%2C112392%2C450006%2C536530%2C536532%2C536534%2C536536%2C597502%2C605514%2C396502%2C423002%2C536518%2C
   ...: 559532%2C610502'''

In [2]: fetch(url)
2017-03-23 20:01:00 [scrapy.core.engine] INFO: Spider opened
2017-03-23 20:01:01 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.officeworks.com.au/webapp/wcs/stores/servlet/OWPriceView?storeId=10151&catalogId=10551&nc=true&productId=90235%2C90237%2C90239%2C504502%2C532522%2C559534%2C450004%2C495002%2C315544%2C582002%2C90229%2C112392%2C450006%2C536530%2C536532%2C536534%2C536536%2C597502%2C605514%2C396502%2C423002%2C536518%2C559532%2C610502> (referer: None)

In [3]: import json

In [4]: json.loads(response.text)
Out[4]: 
[{'bulkbuy': True,
  'hasContractPrice': False,
  'partNumber': 'ACC120',
  'price': '$357.00',
  'priceRange': [{'maximumQuantity': '2.0',
    'minimumQuantity': '1',
    'value': {'currency': 'AUD', 'value': 357.0}},
   {'maximumQuantity': '',
    'minimumQuantity': '3',
    'value': {'currency': 'AUD', 'value': 314.0}}],
  'priceRangeExclTax': [],
  'productId': '90229'},