Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/python-3.x/19.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 3.x 使用Python进行网页抓取的初学者。本网站是否有防刮功能?_Python 3.x - Fatal编程技术网

Python 3.x 使用Python进行网页抓取的初学者。本网站是否有防刮功能?

Python 3.x 使用Python进行网页抓取的初学者。本网站是否有防刮功能?,python-3.x,Python 3.x,我正在尝试做一个自动化的每日网络抓取 但我得到的结果是空的列表。我认为网站上可能有某种保护措施,防止被刮掉 我使用了一些教程来尝试使用BeautifulSoup4和XPath来抓取站点,但这两种方法都给我留下了空列表。我确实在某一点上得到了一个403禁止的错误,但找到了一个使用“hdr={'User-Agent':'Mozilla/5.0'}”的解决方法(不管这意味着什么)。我不熟悉网页抓取,所以我不确定 BeautifulSoup4版本得到了结果,但没有我正在寻找的实际数据 url = "ht

我正在尝试做一个自动化的每日网络抓取 但我得到的结果是空的列表。我认为网站上可能有某种保护措施,防止被刮掉

我使用了一些教程来尝试使用BeautifulSoup4和XPath来抓取站点,但这两种方法都给我留下了空列表。我确实在某一点上得到了一个403禁止的错误,但找到了一个使用“hdr={'User-Agent':'Mozilla/5.0'}”的解决方法(不管这意味着什么)。我不熟悉网页抓取,所以我不确定

BeautifulSoup4版本得到了结果,但没有我正在寻找的实际数据

url = "https://www.cmegroup.com/trading/agricultural/dairy/cash-settled-butter_quotes_globex.html"
hdr = {'User-Agent': 'Mozilla/5.0'}
req = Request(url,headers=hdr)
page = urlopen(req)
soup = BeautifulSoup(page)
print(soup.prettify())
Xpath版本似乎可以连接,但不能传递数据

from lxml import html
import requests
url = "https://www.cmegroup.com/trading/agricultural/dairy/cash-settled-butter_quotes_globex.html"
response = requests.get(url)
tree = html.fromstring(response.content)
data = tree.xpath('//*[@id="quotesFuturesProductTable1"]/tbody/tr[1]/th/span')
data
我想提取姓名、月份和之前的结算。然后最终找出如何让它每天自动提取数据


我做错了什么?

您在网页上看到的数据是通过Javascript动态加载的。BeautifulSoup在这里帮不了你,因为它不执行Javascript

例如,您可以使用
selenium
。或者使用
re
json
模块手动解析数据。此代码将加载json格式的数据并将其打印到屏幕上:

import re
import json
import requests

url = 'https://www.cmegroup.com/trading/agricultural/dairy/cash-settled-butter_quotes_globex.html'

data_url = 'https://www.cmegroup.com' + re.findall(r'component\.url = "(.*?)"', requests.get(url).text)[0]

json_data = requests.get(data_url).json()

print(json.dumps(json_data, indent=4))
印刷品:

{
    "quoteDelayed": true,
    "quoteDelay": "10 minutes",
    "tradeDate": "14 Aug 2019",
    "quotes": [
        {
            "last": "235.850",
            "change": "+0.800",
            "priorSettle": "235.050",
            "open": "235.050",
            "close": "-",
            "high": "235.850",
            "low": "235.050",
            "highLimit": "241.725",
            "lowLimit": "231.725",
            "volume": "2",
            "mdKey": "CBQ9-XCME-G",
            "quoteCode": "CBQ9",
            "escapedQuoteCode": "CBQ9",
            "code": "CBQ9",
            "updated": "11:27:33 CT<br /> 14 Aug 2019",
            "percentageChange": "+0.34%",
            "expirationMonth": "AUG 2019",
            "expirationCode": "Q9",
            "expirationDate": "20190801",
            "productName": "Cash-settled Butter Futures",
            "productCode": "CB",
            "uri": "/trading/agricultural/dairy/cash-settled-butter.html",
            "productId": 26,
            "exchangeCode": "XCME",
            "optionUri": "/trading/agricultural/dairy/cash-settled-butter_quotes_options.html",
            "hasOption": true,
            "lastTradeDate": {
                "timestamp": 1567573200000,
                "dateOnlyLongFormat": "04 Sep 2019",
                "default24": "09/04/2019, 00:00:00 CDT",
                "default12": "09/04/2019, 12:00:00 AM CDT",
                "verbose": "September 04, 2019 12:00:00 AM CDT"
            },
            "priceChart": {
                "enabled": true,
                "code": "CB",
                "monthYear": "Q9",
                "venue": 1,
                "title": "AUG_2019_Cash-settled_Butter_",
                "year": 2019
            },
            "netChangeStatus": "statusOK",
            "highLowLimits": "241.725 / 231.725"
        },

...and so on.
{
“QuotedLayed”:正确,
“引用播放”:“10分钟”,
“交易日期”:“2019年8月14日”,
“引言”:[
{
“最后”:“235.850”,
“变更”:“+0.800”,
“priorSettle”:“235.050”,
“打开”:“235.050”,
“关闭”:“—”,
“高”:“235.850”,
“低”:“235.050”,
“上限”:“241.725”,
“低限”:“231.725”,
“卷”:“2”,
“mdKey”:“CBQ9-XCME-G”,
“报价代码”:“CBQ9”,
“escapedQuoteCode”:“CBQ9”,
“代码”:“CBQ9”,
“更新”:“2019年8月14日11:27:33 CT
”, “百分比变化”:“+0.34%”, “到期月份”:“2019年8月”, “到期代码”:“Q9”, “到期日期”:“20190801”, “产品名称”:“以现金结算的黄油期货”, “产品代码”:“CB”, “uri”:“/trading/agricultural/dairy/cash-settled butter.html”, “productId”:26, “exchangeCode”:“XCME”, “optionUri”:“/trading/agricultural/dairy/cash-settled-butter\u quotes\u options.html”, “hasOption”:没错, “lastTradeDate”:{ “时间戳”:156757300000, “DateOnlyLong格式”:“2019年9月4日”, “default24”:“2019年4月9日,CDT时间00:00:00”, “default12”:“2019年4月9日,CDT上午12:00:00”, “详细”:“2019年9月4日CDT上午12:00:00” }, “价格表”:{ “启用”:正确, “代码”:“CB”, “monthYear”:“Q9”, “地点”:1, “标题”:“2019年8月\u现金结算\u黄油”, “年份”:2019年 }, “netChangeStatus”:“statusOK”, “高低限”:“241.725/231.725” }, 等等