Python 即使屏幕刮板的路径正确,列表也返回空

Python 即使屏幕刮板的路径正确,列表也返回空,python,screen-scraping,Python,Screen Scraping,因此,我试图从育碧网站上的免费游戏网站获得所有的url,但它总是返回空的。我不知道我做错了什么,下图显示了路径 headers = { "User-Agent":"Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Firefox/78.0", } result = requests.get("https://free.ubisoft.com/", headers=headers) soup = BeautifulSoup(r

因此,我试图从育碧网站上的免费游戏网站获得所有的url,但它总是返回空的。我不知道我做错了什么,下图显示了路径

headers = {
    "User-Agent":"Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Firefox/78.0",
}

result = requests.get("https://free.ubisoft.com/", headers=headers)
soup = BeautifulSoup(result.content, 'lxml')
print(result.content)
urls = []
urls = soup.find('div', {'class': 'free-events'}).find_all("a")
for url in urls:

    link = url.attrs['data-url']
    if "https" in link:
        links.append(link)

return links

数据是动态加载的,因此如果您打印
result.content
,您会看到只有一些简单的HTML和Javascript

使用Selenium,您可以加载页面并检索如下链接:

从selenium导入webdriver
从selenium.webdriver.chrome.options导入选项
选项=选项()
options.headless=True
browser=webdriver.Chrome(Chrome\u options=options)
browser.get(“https://free.ubisoft.com/")
用于浏览器中的链接。通过css选择器(“div.free-event-button a[data type='freegame'])查找元素:
打印(link.get_属性(“数据url”))
# https://register.ubisoft.com/aco-discovery-tour
# https://register.ubisoft.com/acod-discovery-tour
# https://register.ubisoft.com/might_and_magic_chess_royale
# https://register.ubisoft.com/rabbids-coding

数据是动态加载的,因此如果您打印
result.content
,您会看到只有一些简单的HTML和Javascript

使用Selenium,您可以加载页面并检索如下链接:

从selenium导入webdriver
从selenium.webdriver.chrome.options导入选项
选项=选项()
options.headless=True
browser=webdriver.Chrome(Chrome\u options=options)
browser.get(“https://free.ubisoft.com/")
用于浏览器中的链接。通过css选择器(“div.free-event-button a[data type='freegame'])查找元素:
打印(link.get_属性(“数据url”))
# https://register.ubisoft.com/aco-discovery-tour
# https://register.ubisoft.com/acod-discovery-tour
# https://register.ubisoft.com/might_and_magic_chess_royale
# https://register.ubisoft.com/rabbids-coding

内容通过JavaScript动态加载,但您可以使用
请求
模块模拟JavaScript请求

例如:

import re
import requests

configuration_url = 'https://free.ubisoft.com/configuration.js'
configuration_js = requests.get(configuration_url).text

app_id = re.search(r"appId:\s*'(.*?)'",configuration_js).group(1)
url = re.search(r"prod:\s*'(.*?)'",configuration_js).group(1)

data = requests.get(url, headers={'ubi-appid': app_id,'ubi-localecode': 'en-US'}).json()

# pretty print all data:
import json
print(json.dumps(data, indent=4))
印刷品:

{
    "news": [
        {
            "spaceId": "6d0af36b-8226-44b6-a03b-4660073a6349",
            "newsId": "ignt.21387",
            "type": "freegame",
            "placement": "freeevents",
            "priority": 1,
            "displayTime": 0,
            "publicationDate": "2020-05-14T17:01:00",
            "expirationDate": "2020-05-21T18:01:00",
            "title": "Assassin's Creed Origins Discovery Tour",
            "body": "Assassin's Creed Origins Discovery Tour",
            "mediaURL": "https://ubistatic2-a.akamaihd.net/sitegen/assets/img/ac-odyssey/ACO_DiscoveryTour_logo.png",
            "mediaType": null,
            "profileId": null,
            "obj": {},
            "links": [
                {
                    "type": "External",
                    "param": "https://register.ubisoft.com/aco-discovery-tour",
                    "actionName": "goto"
                }
            ],
            "locale": "en-US",
            "tags": null
        },

... and so on.
1    Assassin's Creed Origins Discovery Tour      https://register.ubisoft.com/aco-discovery-tour
2    Assassin's Creed Odyssey Discovery Tour      https://register.ubisoft.com/acod-discovery-tour
3    Uno Demo                                     https://register.ubisoft.com/uno-trial
4    The Division 2 Trial                         https://register.ubisoft.com/the-division-2-trial
5    Ghost Recon Breakpoint Trial                 https://register.ubisoft.com/ghost-recon-breakpoint-trial
6    Might and Magic Chess Royale                 https://register.ubisoft.com/might_and_magic_chess_royale
7    Rabbids Coding                               https://register.ubisoft.com/rabbids-coding
8    Trials Rising Demo                           https://register.ubisoft.com/trials-rising-demo
9    The Crew 2 Trial                             https://register.ubisoft.com/tc2-trial
10   Ghost Recon Wildlands Trial                  https://register.ubisoft.com/ghost-recon-wildlands-trial
11   The Division Trial                           https://register.ubisoft.com/the-division-trial
1    Assassin's Creed Origins Discovery Tour      https://register.ubisoft.com/aco-discovery-tour
2    Assassin's Creed Odyssey Discovery Tour      https://register.ubisoft.com/acod-discovery-tour
3    Might and Magic Chess Royale                 https://register.ubisoft.com/might_and_magic_chess_royale
4    Rabbids Coding                               https://register.ubisoft.com/rabbids-coding
编辑:要迭代此数据,可以使用以下示例:

import re
import requests

configuration_url = 'https://free.ubisoft.com/configuration.js'
configuration_js = requests.get(configuration_url).text

app_id = re.search(r"appId:\s*'(.*?)'",configuration_js).group(1)
url = re.search(r"prod:\s*'(.*?)'",configuration_js).group(1)

data = requests.get(url, headers={'ubi-appid': app_id,'ubi-localecode': 'en-US'}).json()

for no, news in enumerate(data['news'], 1):
    print('{:<5}{:<45}{}'.format(no, news['title'], news['links'][0]['param']))
编辑2:要仅过滤免费游戏,您可以执行以下操作:

no = 1
for news in data['news']:
    if news['type'] != 'freegame':
        continue
    print('{:<5}{:<45}{}'.format(no, news['title'], news['links'][0]['param']))
    no += 1

内容是通过JavaScript动态加载的,但是您可以使用
requests
模块模拟JavaScript请求

例如:

import re
import requests

configuration_url = 'https://free.ubisoft.com/configuration.js'
configuration_js = requests.get(configuration_url).text

app_id = re.search(r"appId:\s*'(.*?)'",configuration_js).group(1)
url = re.search(r"prod:\s*'(.*?)'",configuration_js).group(1)

data = requests.get(url, headers={'ubi-appid': app_id,'ubi-localecode': 'en-US'}).json()

# pretty print all data:
import json
print(json.dumps(data, indent=4))
印刷品:

{
    "news": [
        {
            "spaceId": "6d0af36b-8226-44b6-a03b-4660073a6349",
            "newsId": "ignt.21387",
            "type": "freegame",
            "placement": "freeevents",
            "priority": 1,
            "displayTime": 0,
            "publicationDate": "2020-05-14T17:01:00",
            "expirationDate": "2020-05-21T18:01:00",
            "title": "Assassin's Creed Origins Discovery Tour",
            "body": "Assassin's Creed Origins Discovery Tour",
            "mediaURL": "https://ubistatic2-a.akamaihd.net/sitegen/assets/img/ac-odyssey/ACO_DiscoveryTour_logo.png",
            "mediaType": null,
            "profileId": null,
            "obj": {},
            "links": [
                {
                    "type": "External",
                    "param": "https://register.ubisoft.com/aco-discovery-tour",
                    "actionName": "goto"
                }
            ],
            "locale": "en-US",
            "tags": null
        },

... and so on.
1    Assassin's Creed Origins Discovery Tour      https://register.ubisoft.com/aco-discovery-tour
2    Assassin's Creed Odyssey Discovery Tour      https://register.ubisoft.com/acod-discovery-tour
3    Uno Demo                                     https://register.ubisoft.com/uno-trial
4    The Division 2 Trial                         https://register.ubisoft.com/the-division-2-trial
5    Ghost Recon Breakpoint Trial                 https://register.ubisoft.com/ghost-recon-breakpoint-trial
6    Might and Magic Chess Royale                 https://register.ubisoft.com/might_and_magic_chess_royale
7    Rabbids Coding                               https://register.ubisoft.com/rabbids-coding
8    Trials Rising Demo                           https://register.ubisoft.com/trials-rising-demo
9    The Crew 2 Trial                             https://register.ubisoft.com/tc2-trial
10   Ghost Recon Wildlands Trial                  https://register.ubisoft.com/ghost-recon-wildlands-trial
11   The Division Trial                           https://register.ubisoft.com/the-division-trial
1    Assassin's Creed Origins Discovery Tour      https://register.ubisoft.com/aco-discovery-tour
2    Assassin's Creed Odyssey Discovery Tour      https://register.ubisoft.com/acod-discovery-tour
3    Might and Magic Chess Royale                 https://register.ubisoft.com/might_and_magic_chess_royale
4    Rabbids Coding                               https://register.ubisoft.com/rabbids-coding
编辑:要迭代此数据,可以使用以下示例:

import re
import requests

configuration_url = 'https://free.ubisoft.com/configuration.js'
configuration_js = requests.get(configuration_url).text

app_id = re.search(r"appId:\s*'(.*?)'",configuration_js).group(1)
url = re.search(r"prod:\s*'(.*?)'",configuration_js).group(1)

data = requests.get(url, headers={'ubi-appid': app_id,'ubi-localecode': 'en-US'}).json()

for no, news in enumerate(data['news'], 1):
    print('{:<5}{:<45}{}'.format(no, news['title'], news['links'][0]['param']))
编辑2:要仅过滤免费游戏,您可以执行以下操作:

no = 1
for news in data['news']:
    if news['type'] != 'freegame':
        continue
    print('{:<5}{:<45}{}'.format(no, news['title'], news['links'][0]['param']))
    no += 1

当我运行代码时,如何阻止它打开chrome?启动浏览器时使用
headless
查看我的更新答案。当我运行代码时,如何阻止它打开chrome?启动浏览器时使用
headless
查看我的更新答案。我如何根据有多少游戏循环浏览json数据?现在有4场比赛,我如何确保即使有5场比赛,也能把它们全部拿到手games@DawidOlejnik我在答案中添加了一个如何迭代数据的示例。是否有任何方法可以过滤掉这些痕迹。因此,只有完整的游戏才会出现,即刺客信条起源探索之旅、刺客信条奥德赛探索之旅、威武和魔法国际象棋皇家、拉比·科丁我如何根据有多少游戏循环使用json数据?现在有4场比赛,我如何确保即使有5场比赛,也能把它们全部拿到手games@DawidOlejnik我在答案中添加了一个如何迭代数据的示例。是否有任何方法可以过滤掉这些痕迹。因此,只有完整的游戏显示,即刺客的信条起源发现之旅,刺客的信条奥德赛发现之旅,可能和魔法象棋皇家,拉比编码