Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/329.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python网站无法解析某些超链接_Python_Web Scraping_Python Requests - Fatal编程技术网

Python网站无法解析某些超链接

Python网站无法解析某些超链接,python,web-scraping,python-requests,Python,Web Scraping,Python Requests,刮削一些网页时,我没有得到相同的来源,当在浏览器中检查。当在浏览器中查看源代码时,作为实际超链接的超链接显示为{url}。下面是示例页面的示例代码 import requests from bs4 import BeautifulSoup as bs page = requests.get("https://www.mckinsey.com/search?q=iot") soup = bs(page.content, 'html.parser') soup.findAll('div', {'cl

刮削一些网页时,我没有得到相同的来源,当在浏览器中检查。当在浏览器中查看源代码时,作为实际超链接的超链接显示为{url}。下面是示例页面的示例代码

import requests
from bs4 import BeautifulSoup as bs
page = requests.get("https://www.mckinsey.com/search?q=iot")
soup = bs(page.content, 'html.parser')
soup.findAll('div', {'class' : 'item title-link'})
如果在浏览器的最后一行检查soup元素,则它是一个完整的url。如果在requests版本中检查它,它只会说{url},而获取soup对象只会是空的。

此门户使用JavaScript从服务器获取数据并放到页面上

使用Chrome/Firefox中的DevTool,您可以看到javaScript发送带有JSON参数的POST请求,并将所有数据作为JSON获取。如果你得到了它,那么你就拥有了所有的字典

import requests

params = {
    'q': 'iot',
    'page': '1',
    'app': '',
    'sort': 'default',
    'ignoreSpellSuggestion': 'false',
}

url = 'https://www.mckinsey.com/services/ContentAPI/SearchAPI.svc/search'

for page in range(1, 3):

    params['page'] = str(page)

    r = requests.post(url, json=params)

    data = r.json() 

    print()
    print("data['data'].keys():\n", data['data'].keys())
    print()      
    print(' currentPage:', data['data']['currentPage'])
    print('  totalPages:', data['data']['totalPages'])
    print('totalResults:', data['data']['totalResults'])
    print()

    print("data['data']['results'][0].keys():\n", data['data']['results'][0].keys())
    print()

    for item in data['data']['results']:
        print(item['title'])
        print(item['url'])
        print('---')
结果:

data['data'].keys():
 dict_keys(['totalResults', 'totalPages', 'currentPage', 'recommendations', 'results'])

 currentPage: 1
  totalPages: 17
totalResults: 166

data['data']['results'][0].keys():
 dict_keys(['title', 'subtitle', 'imageurl', 'dek', 'tag', 'mimetype', 'url'])

Taking the pulse of enterprise <b>IoT</b>
https://www.mckinsey.com/global-themes/internet-of-things/our-insights/taking-the-pulse-of-enterprise-iot
---
An executive&#39;s guide to the Internet of Things
https://www.mckinsey.com/business-functions/digital-mckinsey/our-insights/an-executives-guide-to-the-internet-of-things
---
Internet of Things | Internet of Things
https://www.mckinsey.com/global-themes/internet-of-things/how-we-help-clients
---
Unlocking the potential of the Internet of Things
https://www.mckinsey.com/business-functions/digital-mckinsey/our-insights/the-internet-of-things-the-value-of-digitizing-the-physical-world
---
Internet of Things
https://www.mckinsey.com/global-themes/internet-of-things/our-insights
---
Six ways CEOs can promote cybersecurity in the <b>IoT</b> age
https://www.mckinsey.com/global-themes/internet-of-things/our-insights/six-ways-ceos-can-promote-cybersecurity-in-the-iot-age
---
What&#39;s new with the Internet of Things?
https://www.mckinsey.com/industries/semiconductors/our-insights/whats-new-with-the-internet-of-things
---
How can we recognize the real power of the Internet of Things?
https://www.mckinsey.com/business-functions/digital-mckinsey/our-insights/how-can-we-recognize-the-real-power-of-the-internet-of-things
---
Making sense of Internet of Things platforms
https://www.mckinsey.com/business-functions/digital-mckinsey/our-insights/making-sense-of-internet-of-things-platforms
---
Partnerships, scale, and speed: The hallmarks of a successful <b>IoT</b> strategy
https://www.mckinsey.com/industries/financial-services/our-insights/partnerships-scale-and-speed
---

data['data'].keys():
 dict_keys(['totalResults', 'totalPages', 'currentPage', 'recommendations', 'results'])

 currentPage: 2
  totalPages: 17
totalResults: 166

data['data']['results'][0].keys():
 dict_keys(['title', 'subtitle', 'imageurl', 'dek', 'tag', 'mimetype', 'url'])

THE INTERNET OF THINGS: MAPPING THE VALUE BEYOND THE HYPE
https://www.mckinsey.com/~/media/mckinsey/business%20functions/mckinsey%20digital/our%20insights/the%20internet%20of%20things%20the%20value%20of%20digitizing%20the%20physical%20world/unlocking_the_potential_of_the_internet_of_things_executive_summary.ashx
---
The future of connectivity: Enabling the Internet of Things
https://www.mckinsey.com/global-themes/internet-of-things/our-insights/the-future-of-connectivity-enabling-the-internet-of-things
---
THE INTERNET OF THINGS: MAPPING THE VALUE BEYOND THE HYPE
https://www.mckinsey.com/~/media/mckinsey/business%20functions/mckinsey%20digital/our%20insights/the%20internet%20of%20things%20the%20value%20of%20digitizing%20the%20physical%20world/the-internet-of-things-mapping-the-value-beyond-the-hype.ashx
---
Insurers need to plug into the Internet of Things – or risk falling behind
https://www.mckinsey.com/~/media/mckinsey/industries/financial%20services/our%20insights/european%20insurance%20practice%20report%20on%20internet%20of%20things/mckinsey%20-%20insurers%20need%20to%20plug%20into%20the%20internet%20of%20things%20or%20risk%20falling%20behind.ashx
---
Security in the Internet of Things
https://www.mckinsey.com/industries/semiconductors/our-insights/security-in-the-internet-of-things
---
Semiconductors
https://www.mckinsey.com/~/media/mckinsey/industries/semiconductors/our%20insights/mckinsey%20on%20semiconductors%20issue%206%20-%20spring%202017/mck%20on%20semiconductors_issue%206_2017.ashx
---
Internet of Things: Opportunities and challenges for semiconductor companies
https://www.mckinsey.com/industries/semiconductors/our-insights/internet-of-things-opportunities-and-challenges-for-semiconductor-companies
---
THE INTERNET OF THINGS: MAPPING THE VALUE BEYOND THE HYPE
https://www.mckinsey.com/~/media/mckinsey/business%20functions/mckinsey%20digital/our%20insights/the%20internet%20of%20things%20the%20value%20of%20digitizing%20the%20physical%20world/unlocking_the_potential_of_the_internet_of_things_full_report.ashx
---
A new Internet of Things platform and business | Digital McKinsey
https://www.mckinsey.com/business-functions/digital-mckinsey/how-we-help-clients/a-new-internet-of-things-platform-and-business
---
Video meets the Internet of Things
https://www.mckinsey.com/industries/high-tech/our-insights/video-meets-the-internet-of-things
---

为此,您可能必须使用ghost.py,因为url是由Javascript生成的。这太棒了!但是,我在DevTools中查找SearchAPI url时遇到问题。加载或初始页面,转到DevTools->Network->XHR并重新加载页面。谢谢!上面的例子只得到166个结果中的前10个。如何获取其余结果?请参阅参数中的“page”:“1”-如果使用“page”:“2”,可能会得到接下来的10个结果。我添加了for循环,该循环使用“page”:“1”和“page”:“2”加载。有数据['data']['totalPages'],您可以将其用于或同时使用