Python//请求//ASP.net//没有访问权限_Python_Web Scraping_Python Requests

Python//请求//ASP.net//没有访问权限

python web-scraping

Python//请求//ASP.net//没有访问权限,python,web-scraping,python-requests,Python,Web Scraping,Python Requests,我还在学这个。但我第一次看到，当我在Python中使用请求模块时，网站给了我反馈，我并没有访问权限。我的代码应该只从站点获取数据，仅此而已 import requests from bs4 import BeautifulSoup url_siemens_part = "https://mall.industry.siemens.com/mall/en/WW/Catalog/Product/5SY6310-7" with requests.session() as sr

我还在学这个。但我第一次看到，当我在Python中使用请求模块时，网站给了我反馈，我并没有访问权限。我的代码应该只从站点获取数据，仅此而已

import requests
from bs4 import BeautifulSoup

url_siemens_part = "https://mall.industry.siemens.com/mall/en/WW/Catalog/Product/5SY6310-7"

with requests.session() as sr:
    partUrl = sr.get(url_siemens_part)
    soup = BeautifulSoup(partUrl.content,'html.parser')
    print(soup)

我从中得到的答案是：

<html><head>
<title>Access Denied</title>
</head><body>
<h1>Access Denied</h1>
 
You don't have permission to access "http://mall.industry.siemens.com/mall/en/WW/Catalog/Product/5SY6310-7" on this server.<p>
Reference #18.36d61202.1596089808.1cc0ef55
</p></body>
</html>


拒绝访问
拒绝访问
您没有访问“”的权限http://mall.industry.siemens.com/mall/en/WW/Catalog/Product/5SY6310-7“在此服务器上。
参考18.36d61202.1596089808.1cc0ef55

网站正在使用ASP.net。chromebrowser中的站点可见，但请求中的站点不可见

你能给我指个路吗？身份验证有问题吗？可能是我必须使用的.ASPXAUTH或ASP.NET_SessionId

提前感谢您的时间和所有帮助。

使用自定义

用户代理

HTTP头以获得正确的响应：

import requests
from bs4 import BeautifulSoup

url_siemens_part = "https://mall.industry.siemens.com/mall/en/WW/Catalog/Product/5SY6310-7"

with requests.session() as sr:
    sr.headers.update({'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:78.0) Gecko/20100101 Firefox/78.0'})
    partUrl = sr.get(url_siemens_part)
    soup = BeautifulSoup(partUrl.content,'html.parser')
    print(soup)

印刷品：

<!DOCTYPE html>

<html>
<head>
<meta charset="utf-8"/>
<meta content="IE=10" http-equiv="X-UA-Compatible"/>


... and so on.


... 等等

您可以使用它。如果没有lib，可以先安装<代码>pip安装请求html

import requests
from bs4 import BeautifulSoup
from requests_html import HTMLSession
url_siemens_part = "https://mall.industry.siemens.com/mall/en/WW/Catalog/Product/5SY6310-7"
sr = HTMLSession()
partUrl = sr.get(url_siemens_part)
soup = BeautifulSoup(partUrl.content,'html.parser')
print(soup)

登录后一切都好：）可以下载所有数据，但当我有下面这样的问题

price_catalog = soup.find_all("td",class_="priceDetailsListPrice")

喝汤后需要找到一些值，写为find_all“td”

我得到输出：

[<td class="priceDetailsListPrice">244,86 EUR
</td>]

我觉得用“for”来表示单个值太多了：（

哦，上帝：它在工作。但现在有问题了。当我运行脚本时，这个标题会更新，告诉服务器我们是“浏览器”不是脚本？我很了解？@ArturY是的，它发送到服务器

用户代理

相当于标准浏览器。一些网站有保护措施，不向机器人发送响应等。感谢您的帮助：）现在我可以阅读更多关于此的信息，并尝试更多：）祝您有一个愉快的一天！所以，也许你们还有一个问题，也许你们有一些电子书或文档，可以帮助你们尝试用ASP.net和SSH登录网站。因为一些数据在登录之后，并且考虑应该在使用此ASP.net向站点“发布”时发送哪些信息。这比获取phpsession_id问题更大。我在请求/响应cookie中看到信息。一些授权等。我正在尽我所能地阅读，以了解它的工作原理：o:）@ArturY获取ASP.net网站的信息/登录信息总是很棘手。您可以打开Firefox（Chrome）开发者工具->网络选项卡，观察登录时发送到服务器的信息。但是Cookie、自定义http头总是有问题。。。

for price_catalog in price_catalog:
    output = price_catalog.text