Python 3.x 合法化web请求，以便服务器允许请求通过_Python 3.x

Python 3.x 合法化web请求，以便服务器允许请求通过

python-3.x

Python 3.x 合法化web请求，以便服务器允许请求通过,python-3.x,Python 3.x,我一直在尝试运行以下代码，但它一直在生成HTTP错误502。我认为这个错误的原因是因为网站知道一个程序试图从中获取信息。因此，它不允许该请求。有没有办法欺骗服务器，使其认为这是一个合法的web请求？我尝试过添加标题，但仍然不起作用 import urllib.request # Function: Convert information within html document to a text file # Append information to the file def html_

我一直在尝试运行以下代码，但它一直在生成HTTP错误502。我认为这个错误的原因是因为网站知道一个程序试图从中获取信息。因此，它不允许该请求。有没有办法欺骗服务器，使其认为这是一个合法的web请求？我尝试过添加标题，但仍然不起作用

import urllib.request


# Function: Convert information within html document to a text file
# Append information to the file
def html_to_text(source_html, target_file):

    opener = urllib.request.build_opener()
    opener.addheaders = [('User-agent', 'Mozilla/5.0')]
    print(source_html)
    r = opener.open(source_html)
    response = r.read()
    print(response)
    temp_file = open(target_file, 'w+')
    temp_file.write(response.__str__())


source_address = "https://sg.finance.yahoo.com/lookup/all?s=*&t=A&m=SG&r=&b=0"
target_location = "C:\\Users\\Admin\\PycharmProjects\\TheLastPuff\\Source\\yahoo_ticker_symbols.txt"

html_to_text(source_address, target_location)

我对守则作了一些修改，并达到了我的要求

import urllib.request
import gzip


# Function: Convert information within html document to a text file
# Append information to the file
def html_to_text(source_html, target_file):

    opener = urllib.request.build_opener()
    # These headers are necessary to ensure that the website thinks that a browser is retrieving information
    # not a program.
    opener.addheaders = [('User-agent', 'Mozilla/5.0 (Windows NT 10.0; WOW64; rv:43.0) Gecko/20100101 Firefox/43.0'),
                         ('Connection', 'keep-alive'),
                         ('Accept-encoding', 'gzip, deflate'),
                         ('Accept-language', 'en-US,en;q=0.5'),
                         ('Accept', 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8'),
                         ('Host', 'sg,finance.yahoo.com'), ]
    r = opener.open(source_html)

    # Check from the "Response Headers" in Firebug whether the content is encoded
    # Since the content is encoded in gzip format, decompression is necessary
    response = gzip.decompress(r.read())

    # The response headers would mention the "charset" from there the encoding type can be obtained
    response = response.decode(encoding='utf-8')
    print(response)
    temp_file = open(target_file, 'w+')
    temp_file.write(response)


source_address = "https://sg.finance.yahoo.com/lookup/all?s=*&t=A&m=SG&r=&b=0"
target_location = "C:\\Users\\Admin\\PycharmProjects\\TheLastPuff\\Source\\yahoo_ticker_symbols.txt"

html_to_text(source_address, target_location)

它对我有用。你向他们发送了多少请求？他们可能会将您的实验检测为暴力或dos攻击，并将您请求的某些指纹列入黑名单。有没有办法欺骗服务器认为请求来自合法浏览器？是的。捕获浏览器发送的流量，并将标题值复制到python脚本中。谢谢！我设法弄明白了！