Python BeautifulSoup cURL无连接_Python_Xml_Curl_Beautifulsoup

Python BeautifulSoup cURL无连接

python xml curl

Python BeautifulSoup cURL无连接,python,xml,curl,beautifulsoup,Python,Xml,Curl,Beautifulsoup,我使用cURL通过代理连接到一个XML页面。由于某些原因，与该页面没有连接。解析不是问题，所以我不会将其包含在代码中 from bs4 import BeautifulSoup import time #added for curl import subprocess #added for curl import os #added for curl file_name = raw_input("Type the name of the new file you will create: ")

我使用cURL通过代理连接到一个XML页面。由于某些原因，与该页面没有连接。解析不是问题，所以我不会将其包含在代码中

from bs4 import BeautifulSoup
import time  #added for curl
import subprocess #added for curl
import os #added for curl
file_name = raw_input("Type the name of the new file you will create: ")
g = open(file_name+".txt",'w')
g.write("---XML Parse---\n")
curlURL= 'F:\Downloads\curl-7.31.0-rtmp-ssh2-ssl-sspi-zlib-idn-static-bin-w32\curl.exe'
with open("list.txt") as f: #file from which information will be read and used in link
    for line in f:
        g.write("\nPage ID: "+line.rstrip('\n')+"\n")
        link = "https://somewebsite.com/+line.rstrip('\n')"
        args = (curlURL+ ' -L ' +link+ ' -o c:\\temp.txt --proxy-ntlm -x http://myproxy:80 -k -U:') #using a proxy
        print args
        sp = subprocess.Popen(args) #run curl
        sp.wait() #Wait for it to finish before proceeding
        xml_string = open('C:/temp.txt', 'r').read() #read in the temporary file
        time.sleep(3)
        os.remove('C:/temp.txt') # clean up
        soup = BeautifulSoup(xml_string)
        result = soup.find('bibliographic-data')
        if result is not None:
            status = result['status']
            g.write("\nApplication Status: "+status+"\n")
            g.write("Most Recent Event Information: \n")

#...i go on to parse the document

我得到一个错误：

curl:(56) Received HTTP code 407 from proxy after CONNECT

知道我被拒绝访问的原因吗？

你试过了吗？我怀疑它可以很好地处理您的防火墙。我不希望在这一点上进行更改，只是解决当前错误的问题。除非您认为临时文件的位置与防火墙相关，否则我不应该像当前计算机上那样。不，tmp文件与防火墙无关。我想我会给你一个比运行cURL作为命令行工具和解决它带来的问题更简单的选择。我也会尝试这样做，但是现在你知道为什么这个临时文件没有被打开来读取XML吗？不；根据错误，没有temp.txt文件。