python定时器+;urllib2代码错误
我试图每5秒钟从一个站点中提取信息,但它似乎不起作用,每次运行时都会出错 代码如下:python定时器+;urllib2代码错误,python,Python,我试图每5秒钟从一个站点中提取信息,但它似乎不起作用,每次运行时都会出错 代码如下: import urllib2, threading def readpage(): data = urllib2.urlopen('http://forums.zybez.net/runescape-2007-prices').read() for line in data: if 'forums.zybez.net/runescape-2007-prices/player/'
import urllib2, threading
def readpage():
data = urllib2.urlopen('http://forums.zybez.net/runescape-2007-prices').read()
for line in data:
if 'forums.zybez.net/runescape-2007-prices/player/' in line:
a = line.split('/runescape-2007-prices/player/'[1])
print(a.split('">')[0])
t = threading.Timer(5.0, readpage)
t.start()
我发现以下错误:
Exception in thread Thread-1:
Traceback (most recent call last):
File "C:\Python27\lib\threading.py", line 808, in __bootstrap_inner
self.run()
File "C:\Python27\lib\threading.py", line 1080, in run
self.function(*self.args, **self.kwargs)
File "C:\Users\Jordan\Desktop\username.py", line 3, in readpage
data = urllib2.urlopen('http://forums.zybez.net/runescape-2007-prices').rea
()
File "C:\Python27\lib\urllib2.py", line 127, in urlopen
return _opener.open(url, data, timeout)
File "C:\Python27\lib\urllib2.py", line 410, in open
response = meth(req, response)
File "C:\Python27\lib\urllib2.py", line 523, in http_response
'http', request, response, code, msg, hdrs)
File "C:\Python27\lib\urllib2.py", line 448, in error
return self._call_chain(*args)
File "C:\Python27\lib\urllib2.py", line 382, in _call_chain
result = func(*args)
File "C:\Python27\lib\urllib2.py", line 531, in http_error_default
raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
HTTPError: HTTP Error 403: Forbidden
非常感谢您的帮助,谢谢 您是否尝试在没有线程的情况下打开该url?错误代码为403:禁止,可能您需要对该网页进行身份验证。这与Python无关——服务器拒绝您对该URL的请求 我怀疑要么URL不正确,要么你达到了某种速率限制,被阻止了 编辑:如何让它工作 该站点正在阻止Python的
用户代理
。试试这个:
import urllib2, threading
def readpage():
headers = { 'User-Agent' : 'Mozilla/5.0' }
req = urllib2.Request('http://forums.zybez.net/runescape-2007-prices', None, headers)
data = urllib2.urlopen(req).read()
for line in data:
if 'forums.zybez.net/runescape-2007-prices/player/' in line:
a = line.split('/runescape-2007-prices/player/'[1])
print(a.split('">')[0])
站点拒绝urllib2报告的默认用户代理。您可以使用install_opener为脚本中的所有请求更改它
opener = urllib2.build_opener()
opener.addheaders = [('User-agent', 'Mozilla/5.0 (X11; Linux i686; rv:5.0) Gecko/20100101 Firefox/5.0')]
urllib2.install_opener(opener)
您还需要按站点拆分数据,以便逐行读取
urllib2.urlopen('http://forums.zybez.net/runescape-2007-prices').read().splitlines()
改变
line.split('/runescape-2007-prices/player/'[1])
到
工作:
import urllib2, threading
opener = urllib2.build_opener()
opener.addheaders = [('User-agent', 'Mozilla/5.0 (X11; Linux i686; rv:5.0) Gecko/20100101 Firefox/5.0')]
urllib2.install_opener(opener)
def readpage():
data = urllib2.urlopen('http://forums.zybez.net/runescape-2007-prices').read().splitlines()
for line in data:
if 'forums.zybez.net/runescape-2007-prices/player/' in line:
a = line.split('/runescape-2007-prices/player/')[1]
print(a.split('">')[0])
t = threading.Timer(5.0, readpage)
t.start()
如果你是这个意思的话,我可以继续使用我的浏览器。这很奇怪,因为我可以继续使用我的浏览器。似乎不再抛出任何错误,尽管它似乎根本没有打印出任何行。我也尝试过只打印每一行而不拆分它或任何东西,但结果仍然相同。如果我打印返回的数据,对我来说它看起来像一个有效的网站。您确定要获取的内容在那里吗?是的,请检查页面源代码。加载数据也可以通过以下一行实现:
data=urllib2.urlopen(urllib2.Request('http://forums.zybez.net/runescape-2007-prices,headers={'User-Agent':'Mozilla/5.0(X11;U;Linux i686)Gecko/20071127 Firefox/2.0.0.11'})
谢谢,尽管我在尝试打印a.split(“>”)[0]时出错,因为它显然是一个列表。我还尝试使它每5秒循环一次并打印新数据,但我得到了线程。错误:“无法启动新线程”
import urllib2, threading
opener = urllib2.build_opener()
opener.addheaders = [('User-agent', 'Mozilla/5.0 (X11; Linux i686; rv:5.0) Gecko/20100101 Firefox/5.0')]
urllib2.install_opener(opener)
def readpage():
data = urllib2.urlopen('http://forums.zybez.net/runescape-2007-prices').read().splitlines()
for line in data:
if 'forums.zybez.net/runescape-2007-prices/player/' in line:
a = line.split('/runescape-2007-prices/player/')[1]
print(a.split('">')[0])
t = threading.Timer(5.0, readpage)
t.start()