Python 获取URL的源代码
我有以下代码:Python 获取URL的源代码,python,Python,我有以下代码: import urllib2 from itertools import product with open('urllist.txt') as urllist: urls=[line.strip() for line in urllist] for url in product(urls): usock = urllib2.urlopen(url) data = usock.read() usock.close() sourcecod
import urllib2
from itertools import product
with open('urllist.txt') as urllist:
urls=[line.strip() for line in urllist]
for url in product(urls):
usock = urllib2.urlopen(url)
data = usock.read()
usock.close()
sourcecode=open('./sourcecode', 'w+')
sourcecode.write(data)
当我运行它时,它给出:
Traceback (most recent call last):
File "12.py", line 8, in <module>
usock = urllib2.urlopen(url)
File "/opt/python2.7.1/lib/python2.7/urllib2.py", line 126, in urlopen
return _opener.open(url, data, timeout)
File "/opt/python2.7.1/lib/python2.7/urllib2.py", line 383, in open
req.timeout = timeout
AttributeError: 'tuple' object has no attribute 'timeout'
回溯(最近一次呼叫最后一次):
文件“12.py”,第8行,在
usock=urllib2.urlopen(url)
文件“/opt/python2.7.1/lib/python2.7/urllib2.py”,urlopen中的第126行
return\u opener.open(url、数据、超时)
文件“/opt/python2.7.1/lib/python2.7/urllib2.py”,第383行,打开
请求超时=超时
AttributeError:“tuple”对象没有属性“timeout”
知道怎么修吗?非常感谢 返回元组而不是项本身:
>>> from itertools import product
>>> lis = ['a','b','c']
>>> for p in product(lis):
... print p
...
('a',)
('b',)
('c',)
使用简单的URL循环:
for url in urls:
usock = urllib2.urlopen(url)
您打算通过使用
product
实现什么?我想从url列表中获取源代码。url
看起来怎么样?谢谢!我想出了另一种方法。只需将“产品中的url(url)”更改为“url中的url:@Tom我在回答中已经提到了这一点。