Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/regex/16.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/ssis/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python web爬虫的属性错误_Python_Regex_Python 3.x_Web Crawler_Urllib - Fatal编程技术网

Python web爬虫的属性错误

Python web爬虫的属性错误,python,regex,python-3.x,web-crawler,urllib,Python,Regex,Python 3.x,Web Crawler,Urllib,运行以下代码时: import urllib import re from urllib import request import webbrowser #email pattern r'[\w._(),:;<>]+@[\w._(),:;<>][.]\w+' # url pattern r'\w\w\w[.]\w+[.]\w+' html = urllib.request.urlopen('somelinkthatistoolongforstackoverflow

运行以下代码时:

import urllib
import re
from urllib import request
import webbrowser

#email pattern
r'[\w._(),:;<>]+@[\w._(),:;<>][.]\w+'

# url pattern
r'\w\w\w[.]\w+[.]\w+'

html = urllib.request.urlopen('somelinkthatistoolongforstackoverflow')

#find all websites

websites = re.findall(r'http://www[.]\w+[.]\w+',str(html.read()))
print(websites)

#find all emails

emails = re.findall(r'[\w._(),:;<>]+@[\w._(),:;<>][.]\w+',str(html.read()))
print(emails)

#sort through websites and find other links

for i in websites:
    y = urllib.request.urlopen(i)
    x = re.findall(r'http://www[.]\w+[.]\w+',str(y.read()))
    websites.append(x)
请注意AttributeError。我能做些什么?我正在使用urllib模块和regex(正则表达式)模块。这是在python 3.3.0中实现的。有人能帮我吗?如果你能帮助我,请张贴在下面。这是为了成为一个网络爬虫找到尽可能多的链接和电子邮件地址,我可以。感谢所有能提供帮助的人。

您想扩展
网站

websites.extend(x)
因为
x
本身就是一个列表

您当前附加了匹配网站的列表,因此在某个时候,您将该列表作为
i
for
lop传递到
urllib.request.urlopen()
,然后尝试将其视为
request
对象,因为它肯定不是字符串,所以是另一个有效选项。

请包含完整的回溯。
websites.extend(x)