Python Web抓取(使用jupyter时出错)
这是我第一次使用python以及所有相关的软件包和工具 这是密码Python Web抓取(使用jupyter时出错),python,jupyter-notebook,Python,Jupyter Notebook,这是我第一次使用python以及所有相关的软件包和工具 这是密码 import pandas as pd # pass in column names for each CSV u_cols = ['user_id', 'age', 'sex', 'occupation', 'zip_code'] users = pd.read_csv( 'http://files.grouplens.org/datasets/movielens/ml-100k/u.user',
import pandas as pd
# pass in column names for each CSV
u_cols = ['user_id', 'age', 'sex', 'occupation', 'zip_code']
users = pd.read_csv(
'http://files.grouplens.org/datasets/movielens/ml-100k/u.user',
sep='|', names=u_cols)
users.head()
我在使用jupyter执行代码时只得到错误
URLErrorTraceback (most recent call last)
<ipython-input-4-cd2489d7386f> in <module>()
6 users = pd.read_csv(
7 'http://files.grouplens.org/datasets/movielens/ml-100k/u.user',
----> 8 sep='|', names=u_cols)
9
10 users.head()
/opt/conda/envs/python2/lib/python2.7/site-packages/pandas/io/parsers.pyc in parser_f(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, skipfooter, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, escapechar, comment, encoding, dialect, tupleize_cols, error_bad_lines, warn_bad_lines, skip_footer, doublequote, delim_whitespace, as_recarray, compact_ints, use_unsigned, low_memory, buffer_lines, memory_map, float_precision)
560 skip_blank_lines=skip_blank_lines)
561
--> 562 return _read(filepath_or_buffer, kwds)
563
564 parser_f.__name__ = name
/opt/conda/envs/python2/lib/python2.7/site-packages/pandas/io/parsers.pyc in _read(filepath_or_buffer, kwds)
299 filepath_or_buffer, _, compression = get_filepath_or_buffer(
300 filepath_or_buffer, encoding,
--> 301 compression=kwds.get('compression', None))
302 kwds['compression'] = (inferred_compression if compression == 'infer'
303 else compression)
/opt/conda/envs/python2/lib/python2.7/site-packages/pandas/io/common.pyc in get_filepath_or_buffer(filepath_or_buffer, encoding, compression)
306
307 if _is_url(filepath_or_buffer):
--> 308 req = _urlopen(str(filepath_or_buffer))
309 if compression == 'infer':
310 content_encoding = req.headers.get('Content-Encoding', None)
/opt/conda/envs/python2/lib/python2.7/urllib2.pyc in urlopen(url, data, timeout, cafile, capath, cadefault, context)
152 else:
153 opener = _opener
--> 154 return opener.open(url, data, timeout)
155
156 def install_opener(opener):
/opt/conda/envs/python2/lib/python2.7/urllib2.pyc in open(self, fullurl, data, timeout)
427 req = meth(req)
428
--> 429 response = self._open(req, data)
430
431 # post-process response
/opt/conda/envs/python2/lib/python2.7/urllib2.pyc in _open(self, req, data)
445 protocol = req.get_type()
446 result = self._call_chain(self.handle_open, protocol, protocol +
--> 447 '_open', req)
448 if result:
449 return result
/opt/conda/envs/python2/lib/python2.7/urllib2.pyc in _call_chain(self, chain, kind, meth_name, *args)
405 func = getattr(handler, meth_name)
406
--> 407 result = func(*args)
408 if result is not None:
409 return result
/opt/conda/envs/python2/lib/python2.7/urllib2.pyc in http_open(self, req)
1226
1227 def http_open(self, req):
-> 1228 return self.do_open(httplib.HTTPConnection, req)
1229
1230 http_request = AbstractHTTPHandler.do_request_
/opt/conda/envs/python2/lib/python2.7/urllib2.pyc in do_open(self, http_class, req, **http_conn_args)
1196 except socket.error, err: # XXX what error?
1197 h.close()
-> 1198 raise URLError(err)
1199 else:
1200 try:
URLError: <urlopen error [Errno -2] Name or service not known>
URLErrorTraceback(最近一次调用上次)
在()
6个用户=pd.read\U csv(
7 'http://files.grouplens.org/datasets/movielens/ml-100k/u.user',
---->9月8日‘|’,姓名=u|cols)
9
10个用户。head()
/解析器中的opt/conda/envs/python2/lib/python2.7/site-packages/pandas/io/parsers.pyc(文件路径或缓冲区、sep、分隔符、标题、名称、索引列、usecols、挤压、前缀、重复、数据类型、引擎、转换器、真值、假值、skipinitialspace、SkipRous、skipfooter、nrows、na值、保留默认值、na过滤器、冗余、跳过空白行、解析日期、推断日期时间格式、保留日期列、日期分析器、dayfirst、i畸胎体、chunksize、压缩、千、十进制、行终止符、引号、转义、注释、编码、方言、元组、错误行、警告行、跳过页脚、双引号、delim空格、as-recarray、compact-int、使用无符号、低内存、缓冲行、内存映射、浮点精度)
560跳过空白行=跳过空白行)
561
-->562返回读取(文件路径或缓冲区,kwds)
563
564解析器名称
/opt/conda/envs/python2/lib/python2.7/site-packages/pandas/io/parsers.pyc in_read(文件路径或缓冲区,kwds)
299文件路径\或\缓冲区,\压缩=获取\文件路径\或\缓冲区(
300文件路径\u或\u缓冲区,编码,
-->301压缩=kwds.get('compression',None))
302 kwds['compression']=(如果压缩=='experre',则推断压缩)
303(压缩)
/get_filepath_或_buffer中的opt/conda/envs/python2/lib/python2.7/site-packages/pandas/io/common.pyc(filepath_或_buffer,编码,压缩)
306
307如果_是_url(文件路径或缓冲区):
-->308请求=_urlopen(str(文件路径或缓冲区))
309如果压缩==“推断”:
310 content_encoding=req.headers.get('content-encoding',无)
/urlopen中的opt/conda/envs/python2/lib/python2.7/urllib2.pyc(url、数据、超时、cafile、capath、cadefault、上下文)
152.其他:
153开瓶器=_开瓶器
-->154返回opener.open(url、数据、超时)
155
156 def安装开启器(开启器):
/opt/conda/envs/python2/lib/python2.7/urllib2.pyc处于打开状态(self、fullurl、数据、超时)
427请求=甲基(请求)
428
-->429响应=自身打开(请求,数据)
430
431#后处理响应
/opt/conda/envs/python2/lib/python2.7/urllib2.pyc处于打开状态(self、req、data)
445协议=请求获取类型()
446结果=self.\u调用\u链(self.handle\u打开,协议,协议+
-->447'_open',请求)
448如果结果:
449返回结果
/opt/conda/envs/python2/lib/python2.7/urllib2.pyc in_call_chain(self、chain、kind、meth_name、*args)
405 func=getattr(处理程序,方法名称)
406
-->407结果=函数(*args)
408如果结果不是无:
409返回结果
/http_open(self,req)中的opt/conda/envs/python2/lib/python2.7/urllib2.pyc
1226
1227 def http_打开(自身,请求):
->1228返回self.do_open(httplib.HTTPConnection,请求)
1229
1230 http_请求=AbstractHTTPHandler.do_请求_
/do_open中的opt/conda/envs/python2/lib/python2.7/urllib2.pyc(self,http_类,req,**http_conn_参数)
1196除了socket.error,err:#XXX什么错误?
1197 h.关闭()
->1198 raise URLRERROR(错误)
1199其他:
1200尝试:
URL错误:
根据讲座,结果应该是。看起来像是网络问题(检查互联网连接)。代码对我来说运行良好:
>>> users.head()
user_id age sex occupation zip_code
0 1 24 M technician 85711
1 2 53 F other 94043
2 3 23 M writer 32067
3 4 24 M technician 43537
4 5 33 F other 15213
尝试在浏览器中打开url,以检查是否可以从您的计算机加载它()。看起来像是网络问题(检查Internet连接)。代码对我来说运行良好:
>>> users.head()
user_id age sex occupation zip_code
0 1 24 M technician 85711
1 2 53 F other 94043
2 3 23 M writer 32067
3 4 24 M technician 43537
4 5 33 F other 15213
尝试在浏览器中打开url,以检查是否可以从您的计算机加载它()。Jupyter笔记本具有
导入熊猫为pd
。你错过了吗?(错误表明,pd
尚未定义。)您的代码在Jupyter笔记本中为我工作。我同意@smarx的说法,即您缺少熊猫(尽管Anaconda的安装应该包括这一点)。尝试在终端中执行conda install pandas
,以安装pandas Jupyter笔记本电脑已将pandas导入为pd。你错过了吗?(错误表明,pd
尚未定义。)您的代码在Jupyter笔记本中为我工作。我同意@smarx的说法,即您缺少熊猫(尽管Anaconda的安装应该包括这一点)。尝试在终端中执行conda install pandas
,以安装pandasI使用的是在线版本,我想这是问题所在。谢谢你们的时间和建议。@Star_89 Anaconda是我推荐的()。它易于安装,不受其他软件包/等的限制,具有多种环境和其他优点。我使用的是在线版本,我想这是问题所在。谢谢你们的时间和建议。@Star_89 Anaconda是我推荐的()。它易于安装,不受其他软件包/等、多种环境和其他好处的限制。