Python Web抓取(使用jupyter时出错)

Python Web抓取(使用jupyter时出错),python,jupyter-notebook,Python,Jupyter Notebook,这是我第一次使用python以及所有相关的软件包和工具 这是密码 import pandas as pd # pass in column names for each CSV u_cols = ['user_id', 'age', 'sex', 'occupation', 'zip_code'] users = pd.read_csv( 'http://files.grouplens.org/datasets/movielens/ml-100k/u.user',

这是我第一次使用python以及所有相关的软件包和工具 这是密码

    import pandas as pd

# pass in column names for each CSV
u_cols = ['user_id', 'age', 'sex', 'occupation', 'zip_code']

users = pd.read_csv(
    'http://files.grouplens.org/datasets/movielens/ml-100k/u.user', 
    sep='|', names=u_cols)

users.head()
我在使用jupyter执行代码时只得到错误

URLErrorTraceback (most recent call last)
<ipython-input-4-cd2489d7386f> in <module>()
      6 users = pd.read_csv(
      7     'http://files.grouplens.org/datasets/movielens/ml-100k/u.user',
----> 8     sep='|', names=u_cols)
      9 
     10 users.head()

/opt/conda/envs/python2/lib/python2.7/site-packages/pandas/io/parsers.pyc in parser_f(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, skipfooter, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, escapechar, comment, encoding, dialect, tupleize_cols, error_bad_lines, warn_bad_lines, skip_footer, doublequote, delim_whitespace, as_recarray, compact_ints, use_unsigned, low_memory, buffer_lines, memory_map, float_precision)
    560                     skip_blank_lines=skip_blank_lines)
    561 
--> 562         return _read(filepath_or_buffer, kwds)
    563 
    564     parser_f.__name__ = name

/opt/conda/envs/python2/lib/python2.7/site-packages/pandas/io/parsers.pyc in _read(filepath_or_buffer, kwds)
    299     filepath_or_buffer, _, compression = get_filepath_or_buffer(
    300         filepath_or_buffer, encoding,
--> 301         compression=kwds.get('compression', None))
    302     kwds['compression'] = (inferred_compression if compression == 'infer'
    303                            else compression)

/opt/conda/envs/python2/lib/python2.7/site-packages/pandas/io/common.pyc in get_filepath_or_buffer(filepath_or_buffer, encoding, compression)
    306 
    307     if _is_url(filepath_or_buffer):
--> 308         req = _urlopen(str(filepath_or_buffer))
    309         if compression == 'infer':
    310             content_encoding = req.headers.get('Content-Encoding', None)

/opt/conda/envs/python2/lib/python2.7/urllib2.pyc in urlopen(url, data, timeout, cafile, capath, cadefault, context)
    152     else:
    153         opener = _opener
--> 154     return opener.open(url, data, timeout)
    155 
    156 def install_opener(opener):

/opt/conda/envs/python2/lib/python2.7/urllib2.pyc in open(self, fullurl, data, timeout)
    427             req = meth(req)
    428 
--> 429         response = self._open(req, data)
    430 
    431         # post-process response

/opt/conda/envs/python2/lib/python2.7/urllib2.pyc in _open(self, req, data)
    445         protocol = req.get_type()
    446         result = self._call_chain(self.handle_open, protocol, protocol +
--> 447                                   '_open', req)
    448         if result:
    449             return result

/opt/conda/envs/python2/lib/python2.7/urllib2.pyc in _call_chain(self, chain, kind, meth_name, *args)
    405             func = getattr(handler, meth_name)
    406 
--> 407             result = func(*args)
    408             if result is not None:
    409                 return result

/opt/conda/envs/python2/lib/python2.7/urllib2.pyc in http_open(self, req)
   1226 
   1227     def http_open(self, req):
-> 1228         return self.do_open(httplib.HTTPConnection, req)
   1229 
   1230     http_request = AbstractHTTPHandler.do_request_

/opt/conda/envs/python2/lib/python2.7/urllib2.pyc in do_open(self, http_class, req, **http_conn_args)
   1196         except socket.error, err: # XXX what error?
   1197             h.close()
-> 1198             raise URLError(err)
   1199         else:
   1200             try:

URLError: <urlopen error [Errno -2] Name or service not known>
URLErrorTraceback(最近一次调用上次)
在()
6个用户=pd.read\U csv(
7     'http://files.grouplens.org/datasets/movielens/ml-100k/u.user',
---->9月8日‘|’,姓名=u|cols)
9
10个用户。head()
/解析器中的opt/conda/envs/python2/lib/python2.7/site-packages/pandas/io/parsers.pyc(文件路径或缓冲区、sep、分隔符、标题、名称、索引列、usecols、挤压、前缀、重复、数据类型、引擎、转换器、真值、假值、skipinitialspace、SkipRous、skipfooter、nrows、na值、保留默认值、na过滤器、冗余、跳过空白行、解析日期、推断日期时间格式、保留日期列、日期分析器、dayfirst、i畸胎体、chunksize、压缩、千、十进制、行终止符、引号、转义、注释、编码、方言、元组、错误行、警告行、跳过页脚、双引号、delim空格、as-recarray、compact-int、使用无符号、低内存、缓冲行、内存映射、浮点精度)
560跳过空白行=跳过空白行)
561
-->562返回读取(文件路径或缓冲区,kwds)
563
564解析器名称
/opt/conda/envs/python2/lib/python2.7/site-packages/pandas/io/parsers.pyc in_read(文件路径或缓冲区,kwds)
299文件路径\或\缓冲区,\压缩=获取\文件路径\或\缓冲区(
300文件路径\u或\u缓冲区,编码,
-->301压缩=kwds.get('compression',None))
302 kwds['compression']=(如果压缩=='experre',则推断压缩)
303(压缩)
/get_filepath_或_buffer中的opt/conda/envs/python2/lib/python2.7/site-packages/pandas/io/common.pyc(filepath_或_buffer,编码,压缩)
306
307如果_是_url(文件路径或缓冲区):
-->308请求=_urlopen(str(文件路径或缓冲区))
309如果压缩==“推断”:
310 content_encoding=req.headers.get('content-encoding',无)
/urlopen中的opt/conda/envs/python2/lib/python2.7/urllib2.pyc(url、数据、超时、cafile、capath、cadefault、上下文)
152.其他:
153开瓶器=_开瓶器
-->154返回opener.open(url、数据、超时)
155
156 def安装开启器(开启器):
/opt/conda/envs/python2/lib/python2.7/urllib2.pyc处于打开状态(self、fullurl、数据、超时)
427请求=甲基(请求)
428
-->429响应=自身打开(请求,数据)
430
431#后处理响应
/opt/conda/envs/python2/lib/python2.7/urllib2.pyc处于打开状态(self、req、data)
445协议=请求获取类型()
446结果=self.\u调用\u链(self.handle\u打开,协议,协议+
-->447'_open',请求)
448如果结果:
449返回结果
/opt/conda/envs/python2/lib/python2.7/urllib2.pyc in_call_chain(self、chain、kind、meth_name、*args)
405 func=getattr(处理程序,方法名称)
406
-->407结果=函数(*args)
408如果结果不是无:
409返回结果
/http_open(self,req)中的opt/conda/envs/python2/lib/python2.7/urllib2.pyc
1226
1227 def http_打开(自身,请求):
->1228返回self.do_open(httplib.HTTPConnection,请求)
1229
1230 http_请求=AbstractHTTPHandler.do_请求_
/do_open中的opt/conda/envs/python2/lib/python2.7/urllib2.pyc(self,http_类,req,**http_conn_参数)
1196除了socket.error,err:#XXX什么错误?
1197 h.关闭()
->1198 raise URLRERROR(错误)
1199其他:
1200尝试:
URL错误:

根据讲座,结果应该是。

看起来像是网络问题(检查互联网连接)。代码对我来说运行良好:

>>> users.head()
   user_id  age sex  occupation zip_code
0        1   24   M  technician    85711
1        2   53   F       other    94043
2        3   23   M      writer    32067
3        4   24   M  technician    43537
4        5   33   F       other    15213

尝试在浏览器中打开url,以检查是否可以从您的计算机加载它()。

看起来像是网络问题(检查Internet连接)。代码对我来说运行良好:

>>> users.head()
   user_id  age sex  occupation zip_code
0        1   24   M  technician    85711
1        2   53   F       other    94043
2        3   23   M      writer    32067
3        4   24   M  technician    43537
4        5   33   F       other    15213

尝试在浏览器中打开url,以检查是否可以从您的计算机加载它()。

Jupyter笔记本具有
导入熊猫为pd
。你错过了吗?(错误表明,
pd
尚未定义。)您的代码在Jupyter笔记本中为我工作。我同意@smarx的说法,即您缺少熊猫(尽管Anaconda的安装应该包括这一点)。尝试在终端中执行
conda install pandas
,以安装pandas Jupyter笔记本电脑已将pandas导入为pd。你错过了吗?(错误表明,
pd
尚未定义。)您的代码在Jupyter笔记本中为我工作。我同意@smarx的说法,即您缺少熊猫(尽管Anaconda的安装应该包括这一点)。尝试在终端中执行
conda install pandas
,以安装pandasI使用的是在线版本,我想这是问题所在。谢谢你们的时间和建议。@Star_89 Anaconda是我推荐的()。它易于安装,不受其他软件包/等的限制,具有多种环境和其他优点。我使用的是在线版本,我想这是问题所在。谢谢你们的时间和建议。@Star_89 Anaconda是我推荐的()。它易于安装,不受其他软件包/等、多种环境和其他好处的限制。