Python 访问Jupyter书籍中的Github数据_Python_Pandas_Csv_Github

Python 访问Jupyter书籍中的Github数据

python pandas csv github

Python 访问Jupyter书籍中的Github数据,python,pandas,csv,github,Python,Pandas,Csv,Github,尝试访问Jupyter Books中的csv文件时出现标记化错误。查看了一些回复，但似乎没有任何帮助。任何帮助都将不胜感激。谢谢 url = "https://github.com/Kallikrates/bde_at2/blob/3875fd9b03b02b2772129acf2d8d83619971b2eb/2016Census_G01_NSW_LGA.csv" insert_df = pd.read_csv(url, header=0, sep=',', quotech

尝试访问Jupyter Books中的csv文件时出现标记化错误。查看了一些回复，但似乎没有任何帮助。任何帮助都将不胜感激。谢谢

url = "https://github.com/Kallikrates/bde_at2/blob/3875fd9b03b02b2772129acf2d8d83619971b2eb/2016Census_G01_NSW_LGA.csv"
insert_df = pd.read_csv(url, header=0, sep=',', quotechar='"')
insert_df.head()

错误：

---------------------------------------------------------------------------

ParserError                               Traceback (most recent call last)

<ipython-input-21-21c294baaa45> in <module>()
      1 url = "https://github.com/Kallikrates/bde_at2/blob/3875fd9b03b02b2772129acf2d8d83619971b2eb/2016Census_G01_NSW_LGA.csv"
----> 2 insert_df = pd.read_csv(url, header=0, sep=',', quotechar='"')
      3 insert_df.head()

3 frames

/usr/local/lib/python3.7/dist-packages/pandas/io/parsers.py in read(self, nrows)
   2155     def read(self, nrows=None):
   2156         try:
-> 2157             data = self._reader.read(nrows)
   2158         except StopIteration:
   2159             if self._first_chunk:

pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader.read()

pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._read_low_memory()

pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._read_rows()

pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._tokenize_rows()

pandas/_libs/parsers.pyx in pandas._libs.parsers.raise_parser_error()

ParserError: Error tokenizing data. C error: Expected 1 fields in line 79, saw 2

---------------------------------------------------------------------------
ParserError回溯（上次最近的调用）
在（）
1 url=”https://github.com/Kallikrates/bde_at2/blob/3875fd9b03b02b2772129acf2d8d83619971b2eb/2016Census_G01_NSW_LGA.csv"
---->2插入_df=pd.read _csv（url，头=0，sep='，'，quotechar='）
3插入方向图头部（）
3帧
/读取中的usr/local/lib/python3.7/dist-packages/pandas/io/parsers.py（self，nrows）
2155 def读取（自身，nrows=无）：
2156尝试：
->2157数据=自身读取（nrows）
2158除停止迭代外：
2159如果自我第一块：
pandas/_libs/parsers.pyx在pandas中。_libs.parsers.textleader.read（）
pandas/_libs/parsers.pyx在pandas中。_libs.parsers.TextReader._read_low_memory（）
pandas/_libs/parsers.pyx在pandas中。_libs.parsers.TextReader._read_rows（）
pandas/_libs/parsers.pyx在pandas中。_libs.parsers.TextReader。_tokenize_rows（）
pandas/_libs/parsers.pyx在pandas中。_libs.parsers.raise_parser_error（）
ParserError:将数据标记化时出错。C错误：第79行中应有1个字段，saw 2

两个选项：

1st:读取为html

url = "https://github.com/Kallikrates/bde_at2/blob/3875fd9b03b02b2772129acf2d8d83619971b2eb/2016Census_G01_NSW_LGA.csv"
insert_df = pd.read_html(url)
insert_df[0].head(2)

第二次读取为原始，观察其中的URL，“原始”

输出：

尝试

insert\u df=pd。阅读html（url）

结果将是列表，因此您的数据集将是

insert\u df[0]。head（）

太棒了。谢谢@simpleApp！

url="https://raw.githubusercontent.com/Kallikrates/bde_at2/3875fd9b03b02b2772129acf2d8d83619971b2eb/2016Census_G01_NSW_LGA.csv"
insert_df_raw = pd.read_csv(url, header=0, sep=',', quotechar='"')
insert_df_raw.head(2)