Python 将csv文件中的内容读取到文件失败_Python_Pandas

Python 将csv文件中的内容读取到文件失败

python pandas

Python 将csv文件中的内容读取到文件失败,python,pandas,Python,Pandas,我有一个csv文件，它是通过将Tableau表导出为csv生成的，但我无法用Python打开它我曾尝试使用pd.read_csv，但失败了 import pandas as pd #path to file path = "tableau_crosstab.csv" data = pd.read_csv(path, encoding="ISO-8859-1") 这可以在文件中读取，但结果只是有许多行，每行一个字符，并且在帧的头部有一些奇怪的字符 ÿþd o m a

我有一个csv文件，它是通过将Tableau表导出为csv生成的，但我无法用Python打开它

我曾尝试使用pd.read_csv，但失败了

import pandas as pd

#path to file
path = "tableau_crosstab.csv"

data = pd.read_csv(path, encoding="ISO-8859-1")

这可以在文件中读取，但结果只是有许多行，每行一个字符，并且在帧的头部有一些奇怪的字符

ÿþd
o    
m    
a    
i

等等。当我尝试在Excel中导入文件时，我必须选择tab作为分隔符，但当我在这里尝试时，它失败了

import pandas as pd

#path to file
path = "tableau_crosstab.csv"

data = pd.read_csv(path, encoding="ISO-8859-1", sep='\t')

CParserError:标记数据时出错。C错误：第7行中应为1个字段，saw 2

我确实尝试用编解码器打开文件，然后它说编码是“cp1252”，但使用它作为编码也失败了

我还尝试使用utf-8来阅读它，但也失败了。我已经没有办法解决这个问题了

这里有一个链接，指向如果文件是，如果有人可以查看的话，副本的位置

您有专门的unicode BOM

试一试

您看到的有趣字符：

ÿþ

对应于十六进制

FF FE

，这是unicode-16小尾端字节顺序标记。如果您看到wikipedia页面，它会显示所有不同的字节顺序标记

阅读您的csv时，我得到以下信息：

In[4]:
data = pd.read_csv(r'C:\tableau_crosstab.csv', encoding='utf-16', sep='\t')
data

Out[4]: 
       domain Month of date impressions clicks
0    test1.no        jun.17     725 676    633
1    test1.no        mai.17     422 995    456
2    test1.no        apr.17     241 102    316
3    test1.no        mar.17     295 157    260
4    test1.no        feb.17     122 902    198
5    test1.no        jan.17     137 972    201
6    test1.no        des.16     274 435    361
7   test2.com        jun.17   3 083 373  1 638
8   test2.com        mai.17   3 370 620  2 036
9   test2.com        apr.17   2 388 933  1 483
10  test2.com        mar.17   2 410 675  1 581
11  test2.com        feb.17   2 311 952  1 682
12  test2.com        jan.17   1 184 787    874
13  test2.com        des.16   2 118 594  1 738
14  test3.com        jun.17     411 456     41
15  test3.com        mai.17     342 048     87
16  test3.com        apr.17     197 058    108
17  test3.com        mar.17     288 949    156
18  test3.com        feb.17     230 970    130
19  test3.com        jan.17     388 032    115
20  test3.com        des.16   1 693 442    166
21   test4.no        jun.17     521 790    683
22   test4.no        mai.17     438 037    541
23   test4.no        apr.17     618 282  1 042
24   test4.no        mar.17     576 413    956
25   test4.no        feb.17     451 248    636
26   test4.no        jan.17     293 217    471
27   test4.no        des.16     641 491    978

这对我也有用。谢谢因此，通过查看ÿþ，您能够理解编码为“utf-16”？是的，如果您查看wikipedia页面：您将看到十六进制值和显示的字符，您会习惯于看到它们并在一段时间后识别它们

In[4]:
data = pd.read_csv(r'C:\tableau_crosstab.csv', encoding='utf-16', sep='\t')
data

Out[4]: 
       domain Month of date impressions clicks
0    test1.no        jun.17     725 676    633
1    test1.no        mai.17     422 995    456
2    test1.no        apr.17     241 102    316
3    test1.no        mar.17     295 157    260
4    test1.no        feb.17     122 902    198
5    test1.no        jan.17     137 972    201
6    test1.no        des.16     274 435    361
7   test2.com        jun.17   3 083 373  1 638
8   test2.com        mai.17   3 370 620  2 036
9   test2.com        apr.17   2 388 933  1 483
10  test2.com        mar.17   2 410 675  1 581
11  test2.com        feb.17   2 311 952  1 682
12  test2.com        jan.17   1 184 787    874
13  test2.com        des.16   2 118 594  1 738
14  test3.com        jun.17     411 456     41
15  test3.com        mai.17     342 048     87
16  test3.com        apr.17     197 058    108
17  test3.com        mar.17     288 949    156
18  test3.com        feb.17     230 970    130
19  test3.com        jan.17     388 032    115
20  test3.com        des.16   1 693 442    166
21   test4.no        jun.17     521 790    683
22   test4.no        mai.17     438 037    541
23   test4.no        apr.17     618 282  1 042
24   test4.no        mar.17     576 413    956
25   test4.no        feb.17     451 248    636
26   test4.no        jan.17     293 217    471
27   test4.no        des.16     641 491    978