Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/345.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
python csv模块读取按逗号分割的csv,但忽略双引号或单引号内的逗号_Python_Csv_Pandas_Comma - Fatal编程技术网

python csv模块读取按逗号分割的csv,但忽略双引号或单引号内的逗号

python csv模块读取按逗号分割的csv,但忽略双引号或单引号内的逗号,python,csv,pandas,comma,Python,Csv,Pandas,Comma,我有一个.csv文件,列值包含一些逗号。以下是一些例子: Header: ID Value Content Date 1 34 "market, business" 12/20/2013 2 15 "market, busin

我有一个.csv文件,列值包含一些逗号。以下是一些例子:

Header: ID     Value           Content                                            Date
        1      34             "market, business"                               12/20/2013
        2      15             "market, business", yesterday, metric            11/21/2014
        3      18             "market," business and yesterday                 10/20/2014
        4      19              yesterday, today,                               11/22/2014
这是.csv文件的格式,如果以升华文本打开该文件,它将以以下格式显示:

1, 34, "market, business", 12/20/2013
2, 15, "market, business", "yesterday, metric, 11/21/2014
3, 18, "market," business and yesterday, 10/20/2014
4, 19, yesterday, today, 11/22/2014
但我想要的是在python csv阅读器程序之后:

[1, 34, "market, business", 12/20/2013]
[2, 15, "market, business" "yesterday metric, 11/21/2014]
[3, 18, "market," business and yesterday, 10/20/2014]
[4, 19, yesterday today, 11/22/2014]
这些只是我拥有的示例数据,“内容”列是这里最头疼的问题,因为csv模块使用“,”作为分隔符

reader = csv.reader(f, skipinitialspace=True)
如果所有字符串都在一个双引号内,则它适用于第一行。但如果引号外有逗号(单引号或双引号),则不适用于第三行和第二行

我怎样才能解决这个问题?我现在只是在python中使用传统的csv模块,“熊猫”有能力解决这个问题吗

谢谢

我做了一些更新,我想我想要的是,在不同的地方指定逗号的方法。。。 现在我把它粘贴到这里似乎不合理,因为我无法在csv模块中找到区分字段中分隔符“,”和“,”的方法。即使是excel也不能

有什么想法吗?

如果我们可以假设

  • 每行以逗号分隔的两个整数开头
  • 每行以日期结尾,以逗号分隔
  • 剩下的(中间的)都属于第三列
然后可以通过以下方式解析数据:

data = list()
with open('data') as f:
    for line in f:
        parts = line.split(',', 2)
        parts[2:4] = parts[2].rsplit(',', 1)
        parts[:2] = map(int, parts[:2])
        parts[2:] = map(str.strip, parts[2:])
        data.append(parts)

for row in data:
    print(row)
屈服

[1, 34, '"market, business"', '12/20/2013']
[2, 15, '"market, business", "yesterday, metric', '11/21/2014']
[3, 18, '"market," business and yesterday', '10/20/2014']
[4, 19, 'yesterday, today', '11/22/2014']
   Id  Value                                 Content        Date
0   1     34                      "market, business"  12/20/2013
1   2     15  "market, business", "yesterday, metric  11/21/2014
2   3     18        "market," business and yesterday  10/20/2014
3   4     19                        yesterday, today  11/22/2014

然后,您可以创建如下数据帧:

import pandas as pd
df = pd.DataFrame(data, columns=['Id','Value','Content','Date'])
print(df)
屈服

[1, 34, '"market, business"', '12/20/2013']
[2, 15, '"market, business", "yesterday, metric', '11/21/2014']
[3, 18, '"market," business and yesterday', '10/20/2014']
[4, 19, 'yesterday, today', '11/22/2014']
   Id  Value                                 Content        Date
0   1     34                      "market, business"  12/20/2013
1   2     15  "market, business", "yesterday, metric  11/21/2014
2   3     18        "market," business and yesterday  10/20/2014
3   4     19                        yesterday, today  11/22/2014

看右边的“相关问题”列表。以下哪一项回答了您的问题?请发布您的csv示例和所需的数据帧。所需的Python列表将引发语法错误,因为有不匹配的引号和没有任何引号的字符串。请修复。如果所有记录只有4个字段(已修复),则存在一个小问题way@BhargavRao不幸的是没有。