Python中数据集的组织_Python_Csv_Dictionary

Python中数据集的组织

python csv dictionary

Python中数据集的组织,python,csv,dictionary,Python,Csv,Dictionary,我有一个包含大量习惯用法的.csv数据集。每一行包含三个我想分开的元素（用逗号分隔）： 1）索引编号（0,1,2,3…） 2）成语本身 3）如果习语是肯定的/否定的/中性的下面是.csv文件的一个小示例： 0,"I did touch them one time you see but of course there was nothing doing, he wanted me.",neutral 1,We find that choice theorists admit that

我有一个包含大量习惯用法的.csv数据集。每一行包含三个我想分开的元素（用逗号分隔）：

1）索引编号（0,1,2,3…）

2）成语本身

3）如果习语是肯定的/否定的/中性的

下面是.csv文件的一个小示例：

0,"I did touch them one time you see but of course there was nothing doing, he wanted me.",neutral

1,We find that choice theorists admit that they introduce a style of moral paternalism at odds with liberal values.,neutral

2,"Well, here I am with an olive branch.",positive

3,"Its rudder and fin were both knocked out, and a four-foot-long gash in the shell meant even repairs on the bank were out of the question.",negative

正如你所看到的，有时习语会包含引号，而有时则不会。然而，我认为这并不难分类

我认为在Python中组织这一点的最好方法是通过字典，如下所示：

example_dict = {0: ['This is an idiom.', 'neutral']}

那么，如何将每一行拆分为三个不同的字符串（基于逗号），然后将第一个字符串用作键号，最后两个作为dict中相应的列表项

我最初的想法是尝试用以下代码拆分逗号：

for line in file:    
    new_item = ','.join(line.split(',')[1:])

但它所做的只是删除所有内容，直到一行中的第一个逗号，我不认为通过它进行一系列迭代是有效的

我想得到一些关于这样组织数据的最佳方法的建议。

Python专门致力于处理

csv

文件。在本例中，您可以使用它从文件中创建列表列表。现在让我们调用您的文件

idioms.csv

：

import csv
with open('idioms.csv', newline='') as idioms_file:
    reader = csv.reader(idioms_file, delimiter=',', quotechar='"')
    idioms_list = [line for line in reader]

# Now you have a list that looks like this:
# [[0, "I did touch them...", "neutral"],
#  [1, "We find that choice...", "neutral"],
#  ...
# ]

现在，您可以根据自己的喜好对数据进行排序或组织