使用python将csv文件从垂直数据转换为水平数据_Python_Csv

使用python将csv文件从垂直数据转换为水平数据

python csv

使用python将csv文件从垂直数据转换为水平数据,python,csv,Python,Csv,我正在编写python脚本。我的大部分数据都记录在一个垂直模型中，我想把它们放在一个水平模型中这是我收集的数据的例子 ID,Identifier,Value 1_UK,City,Paris 1_UK,Number of the departments,75 1_UK,Department,Ile de France 1_UK,Habitant,12405426hab 2_UK,City,Ajaccio 2_UK,Number of the departments,2A 2_UK,Depart

我正在编写python脚本。我的大部分数据都记录在一个垂直模型中，我想把它们放在一个水平模型中

这是我收集的数据的例子

ID,Identifier,Value
1_UK,City,Paris
1_UK,Number of the departments,75
1_UK,Department,Ile de France
1_UK,Habitant,12405426hab
2_UK,City,Ajaccio
2_UK,Number of the departments,2A
2_UK,Department,Corse du Sud

下面是我想去的地方：

ID, City, Number of the departments, Department, Habitant
1_UK, Paris, 75, Ile de France, 12405426hab
2_UK, Ajaccio, 2A, Corse du sud,''

用Python读取CSV文件并不困难。我迷路的地方是我有4个标识符（城市、部门数量、部门和居住者） ID 2_UK没有“居住者”的值。我不知道如何在我的代码中表示这一点

import csv
csvfile = open ("Exercice1.csv",'r',encoding='utf-8')
IDs=[]
identifiers=[]
uniqueIDs=[]
uniqueidentifiers=[]
reader=csv.reader(csvfile)

for row in reader:    
    IDs.append(ID)
    identifiers.append(identifier)
csvfile.close()

#remove duplicate value and keep order as is it.
for i in IDs:
    if i not in uniqueIDs:
        uniqueIDs.append(i)

for i in identifiers:
    if i not in uniqueidentifiers:
        uniqueidentifiers.append(i)

然后我就迷路了功能zip似乎不能满足我的需要，或者我没有正确使用它

很高兴听你的建议

谢谢大家!

你可以做以下几点：

import csv

cities = {}
with open('Exercice1.csv', 'r') as f:
    reader = csv.DictReader(f)

    for d in reader:
        new_dict = {d['Identifier']: d['Value'], 'ID': d['ID']}
        try:
            cities[d['ID']] = {**cities[d['ID']], **new_dict}
        except KeyError:
            cities[d['ID']] = {**new_dict}

with open('output.csv', 'w') as f:
    field_names = ['ID', 'City', 'Number of the departments', 'Department', 'Habitant']
    writer = csv.DictWriter(f, fieldnames=field_names, lineterminator='\n', restval='')

    writer.writeheader()
    for k, v in cities.items():
        writer.writerow(v)

使用您的数据，我可以：

ID,City,Number of the departments,Department,Habitant
1_UK,Paris,75,Ile de France,12405426hab
2_UK,Ajaccio,2A,Corse du Sud,

csv.DictWriter

中的

restval

参数是在提供的dict没有来自

字段名称列表的键时插入到行中的参数。我只使用了一个空字符串，您可以用任何您喜欢的内容替换它。
使用pandas
很容易。您可以将.csv
文件导入数据帧df
，然后使用pivot
：
In [10]: d = df.pivot(index='ID', columns='Identifier', values='Value')

In [11]: d
Out[11]: 
Identifier     City     Department     Habitant Number of the departments
ID                                                                       
1_UK          Paris  Ile de France  12405426hab                        75
2_UK        Ajaccio   Corse du Sud         None                        2A

您需要包括一个完整的工作示例-您的示例代码无法运行。标头是否已修复，因为在中，输出CSV是否仅具有定义的列？另外，第一列中的值是否有序/不混合（即1_-UK
将永远不会出现在2_-UK
之后）？谢谢你们两位的评论。熊猫的使用使工作超高效！我喜欢。我的真实文件更复杂，根据列value或value2上的数据类型，我有两个值。我会调查一下，看看我是否能把这些放在一起。谢谢你的建议，我需要花更多的时间在我的脚本上复制你的解决方案。我会告诉你结果：）