用Python重新构造CSV文件

用Python重新构造CSV文件,python,csv,Python,Csv,我有一个csv文件,如下所示: Date Name Wage 5/1/19 Joe $100 5/1/19 Sam $120 5/1/19 Kate $30 5/2/19 Joe $120 5/2/19 Sam $134 5/2/19 Kate $56 5/3/19 Joe $89 5/3/19 Sam $90 5/3/19 Kate $231 Date Joe

我有一个csv文件,如下所示:

Date     Name    Wage
5/1/19   Joe     $100
5/1/19   Sam     $120
5/1/19   Kate    $30
5/2/19   Joe     $120
5/2/19   Sam     $134
5/2/19   Kate    $56
5/3/19   Joe     $89
5/3/19   Sam     $90
5/3/19   Kate    $231
Date      Joe    Sam    Kate
5/1/19    $100   $120   $30
5/2/19    $120   $134   $56
5/3/19    $89    $90    $231
我想把它重组成这样:

Date     Name    Wage
5/1/19   Joe     $100
5/1/19   Sam     $120
5/1/19   Kate    $30
5/2/19   Joe     $120
5/2/19   Sam     $134
5/2/19   Kate    $56
5/3/19   Joe     $89
5/3/19   Sam     $90
5/3/19   Kate    $231
Date      Joe    Sam    Kate
5/1/19    $100   $120   $30
5/2/19    $120   $134   $56
5/3/19    $89    $90    $231
我不知道该怎么做。 以下是我开始写的内容:

import csv

with open ('myfile.csv', 'rb') as filein, open ('restructured.csv', 'wb') as fileout:
  rows = list(csv.DictReader(filein, skipinitialspace=True))
  names = NOT SURE HOW TO GET THIS
  fieldnames = ['Date'] + ['{}'.format(i) for i in names]
  csvout = csv.DictWriter(fileout, fieldnames=fieldnames, extrasaction='ignore', restval='NA')
  csvout.writeheader()
  for row in rows:
    row['{}'.format(row['Name'].strip())] = row['Wage']
    csvout.writerow(row)

这会让你走上正轨

data.csv

5/1/19,Joe,$100
5/1/19,Sam,$120
5/1/19,Kate,$30
5/2/19,Joe,$120
5/2/19,Sam,$134
5/2/19,Kate,$56
5/3/19,Joe,$89
5/3/19,Sam,$90
5/3/19,Kate,$231
输出:

Date,Sam,Kate,Joe
5/1/19,$120,$30,$100
5/2/19,$134,$56,$120
5/3/19,$90,$231,$89

只需使用
pandas
库:

import pandas as pd

df = pd.read_csv("test.csv", sep="\s+")
p_table = pd.pivot_table(df, values='Wage', columns=['Name'], index='Date', 
                         aggfunc=lambda x:x)
p_table = p_table.reset_index()
p_table.columns.name = None

print(p_table)
输出:

     Date   Joe  Kate   Sam
0  5/1/19  $100   $30  $120
1  5/2/19  $120   $56  $134
2  5/3/19   $89  $231   $90

参考链接:


这可以通过csv模块完成。以下是Python 3的方法:

import csv
import collections

with open ('myfile.csv', 'r') as filein, open ('restructured.csv', 'w', newline='') as fileout:
    data = collections.defaultdict(dict)
    names = set()
    for row in csv.DictReader(filein, skipinitialspace=True):
        data[row['Date']][row['Name']] = row['Wage']
        names.add(row['Name'])
    csvout = csv.DictWriter(fileout, fieldnames = ['Date'] + list(names))
    csvout.writeheader()
    for dat in sorted(data.keys()):
        row = data[dat]
        row['Date'] = dat
        csvout.writerow(row)
生成的csv应如下所示:

Date,Kate,Joe,Sam
5/1/19,$30,$100,$120
5/2/19,$56,$120,$134
5/3/19,$231,$89,$90
除了第一行之外,Python 2也一样,它应该是:

with open ('myfile.csv', 'rb') as filein, open ('restructured.csv', 'wb') as fileout:

您要做的就是将长格式转换为宽格式。使用
pandas
您可以通过

import pandas as pd

df = pd.read_csv("myfile.csv", sep = ',')

# Restructure the dataframe
tdf = df.pivot(index = 'Date', columns = 'Name', values = 'Wage')

tdf.to_csv("restructured.csv", sep = ',')

print(tdf)
Name     Joe  Kate   Sam
Date                    
5/1/19  $100   $30  $120
5/2/19  $120   $56  $134
5/3/19   $89  $231   $90

csv模块只是一个解析器,它以元组或dict的形式生成csv行。它本身不会将行转换为其他内容。在这种情况下使用熊猫会更容易谢谢。你介意给我举一个类似的例子吗?@manticora这个视频可以帮助你:什么是分隔符?
列表(csv.DictReader(filein,skipinitialspace=True))是否返回您所期望的内容?我认为OP希望保存为csv文件,而不是打印输出。我喜欢您在这里的聚合功能,我以前从未见过或想到过。它确实对我有用-非常感谢!但是数据不是按日期排序的:(我的第一列看起来像这样:日期5/1/195/2/195/19/19 5/29/19 5/24/19 5/27/19 5/21/19 5/9/19我尝试用python后缀对其排序,但得到了以下错误:ValueError:时间数据“5”与格式“%m-%d-%y”不匹配。它可以很容易地按日期排序。请参阅我在
中的编辑,以获得排序后的数据(data.keys()):
我想它不会把它识别为日期,因为这次它是这样排序的:5/1/19、5/10/19、5/11/19等等