用Python重新构造CSV文件
我有一个csv文件,如下所示:用Python重新构造CSV文件,python,csv,Python,Csv,我有一个csv文件,如下所示: Date Name Wage 5/1/19 Joe $100 5/1/19 Sam $120 5/1/19 Kate $30 5/2/19 Joe $120 5/2/19 Sam $134 5/2/19 Kate $56 5/3/19 Joe $89 5/3/19 Sam $90 5/3/19 Kate $231 Date Joe
Date Name Wage
5/1/19 Joe $100
5/1/19 Sam $120
5/1/19 Kate $30
5/2/19 Joe $120
5/2/19 Sam $134
5/2/19 Kate $56
5/3/19 Joe $89
5/3/19 Sam $90
5/3/19 Kate $231
Date Joe Sam Kate
5/1/19 $100 $120 $30
5/2/19 $120 $134 $56
5/3/19 $89 $90 $231
我想把它重组成这样:
Date Name Wage
5/1/19 Joe $100
5/1/19 Sam $120
5/1/19 Kate $30
5/2/19 Joe $120
5/2/19 Sam $134
5/2/19 Kate $56
5/3/19 Joe $89
5/3/19 Sam $90
5/3/19 Kate $231
Date Joe Sam Kate
5/1/19 $100 $120 $30
5/2/19 $120 $134 $56
5/3/19 $89 $90 $231
我不知道该怎么做。
以下是我开始写的内容:
import csv
with open ('myfile.csv', 'rb') as filein, open ('restructured.csv', 'wb') as fileout:
rows = list(csv.DictReader(filein, skipinitialspace=True))
names = NOT SURE HOW TO GET THIS
fieldnames = ['Date'] + ['{}'.format(i) for i in names]
csvout = csv.DictWriter(fileout, fieldnames=fieldnames, extrasaction='ignore', restval='NA')
csvout.writeheader()
for row in rows:
row['{}'.format(row['Name'].strip())] = row['Wage']
csvout.writerow(row)
这会让你走上正轨 data.csv
5/1/19,Joe,$100
5/1/19,Sam,$120
5/1/19,Kate,$30
5/2/19,Joe,$120
5/2/19,Sam,$134
5/2/19,Kate,$56
5/3/19,Joe,$89
5/3/19,Sam,$90
5/3/19,Kate,$231
输出:
Date,Sam,Kate,Joe
5/1/19,$120,$30,$100
5/2/19,$134,$56,$120
5/3/19,$90,$231,$89
只需使用
pandas
库:
import pandas as pd
df = pd.read_csv("test.csv", sep="\s+")
p_table = pd.pivot_table(df, values='Wage', columns=['Name'], index='Date',
aggfunc=lambda x:x)
p_table = p_table.reset_index()
p_table.columns.name = None
print(p_table)
输出:
Date Joe Kate Sam
0 5/1/19 $100 $30 $120
1 5/2/19 $120 $56 $134
2 5/3/19 $89 $231 $90
参考链接:
这可以通过csv模块完成。以下是Python 3的方法:
import csv
import collections
with open ('myfile.csv', 'r') as filein, open ('restructured.csv', 'w', newline='') as fileout:
data = collections.defaultdict(dict)
names = set()
for row in csv.DictReader(filein, skipinitialspace=True):
data[row['Date']][row['Name']] = row['Wage']
names.add(row['Name'])
csvout = csv.DictWriter(fileout, fieldnames = ['Date'] + list(names))
csvout.writeheader()
for dat in sorted(data.keys()):
row = data[dat]
row['Date'] = dat
csvout.writerow(row)
生成的csv应如下所示:
Date,Kate,Joe,Sam
5/1/19,$30,$100,$120
5/2/19,$56,$120,$134
5/3/19,$231,$89,$90
除了第一行之外,Python 2也一样,它应该是:
with open ('myfile.csv', 'rb') as filein, open ('restructured.csv', 'wb') as fileout:
您要做的就是将长格式转换为宽格式。使用
pandas
您可以通过
import pandas as pd
df = pd.read_csv("myfile.csv", sep = ',')
# Restructure the dataframe
tdf = df.pivot(index = 'Date', columns = 'Name', values = 'Wage')
tdf.to_csv("restructured.csv", sep = ',')
print(tdf)
Name Joe Kate Sam
Date
5/1/19 $100 $30 $120
5/2/19 $120 $56 $134
5/3/19 $89 $231 $90
csv模块只是一个解析器,它以元组或dict的形式生成csv行。它本身不会将行转换为其他内容。在这种情况下使用熊猫会更容易谢谢。你介意给我举一个类似的例子吗?@manticora这个视频可以帮助你:什么是分隔符?
列表(csv.DictReader(filein,skipinitialspace=True))是否返回您所期望的内容?我认为OP希望保存为csv文件,而不是打印输出。我喜欢您在这里的聚合功能,我以前从未见过或想到过。它确实对我有用-非常感谢!但是数据不是按日期排序的:(我的第一列看起来像这样:日期5/1/195/2/195/19/19 5/29/19 5/24/19 5/27/19 5/21/19 5/9/19我尝试用python后缀对其排序,但得到了以下错误:ValueError:时间数据“5”与格式“%m-%d-%y”不匹配。它可以很容易地按日期排序。请参阅我在中的编辑,以获得排序后的数据(data.keys()):
我想它不会把它识别为日期,因为这次它是这样排序的:5/1/19、5/10/19、5/11/19等等