Merce csv文件(来自文件夹)合并到一个文件夹中,使用Python添加具有不同名称的列
我需要将文件夹中的几个CSV文件合并到一个文件夹中 我的原始数据是这样的 y_1980.csv:Merce csv文件(来自文件夹)合并到一个文件夹中,使用Python添加具有不同名称的列,python,csv,pandas,merge,Python,Csv,Pandas,Merge,我需要将文件夹中的几个CSV文件合并到一个文件夹中 我的原始数据是这样的 y_1980.csv: country y_1980 0 afg 196 1 ago 125 2 alb 23 3 . . . . . country y_1981 0 afg 192 1 ago 120 2 alb
country y_1980
0 afg 196
1 ago 125
2 alb 23
3 . .
. . .
country y_1981
0 afg 192
1 ago 120
2 alb 0
3 . .
. . .
country y_20xx
0 afg 176
1 ago 170
2 alb 76
3 . .
. . .
import pandas as pd
files = ['file1', 'file2']
dfs = None
for filename in files:
df = pd.read_csv(filename, sep='\s+')
if dfs is None:
dfs = df
else:
dfs = dfs.merge(df, how='outer')
print(df)
print(dfs)
dfs.to_csv('file3', sep=' ')
country y_1980
0 afg 196
1 ago 125
2 alb 23
country y_1981
0 afg 192
1 ago 120
2 alb 0
country y_1980 y_1981
0 afg 196 192
1 ago 125 120
2 alb 23 0
y_1981.csv:
country y_1980
0 afg 196
1 ago 125
2 alb 23
3 . .
. . .
country y_1981
0 afg 192
1 ago 120
2 alb 0
3 . .
. . .
country y_20xx
0 afg 176
1 ago 170
2 alb 76
3 . .
. . .
import pandas as pd
files = ['file1', 'file2']
dfs = None
for filename in files:
df = pd.read_csv(filename, sep='\s+')
if dfs is None:
dfs = df
else:
dfs = dfs.merge(df, how='outer')
print(df)
print(dfs)
dfs.to_csv('file3', sep=' ')
country y_1980
0 afg 196
1 ago 125
2 alb 23
country y_1981
0 afg 192
1 ago 120
2 alb 0
country y_1980 y_1981
0 afg 196 192
1 ago 125 120
2 alb 23 0
y_20xx.csv:
country y_1980
0 afg 196
1 ago 125
2 alb 23
3 . .
. . .
country y_1981
0 afg 192
1 ago 120
2 alb 0
3 . .
. . .
country y_20xx
0 afg 176
1 ago 170
2 alb 76
3 . .
. . .
import pandas as pd
files = ['file1', 'file2']
dfs = None
for filename in files:
df = pd.read_csv(filename, sep='\s+')
if dfs is None:
dfs = df
else:
dfs = dfs.merge(df, how='outer')
print(df)
print(dfs)
dfs.to_csv('file3', sep=' ')
country y_1980
0 afg 196
1 ago 125
2 alb 23
country y_1981
0 afg 192
1 ago 120
2 alb 0
country y_1980 y_1981
0 afg 196 192
1 ago 125 120
2 alb 23 0
我希望得到类似的结果:
country y_1980 y_1981 ... y_20xx
0 afg 196 192 ... 176
1 ago 125 120 ... 170
2 alb 23 0 ... 76
3 . . . ... .
. . . . ... .
到目前为止,我当前的代码如下所示,但我得到的结果是数据帧在前一帧之后合并:
interesting_files = glob.glob("/Users/Desktop/Data/*.csv")
header_saved = True
with open('/Users/Desktop/Data/table.csv','wb') as fout:
for filename in interesting_files:
with open(filename) as fin:
header = next(fin)
if not header_saved:
fout.write(header)
header_saved = True
for line in fin:
fout.write(line)
代码的顺序似乎如下所示:
- 打开文件#1
- 如果未保存,则写入标题
- 写数据行
- 打开文件#2
- …等等
如果你用熊猫,那就容易多了。原因是它将消除
for循环
问题,并将内存占用保持在较低的水平
import pandas as pd
# read the files first
y_1980 = pd.read_csv('y_1980.csv', sep='\t')
y_1981 = pd.read_csv('y_1981.csv', sep='\t')
如果使用逗号“”或“”,以空格分隔值,则可以更改sep
选项
# set 'country' as the index to use this value to merge.
y_1980 = y_1980.set_index('country', append=True)
y_1981 = y_1981.set_index('country', append=True)
print(y_1980)
print(y_1981)
y_1980
country
0 afg 196
1 ago 125
2 alb 23
y_1980
country
0 afg 192
1 ago 120
2 alb 0
# set the frames to merge. You can add as many dataframe as you want.
frames =[y_1980, y_1981]
# now merge the dataframe
merged_df = pd.concat(frames, axis=1).reset_index(level=['country'])
print(result)
country y_1980 y_1980
0 afg 196 192
1 ago 125 120
2 alb 23 0
附加说明:如果只想合并所有帧中存在的关键点,可以添加选项:how='internal'和drop=na
。如果要合并所有帧中的所有可能数据,请使用how='outer'
有关更多详细信息,请参阅此链接:熊猫使这项工作变得非常简单。通过循环和合并,您只需执行以下操作:
代码:
country y_1980
0 afg 196
1 ago 125
2 alb 23
3 . .
. . .
country y_1981
0 afg 192
1 ago 120
2 alb 0
3 . .
. . .
country y_20xx
0 afg 176
1 ago 170
2 alb 76
3 . .
. . .
import pandas as pd
files = ['file1', 'file2']
dfs = None
for filename in files:
df = pd.read_csv(filename, sep='\s+')
if dfs is None:
dfs = df
else:
dfs = dfs.merge(df, how='outer')
print(df)
print(dfs)
dfs.to_csv('file3', sep=' ')
country y_1980
0 afg 196
1 ago 125
2 alb 23
country y_1981
0 afg 192
1 ago 120
2 alb 0
country y_1980 y_1981
0 afg 196 192
1 ago 125 120
2 alb 23 0
结果:
country y_1980
0 afg 196
1 ago 125
2 alb 23
3 . .
. . .
country y_1981
0 afg 192
1 ago 120
2 alb 0
3 . .
. . .
country y_20xx
0 afg 176
1 ago 170
2 alb 76
3 . .
. . .
import pandas as pd
files = ['file1', 'file2']
dfs = None
for filename in files:
df = pd.read_csv(filename, sep='\s+')
if dfs is None:
dfs = df
else:
dfs = dfs.merge(df, how='outer')
print(df)
print(dfs)
dfs.to_csv('file3', sep=' ')
country y_1980
0 afg 196
1 ago 125
2 alb 23
country y_1981
0 afg 192
1 ago 120
2 alb 0
country y_1980 y_1981
0 afg 196 192
1 ago 125 120
2 alb 23 0
如果您使用pandas
,会更容易。因为它摆脱了for循环
,并保持了较低的内存足迹。而且,它更全面。如果您想要熊猫解决方案,请告诉我。是的,我想要熊猫解决方案。请检查答案。它将优雅而全面地工作。让我知道它是否有效。如果有帮助,请接受并投票。谢谢。我正在尝试运行此代码,但遇到以下错误:-->13 merged_df=for df in dfs[1:]:indexer错误:列表索引超出范围修改了我的代码以修复某些格式,但您能否检查以确保dfs确实包含读入的数据帧列表?带有dfs[1:://code>的for循环在声明merged_df
时分配的第一个数据帧之外的所有数据帧中循环