Merce csv文件(来自文件夹)合并到一个文件夹中,使用Python添加具有不同名称的列

Merce csv文件(来自文件夹)合并到一个文件夹中,使用Python添加具有不同名称的列,python,csv,pandas,merge,Python,Csv,Pandas,Merge,我需要将文件夹中的几个CSV文件合并到一个文件夹中 我的原始数据是这样的 y_1980.csv: country y_1980 0 afg 196 1 ago 125 2 alb 23 3 . . . . . country y_1981 0 afg 192 1 ago 120 2 alb

我需要将文件夹中的几个CSV文件合并到一个文件夹中

我的原始数据是这样的

y_1980.csv:

     country   y_1980
0        afg    196
1        ago    125
2        alb     23
3          .      .
.          .      .
     country   y_1981
0        afg    192
1        ago    120
2        alb     0
3          .      .
.          .      .
     country   y_20xx
0        afg    176
1        ago    170
2        alb     76
3          .      .
.          .      .
import pandas as pd

files = ['file1', 'file2']
dfs = None
for filename in files:
    df = pd.read_csv(filename, sep='\s+')
    if dfs is None:
        dfs = df
    else:
        dfs = dfs.merge(df, how='outer')
    print(df)
print(dfs)
dfs.to_csv('file3', sep=' ')
  country  y_1980
0     afg     196
1     ago     125
2     alb      23

  country  y_1981
0     afg     192
1     ago     120
2     alb       0

  country  y_1980  y_1981
0     afg     196     192
1     ago     125     120
2     alb      23       0
y_1981.csv:

     country   y_1980
0        afg    196
1        ago    125
2        alb     23
3          .      .
.          .      .
     country   y_1981
0        afg    192
1        ago    120
2        alb     0
3          .      .
.          .      .
     country   y_20xx
0        afg    176
1        ago    170
2        alb     76
3          .      .
.          .      .
import pandas as pd

files = ['file1', 'file2']
dfs = None
for filename in files:
    df = pd.read_csv(filename, sep='\s+')
    if dfs is None:
        dfs = df
    else:
        dfs = dfs.merge(df, how='outer')
    print(df)
print(dfs)
dfs.to_csv('file3', sep=' ')
  country  y_1980
0     afg     196
1     ago     125
2     alb      23

  country  y_1981
0     afg     192
1     ago     120
2     alb       0

  country  y_1980  y_1981
0     afg     196     192
1     ago     125     120
2     alb      23       0
y_20xx.csv:

     country   y_1980
0        afg    196
1        ago    125
2        alb     23
3          .      .
.          .      .
     country   y_1981
0        afg    192
1        ago    120
2        alb     0
3          .      .
.          .      .
     country   y_20xx
0        afg    176
1        ago    170
2        alb     76
3          .      .
.          .      .
import pandas as pd

files = ['file1', 'file2']
dfs = None
for filename in files:
    df = pd.read_csv(filename, sep='\s+')
    if dfs is None:
        dfs = df
    else:
        dfs = dfs.merge(df, how='outer')
    print(df)
print(dfs)
dfs.to_csv('file3', sep=' ')
  country  y_1980
0     afg     196
1     ago     125
2     alb      23

  country  y_1981
0     afg     192
1     ago     120
2     alb       0

  country  y_1980  y_1981
0     afg     196     192
1     ago     125     120
2     alb      23       0
我希望得到类似的结果:

     country   y_1980   y_1981   ...   y_20xx    
0        afg      196      192   ...      176
1        ago      125      120   ...      170
2        alb       23        0   ...       76
3          .        .        .   ...        .
.          .        .        .   ...        .
到目前为止,我当前的代码如下所示,但我得到的结果是数据帧在前一帧之后合并:

interesting_files = glob.glob("/Users/Desktop/Data/*.csv") 

header_saved = True

with open('/Users/Desktop/Data/table.csv','wb') as fout:
    for filename in interesting_files:

        with open(filename) as fin:
            header = next(fin)
            if not header_saved:
                fout.write(header)
                header_saved = True
            for line in fin:
                fout.write(line)

代码的顺序似乎如下所示:

  • 打开文件#1
  • 如果未保存,则写入标题
  • 写数据行
  • 打开文件#2
  • …等等
它将所有数据连接到一个文件中。听起来你真的想加入“国家”一栏


如果你用熊猫,那就容易多了。原因是它将消除
for循环
问题,并将
内存占用保持在较低的水平

import pandas as pd

# read the files first

y_1980 = pd.read_csv('y_1980.csv', sep='\t')
y_1981 = pd.read_csv('y_1981.csv', sep='\t')
如果使用逗号“”或“”,以空格分隔值,则可以更改
sep
选项

# set 'country' as the index to use this value to merge.
y_1980 = y_1980.set_index('country', append=True)
y_1981 = y_1981.set_index('country', append=True)

print(y_1980)
print(y_1981)

            y_1980
    country        
  0 afg         196
  1 ago         125
  2 alb          23


             y_1980
    country        
  0 afg         192
  1 ago         120
  2 alb           0

# set the frames to merge. You can add as many dataframe as you want.
frames =[y_1980, y_1981]

# now merge the dataframe
merged_df = pd.concat(frames, axis=1).reset_index(level=['country'])
print(result)

      country  y_1980  y_1980
0     afg     196     192
1     ago     125     120
2     alb      23       0
附加说明:如果只想合并所有帧中存在的关键点,可以添加选项:
how='internal'和drop=na
。如果要合并所有帧中的所有可能数据,请使用
how='outer'


有关更多详细信息,请参阅此链接:

熊猫使这项工作变得非常简单。通过循环和合并,您只需执行以下操作:

代码:

     country   y_1980
0        afg    196
1        ago    125
2        alb     23
3          .      .
.          .      .
     country   y_1981
0        afg    192
1        ago    120
2        alb     0
3          .      .
.          .      .
     country   y_20xx
0        afg    176
1        ago    170
2        alb     76
3          .      .
.          .      .
import pandas as pd

files = ['file1', 'file2']
dfs = None
for filename in files:
    df = pd.read_csv(filename, sep='\s+')
    if dfs is None:
        dfs = df
    else:
        dfs = dfs.merge(df, how='outer')
    print(df)
print(dfs)
dfs.to_csv('file3', sep=' ')
  country  y_1980
0     afg     196
1     ago     125
2     alb      23

  country  y_1981
0     afg     192
1     ago     120
2     alb       0

  country  y_1980  y_1981
0     afg     196     192
1     ago     125     120
2     alb      23       0
结果:

     country   y_1980
0        afg    196
1        ago    125
2        alb     23
3          .      .
.          .      .
     country   y_1981
0        afg    192
1        ago    120
2        alb     0
3          .      .
.          .      .
     country   y_20xx
0        afg    176
1        ago    170
2        alb     76
3          .      .
.          .      .
import pandas as pd

files = ['file1', 'file2']
dfs = None
for filename in files:
    df = pd.read_csv(filename, sep='\s+')
    if dfs is None:
        dfs = df
    else:
        dfs = dfs.merge(df, how='outer')
    print(df)
print(dfs)
dfs.to_csv('file3', sep=' ')
  country  y_1980
0     afg     196
1     ago     125
2     alb      23

  country  y_1981
0     afg     192
1     ago     120
2     alb       0

  country  y_1980  y_1981
0     afg     196     192
1     ago     125     120
2     alb      23       0

如果您使用
pandas
,会更容易。因为它摆脱了for循环
,并保持了较低的内存足迹。而且,它更全面。如果您想要熊猫解决方案,请告诉我。是的,我想要熊猫解决方案。请检查答案。它将优雅而全面地工作。让我知道它是否有效。如果有帮助,请接受并投票。谢谢。我正在尝试运行此代码,但遇到以下错误:-->13 merged_df=for df in dfs[1:]:indexer错误:列表索引超出范围修改了我的代码以修复某些格式,但您能否检查以确保dfs确实包含读入的数据帧列表?带有
dfs[1:://code>的for循环在声明
merged_df
时分配的第一个数据帧之外的所有数据帧中循环