python-将最后n列替换为所有文件的总和

python-将最后n列替换为所有文件的总和,python,csv,numpy,pandas,Python,Csv,Numpy,Pandas,我是python新手 我有8个csv文件,每个文件有26列和600行。现在我想取每个csv文件的最后4列(第22列到第25列),读取这些文件并将它们相加,以替换每个文件中的所有4列。例如(我在这里展示了一些随机数据): new-1.csv: a b c d e f g h i j k 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 3

我是python新手

我有8个csv文件,每个文件有26列和600行。现在我想取每个csv文件的最后4列(第22列到第25列),读取这些文件并将它们相加,以替换每个文件中的所有4列。例如(我在这里展示了一些随机数据):

new-1.csv:

a   b   c   d   e   f   g   h   i   j   k
1   1   1   1   1   1   1   1   1   1   1
2   2   2   2   2   2   2   2   2   2   2
3   3   3   3   3   3   3   3   3   3   3
4   4   4   4   4   4   4   4   4   4   4
5   5   5   5   5   5   5   5   5   5   5
6   6   6   6   6   6   6   6   6   6   6
7   7   7   7   7   7   7   7   7   7   7
8   8   8   8   8   8   8   8   8   8   8
9   9   9   9   9   9   9   9   9   9   9 
a   b   c   d   e   f   g   h   i   j   k
1   1   1   1   1   1   1   12  12  12  12
2   2   2   2   2   2   2   14  14  14  14
3   3   3   3   3   3   3   16  16  16  16
4   4   4   4   4   4   4   18  18  18  18
5   5   5   5   5   5   5   20  20  20  20
6   6   6   6   6   6   6   22  22  22  22
7   7   7   7   7   7   7   24  24  24  24
8   8   8   8   8   8   8   26  26  26  26
9   9   9   9   9   9   9   28  28  28  28
new2.csv:

a   b   c   d   e   f   g   h   i   j   k
11  11  11  11  11  11  11  11  11  11  11
12  12  12  12  12  12  12  12  12  12  12
13  13  13  13  13  13  13  13  13  13  13
14  14  14  14  14  14  14  14  14  14  14
15  15  15  15  15  15  15  15  15  15  15
16  16  16  16  16  16  16  16  16  16  16
17  17  17  17  17  17  17  17  17  17  17
18  18  18  18  18  18  18  18  18  18  18
19  19  19  19  19  19  19  19  19  19  19
现在,我想求这两个文件中“h,I,j,k”的每个元素的和,然后用这个新的和替换最后4列的文件

修改后的new-1.csv:

a   b   c   d   e   f   g   h   i   j   k
1   1   1   1   1   1   1   1   1   1   1
2   2   2   2   2   2   2   2   2   2   2
3   3   3   3   3   3   3   3   3   3   3
4   4   4   4   4   4   4   4   4   4   4
5   5   5   5   5   5   5   5   5   5   5
6   6   6   6   6   6   6   6   6   6   6
7   7   7   7   7   7   7   7   7   7   7
8   8   8   8   8   8   8   8   8   8   8
9   9   9   9   9   9   9   9   9   9   9 
a   b   c   d   e   f   g   h   i   j   k
1   1   1   1   1   1   1   12  12  12  12
2   2   2   2   2   2   2   14  14  14  14
3   3   3   3   3   3   3   16  16  16  16
4   4   4   4   4   4   4   18  18  18  18
5   5   5   5   5   5   5   20  20  20  20
6   6   6   6   6   6   6   22  22  22  22
7   7   7   7   7   7   7   24  24  24  24
8   8   8   8   8   8   8   26  26  26  26
9   9   9   9   9   9   9   28  28  28  28
修改后的new-2.csv:

a   b   c   d   e   f   g   h   i   j   k
11  11  11  11  11  11  11  12  12  12  12
12  12  12  12  12  12  12  14  14  14  14
13  13  13  13  13  13  13  16  16  16  16
14  14  14  14  14  14  14  18  18  18  18
15  15  15  15  15  15  15  20  20  20  20
16  16  16  16  16  16  16  22  22  22  22
17  17  17  17  17  17  17  24  24  24  24
18  18  18  18  18  18  18  26  26  26  26
19  19  19  19  19  19  19  28  28  28  28

我想我应该用Panda或numpy来做这个,但不知道怎么做。如有任何建议/提示,将不胜感激

使用
read\u csv
加载csv后,您可以将最后4列添加到一起,然后覆盖它们:

In [10]:
total = df[df.columns[-4:]].values + df1[df1.columns[-4:]].values
total

Out[10]:
array([[12, 12, 12, 12],
       [14, 14, 14, 14],
       [16, 16, 16, 16],
       [18, 18, 18, 18],
       [20, 20, 20, 20],
       [22, 22, 22, 22],
       [24, 24, 24, 24],
       [26, 26, 26, 26],
       [28, 28, 28, 28]], dtype=int64)

In [12]:    
df[df.columns[-4:]] = total
df1[df1.columns[-4:]] = total
df

Out[12]:
   a  b  c  d  e  f  g   h   i   j   k
0  1  1  1  1  1  1  1  12  12  12  12
1  2  2  2  2  2  2  2  14  14  14  14
2  3  3  3  3  3  3  3  16  16  16  16
3  4  4  4  4  4  4  4  18  18  18  18
4  5  5  5  5  5  5  5  20  20  20  20
5  6  6  6  6  6  6  6  22  22  22  22
6  7  7  7  7  7  7  7  24  24  24  24
7  8  8  8  8  8  8  8  26  26  26  26
8  9  9  9  9  9  9  9  28  28  28  28

In [13]:    
df1

Out[13]:
    a   b   c   d   e   f   g   h   i   j   k
0  11  11  11  11  11  11  11  12  12  12  12
1  12  12  12  12  12  12  12  14  14  14  14
2  13  13  13  13  13  13  13  16  16  16  16
3  14  14  14  14  14  14  14  18  18  18  18
4  15  15  15  15  15  15  15  20  20  20  20
5  16  16  16  16  16  16  16  22  22  22  22
6  17  17  17  17  17  17  17  24  24  24  24
7  18  18  18  18  18  18  18  26  26  26  26
8  19  19  19  19  19  19  19  28  28  28  28
我们需要在这里调用属性
.values
,以返回np数组,因为否则它将尝试在索引上对齐,而在本例中,索引不对齐

覆盖后,调用
df.to\u csv(文件路径)
df1.to\u csv(文件路径)

对于您的8个dfs,您可以在其上循环并在循环时聚合:

# take a copy of the firt df's last 4 columns
total = df_list[0]
total = total[total.columns[-4:]].values
for df in df_list[1:]:
    total += df[df.columns[-4:]].values
然后只需再次循环dfs即可覆盖:

for df in df_list:
    df[df.columns[-4:]] = total

然后再次使用
写入csv

在使用
读取csv
加载csv后,您可以将最后4列添加到一起,然后覆盖它们:

In [10]:
total = df[df.columns[-4:]].values + df1[df1.columns[-4:]].values
total

Out[10]:
array([[12, 12, 12, 12],
       [14, 14, 14, 14],
       [16, 16, 16, 16],
       [18, 18, 18, 18],
       [20, 20, 20, 20],
       [22, 22, 22, 22],
       [24, 24, 24, 24],
       [26, 26, 26, 26],
       [28, 28, 28, 28]], dtype=int64)

In [12]:    
df[df.columns[-4:]] = total
df1[df1.columns[-4:]] = total
df

Out[12]:
   a  b  c  d  e  f  g   h   i   j   k
0  1  1  1  1  1  1  1  12  12  12  12
1  2  2  2  2  2  2  2  14  14  14  14
2  3  3  3  3  3  3  3  16  16  16  16
3  4  4  4  4  4  4  4  18  18  18  18
4  5  5  5  5  5  5  5  20  20  20  20
5  6  6  6  6  6  6  6  22  22  22  22
6  7  7  7  7  7  7  7  24  24  24  24
7  8  8  8  8  8  8  8  26  26  26  26
8  9  9  9  9  9  9  9  28  28  28  28

In [13]:    
df1

Out[13]:
    a   b   c   d   e   f   g   h   i   j   k
0  11  11  11  11  11  11  11  12  12  12  12
1  12  12  12  12  12  12  12  14  14  14  14
2  13  13  13  13  13  13  13  16  16  16  16
3  14  14  14  14  14  14  14  18  18  18  18
4  15  15  15  15  15  15  15  20  20  20  20
5  16  16  16  16  16  16  16  22  22  22  22
6  17  17  17  17  17  17  17  24  24  24  24
7  18  18  18  18  18  18  18  26  26  26  26
8  19  19  19  19  19  19  19  28  28  28  28
我们需要在这里调用属性
.values
,以返回np数组,因为否则它将尝试在索引上对齐,而在本例中,索引不对齐

覆盖后,调用
df.to\u csv(文件路径)
df1.to\u csv(文件路径)

对于您的8个dfs,您可以在其上循环并在循环时聚合:

# take a copy of the firt df's last 4 columns
total = df_list[0]
total = total[total.columns[-4:]].values
for df in df_list[1:]:
    total += df[df.columns[-4:]].values
然后只需再次循环dfs即可覆盖:

for df in df_list:
    df[df.columns[-4:]] = total

然后再次使用
写入csv

在使用
读取csv
加载csv后,您可以将最后4列添加到一起,然后覆盖它们:

In [10]:
total = df[df.columns[-4:]].values + df1[df1.columns[-4:]].values
total

Out[10]:
array([[12, 12, 12, 12],
       [14, 14, 14, 14],
       [16, 16, 16, 16],
       [18, 18, 18, 18],
       [20, 20, 20, 20],
       [22, 22, 22, 22],
       [24, 24, 24, 24],
       [26, 26, 26, 26],
       [28, 28, 28, 28]], dtype=int64)

In [12]:    
df[df.columns[-4:]] = total
df1[df1.columns[-4:]] = total
df

Out[12]:
   a  b  c  d  e  f  g   h   i   j   k
0  1  1  1  1  1  1  1  12  12  12  12
1  2  2  2  2  2  2  2  14  14  14  14
2  3  3  3  3  3  3  3  16  16  16  16
3  4  4  4  4  4  4  4  18  18  18  18
4  5  5  5  5  5  5  5  20  20  20  20
5  6  6  6  6  6  6  6  22  22  22  22
6  7  7  7  7  7  7  7  24  24  24  24
7  8  8  8  8  8  8  8  26  26  26  26
8  9  9  9  9  9  9  9  28  28  28  28

In [13]:    
df1

Out[13]:
    a   b   c   d   e   f   g   h   i   j   k
0  11  11  11  11  11  11  11  12  12  12  12
1  12  12  12  12  12  12  12  14  14  14  14
2  13  13  13  13  13  13  13  16  16  16  16
3  14  14  14  14  14  14  14  18  18  18  18
4  15  15  15  15  15  15  15  20  20  20  20
5  16  16  16  16  16  16  16  22  22  22  22
6  17  17  17  17  17  17  17  24  24  24  24
7  18  18  18  18  18  18  18  26  26  26  26
8  19  19  19  19  19  19  19  28  28  28  28
我们需要在这里调用属性
.values
,以返回np数组,因为否则它将尝试在索引上对齐,而在本例中,索引不对齐

覆盖后,调用
df.to\u csv(文件路径)
df1.to\u csv(文件路径)

对于您的8个dfs,您可以在其上循环并在循环时聚合:

# take a copy of the firt df's last 4 columns
total = df_list[0]
total = total[total.columns[-4:]].values
for df in df_list[1:]:
    total += df[df.columns[-4:]].values
然后只需再次循环dfs即可覆盖:

for df in df_list:
    df[df.columns[-4:]] = total

然后再次使用
写入csv

在使用
读取csv
加载csv后,您可以将最后4列添加到一起,然后覆盖它们:

In [10]:
total = df[df.columns[-4:]].values + df1[df1.columns[-4:]].values
total

Out[10]:
array([[12, 12, 12, 12],
       [14, 14, 14, 14],
       [16, 16, 16, 16],
       [18, 18, 18, 18],
       [20, 20, 20, 20],
       [22, 22, 22, 22],
       [24, 24, 24, 24],
       [26, 26, 26, 26],
       [28, 28, 28, 28]], dtype=int64)

In [12]:    
df[df.columns[-4:]] = total
df1[df1.columns[-4:]] = total
df

Out[12]:
   a  b  c  d  e  f  g   h   i   j   k
0  1  1  1  1  1  1  1  12  12  12  12
1  2  2  2  2  2  2  2  14  14  14  14
2  3  3  3  3  3  3  3  16  16  16  16
3  4  4  4  4  4  4  4  18  18  18  18
4  5  5  5  5  5  5  5  20  20  20  20
5  6  6  6  6  6  6  6  22  22  22  22
6  7  7  7  7  7  7  7  24  24  24  24
7  8  8  8  8  8  8  8  26  26  26  26
8  9  9  9  9  9  9  9  28  28  28  28

In [13]:    
df1

Out[13]:
    a   b   c   d   e   f   g   h   i   j   k
0  11  11  11  11  11  11  11  12  12  12  12
1  12  12  12  12  12  12  12  14  14  14  14
2  13  13  13  13  13  13  13  16  16  16  16
3  14  14  14  14  14  14  14  18  18  18  18
4  15  15  15  15  15  15  15  20  20  20  20
5  16  16  16  16  16  16  16  22  22  22  22
6  17  17  17  17  17  17  17  24  24  24  24
7  18  18  18  18  18  18  18  26  26  26  26
8  19  19  19  19  19  19  19  28  28  28  28
我们需要在这里调用属性
.values
,以返回np数组,因为否则它将尝试在索引上对齐,而在本例中,索引不对齐

覆盖后,调用
df.to\u csv(文件路径)
df1.to\u csv(文件路径)

对于您的8个dfs,您可以在其上循环并在循环时聚合:

# take a copy of the firt df's last 4 columns
total = df_list[0]
total = total[total.columns[-4:]].values
for df in df_list[1:]:
    total += df[df.columns[-4:]].values
然后只需再次循环dfs即可覆盖:

for df in df_list:
    df[df.columns[-4:]] = total

然后用
再次写出来,以_csv

您只需使用numpy就可以做到这一点。

import numpy as np

# list of all the files

file_list = ['foo.csv','bar.csv','baz.csv'] # all 8 files

col_names = ['a','b','c','d','e','f'] # all the names till z if necessary as the first row, else skip this

# initializing a numpy array, for containing sum from last 4 columns

add_cols = np.zeros((600,4))

# iterating over all .csv files

for file in file_list :

    # skiprows will skip the first row and usecols will get values in last 4 cols

    temp = np.loadtxt(file, skiprows=1, delimiter=',' , usecols = (22,23,24,25) )
    add_cols = np.add(temp,add_cols)

# now again overwriting all the files, substituting the last 4 columns with the sum   

for file in file_list :

    #loading the content from file in temp

    temp = np.loadtxt(file, skiprows=1, delimiter=',')
    temp[:,[22,23,24,25]] = add_cols 

    # writing the column names first

    with open(file,'w') as p:
        p.write(','.join(col_names)+'\n')

    # now appending final values in temp to the file as csv

    with open(file,'a')  as p:  
        np.savetxt(p,temp,delimiter=",",fmt="%i") 

现在,如果文件不是逗号分隔的,而是空格分隔的,请从所有函数中删除
分隔符
选项,因为默认情况下,分隔符被视为
空格
。也相应地加入第一列。

只需使用numpy即可完成此操作。

import numpy as np

# list of all the files

file_list = ['foo.csv','bar.csv','baz.csv'] # all 8 files

col_names = ['a','b','c','d','e','f'] # all the names till z if necessary as the first row, else skip this

# initializing a numpy array, for containing sum from last 4 columns

add_cols = np.zeros((600,4))

# iterating over all .csv files

for file in file_list :

    # skiprows will skip the first row and usecols will get values in last 4 cols

    temp = np.loadtxt(file, skiprows=1, delimiter=',' , usecols = (22,23,24,25) )
    add_cols = np.add(temp,add_cols)

# now again overwriting all the files, substituting the last 4 columns with the sum   

for file in file_list :

    #loading the content from file in temp

    temp = np.loadtxt(file, skiprows=1, delimiter=',')
    temp[:,[22,23,24,25]] = add_cols 

    # writing the column names first

    with open(file,'w') as p:
        p.write(','.join(col_names)+'\n')

    # now appending final values in temp to the file as csv

    with open(file,'a')  as p:  
        np.savetxt(p,temp,delimiter=",",fmt="%i") 

现在,如果文件不是逗号分隔的,而是空格分隔的,请从所有函数中删除
分隔符
选项,因为默认情况下,分隔符被视为
空格
。也相应地加入第一列。

只需使用numpy即可完成此操作。

import numpy as np

# list of all the files

file_list = ['foo.csv','bar.csv','baz.csv'] # all 8 files

col_names = ['a','b','c','d','e','f'] # all the names till z if necessary as the first row, else skip this

# initializing a numpy array, for containing sum from last 4 columns

add_cols = np.zeros((600,4))

# iterating over all .csv files

for file in file_list :

    # skiprows will skip the first row and usecols will get values in last 4 cols

    temp = np.loadtxt(file, skiprows=1, delimiter=',' , usecols = (22,23,24,25) )
    add_cols = np.add(temp,add_cols)

# now again overwriting all the files, substituting the last 4 columns with the sum   

for file in file_list :

    #loading the content from file in temp

    temp = np.loadtxt(file, skiprows=1, delimiter=',')
    temp[:,[22,23,24,25]] = add_cols 

    # writing the column names first

    with open(file,'w') as p:
        p.write(','.join(col_names)+'\n')

    # now appending final values in temp to the file as csv

    with open(file,'a')  as p:  
        np.savetxt(p,temp,delimiter=",",fmt="%i") 

现在,如果文件不是逗号分隔的,而是空格分隔的,请从所有函数中删除
分隔符
选项,因为默认情况下,分隔符被视为
空格
。也相应地加入第一列。

只需使用numpy即可完成此操作。

import numpy as np

# list of all the files

file_list = ['foo.csv','bar.csv','baz.csv'] # all 8 files

col_names = ['a','b','c','d','e','f'] # all the names till z if necessary as the first row, else skip this

# initializing a numpy array, for containing sum from last 4 columns

add_cols = np.zeros((600,4))

# iterating over all .csv files

for file in file_list :

    # skiprows will skip the first row and usecols will get values in last 4 cols

    temp = np.loadtxt(file, skiprows=1, delimiter=',' , usecols = (22,23,24,25) )
    add_cols = np.add(temp,add_cols)

# now again overwriting all the files, substituting the last 4 columns with the sum   

for file in file_list :

    #loading the content from file in temp

    temp = np.loadtxt(file, skiprows=1, delimiter=',')
    temp[:,[22,23,24,25]] = add_cols 

    # writing the column names first

    with open(file,'w') as p:
        p.write(','.join(col_names)+'\n')

    # now appending final values in temp to the file as csv

    with open(file,'a')  as p:  
        np.savetxt(p,temp,delimiter=",",fmt="%i") 
现在,如果文件不是逗号分隔的,而是空格分隔的,请从所有函数中删除
分隔符
选项,因为默认情况下,分隔符被视为
空格
。也相应地连接第一列