python-将最后n列替换为所有文件的总和_Python_Csv_Numpy_Pandas

python-将最后n列替换为所有文件的总和

python csv numpy pandas

python-将最后n列替换为所有文件的总和,python,csv,numpy,pandas,Python,Csv,Numpy,Pandas,我是python新手我有8个csv文件，每个文件有26列和600行。现在我想取每个csv文件的最后4列（第22列到第25列），读取这些文件并将它们相加，以替换每个文件中的所有4列。例如（我在这里展示了一些随机数据）： new-1.csv： a b c d e f g h i j k 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 3

我是python新手

我有8个csv文件，每个文件有26列和600行。现在我想取每个csv文件的最后4列（第22列到第25列），读取这些文件并将它们相加，以替换每个文件中的所有4列。例如（我在这里展示了一些随机数据）：

new-1.csv：

a   b   c   d   e   f   g   h   i   j   k
1   1   1   1   1   1   1   1   1   1   1
2   2   2   2   2   2   2   2   2   2   2
3   3   3   3   3   3   3   3   3   3   3
4   4   4   4   4   4   4   4   4   4   4
5   5   5   5   5   5   5   5   5   5   5
6   6   6   6   6   6   6   6   6   6   6
7   7   7   7   7   7   7   7   7   7   7
8   8   8   8   8   8   8   8   8   8   8
9   9   9   9   9   9   9   9   9   9   9

a   b   c   d   e   f   g   h   i   j   k
1   1   1   1   1   1   1   12  12  12  12
2   2   2   2   2   2   2   14  14  14  14
3   3   3   3   3   3   3   16  16  16  16
4   4   4   4   4   4   4   18  18  18  18
5   5   5   5   5   5   5   20  20  20  20
6   6   6   6   6   6   6   22  22  22  22
7   7   7   7   7   7   7   24  24  24  24
8   8   8   8   8   8   8   26  26  26  26
9   9   9   9   9   9   9   28  28  28  28

new2.csv：

a   b   c   d   e   f   g   h   i   j   k
11  11  11  11  11  11  11  11  11  11  11
12  12  12  12  12  12  12  12  12  12  12
13  13  13  13  13  13  13  13  13  13  13
14  14  14  14  14  14  14  14  14  14  14
15  15  15  15  15  15  15  15  15  15  15
16  16  16  16  16  16  16  16  16  16  16
17  17  17  17  17  17  17  17  17  17  17
18  18  18  18  18  18  18  18  18  18  18
19  19  19  19  19  19  19  19  19  19  19

现在，我想求这两个文件中“h，I，j，k”的每个元素的和，然后用这个新的和替换最后4列的文件

修改后的new-1.csv：

a   b   c   d   e   f   g   h   i   j   k
1   1   1   1   1   1   1   1   1   1   1
2   2   2   2   2   2   2   2   2   2   2
3   3   3   3   3   3   3   3   3   3   3
4   4   4   4   4   4   4   4   4   4   4
5   5   5   5   5   5   5   5   5   5   5
6   6   6   6   6   6   6   6   6   6   6
7   7   7   7   7   7   7   7   7   7   7
8   8   8   8   8   8   8   8   8   8   8
9   9   9   9   9   9   9   9   9   9   9

a   b   c   d   e   f   g   h   i   j   k
1   1   1   1   1   1   1   12  12  12  12
2   2   2   2   2   2   2   14  14  14  14
3   3   3   3   3   3   3   16  16  16  16
4   4   4   4   4   4   4   18  18  18  18
5   5   5   5   5   5   5   20  20  20  20
6   6   6   6   6   6   6   22  22  22  22
7   7   7   7   7   7   7   24  24  24  24
8   8   8   8   8   8   8   26  26  26  26
9   9   9   9   9   9   9   28  28  28  28

修改后的new-2.csv：

a   b   c   d   e   f   g   h   i   j   k
11  11  11  11  11  11  11  12  12  12  12
12  12  12  12  12  12  12  14  14  14  14
13  13  13  13  13  13  13  16  16  16  16
14  14  14  14  14  14  14  18  18  18  18
15  15  15  15  15  15  15  20  20  20  20
16  16  16  16  16  16  16  22  22  22  22
17  17  17  17  17  17  17  24  24  24  24
18  18  18  18  18  18  18  26  26  26  26
19  19  19  19  19  19  19  28  28  28  28

我想我应该用Panda或numpy来做这个，但不知道怎么做。如有任何建议/提示，将不胜感激

使用

read\u csv

加载csv后，您可以将最后4列添加到一起，然后覆盖它们：

In [10]:
total = df[df.columns[-4:]].values + df1[df1.columns[-4:]].values
total

Out[10]:
array([[12, 12, 12, 12],
       [14, 14, 14, 14],
       [16, 16, 16, 16],
       [18, 18, 18, 18],
       [20, 20, 20, 20],
       [22, 22, 22, 22],
       [24, 24, 24, 24],
       [26, 26, 26, 26],
       [28, 28, 28, 28]], dtype=int64)

In [12]:    
df[df.columns[-4:]] = total
df1[df1.columns[-4:]] = total
df

Out[12]:
   a  b  c  d  e  f  g   h   i   j   k
0  1  1  1  1  1  1  1  12  12  12  12
1  2  2  2  2  2  2  2  14  14  14  14
2  3  3  3  3  3  3  3  16  16  16  16
3  4  4  4  4  4  4  4  18  18  18  18
4  5  5  5  5  5  5  5  20  20  20  20
5  6  6  6  6  6  6  6  22  22  22  22
6  7  7  7  7  7  7  7  24  24  24  24
7  8  8  8  8  8  8  8  26  26  26  26
8  9  9  9  9  9  9  9  28  28  28  28

In [13]:    
df1

Out[13]:
    a   b   c   d   e   f   g   h   i   j   k
0  11  11  11  11  11  11  11  12  12  12  12
1  12  12  12  12  12  12  12  14  14  14  14
2  13  13  13  13  13  13  13  16  16  16  16
3  14  14  14  14  14  14  14  18  18  18  18
4  15  15  15  15  15  15  15  20  20  20  20
5  16  16  16  16  16  16  16  22  22  22  22
6  17  17  17  17  17  17  17  24  24  24  24
7  18  18  18  18  18  18  18  26  26  26  26
8  19  19  19  19  19  19  19  28  28  28  28

我们需要在这里调用属性

.values

，以返回np数组，因为否则它将尝试在索引上对齐，而在本例中，索引不对齐

覆盖后，调用

df.to\u csv（文件路径）

和

df1.to\u csv（文件路径）

对于您的8个dfs，您可以在其上循环并在循环时聚合：

# take a copy of the firt df's last 4 columns
total = df_list[0]
total = total[total.columns[-4:]].values
for df in df_list[1:]:
    total += df[df.columns[-4:]].values

然后只需再次循环dfs即可覆盖：

for df in df_list:
    df[df.columns[-4:]] = total

然后再次使用

写入csv

在使用

读取csv

加载csv后，您可以将最后4列添加到一起，然后覆盖它们：

In [10]:
total = df[df.columns[-4:]].values + df1[df1.columns[-4:]].values
total

Out[10]:
array([[12, 12, 12, 12],
       [14, 14, 14, 14],
       [16, 16, 16, 16],
       [18, 18, 18, 18],
       [20, 20, 20, 20],
       [22, 22, 22, 22],
       [24, 24, 24, 24],
       [26, 26, 26, 26],
       [28, 28, 28, 28]], dtype=int64)

In [12]:    
df[df.columns[-4:]] = total
df1[df1.columns[-4:]] = total
df

Out[12]:
   a  b  c  d  e  f  g   h   i   j   k
0  1  1  1  1  1  1  1  12  12  12  12
1  2  2  2  2  2  2  2  14  14  14  14
2  3  3  3  3  3  3  3  16  16  16  16
3  4  4  4  4  4  4  4  18  18  18  18
4  5  5  5  5  5  5  5  20  20  20  20
5  6  6  6  6  6  6  6  22  22  22  22
6  7  7  7  7  7  7  7  24  24  24  24
7  8  8  8  8  8  8  8  26  26  26  26
8  9  9  9  9  9  9  9  28  28  28  28

In [13]:    
df1

Out[13]:
    a   b   c   d   e   f   g   h   i   j   k
0  11  11  11  11  11  11  11  12  12  12  12
1  12  12  12  12  12  12  12  14  14  14  14
2  13  13  13  13  13  13  13  16  16  16  16
3  14  14  14  14  14  14  14  18  18  18  18
4  15  15  15  15  15  15  15  20  20  20  20
5  16  16  16  16  16  16  16  22  22  22  22
6  17  17  17  17  17  17  17  24  24  24  24
7  18  18  18  18  18  18  18  26  26  26  26
8  19  19  19  19  19  19  19  28  28  28  28

我们需要在这里调用属性

.values

，以返回np数组，因为否则它将尝试在索引上对齐，而在本例中，索引不对齐

覆盖后，调用

df.to\u csv（文件路径）

和

df1.to\u csv（文件路径）

对于您的8个dfs，您可以在其上循环并在循环时聚合：

# take a copy of the firt df's last 4 columns
total = df_list[0]
total = total[total.columns[-4:]].values
for df in df_list[1:]:
    total += df[df.columns[-4:]].values

然后只需再次循环dfs即可覆盖：

for df in df_list:
    df[df.columns[-4:]] = total

然后再次使用

写入csv

在使用

读取csv

加载csv后，您可以将最后4列添加到一起，然后覆盖它们：

In [10]:
total = df[df.columns[-4:]].values + df1[df1.columns[-4:]].values
total

Out[10]:
array([[12, 12, 12, 12],
       [14, 14, 14, 14],
       [16, 16, 16, 16],
       [18, 18, 18, 18],
       [20, 20, 20, 20],
       [22, 22, 22, 22],
       [24, 24, 24, 24],
       [26, 26, 26, 26],
       [28, 28, 28, 28]], dtype=int64)

In [12]:    
df[df.columns[-4:]] = total
df1[df1.columns[-4:]] = total
df

Out[12]:
   a  b  c  d  e  f  g   h   i   j   k
0  1  1  1  1  1  1  1  12  12  12  12
1  2  2  2  2  2  2  2  14  14  14  14
2  3  3  3  3  3  3  3  16  16  16  16
3  4  4  4  4  4  4  4  18  18  18  18
4  5  5  5  5  5  5  5  20  20  20  20
5  6  6  6  6  6  6  6  22  22  22  22
6  7  7  7  7  7  7  7  24  24  24  24
7  8  8  8  8  8  8  8  26  26  26  26
8  9  9  9  9  9  9  9  28  28  28  28

In [13]:    
df1

Out[13]:
    a   b   c   d   e   f   g   h   i   j   k
0  11  11  11  11  11  11  11  12  12  12  12
1  12  12  12  12  12  12  12  14  14  14  14
2  13  13  13  13  13  13  13  16  16  16  16
3  14  14  14  14  14  14  14  18  18  18  18
4  15  15  15  15  15  15  15  20  20  20  20
5  16  16  16  16  16  16  16  22  22  22  22
6  17  17  17  17  17  17  17  24  24  24  24
7  18  18  18  18  18  18  18  26  26  26  26
8  19  19  19  19  19  19  19  28  28  28  28

我们需要在这里调用属性

.values

，以返回np数组，因为否则它将尝试在索引上对齐，而在本例中，索引不对齐

覆盖后，调用

df.to\u csv（文件路径）

和

df1.to\u csv（文件路径）

对于您的8个dfs，您可以在其上循环并在循环时聚合：

# take a copy of the firt df's last 4 columns
total = df_list[0]
total = total[total.columns[-4:]].values
for df in df_list[1:]:
    total += df[df.columns[-4:]].values

然后只需再次循环dfs即可覆盖：

for df in df_list:
    df[df.columns[-4:]] = total

然后再次使用

写入csv

在使用

读取csv

加载csv后，您可以将最后4列添加到一起，然后覆盖它们：

In [10]:
total = df[df.columns[-4:]].values + df1[df1.columns[-4:]].values
total

Out[10]:
array([[12, 12, 12, 12],
       [14, 14, 14, 14],
       [16, 16, 16, 16],
       [18, 18, 18, 18],
       [20, 20, 20, 20],
       [22, 22, 22, 22],
       [24, 24, 24, 24],
       [26, 26, 26, 26],
       [28, 28, 28, 28]], dtype=int64)

In [12]:    
df[df.columns[-4:]] = total
df1[df1.columns[-4:]] = total
df

Out[12]:
   a  b  c  d  e  f  g   h   i   j   k
0  1  1  1  1  1  1  1  12  12  12  12
1  2  2  2  2  2  2  2  14  14  14  14
2  3  3  3  3  3  3  3  16  16  16  16
3  4  4  4  4  4  4  4  18  18  18  18
4  5  5  5  5  5  5  5  20  20  20  20
5  6  6  6  6  6  6  6  22  22  22  22
6  7  7  7  7  7  7  7  24  24  24  24
7  8  8  8  8  8  8  8  26  26  26  26
8  9  9  9  9  9  9  9  28  28  28  28

In [13]:    
df1

Out[13]:
    a   b   c   d   e   f   g   h   i   j   k
0  11  11  11  11  11  11  11  12  12  12  12
1  12  12  12  12  12  12  12  14  14  14  14
2  13  13  13  13  13  13  13  16  16  16  16
3  14  14  14  14  14  14  14  18  18  18  18
4  15  15  15  15  15  15  15  20  20  20  20
5  16  16  16  16  16  16  16  22  22  22  22
6  17  17  17  17  17  17  17  24  24  24  24
7  18  18  18  18  18  18  18  26  26  26  26
8  19  19  19  19  19  19  19  28  28  28  28

我们需要在这里调用属性

.values

，以返回np数组，因为否则它将尝试在索引上对齐，而在本例中，索引不对齐

覆盖后，调用

df.to\u csv（文件路径）

和

df1.to\u csv（文件路径）

对于您的8个dfs，您可以在其上循环并在循环时聚合：

# take a copy of the firt df's last 4 columns
total = df_list[0]
total = total[total.columns[-4:]].values
for df in df_list[1:]:
    total += df[df.columns[-4:]].values

然后只需再次循环dfs即可覆盖：

for df in df_list:
    df[df.columns[-4:]] = total

然后用

再次写出来，以_csv

您只需使用numpy就可以做到这一点。

import numpy as np

# list of all the files

file_list = ['foo.csv','bar.csv','baz.csv'] # all 8 files

col_names = ['a','b','c','d','e','f'] # all the names till z if necessary as the first row, else skip this

# initializing a numpy array, for containing sum from last 4 columns

add_cols = np.zeros((600,4))

# iterating over all .csv files

for file in file_list :

    # skiprows will skip the first row and usecols will get values in last 4 cols

    temp = np.loadtxt(file, skiprows=1, delimiter=',' , usecols = (22,23,24,25) )
    add_cols = np.add(temp,add_cols)

# now again overwriting all the files, substituting the last 4 columns with the sum   

for file in file_list :

    #loading the content from file in temp

    temp = np.loadtxt(file, skiprows=1, delimiter=',')
    temp[:,[22,23,24,25]] = add_cols 

    # writing the column names first

    with open(file,'w') as p:
        p.write(','.join(col_names)+'\n')

    # now appending final values in temp to the file as csv

    with open(file,'a')  as p:  
        np.savetxt(p,temp,delimiter=",",fmt="%i")

现在，如果文件不是逗号分隔的，而是空格分隔的，请从所有函数中删除

分隔符

选项，因为默认情况下，分隔符被视为

空格

。也相应地加入第一列。

只需使用numpy即可完成此操作。

import numpy as np

# list of all the files

file_list = ['foo.csv','bar.csv','baz.csv'] # all 8 files

col_names = ['a','b','c','d','e','f'] # all the names till z if necessary as the first row, else skip this

# initializing a numpy array, for containing sum from last 4 columns

add_cols = np.zeros((600,4))

# iterating over all .csv files

for file in file_list :

    # skiprows will skip the first row and usecols will get values in last 4 cols

    temp = np.loadtxt(file, skiprows=1, delimiter=',' , usecols = (22,23,24,25) )
    add_cols = np.add(temp,add_cols)

# now again overwriting all the files, substituting the last 4 columns with the sum   

for file in file_list :

    #loading the content from file in temp

    temp = np.loadtxt(file, skiprows=1, delimiter=',')
    temp[:,[22,23,24,25]] = add_cols 

    # writing the column names first

    with open(file,'w') as p:
        p.write(','.join(col_names)+'\n')

    # now appending final values in temp to the file as csv

    with open(file,'a')  as p:  
        np.savetxt(p,temp,delimiter=",",fmt="%i")

现在，如果文件不是逗号分隔的，而是空格分隔的，请从所有函数中删除

分隔符

选项，因为默认情况下，分隔符被视为

空格

。也相应地加入第一列。

只需使用numpy即可完成此操作。

import numpy as np

# list of all the files

file_list = ['foo.csv','bar.csv','baz.csv'] # all 8 files

col_names = ['a','b','c','d','e','f'] # all the names till z if necessary as the first row, else skip this

# initializing a numpy array, for containing sum from last 4 columns

add_cols = np.zeros((600,4))

# iterating over all .csv files

for file in file_list :

    # skiprows will skip the first row and usecols will get values in last 4 cols

    temp = np.loadtxt(file, skiprows=1, delimiter=',' , usecols = (22,23,24,25) )
    add_cols = np.add(temp,add_cols)

# now again overwriting all the files, substituting the last 4 columns with the sum   

for file in file_list :

    #loading the content from file in temp

    temp = np.loadtxt(file, skiprows=1, delimiter=',')
    temp[:,[22,23,24,25]] = add_cols 

    # writing the column names first

    with open(file,'w') as p:
        p.write(','.join(col_names)+'\n')

    # now appending final values in temp to the file as csv

    with open(file,'a')  as p:  
        np.savetxt(p,temp,delimiter=",",fmt="%i")

现在，如果文件不是逗号分隔的，而是空格分隔的，请从所有函数中删除

分隔符

选项，因为默认情况下，分隔符被视为

空格

。也相应地加入第一列。

只需使用numpy即可完成此操作。

import numpy as np

# list of all the files

file_list = ['foo.csv','bar.csv','baz.csv'] # all 8 files

col_names = ['a','b','c','d','e','f'] # all the names till z if necessary as the first row, else skip this

# initializing a numpy array, for containing sum from last 4 columns

add_cols = np.zeros((600,4))

# iterating over all .csv files

for file in file_list :

    # skiprows will skip the first row and usecols will get values in last 4 cols

    temp = np.loadtxt(file, skiprows=1, delimiter=',' , usecols = (22,23,24,25) )
    add_cols = np.add(temp,add_cols)

# now again overwriting all the files, substituting the last 4 columns with the sum   

for file in file_list :

    #loading the content from file in temp

    temp = np.loadtxt(file, skiprows=1, delimiter=',')
    temp[:,[22,23,24,25]] = add_cols 

    # writing the column names first

    with open(file,'w') as p:
        p.write(','.join(col_names)+'\n')

    # now appending final values in temp to the file as csv

    with open(file,'a')  as p:  
        np.savetxt(p,temp,delimiter=",",fmt="%i")

现在，如果文件不是逗号分隔的，而是空格分隔的，请从所有函数中删除

分隔符

选项，因为默认情况下，分隔符被视为

空格

。也相应地连接第一列