python-将最后n列替换为所有文件的总和
我是python新手 我有8个csv文件,每个文件有26列和600行。现在我想取每个csv文件的最后4列(第22列到第25列),读取这些文件并将它们相加,以替换每个文件中的所有4列。例如(我在这里展示了一些随机数据): new-1.csv:python-将最后n列替换为所有文件的总和,python,csv,numpy,pandas,Python,Csv,Numpy,Pandas,我是python新手 我有8个csv文件,每个文件有26列和600行。现在我想取每个csv文件的最后4列(第22列到第25列),读取这些文件并将它们相加,以替换每个文件中的所有4列。例如(我在这里展示了一些随机数据): new-1.csv: a b c d e f g h i j k 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 3
a b c d e f g h i j k
1 1 1 1 1 1 1 1 1 1 1
2 2 2 2 2 2 2 2 2 2 2
3 3 3 3 3 3 3 3 3 3 3
4 4 4 4 4 4 4 4 4 4 4
5 5 5 5 5 5 5 5 5 5 5
6 6 6 6 6 6 6 6 6 6 6
7 7 7 7 7 7 7 7 7 7 7
8 8 8 8 8 8 8 8 8 8 8
9 9 9 9 9 9 9 9 9 9 9
a b c d e f g h i j k
1 1 1 1 1 1 1 12 12 12 12
2 2 2 2 2 2 2 14 14 14 14
3 3 3 3 3 3 3 16 16 16 16
4 4 4 4 4 4 4 18 18 18 18
5 5 5 5 5 5 5 20 20 20 20
6 6 6 6 6 6 6 22 22 22 22
7 7 7 7 7 7 7 24 24 24 24
8 8 8 8 8 8 8 26 26 26 26
9 9 9 9 9 9 9 28 28 28 28
new2.csv:
a b c d e f g h i j k
11 11 11 11 11 11 11 11 11 11 11
12 12 12 12 12 12 12 12 12 12 12
13 13 13 13 13 13 13 13 13 13 13
14 14 14 14 14 14 14 14 14 14 14
15 15 15 15 15 15 15 15 15 15 15
16 16 16 16 16 16 16 16 16 16 16
17 17 17 17 17 17 17 17 17 17 17
18 18 18 18 18 18 18 18 18 18 18
19 19 19 19 19 19 19 19 19 19 19
现在,我想求这两个文件中“h,I,j,k”的每个元素的和,然后用这个新的和替换最后4列的文件
修改后的new-1.csv:
a b c d e f g h i j k
1 1 1 1 1 1 1 1 1 1 1
2 2 2 2 2 2 2 2 2 2 2
3 3 3 3 3 3 3 3 3 3 3
4 4 4 4 4 4 4 4 4 4 4
5 5 5 5 5 5 5 5 5 5 5
6 6 6 6 6 6 6 6 6 6 6
7 7 7 7 7 7 7 7 7 7 7
8 8 8 8 8 8 8 8 8 8 8
9 9 9 9 9 9 9 9 9 9 9
a b c d e f g h i j k
1 1 1 1 1 1 1 12 12 12 12
2 2 2 2 2 2 2 14 14 14 14
3 3 3 3 3 3 3 16 16 16 16
4 4 4 4 4 4 4 18 18 18 18
5 5 5 5 5 5 5 20 20 20 20
6 6 6 6 6 6 6 22 22 22 22
7 7 7 7 7 7 7 24 24 24 24
8 8 8 8 8 8 8 26 26 26 26
9 9 9 9 9 9 9 28 28 28 28
修改后的new-2.csv:
a b c d e f g h i j k
11 11 11 11 11 11 11 12 12 12 12
12 12 12 12 12 12 12 14 14 14 14
13 13 13 13 13 13 13 16 16 16 16
14 14 14 14 14 14 14 18 18 18 18
15 15 15 15 15 15 15 20 20 20 20
16 16 16 16 16 16 16 22 22 22 22
17 17 17 17 17 17 17 24 24 24 24
18 18 18 18 18 18 18 26 26 26 26
19 19 19 19 19 19 19 28 28 28 28
我想我应该用Panda或numpy来做这个,但不知道怎么做。如有任何建议/提示,将不胜感激 使用
read\u csv
加载csv后,您可以将最后4列添加到一起,然后覆盖它们:
In [10]:
total = df[df.columns[-4:]].values + df1[df1.columns[-4:]].values
total
Out[10]:
array([[12, 12, 12, 12],
[14, 14, 14, 14],
[16, 16, 16, 16],
[18, 18, 18, 18],
[20, 20, 20, 20],
[22, 22, 22, 22],
[24, 24, 24, 24],
[26, 26, 26, 26],
[28, 28, 28, 28]], dtype=int64)
In [12]:
df[df.columns[-4:]] = total
df1[df1.columns[-4:]] = total
df
Out[12]:
a b c d e f g h i j k
0 1 1 1 1 1 1 1 12 12 12 12
1 2 2 2 2 2 2 2 14 14 14 14
2 3 3 3 3 3 3 3 16 16 16 16
3 4 4 4 4 4 4 4 18 18 18 18
4 5 5 5 5 5 5 5 20 20 20 20
5 6 6 6 6 6 6 6 22 22 22 22
6 7 7 7 7 7 7 7 24 24 24 24
7 8 8 8 8 8 8 8 26 26 26 26
8 9 9 9 9 9 9 9 28 28 28 28
In [13]:
df1
Out[13]:
a b c d e f g h i j k
0 11 11 11 11 11 11 11 12 12 12 12
1 12 12 12 12 12 12 12 14 14 14 14
2 13 13 13 13 13 13 13 16 16 16 16
3 14 14 14 14 14 14 14 18 18 18 18
4 15 15 15 15 15 15 15 20 20 20 20
5 16 16 16 16 16 16 16 22 22 22 22
6 17 17 17 17 17 17 17 24 24 24 24
7 18 18 18 18 18 18 18 26 26 26 26
8 19 19 19 19 19 19 19 28 28 28 28
我们需要在这里调用属性.values
,以返回np数组,因为否则它将尝试在索引上对齐,而在本例中,索引不对齐
覆盖后,调用df.to\u csv(文件路径)
和df1.to\u csv(文件路径)
对于您的8个dfs,您可以在其上循环并在循环时聚合:
# take a copy of the firt df's last 4 columns
total = df_list[0]
total = total[total.columns[-4:]].values
for df in df_list[1:]:
total += df[df.columns[-4:]].values
然后只需再次循环dfs即可覆盖:
for df in df_list:
df[df.columns[-4:]] = total
然后再次使用
写入csv
在使用读取csv
加载csv后,您可以将最后4列添加到一起,然后覆盖它们:
In [10]:
total = df[df.columns[-4:]].values + df1[df1.columns[-4:]].values
total
Out[10]:
array([[12, 12, 12, 12],
[14, 14, 14, 14],
[16, 16, 16, 16],
[18, 18, 18, 18],
[20, 20, 20, 20],
[22, 22, 22, 22],
[24, 24, 24, 24],
[26, 26, 26, 26],
[28, 28, 28, 28]], dtype=int64)
In [12]:
df[df.columns[-4:]] = total
df1[df1.columns[-4:]] = total
df
Out[12]:
a b c d e f g h i j k
0 1 1 1 1 1 1 1 12 12 12 12
1 2 2 2 2 2 2 2 14 14 14 14
2 3 3 3 3 3 3 3 16 16 16 16
3 4 4 4 4 4 4 4 18 18 18 18
4 5 5 5 5 5 5 5 20 20 20 20
5 6 6 6 6 6 6 6 22 22 22 22
6 7 7 7 7 7 7 7 24 24 24 24
7 8 8 8 8 8 8 8 26 26 26 26
8 9 9 9 9 9 9 9 28 28 28 28
In [13]:
df1
Out[13]:
a b c d e f g h i j k
0 11 11 11 11 11 11 11 12 12 12 12
1 12 12 12 12 12 12 12 14 14 14 14
2 13 13 13 13 13 13 13 16 16 16 16
3 14 14 14 14 14 14 14 18 18 18 18
4 15 15 15 15 15 15 15 20 20 20 20
5 16 16 16 16 16 16 16 22 22 22 22
6 17 17 17 17 17 17 17 24 24 24 24
7 18 18 18 18 18 18 18 26 26 26 26
8 19 19 19 19 19 19 19 28 28 28 28
我们需要在这里调用属性.values
,以返回np数组,因为否则它将尝试在索引上对齐,而在本例中,索引不对齐
覆盖后,调用df.to\u csv(文件路径)
和df1.to\u csv(文件路径)
对于您的8个dfs,您可以在其上循环并在循环时聚合:
# take a copy of the firt df's last 4 columns
total = df_list[0]
total = total[total.columns[-4:]].values
for df in df_list[1:]:
total += df[df.columns[-4:]].values
然后只需再次循环dfs即可覆盖:
for df in df_list:
df[df.columns[-4:]] = total
然后再次使用
写入csv
在使用读取csv
加载csv后,您可以将最后4列添加到一起,然后覆盖它们:
In [10]:
total = df[df.columns[-4:]].values + df1[df1.columns[-4:]].values
total
Out[10]:
array([[12, 12, 12, 12],
[14, 14, 14, 14],
[16, 16, 16, 16],
[18, 18, 18, 18],
[20, 20, 20, 20],
[22, 22, 22, 22],
[24, 24, 24, 24],
[26, 26, 26, 26],
[28, 28, 28, 28]], dtype=int64)
In [12]:
df[df.columns[-4:]] = total
df1[df1.columns[-4:]] = total
df
Out[12]:
a b c d e f g h i j k
0 1 1 1 1 1 1 1 12 12 12 12
1 2 2 2 2 2 2 2 14 14 14 14
2 3 3 3 3 3 3 3 16 16 16 16
3 4 4 4 4 4 4 4 18 18 18 18
4 5 5 5 5 5 5 5 20 20 20 20
5 6 6 6 6 6 6 6 22 22 22 22
6 7 7 7 7 7 7 7 24 24 24 24
7 8 8 8 8 8 8 8 26 26 26 26
8 9 9 9 9 9 9 9 28 28 28 28
In [13]:
df1
Out[13]:
a b c d e f g h i j k
0 11 11 11 11 11 11 11 12 12 12 12
1 12 12 12 12 12 12 12 14 14 14 14
2 13 13 13 13 13 13 13 16 16 16 16
3 14 14 14 14 14 14 14 18 18 18 18
4 15 15 15 15 15 15 15 20 20 20 20
5 16 16 16 16 16 16 16 22 22 22 22
6 17 17 17 17 17 17 17 24 24 24 24
7 18 18 18 18 18 18 18 26 26 26 26
8 19 19 19 19 19 19 19 28 28 28 28
我们需要在这里调用属性.values
,以返回np数组,因为否则它将尝试在索引上对齐,而在本例中,索引不对齐
覆盖后,调用df.to\u csv(文件路径)
和df1.to\u csv(文件路径)
对于您的8个dfs,您可以在其上循环并在循环时聚合:
# take a copy of the firt df's last 4 columns
total = df_list[0]
total = total[total.columns[-4:]].values
for df in df_list[1:]:
total += df[df.columns[-4:]].values
然后只需再次循环dfs即可覆盖:
for df in df_list:
df[df.columns[-4:]] = total
然后再次使用
写入csv
在使用读取csv
加载csv后,您可以将最后4列添加到一起,然后覆盖它们:
In [10]:
total = df[df.columns[-4:]].values + df1[df1.columns[-4:]].values
total
Out[10]:
array([[12, 12, 12, 12],
[14, 14, 14, 14],
[16, 16, 16, 16],
[18, 18, 18, 18],
[20, 20, 20, 20],
[22, 22, 22, 22],
[24, 24, 24, 24],
[26, 26, 26, 26],
[28, 28, 28, 28]], dtype=int64)
In [12]:
df[df.columns[-4:]] = total
df1[df1.columns[-4:]] = total
df
Out[12]:
a b c d e f g h i j k
0 1 1 1 1 1 1 1 12 12 12 12
1 2 2 2 2 2 2 2 14 14 14 14
2 3 3 3 3 3 3 3 16 16 16 16
3 4 4 4 4 4 4 4 18 18 18 18
4 5 5 5 5 5 5 5 20 20 20 20
5 6 6 6 6 6 6 6 22 22 22 22
6 7 7 7 7 7 7 7 24 24 24 24
7 8 8 8 8 8 8 8 26 26 26 26
8 9 9 9 9 9 9 9 28 28 28 28
In [13]:
df1
Out[13]:
a b c d e f g h i j k
0 11 11 11 11 11 11 11 12 12 12 12
1 12 12 12 12 12 12 12 14 14 14 14
2 13 13 13 13 13 13 13 16 16 16 16
3 14 14 14 14 14 14 14 18 18 18 18
4 15 15 15 15 15 15 15 20 20 20 20
5 16 16 16 16 16 16 16 22 22 22 22
6 17 17 17 17 17 17 17 24 24 24 24
7 18 18 18 18 18 18 18 26 26 26 26
8 19 19 19 19 19 19 19 28 28 28 28
我们需要在这里调用属性.values
,以返回np数组,因为否则它将尝试在索引上对齐,而在本例中,索引不对齐
覆盖后,调用df.to\u csv(文件路径)
和df1.to\u csv(文件路径)
对于您的8个dfs,您可以在其上循环并在循环时聚合:
# take a copy of the firt df's last 4 columns
total = df_list[0]
total = total[total.columns[-4:]].values
for df in df_list[1:]:
total += df[df.columns[-4:]].values
然后只需再次循环dfs即可覆盖:
for df in df_list:
df[df.columns[-4:]] = total
然后用
再次写出来,以_csv
您只需使用numpy就可以做到这一点。
import numpy as np
# list of all the files
file_list = ['foo.csv','bar.csv','baz.csv'] # all 8 files
col_names = ['a','b','c','d','e','f'] # all the names till z if necessary as the first row, else skip this
# initializing a numpy array, for containing sum from last 4 columns
add_cols = np.zeros((600,4))
# iterating over all .csv files
for file in file_list :
# skiprows will skip the first row and usecols will get values in last 4 cols
temp = np.loadtxt(file, skiprows=1, delimiter=',' , usecols = (22,23,24,25) )
add_cols = np.add(temp,add_cols)
# now again overwriting all the files, substituting the last 4 columns with the sum
for file in file_list :
#loading the content from file in temp
temp = np.loadtxt(file, skiprows=1, delimiter=',')
temp[:,[22,23,24,25]] = add_cols
# writing the column names first
with open(file,'w') as p:
p.write(','.join(col_names)+'\n')
# now appending final values in temp to the file as csv
with open(file,'a') as p:
np.savetxt(p,temp,delimiter=",",fmt="%i")
现在,如果文件不是逗号分隔的,而是空格分隔的,请从所有函数中删除
分隔符
选项,因为默认情况下,分隔符被视为空格
。也相应地加入第一列。只需使用numpy即可完成此操作。
import numpy as np
# list of all the files
file_list = ['foo.csv','bar.csv','baz.csv'] # all 8 files
col_names = ['a','b','c','d','e','f'] # all the names till z if necessary as the first row, else skip this
# initializing a numpy array, for containing sum from last 4 columns
add_cols = np.zeros((600,4))
# iterating over all .csv files
for file in file_list :
# skiprows will skip the first row and usecols will get values in last 4 cols
temp = np.loadtxt(file, skiprows=1, delimiter=',' , usecols = (22,23,24,25) )
add_cols = np.add(temp,add_cols)
# now again overwriting all the files, substituting the last 4 columns with the sum
for file in file_list :
#loading the content from file in temp
temp = np.loadtxt(file, skiprows=1, delimiter=',')
temp[:,[22,23,24,25]] = add_cols
# writing the column names first
with open(file,'w') as p:
p.write(','.join(col_names)+'\n')
# now appending final values in temp to the file as csv
with open(file,'a') as p:
np.savetxt(p,temp,delimiter=",",fmt="%i")
现在,如果文件不是逗号分隔的,而是空格分隔的,请从所有函数中删除
分隔符
选项,因为默认情况下,分隔符被视为空格
。也相应地加入第一列。只需使用numpy即可完成此操作。
import numpy as np
# list of all the files
file_list = ['foo.csv','bar.csv','baz.csv'] # all 8 files
col_names = ['a','b','c','d','e','f'] # all the names till z if necessary as the first row, else skip this
# initializing a numpy array, for containing sum from last 4 columns
add_cols = np.zeros((600,4))
# iterating over all .csv files
for file in file_list :
# skiprows will skip the first row and usecols will get values in last 4 cols
temp = np.loadtxt(file, skiprows=1, delimiter=',' , usecols = (22,23,24,25) )
add_cols = np.add(temp,add_cols)
# now again overwriting all the files, substituting the last 4 columns with the sum
for file in file_list :
#loading the content from file in temp
temp = np.loadtxt(file, skiprows=1, delimiter=',')
temp[:,[22,23,24,25]] = add_cols
# writing the column names first
with open(file,'w') as p:
p.write(','.join(col_names)+'\n')
# now appending final values in temp to the file as csv
with open(file,'a') as p:
np.savetxt(p,temp,delimiter=",",fmt="%i")
现在,如果文件不是逗号分隔的,而是空格分隔的,请从所有函数中删除
分隔符
选项,因为默认情况下,分隔符被视为空格
。也相应地加入第一列。只需使用numpy即可完成此操作。
import numpy as np
# list of all the files
file_list = ['foo.csv','bar.csv','baz.csv'] # all 8 files
col_names = ['a','b','c','d','e','f'] # all the names till z if necessary as the first row, else skip this
# initializing a numpy array, for containing sum from last 4 columns
add_cols = np.zeros((600,4))
# iterating over all .csv files
for file in file_list :
# skiprows will skip the first row and usecols will get values in last 4 cols
temp = np.loadtxt(file, skiprows=1, delimiter=',' , usecols = (22,23,24,25) )
add_cols = np.add(temp,add_cols)
# now again overwriting all the files, substituting the last 4 columns with the sum
for file in file_list :
#loading the content from file in temp
temp = np.loadtxt(file, skiprows=1, delimiter=',')
temp[:,[22,23,24,25]] = add_cols
# writing the column names first
with open(file,'w') as p:
p.write(','.join(col_names)+'\n')
# now appending final values in temp to the file as csv
with open(file,'a') as p:
np.savetxt(p,temp,delimiter=",",fmt="%i")
现在,如果文件不是逗号分隔的,而是空格分隔的,请从所有函数中删除分隔符
选项,因为默认情况下,分隔符被视为空格
。也相应地连接第一列