Python 将两个csv文件的值和结果相加为一个新文件
我需要一些帮助,但不知道如何开始 基本上,我有两个csv输入文件(来自两个不同的节点),我想将这些值和结果相加到elasticsearch 非常感谢您的帮助 CSV1 CSV2 结果-->创建elasticsearch索引Python 将两个csv文件的值和结果相加为一个新文件,python,Python,我需要一些帮助,但不知道如何开始 基本上,我有两个csv输入文件(来自两个不同的节点),我想将这些值和结果相加到elasticsearch 非常感谢您的帮助 CSV1 CSV2 结果-->创建elasticsearch索引 node link rate-in rate-out allnode link1 30 30 allnode link2 80 120 allnode link3 120 100 谢谢 将CSV作为数据帧导入python后 import pand
node link rate-in rate-out
allnode link1 30 30
allnode link2 80 120
allnode link3 120 100
谢谢 将CSV作为数据帧导入python后
import pandas
df1 = pandas.DataFrame({'node':['node1', 'node1', 'node1'], 'link': ['link1', 'link2', 'link3'], 'rate-in': [10, 30, 40], 'rate-out': [20, 50, 60]})
df2 = pandas.DataFrame({'node':['node2', 'node2', 'node2'], 'link': ['link1', 'link2', 'link3'], 'rate-in': [20, 50, 80], 'rate-out': [10, 70, 40]})
result = pandas.concat([df1, df2], axis=0).groupby('link').agg({'node': lambda x : 'allNodes', 'rate-in': 'sum', 'rate-out': 'sum'}).reset_index(drop=False)
result
输出:
link node rate-in rate-out
0 link1 allNodes 30 30
1 link2 allNodes 80 120
2 link3 allNodes 120 100
使用内置
csv
模块
Ex:
import csv
with open("CSV1.csv") as csvfile_1, open("CSV2.csv") as csvfile_2:
reader_1 = csv.DictReader(csvfile_1)
reader_2 = csv.DictReader(csvfile_2)
result = []
data_1 = {"{}_{}".format(row["node"], row["link"]): (int(row["rate-in"]), int(row["rate-out"])) for row in reader_1}
for row in reader_2:
key = "{}_{}".format(row["node"], row["link"])
if key in data_1:
rate_in, rate_out = data_1[key]
result.append({"node": "allnode", "link": row["link"], "rate-in": rate_in + int(row["rate-in"]), "rate-out": rate_out + int(row["rate-out"])})
with open("outcsv.csv", "w") as outfile:
writer = csv.DictWriter(outfile, fieldnames=['node', 'link', 'rate-in', 'rate-out'])
writer.writeheader()
writer.writerows(result)
Pandas对读取和操作CSV文件有很好的支持
import pandas as pd
df = pd.concat([
pd.read_csv('csv1.csv'),
pd.read_csv('csv2.csv')
])
result=df.groupby('link', as_index=False).sum()
result['node'] = 'allnode'
result.to_csv('result.csv')
其中csv1.csv是:
node,link,rate-in,rate-out
node1,link1,10,20
node1,link2,30,50
node1,link3,40,60
更改第一个和第二个文件的file1.csv、file2.csv中的值
将outfile.csv添加到所需的输出文件中如果数据集很大,则可以直接从csv文件中读取并添加输出所需的列的值,而不是显式地硬编码每个值
import pandas as pd
df1 = pd.read_csv('csv1.csv', header = 0) # place your csv1 in df1
df2 = pd.read_csv('csv2.csv', header = 0) # place your csv2 in df2
rate_in_1 = df1.iloc[:,2].values.tolist() #store the values of the 3rd column from csv1 to a list
rate_out_1 = df1.iloc[:,3].values.tolist() #store the values of the 4th column from csv1 to a list
rate_in_2 = df2.iloc[:,2].values.tolist() #store the values of the 3rd column from csv1 to a list
rate_out_2 = df2.iloc[:,3].values.tolist() #store the values of the 4th column from csv1 to a list
rate_in_total = [x+y for x, y in zip(rate_in_1, rate_in_2)] # add the values of 2 rate in lists into rate_in_total list
rate_out_total = [x+y for x, y in zip(rate_out_1, rate_out_2] # add the values of 2 rate out lists into rate_out_total list
#Now to output/concatenate this into 1 DataFrame:
final_df = pandas.Dataframe()
final_df['Node'] = ['allnode' for x in rate_in_total]
final_df['Link'] = df1.iloc[:,1].values.tolist()
final_df['rate-in'] = rate_in_total
final_df['rate-out'] = rate_out_total
如果您想将final_df
写入csv文件,只需使用pandas提供的.to_csv()
方法即可
希望这有帮助:)小问题,在result.csv中,节点列放在末尾。。是否可以将此列作为第一列? node,link,rate-in,rate-out node1,link1,10,20 node1,link2,30,50 node1,link3,40,60 node,link,rate-in,rate-out node2,link1,20,10 node2,link2,50,70 node2,link3,80,40 link,rate-in,rate-out,node link1,30,30,allnode link2,80,120,allnode link3,120,100,allnode
result[[
'node','link','rate-in','rate-out'
]].to_csv('result.csv', index=False)
import csv
import sys
nodeList = {}
with open('file1.csv', 'r') as csv1,open('file2.csv', 'r') as csv2:
# creating a csv reader object
csvreader1 = csv.reader(csv1,delimiter='\t')
next(csvreader1)
csvreader2 = csv.reader(csv2,delimiter='\t')
header = next(csvreader2)
for row in csvreader1:
# print(len(row),row)
nodeList[row[1]] = ['allnode',row[1],int(row[2]),int(row[3])]
for row in csvreader2:
if row[1] in nodeList:
nodeList[row[1]][2]+=int(row[2])
nodeList[row[1]][3]+=int(row[3])
else:
nodeList[row[1]] = ['allnode',row[1],int(row[2]),int(row[3])]
with open('outfile.csv','wb') as out:
csvwriter = csv.writer(out,delimiter='\t')
csvwriter.writerow(header)
for v in nodeList.values():
csvwriter.writerow(v)
import pandas as pd
df1 = pd.read_csv('csv1.csv', header = 0) # place your csv1 in df1
df2 = pd.read_csv('csv2.csv', header = 0) # place your csv2 in df2
rate_in_1 = df1.iloc[:,2].values.tolist() #store the values of the 3rd column from csv1 to a list
rate_out_1 = df1.iloc[:,3].values.tolist() #store the values of the 4th column from csv1 to a list
rate_in_2 = df2.iloc[:,2].values.tolist() #store the values of the 3rd column from csv1 to a list
rate_out_2 = df2.iloc[:,3].values.tolist() #store the values of the 4th column from csv1 to a list
rate_in_total = [x+y for x, y in zip(rate_in_1, rate_in_2)] # add the values of 2 rate in lists into rate_in_total list
rate_out_total = [x+y for x, y in zip(rate_out_1, rate_out_2] # add the values of 2 rate out lists into rate_out_total list
#Now to output/concatenate this into 1 DataFrame:
final_df = pandas.Dataframe()
final_df['Node'] = ['allnode' for x in rate_in_total]
final_df['Link'] = df1.iloc[:,1].values.tolist()
final_df['rate-in'] = rate_in_total
final_df['rate-out'] = rate_out_total