Python 将两个csv文件的值和结果相加为一个新文件_Python

Python 将两个csv文件的值和结果相加为一个新文件

python

Python 将两个csv文件的值和结果相加为一个新文件,python,Python,我需要一些帮助，但不知道如何开始基本上，我有两个csv输入文件（来自两个不同的节点），我想将这些值和结果相加到elasticsearch 非常感谢您的帮助 CSV1 CSV2 结果-->创建elasticsearch索引 node link rate-in rate-out allnode link1 30 30 allnode link2 80 120 allnode link3 120 100 谢谢将CSV作为数据帧导入python后 import pand

我需要一些帮助，但不知道如何开始

基本上，我有两个csv输入文件（来自两个不同的节点），我想将这些值和结果相加到elasticsearch

非常感谢您的帮助

CSV1

CSV2

结果-->创建elasticsearch索引

node    link    rate-in rate-out
allnode link1   30  30
allnode link2   80  120
allnode link3   120 100

谢谢

将CSV作为数据帧导入python后

import pandas
df1 = pandas.DataFrame({'node':['node1', 'node1', 'node1'], 'link': ['link1', 'link2', 'link3'], 'rate-in': [10, 30, 40], 'rate-out': [20, 50, 60]})
df2 = pandas.DataFrame({'node':['node2', 'node2', 'node2'], 'link': ['link1', 'link2', 'link3'], 'rate-in': [20, 50, 80], 'rate-out': [10, 70, 40]})

result = pandas.concat([df1, df2], axis=0).groupby('link').agg({'node': lambda x : 'allNodes', 'rate-in': 'sum', 'rate-out': 'sum'}).reset_index(drop=False)
result

输出：

link    node    rate-in rate-out
0   link1   allNodes    30  30
1   link2   allNodes    80  120
2   link3   allNodes    120 100

使用内置

csv

模块

Ex:

import csv

with open("CSV1.csv") as csvfile_1, open("CSV2.csv") as csvfile_2:
    reader_1 = csv.DictReader(csvfile_1)
    reader_2 = csv.DictReader(csvfile_2)
    result = []
    data_1 = {"{}_{}".format(row["node"], row["link"]): (int(row["rate-in"]), int(row["rate-out"])) for row in reader_1}
    for row in reader_2:
        key = "{}_{}".format(row["node"], row["link"])
        if key in data_1:
            rate_in, rate_out = data_1[key]
            result.append({"node": "allnode", "link": row["link"], "rate-in": rate_in + int(row["rate-in"]),  "rate-out": rate_out + int(row["rate-out"])})

with open("outcsv.csv", "w") as outfile:
    writer = csv.DictWriter(outfile, fieldnames=['node', 'link', 'rate-in', 'rate-out'])  
    writer.writeheader()
    writer.writerows(result)

Pandas对读取和操作CSV文件有很好的支持


    import pandas as pd

    df = pd.concat([
        pd.read_csv('csv1.csv'),
        pd.read_csv('csv2.csv')
    ])

    result=df.groupby('link', as_index=False).sum()
    result['node'] = 'allnode'

    result.to_csv('result.csv')

其中csv1.csv是：

node,link,rate-in,rate-out node1,link1,10,20 node1,link2,30,50 node1,link3,40,60 更改第一个和第二个文件的file1.csv、file2.csv中的值

将outfile.csv添加到所需的输出文件中

如果数据集很大，则可以直接从csv文件中读取并添加输出所需的列的值，而不是显式地硬编码每个值

import pandas as pd
df1 = pd.read_csv('csv1.csv', header = 0) # place your csv1 in df1
df2 = pd.read_csv('csv2.csv', header = 0) # place your csv2 in df2

rate_in_1 = df1.iloc[:,2].values.tolist() #store the values of the 3rd column from csv1 to a list
rate_out_1 = df1.iloc[:,3].values.tolist() #store the values of the 4th column from csv1 to a list

rate_in_2 = df2.iloc[:,2].values.tolist() #store the values of the 3rd column from csv1 to a list
rate_out_2 = df2.iloc[:,3].values.tolist() #store the values of the 4th column from csv1 to a list

rate_in_total = [x+y for x, y in zip(rate_in_1, rate_in_2)] # add the values of 2 rate in lists into rate_in_total list
rate_out_total = [x+y for x, y in zip(rate_out_1, rate_out_2] # add the values of 2 rate out lists into rate_out_total list


#Now to output/concatenate this into 1 DataFrame:

final_df = pandas.Dataframe()
final_df['Node'] = ['allnode' for x in rate_in_total]
final_df['Link'] = df1.iloc[:,1].values.tolist()
final_df['rate-in'] = rate_in_total
final_df['rate-out'] = rate_out_total

如果您想将

final_df

写入csv文件，只需使用pandas提供的

.to_csv（）

方法即可

希望这有帮助：）

小问题，在result.csv中，节点列放在末尾。。是否可以将此列作为第一列？ node,link,rate-in,rate-out node1,link1,10,20 node1,link2,30,50 node1,link3,40,60 node,link,rate-in,rate-out node2,link1,20,10 node2,link2,50,70 node2,link3,80,40 link,rate-in,rate-out,node link1,30,30,allnode link2,80,120,allnode link3,120,100,allnode


    result[[
        'node','link','rate-in','rate-out'
    ]].to_csv('result.csv', index=False)

import csv
import sys

nodeList = {}

with open('file1.csv', 'r') as csv1,open('file2.csv', 'r') as csv2: 
    # creating a csv reader object 
    csvreader1 = csv.reader(csv1,delimiter='\t')
    next(csvreader1)
    csvreader2 = csv.reader(csv2,delimiter='\t')
    header = next(csvreader2)

    for row in csvreader1:
        # print(len(row),row)

        nodeList[row[1]] = ['allnode',row[1],int(row[2]),int(row[3])]

    for row in csvreader2:
        if row[1] in nodeList:
            nodeList[row[1]][2]+=int(row[2])
            nodeList[row[1]][3]+=int(row[3])
        else:
            nodeList[row[1]] = ['allnode',row[1],int(row[2]),int(row[3])]

with open('outfile.csv','wb') as out:
    csvwriter = csv.writer(out,delimiter='\t')

    csvwriter.writerow(header)
    for v in nodeList.values():
        csvwriter.writerow(v)

import pandas as pd
df1 = pd.read_csv('csv1.csv', header = 0) # place your csv1 in df1
df2 = pd.read_csv('csv2.csv', header = 0) # place your csv2 in df2

rate_in_1 = df1.iloc[:,2].values.tolist() #store the values of the 3rd column from csv1 to a list
rate_out_1 = df1.iloc[:,3].values.tolist() #store the values of the 4th column from csv1 to a list

rate_in_2 = df2.iloc[:,2].values.tolist() #store the values of the 3rd column from csv1 to a list
rate_out_2 = df2.iloc[:,3].values.tolist() #store the values of the 4th column from csv1 to a list

rate_in_total = [x+y for x, y in zip(rate_in_1, rate_in_2)] # add the values of 2 rate in lists into rate_in_total list
rate_out_total = [x+y for x, y in zip(rate_out_1, rate_out_2] # add the values of 2 rate out lists into rate_out_total list


#Now to output/concatenate this into 1 DataFrame:

final_df = pandas.Dataframe()
final_df['Node'] = ['allnode' for x in rate_in_total]
final_df['Link'] = df1.iloc[:,1].values.tolist()
final_df['rate-in'] = rate_in_total
final_df['rate-out'] = rate_out_total