Python 向数据帧追加数据_Python_Pandas

Python 向数据帧追加数据

python pandas

Python 向数据帧追加数据,python,pandas,Python,Pandas,我有一个目录，其中包含一些csv文件，名为： results_roll_3_oe_2016-02-04 results_roll_2_oe_2016-01-28 2016-02-04年3月的结果如下： date day_performance 2016-01-26 3.714011839374111 2016-01-27 -8.402334555591418 2016-01-28 -41.09889373400086 date day_perf

我有一个目录，其中包含一些csv文件，名为：

results_roll_3_oe_2016-02-04
results_roll_2_oe_2016-01-28

2016-02-04年3月的结果如下：

date           day_performance
2016-01-26   3.714011839374111
2016-01-27  -8.402334555591418
2016-01-28  -41.09889373400086

date           day_performance
2016-02-02   52.07647107113144
2016-02-03    -1.7503249876724
2016-02-04  -158.1667860104882

    date    day_performance
0   02/02/2016  52.07647107
1   03/02/2016  -1.750324988
2   04/02/2016  -158.166786

2016-01-28年的结果如下：

date           day_performance
2016-01-26   3.714011839374111
2016-01-27  -8.402334555591418
2016-01-28  -41.09889373400086

date           day_performance
2016-02-02   52.07647107113144
2016-02-03    -1.7503249876724
2016-02-04  -158.1667860104882

    date    day_performance
0   02/02/2016  52.07647107
1   03/02/2016  -1.750324988
2   04/02/2016  -158.166786

（事实上，有更多的文件可以这样做）。我试图查看将结果粘贴在一起的目录，并将其保存到一个数据帧中（因此我的最终输出如下所示）：

我已经编写了一些代码（如下），可以循环遍历文件，并尝试将结果文件一起附加到一个新的数据帧（

dfs

）中，但我得到了以下输出：

date           day_performance
2016-02-02   52.07647107113144
2016-02-03    -1.7503249876724
2016-02-04  -158.1667860104882
date           day_performance
2016-02-02   52.07647107113144
2016-02-03    -1.7503249876724
2016-02-04  -158.1667860104882

在这里，它看起来将获取result_roll_2，并将数据与头一起附加两次

我的代码如下：

def main():

    dfs = pd.DataFrame()
    ResultsDataPath = 'C:/Users/stacey/Documents/data/VwapBacktestResults/'
    print(ResultsDataPath)

    allfiles = glob.glob(os.path.join(ResultsDataPath, "*oe*"))
    for fname in allfiles:    
        df = pd.read_csv(fname, header=None, usecols=[1,2], 
                        parse_dates=[0], dayfirst=True,
                        index_col=[0], names=['date', 'day_performance'])
        print(df)

        dfs = df.append(df,ignore_index=False)

我的准确CSV（结果滚动2周，2016-01-28）如下所示：

    date    day_performance
0   26/01/2016  3.714011839
1   27/01/2016  -8.402334556
2   28/01/2016  -41.09889373

我的CSV（结果滚动3次运行）如下所示：

date           day_performance
2016-01-26   3.714011839374111
2016-01-27  -8.402334555591418
2016-01-28  -41.09889373400086

date           day_performance
2016-02-02   52.07647107113144
2016-02-03    -1.7503249876724
2016-02-04  -158.1667860104882

    date    day_performance
0   02/02/2016  52.07647107
1   03/02/2016  -1.750324988
2   04/02/2016  -158.166786

它们都是MS excel逗号分隔值文件

更新：您的CSV文件以空格分隔或制表符分隔，因此您必须指定它。除此之外，所有CSV文件都有一个标题行，只有两列，因此不需要使用

usecols

，

标题

，

名称

参数：

In [100]: fmask = r'D:\temp\.data\results_roll_*'

In [101]: df = get_merged_csv(glob.glob(fmask),
   .....:                     delim_whitespace=True,
   .....:                     index_col=0)

In [102]: df['date'] = pd.to_datetime(df['date'], dayfirst=True)

In [103]: df
Out[103]:
        date  day_performance
0 2016-01-26         3.714012
1 2016-01-27        -8.402335
2 2016-01-28       -41.098894
3 2016-02-02        52.076471
4 2016-02-03        -1.750325
5 2016-02-04      -158.166786

旧答案：

试试这个：

import glob
import pandas as pd

def get_merged_csv(flist, **kwargs):
    return pd.concat([pd.read_csv(f, **kwargs) for f in flist], ignore_index=True)

fmask = '/path/to/results_roll_*.csv'

df = get_merged_csv(glob.glob(fmask),
                    header=None, usecols=[1,2], 
                    parse_dates=[0], dayfirst=True,
                    index_col=[0], names=['date', 'day_performance'])

谢谢，在输出中获得两次标题（在数据帧中间的第二个文件的顶部，然后在输出的顶部），有什么想法吗？@Stacey，你所有的CSV都有一个标题行：[日期，日期\性能]？你能以与真实CSV完全相同的格式（包括标题行）发布两个CSV的3-5个第一个样本行吗？谢谢文章底部的MaxU文件question@Stacey，请参阅我答案中的更新部分