Python 保留在大熊猫中，比循环或应用更快_Python_Pandas_For Loop_Apply_Retain

Python 保留在大熊猫中，比循环或应用更快

python pandas for-loop

Python 保留在大熊猫中，比循环或应用更快,python,pandas,for-loop,apply,retain,Python,Pandas,For Loop,Apply,Retain,我有一个像这样的问题（）。在处理时间序列数据时，算法通常需要动态引用最后计算的记录例如，我有一些股票交易记录，我想计算我持有的股票的“平均成本”。我能想到的唯一解决方案是迭代目标数据帧。这感觉不像是在使用熊猫数据帧的强度伪造数据： import numpy as np import pandas as pd aapl = pd.read_csv('https://raw.githubusercontent.com/ktc312/pandas-questions/master/AAPL_ex

我有一个像这样的问题（）。在处理时间序列数据时，算法通常需要动态引用最后计算的记录

例如，我有一些股票交易记录，我想计算我持有的股票的“平均成本”。我能想到的唯一解决方案是迭代目标数据帧。这感觉不像是在使用熊猫数据帧的强度

伪造数据：

import numpy as np
import pandas as pd

aapl = pd.read_csv('https://raw.githubusercontent.com/ktc312/pandas-questions/master/AAPL_exmaple.csv', parse_dates=['Date'])
print aapl

        Date  Quantity     Price
0 2017-01-10      1000  117.2249
1 2017-02-10      -500  130.5928
2 2017-03-10      1500  137.5316
3 2017-04-10     -2000  141.5150
4 2017-05-10       500  151.4884
5 2017-06-09       500  147.8657
6 2017-07-10       500  143.9750
7 2017-08-10     -1000  154.7636
8 2017-09-11      -200  160.9215
9 2017-10-10      1000  155.3416

一些需要的变量：

aapl['Total Shares'] = aapl['Quantity'].cumsum()
aapl['Cost'] = aapl['Quantity']*aapl['Price']
print apple

        Date  Quantity     Price  Total Shares       Cost
0 2017-01-10      1000  117.2249          1000  117224.90
1 2017-02-10      -500  130.5928           500  -65296.40
2 2017-03-10      1500  137.5316          2000  206297.40
3 2017-04-10     -2000  141.5150             0 -283030.00
4 2017-05-10       500  151.4884           500   75744.20
5 2017-06-09       500  147.8657          1000   73932.85
6 2017-07-10       500  143.9750          1500   71987.50
7 2017-08-10     -1000  154.7636           500 -154763.60
8 2017-09-11      -200  160.9215           300  -32184.30
9 2017-10-10      1000  155.3416          1300  155341.60

循环遍历数据以获得平均成本：

def get_ave_cost(df):
    for index, row in df.iterrows():
        if index == 0:
            df.loc[index,'Ave Cost'] = row['Price']
        elif row['Total Shares'] == 0:
            df.loc[index,'Ave Cost'] = 0.0
        else:
            if row['Quantity'] > 0:
                df.loc[index,'Ave Cost'] = \
                    ((df.loc[index - 1,'Ave Cost'] * \
                      df.loc[index - 1,'Total Shares']) + \
                      row['Cost'])/row['Total Shares']
            else:
                df.loc[index,'Ave Cost'] =  df.loc[index - 1,'Ave Cost']
    return df

get_ave_cost(stock_trading_records_df)

想要的结果：

        Date  Quantity     Price  Total Shares       Cost    Ave Cost
0 2017-01-10      1000  117.2249          1000  117224.90  117.224900
1 2017-02-10      -500  130.5928           500  -65296.40  117.224900
2 2017-03-10      1500  137.5316          2000  206297.40  132.454925
3 2017-04-10     -2000  141.5150             0 -283030.00    0.000000
4 2017-05-10       500  151.4884           500   75744.20  151.488400
5 2017-06-09       500  147.8657          1000   73932.85  149.677050
6 2017-07-10       500  143.9750          1500   71987.50  147.776367
7 2017-08-10     -1000  154.7636           500 -154763.60  147.776367
8 2017-09-11      -200  160.9215           300  -32184.30  147.776367
9 2017-10-10      1000  155.3416          1300  155341.60  153.595777

还有其他更有效或更简单的方法吗

谢谢大家!

我想你需要修复那第二个艾尔塞多你是说第二个在获得成本？但是此函数（for loop）返回正确的答案，似乎不是一个理想的解决方案。对于以后的问题，请提供一个包含模拟数据的易于复制的可粘贴代码示例。@tobsecret这只是一个伪数据，但如果您感兴趣，这里有，试试用迅捷的我想你需要修复第二个elsedo你是说第二个在获得成本？但是此函数（for loop）返回正确的答案，似乎不是一个理想的解决方案。对于以后的问题，请提供一个包含模拟数据的易于复制粘贴的代码示例。@tobsecret这只是一个伪数据，但如果您感兴趣，可以在此处使用，请尝试使用swifter