Python 熊猫/小矮人:创建梯子的最快方法?

Python 熊猫/小矮人:创建梯子的最快方法?,python,pandas,numpy,dataframe,vectorization,Python,Pandas,Numpy,Dataframe,Vectorization,我有一个数据框,如: color cost temp 0 blue 12.0 80.4 1 red 8.1 81.2 2 pink 24.5 83.5 color cost temp original_idx 0 blue 11.5 80.4 0 1 blue 12.0 80.4 0 2

我有一个数据框,如:

    color     cost    temp
0   blue      12.0    80.4   
1    red       8.1    81.2 
2   pink      24.5    83.5
    color     cost    temp  original_idx
0   blue      11.5    80.4            0
1   blue      12.0    80.4            0 
2   blue      12.5    80.4            0  
3    red       7.6    81.2            1 
4    red       8.1    81.2            1 
5    red       8.6    81.2            1 
6   pink      24.0    83.5            2
7   pink      24.5    83.5            2
8   pink      25.0    83.5            2
我想为每行创建一个“阶梯”或“范围”,以50美分为增量,从低于当前成本的0.50美元到高于当前成本的0.50美元。我当前的代码类似于以下代码:

incremented_prices = []

df['original_idx'] = df.index # To know it's original label

for row in df.iterrows():
    current_price = row['cost']
    more_costs    = numpy.arange(current_price-1, current_price+1, step=0.5)

    for cost in more_costs:
        row_c = row.copy()
        row_c['cost'] = cost
        incremented_prices.append(row_c)

df_incremented = pandas.concat(incremented_prices)
这段代码将生成如下数据帧:

    color     cost    temp
0   blue      12.0    80.4   
1    red       8.1    81.2 
2   pink      24.5    83.5
    color     cost    temp  original_idx
0   blue      11.5    80.4            0
1   blue      12.0    80.4            0 
2   blue      12.5    80.4            0  
3    red       7.6    81.2            1 
4    red       8.1    81.2            1 
5    red       8.6    81.2            1 
6   pink      24.0    83.5            2
7   pink      24.5    83.5            2
8   pink      25.0    83.5            2

在实际问题中,我将使范围从-50.00美元到50.00美元,我发现这非常缓慢,是否有更快的矢量化方法?

您可以尝试使用
numpy重新创建数据帧。重复

cost_steps = pd.np.arange(-0.5, 0.51, 0.5)
repeats = cost_steps.size   

pd.DataFrame(dict(
    color = pd.np.repeat(df.color.values, repeats),
    # here is a vectorized method to calculate the costs with all steps added with broadcasting
    cost = (df.cost.values[:, None] + cost_steps).ravel(),
    temp = pd.np.repeat(df.temp.values, repeats),
    original_idx = pd.np.repeat(df.index.values, repeats)
    ))

更新更多列:

df1 = df.rename_axis("original_idx").reset_index()
cost_steps = pd.np.arange(-0.5, 0.51, 0.5)
repeats = cost_steps.size   

pd.DataFrame(pd.np.hstack((pd.np.repeat(df1.drop("cost", 1).values, repeats, axis=0),
                          (df1.cost[:, None] + cost_steps).reshape(-1, 1))),
             columns=df1.columns.drop("cost").tolist()+["cost"])

以下是一种基于NumPy初始化的方法-

increments = 0.5*np.arange(-1,2) # Edit the increments here

names = np.append(df.columns, 'original_idx')

M,N = df.shape
vals = df.values

cost_col_idx = (names == 'cost').argmax()

n = len(increments)
shp = (M,n,N+1)
b = np.empty(shp,dtype=object)
b[...,:-1] = vals[:,None]
b[...,-1] = np.arange(M)[:,None]
b[...,cost_col_idx] = vals[:,cost_col_idx].astype(float)[:,None] + increments
b.shape = (-1,N+1)
df_out = pd.DataFrame(b, columns=names)
要使增量从
-50
变为
+50
,增量为
0.5
,请使用:

increments = 0.5*np.arange(-100,101)
样本运行-

In [200]: df
Out[200]: 
  color  cost  temp  newcol
0  blue  12.0  80.4   mango
1   red   8.1  81.2  banana
2  pink  24.5  83.5   apple

In [201]: df_out
Out[201]: 
  color  cost  temp  newcol original_idx
0  blue  11.5  80.4   mango            0
1  blue    12  80.4   mango            0
2  blue  12.5  80.4   mango            0
3   red   7.6  81.2  banana            1
4   red   8.1  81.2  banana            1
5   red   8.6  81.2  banana            1
6  pink    24  83.5   apple            2
7  pink  24.5  83.5   apple            2
8  pink    25  83.5   apple            2

您还可以将这个问题重新表述为:如何创建一个DF,使我的原始DF的每一行重复N次?然后,可能有用。@Lev这将是它的一部分,但对于每一行,我需要一个不同的价格,它基于原始价格+/-一定的金额。这是我想要的,但我有500列,所以我不想键入每一列。有没有办法把你的答案和500列的数据框结合起来