Pandas 通过组合行的值来优化并找到最大值_Pandas_Scipy

Pandas 通过组合行的值来优化并找到最大值

pandas

Pandas 通过组合行的值来优化并找到最大值,pandas,scipy,Pandas,Scipy,我有以下df： level type price1 price2 5250 A 0.233 0.2865 5250 B 0.004 0.006 5500 A 0.197 0.2545 5500 B 0.0055 0.0075 5750 A 0.1615 0.223 5750 B 0.0075 0.009 6000 A 0.127 0.1925 6000 B 0.0105 0.012

我有以下

df

：

level   type  price1    price2
5250    A   0.233   0.2865
5250    B   0.004   0.006
5500    A   0.197   0.2545
5500    B   0.0055  0.0075
5750    A   0.1615  0.223
5750    B   0.0075  0.009
6000    A   0.127   0.1925
6000    B   0.0105  0.0125
6250    A   0.1215  0.1635
6250    B   0.0135  0.0165
6500    A   0.099   0.136
6500    B   0.021   0.024
6750    A   0.071   0.085
6750    B   0.03    0.0325
7000    A   0.052   0.0555
7000    B   0.044   0.047
7250    A   0.036   0.0395
7250    B   0.063   0.0675
7500    A   0.024   0.0275
7500    B   0.086   0.091
7750    A   0.0165  0.019
7750    B   0.111   0.161
8000    A   0.0105  0.0135
8000    B   0.118   0.1915
8250    A   0.0085  0.0105
8250    B   0.137   0.224
8500    A   0.0055  0.008
8500    B   0.1835  0.257
8750    A   0.0045  0.0065
8750    B   0.2035  0.291
9000    A   0.0035  0.0055
9000    B   0.002   1.956

我根据

类型列将df
分为df_A
和df_B。
下一步，我想找到使以下各项最大化的级别/行：
sum = buy_A + buy_B - sell_A - sell_B 

在哪里
为了清楚起见，我在这里使用了行
，但是在我的脚本中行
应该等于列级别
的值。我有个限制。对于买入A
和卖出B
而言，级别
应该相等，对于卖出A
和买入B
而言，级别
的值应该相等。包括这一结果：
buy_A = float(df_A.loc[(df_A['level'] == level_1), 'price2'])
buy_B = float(df_B.loc[(df_B['level'] == level_2), 'price2'])
sell_A = float(df_A.loc[(df_A['level'] == level_1), 'price1'])
sell_B = float(df_B.loc[(df_B['level'] == level_2), 'price1'])

基本上，对于上述df
我得到以下矩阵：

我想返回矩阵中最大值的值以及相应的级别
我的剧本：
import pandas as pd
import numpy as np
from scipy.optimize import minimize


def obj(x, df):
   df_A = df.loc[(df['type'] == 'A')]
   df_B = df.loc[(df['type'] == 'B')]
   sum = df_A['price1'] + df_B['price1'] - df_A['price2'] - df_B['price2']
   return -1 * sum


if __name__ == "__main__":
   df = pd.read_csv('quotes.csv')
   guess = 0
   solver = minimize(obj, args=(df), x0=guess, method='Nelder-Mead', options={'disp': True})

我需要更改什么才能获得最大值和相应的级别？非常感谢
因为sum
是Python中的内置函数，我想用z
替换它。既然你在（buy_A
，sell_B
）和（sell_A
，buy_B
）中有相同的level
的限制，那么让我们重新安排你的等式，让它更清楚：
z = buy_A + buy_B - sell_A - sell_B 
  = (buy_A - sell_B) + (buy_B - sell_A)
  = x + y


让我们深入探讨你的问题。我们要做的第一件事是重新格式化原始数据帧以排列级别：
tmp = df.rename({'price1': 'sell', 'price2': 'buy'}, axis=1) \
        .set_index(['level', 'type']) \
        .unstack()

# tmp:
         sell             buy        
type        A       B       A       B
level                                
5250   0.2330  0.0040  0.2865  0.0060
5500   0.1970  0.0055  0.2545  0.0075
5750   0.1615  0.0075  0.2230  0.0090
6000   0.1270  0.0105  0.1925  0.0125
6250   0.1215  0.0135  0.1635  0.0165

然后计算我们的x
和y
：
x = tmp[('buy', 'A')] - tmp[('sell', 'B')]
y = tmp[('buy', 'B')] - tmp[('sell', 'A')]

接下来，我们需要计算z
z
不仅仅是x
+y
，而是x
中的每个值都添加到y
中的每个值；因此，z
是一个方阵。但我们也不想要整个矩阵。我们只想要主对角线下方的三角形。numpy.ma
模块在屏蔽数组上提供函数，我们可以将某些元素标记为不存在
import numpy.ma as ma

# Mask away the upper triangle, including the main diagonal
# len(x) == len(y)
mask = np.triu(np.ones((len(x), len(y))))

# Use numpy broadcasting to add every value in `x` to every value in `y`
# `x` and `y` are pandas Series. `.values` get the underlying numpy array
#
# `y.values[:, None]` raises `y` to another dimension. This is what
# triggers numpy's array broadcasting and make `z` a square matrix
z = -ma.array(x.values + y.values[:, None], mask=mask)

# If you want to visualize `z`, type this into the debugger
# pd.DataFrame(z, index=tmp.index, columns=tmp.index)

最后一步是获得相加时产生最大值的级别。如果存在多个具有最大值的单元格，则仅获取第一个：
i,j = np.unravel_index(z.argmax(), z.shape)

# The level with the max sum
level1, level2 = tmp.index[[i,j]]   # 7250, 7000

# The max value of the sums
z[i,j]                              # -0.043

因为sum
是Python中的内置函数，所以我想用z
替换它。既然你在（buy_A
，sell_B
）和（sell_A
，buy_B
）中有相同的level
的限制，那么让我们重新安排你的等式，让它更清楚：
z = buy_A + buy_B - sell_A - sell_B 
  = (buy_A - sell_B) + (buy_B - sell_A)
  = x + y


让我们深入探讨你的问题。我们要做的第一件事是重新格式化原始数据帧以排列级别：
tmp = df.rename({'price1': 'sell', 'price2': 'buy'}, axis=1) \
        .set_index(['level', 'type']) \
        .unstack()

# tmp:
         sell             buy        
type        A       B       A       B
level                                
5250   0.2330  0.0040  0.2865  0.0060
5500   0.1970  0.0055  0.2545  0.0075
5750   0.1615  0.0075  0.2230  0.0090
6000   0.1270  0.0105  0.1925  0.0125
6250   0.1215  0.0135  0.1635  0.0165

然后计算我们的x
和y
：
x = tmp[('buy', 'A')] - tmp[('sell', 'B')]
y = tmp[('buy', 'B')] - tmp[('sell', 'A')]

接下来，我们需要计算z
z
不仅仅是x
+y
，而是x
中的每个值都添加到y
中的每个值；因此，z
是一个方阵。但我们也不想要整个矩阵。我们只想要主对角线下方的三角形。numpy.ma
模块在屏蔽数组上提供函数，我们可以将某些元素标记为不存在
import numpy.ma as ma

# Mask away the upper triangle, including the main diagonal
# len(x) == len(y)
mask = np.triu(np.ones((len(x), len(y))))

# Use numpy broadcasting to add every value in `x` to every value in `y`
# `x` and `y` are pandas Series. `.values` get the underlying numpy array
#
# `y.values[:, None]` raises `y` to another dimension. This is what
# triggers numpy's array broadcasting and make `z` a square matrix
z = -ma.array(x.values + y.values[:, None], mask=mask)

# If you want to visualize `z`, type this into the debugger
# pd.DataFrame(z, index=tmp.index, columns=tmp.index)

最后一步是获得相加时产生最大值的级别。如果存在多个具有最大值的单元格，则仅获取第一个：
i,j = np.unravel_index(z.argmax(), z.shape)

# The level with the max sum
level1, level2 = tmp.index[[i,j]]   # 7250, 7000

# The max value of the sums
z[i,j]                              # -0.043

为什么df_B.loc[row，'price2']
既等于buy_B
又等于sell_B
？这是打字错误吗？是的，很抱歉，为什么df_B.loc[row，'price2']
既等于buy_B
又等于sell_B
？那是打字错误吗？是的很抱歉