Python 基于列中的值运行模拟_Python_Pandas_Numpy

Python 基于列中的值运行模拟

python pandas numpy

Python 基于列中的值运行模拟,python,pandas,numpy,Python,Pandas,Numpy,我已经编写了一些代码，根据一些条件模拟熊猫数据帧中的值。我现在只想对名为df['Use Type']的列中的特定值运行此代码。我目前有以下情况： def l_sim(): n = 100 for i in range(n) df['RAND'] = np.random.uniform(0, 1, size=df.index.size) conditions = [df['RAND'] >= (1 - 0.8062), (df['RAND']

我已经编写了一些代码，根据一些条件模拟熊猫数据帧中的值。我现在只想对名为df['Use Type']的列中的特定值运行此代码。我目前有以下情况：

def l_sim():
    n = 100
    for i in range(n)

       df['RAND'] = np.random.uniform(0, 1, size=df.index.size)

       conditions = [df['RAND'] >= (1 - 0.8062), (df['RAND'] < (1 - 0.8062)) & (df['RAND'] >= 0.1),
                  (df['RAND'] < 0.1) & (df['RAND'] >= 0.05), (df['RAND'] < 0.05) &
                  (df['RAND'] >= 0.025), (df['RAND'] < 0.025) & (df['RAND'] >= 0.0125),
                  (df['RAND'] < 0.0125)]
       choices = ['L0', 'L1', 'L2', 'L3', 'L4', 'L5']
       df['L'] = np.select(conditions, choices)

       conditions = [df['L'] == 'L0', df['L'] == 'L1', df['L'] == 'L2', df['L'] == 'L3',
                  df['L'] == 'L4', df['L'] == 'L5']
       choices = [df['A'] * 0.02, df['A'] * 0.15, df['A'] * 0.20, df['A'] * 0.50,
               df['A'] * 1, df['A'] * 1]
       df['AL'] = np.select(conditions, choices)


 l_sim()

我如何才能仅对具有df.loc[df['Use Type']=='Commercial Property']的行运行此代码

提前感谢。

我认为您需要以不同的方式构建代码。但一般来说，可以使用df.apply和lambda函数。这种模式：

df['L'] = df.apply(lambda row: l_sim(row), axis=1)

我会将您的代码分成三个函数，一个用于df['L']：

第三种逻辑仅在行['Use Type']=='Commercial Property'时创建值：

要启动它：

df['L'] = df.apply(lambda row: l_sim(row), axis=1)

df['AL'] = df.apply(lambda row: l_sim(row), axis=1)

假设您的数据帧至少有两列“A”和“Use Type”，例如：

df = pd.DataFrame({'Use Type':['Commercial Property']*3+['other']*2, 'A':1})

然后通过以下方式修改函数：

def l_sim(df,use_type=None):
    #check if you want to do it ont he whole datafrmae or a specific Use type
    if use_type:
        mask = df['Use Type'] == use_type
    else:
        mask = slice(None)
    # generete the random values
    df.loc[mask,'RAND'] = np.random.uniform(0, 1, size=df[mask].index.size)
    # create conditions (same for both L and AL by the way)
    conditions = [ df['RAND'] >= (1 - 0.8062), (df['RAND'] >= 0.1), (df['RAND'] >= 0.05), 
                  (df['RAND'] >= 0.025), (df['RAND'] >= 0.0125), (df['RAND'] < 0.0125)]
    #choices for the column L and create the column
    choices_L = ['L0', 'L1', 'L2', 'L3', 'L4', 'L5']
    df.loc[mask,'L'] = np.select(conditions, choices_L)[mask]
    #choices for the column AL and create the column
    choices_A = [df['A'] * 0.02, df['A'] * 0.15, df['A'] * 0.20, df['A'] * 0.50,
                 df['A'] * 1, df['A'] * 1]
    df.loc[mask,'AL'] = np.select(conditions, choices_A)[mask]

及

我删除了的循环，因为我看不出重点，我简化了您的条件，就像前面问题中的一样

为什么在代码中有循环？它似乎从未在您的代码中使用过do@Ben.T对于100范围内的每个I，我会得到一组不同的随机数，因此数据帧中的每一行都会有不同的“L”值。好的，但是如果在每个循环中重写同一列中的“L”值，那么上一个循环中的值就会被擦除。对于列AL也是一样，您的代码也会覆盖此列，而不考虑前面的循环

df['L'] = df.apply(lambda row: l_sim(row), axis=1)

df['AL'] = df.apply(lambda row: l_sim(row), axis=1)

df = pd.DataFrame({'Use Type':['Commercial Property']*3+['other']*2, 'A':1})

def l_sim(df,use_type=None):
    #check if you want to do it ont he whole datafrmae or a specific Use type
    if use_type:
        mask = df['Use Type'] == use_type
    else:
        mask = slice(None)
    # generete the random values
    df.loc[mask,'RAND'] = np.random.uniform(0, 1, size=df[mask].index.size)
    # create conditions (same for both L and AL by the way)
    conditions = [ df['RAND'] >= (1 - 0.8062), (df['RAND'] >= 0.1), (df['RAND'] >= 0.05), 
                  (df['RAND'] >= 0.025), (df['RAND'] >= 0.0125), (df['RAND'] < 0.0125)]
    #choices for the column L and create the column
    choices_L = ['L0', 'L1', 'L2', 'L3', 'L4', 'L5']
    df.loc[mask,'L'] = np.select(conditions, choices_L)[mask]
    #choices for the column AL and create the column
    choices_A = [df['A'] * 0.02, df['A'] * 0.15, df['A'] * 0.20, df['A'] * 0.50,
                 df['A'] * 1, df['A'] * 1]
    df.loc[mask,'AL'] = np.select(conditions, choices_A)[mask]

l_sim(df,'Commercial Property')
print (df)
              Use Type  A      RAND    L    AL
0  Commercial Property  1  0.036593   L3  0.50
1  Commercial Property  1  0.114773   L1  0.15
2  Commercial Property  1  0.651873   L0  0.02
3                other  1       NaN  NaN   NaN
4                other  1       NaN  NaN   NaN

l_sim(df)
print (df)
              Use Type  A      RAND   L    AL
0  Commercial Property  1  0.123265  L1  0.15
1  Commercial Property  1  0.906185  L0  0.02
2  Commercial Property  1  0.107588  L1  0.15
3                other  1  0.434560  L0  0.02
4                other  1  0.304901  L0  0.02