Python 为数据帧中的每一行循环IF语句_Python_Pandas_Numpy

Python 为数据帧中的每一行循环IF语句

python pandas numpy

Python 为数据帧中的每一行循环IF语句,python,pandas,numpy,Python,Pandas,Numpy,您好，我是一个新的使用熊猫从SAS的背景，我正试图分段一个连续变量到波段使用以下代码 var_range = df['BILL_AMT1'].max() - df['BILL_AMT1'].min() a= 10 for i in range(1,a): inc = var_range/a lower_bound = df['BILL_AMT1'].min() + (i-1)*inc print('Lower bound is '+str(lower_bound))

您好，我是一个新的使用熊猫从SAS的背景，我正试图分段一个连续变量到波段使用以下代码

var_range = df['BILL_AMT1'].max() - df['BILL_AMT1'].min()
a= 10
for i in range(1,a):
    inc = var_range/a
    lower_bound = df['BILL_AMT1'].min() + (i-1)*inc
    print('Lower bound is '+str(lower_bound))
    upper_bound = df['BILL_AMT1'].max() + (i)*inc
    print('Upper bound is '+str(upper_bound))
    if (lower_bound <= df['BILL_AMT1'] < upper_bound):
        df['bill_class'] = i
    i+=1

我认为if条件的计算是正确的，但错误是由于为新列指定了for循环计数器的值

任何人都可以解释出哪里出了问题或提出替代方案。

要避免

值错误

，请更改

if (lower_bound <= df['BILL_AMT1'] < upper_bound):
    df['bill_class'] = i

然后使用

df.loc

为

mask

为真的

bill\u class

列赋值：

df.loc[mask, 'bill_class'] = i

要在

df['BILL_AMT1']

中存储数据，可以完全删除Python

for循环

，并使用

pd.cut

：

df['bill_class'] = pd.cut(df['BILL_AMT1'], bins=10, labels=False)+1

IIUC，这应该是对代码的修复：

mx, mn = df['BILL_AMT1'].max(), df['BILL_AMT1'].min()
rng = mx - mn
a = 10

for i in range(a):
    inc = rng / a
    lower_bound = mn + i * inc
    print('Lower bound is ' + str(lower_bound))
    upper_bound = mn + (i + 1) * inc if i + 1 < a else mx
    print('Upper bound is ' + str(upper_bound))
    ge = df['BILL_AMT1'].ge(lower_bound)
    lt = df['BILL_AMT1'].lt(upper_bound)
    df.loc[ge & lt, 'bill_class'] = i

@DSM：是的，完全是我的错。好多了。：-）尽管我们可能应该推荐一种矢量化方法（无论是pd.cut还是np.digitalize——我看你已经有了至少一个pd.cut答案可以引用了……）谢谢。我最终使用了@DMS建议的方法，因为我并不完全理解.loc和mask的内容。

mask = (lower_bound <= df['BILL_AMT1']) & (df['BILL_AMT1'] < upper_bound)

df.loc[mask, 'bill_class'] = i

df['bill_class'] = pd.cut(df['BILL_AMT1'], bins=10, labels=False)+1

mx, mn = df['BILL_AMT1'].max(), df['BILL_AMT1'].min()
rng = mx - mn
a = 10

for i in range(a):
    inc = rng / a
    lower_bound = mn + i * inc
    print('Lower bound is ' + str(lower_bound))
    upper_bound = mn + (i + 1) * inc if i + 1 < a else mx
    print('Upper bound is ' + str(upper_bound))
    ge = df['BILL_AMT1'].ge(lower_bound)
    lt = df['BILL_AMT1'].lt(upper_bound)
    df.loc[ge & lt, 'bill_class'] = i

df['bill_class'] = pd.qcut(df['BILL_AMT1'], 10, list(range(10)))