Python 是否有一种有效/最佳的方法将分数分配给数据框列中的值？_Python_Pandas_Dataframe_Numpy

Python 是否有一种有效/最佳的方法将分数分配给数据框列中的值？

python pandas dataframe numpy

Python 是否有一种有效/最佳的方法将分数分配给数据框列中的值？,python,pandas,dataframe,numpy,Python,Pandas,Dataframe,Numpy,我有一列数据框，其中包含客户“关系长度”的值。我想根据它们是否低于终止关系长度的平均值、高于平均值、高于1个标准偏差和高于2个标准偏差，将这些值转换为1-4之间的数字。在不使用for循环的情况下，有没有更简单/更快的方法以下是我目前的代码： average = terminatedDf['Relationship Length'].mean() standardDeviation = terminatedDf['Relationship Length'].std() lorScores =

我有一列数据框，其中包含客户“关系长度”的值。我想根据它们是否低于终止关系长度的平均值、高于平均值、高于1个标准偏差和高于2个标准偏差，将这些值转换为1-4之间的数字。在不使用for循环的情况下，有没有更简单/更快的方法

以下是我目前的代码：

average = terminatedDf['Relationship Length'].mean()

standardDeviation = terminatedDf['Relationship Length'].std()

lorScores = {np.arange(0, average): 1, np.arange(average, standardDeviation): 2, np.arange(standardDeviation, standardDeviation*2): 3, np.arange(standardDeviation*2, 150): 4}

reportDf['Length of Relationship Score'] = reportDf['Relationship Length'].map(lorScores)

我的问题是numpy数组是不可散列的，但是使用正则范围函数只允许整数

我想我可以循环遍历数据帧，因为它只有约1500行，如下所示：

for row in reportDf:
    if row[5] < average: 
        row[15] = 1
    else:
     ....

平均长度约为6.3，标准偏差约为3.4，我将为此创建函数并使用以下方法

def get_score(x):
    if x <= average:
         return 1
    if average < x <= standardDeviation:
         return 2
    if standardDeviation < x <= standardDeviation*2:
         return 3
    if standardDeviation*2 < x <= 150:
         return 4

reportDf['Length of Relationship Score'] = reportDf['Relationship Length'].apply(get_score)

输出

     x category
0    1      few
1    9      few
2   10     tens
3   17     tens
4   45     tens
5   99     tens
6  100      NaN
7  121      NaN

Idk，如果你能做到这一点，没有任何循环，tbh

但您基本上不应该使用原始循环，如reportDf:中的行的

！它们最终会变得非常慢，而且总会有更好的解决方案。例如，iteritems（）和列表理解。由于这有多个“如果”条件，我认为不能在信用证中完成
scores = []
for index, duration in terminatedDf['Relationship Length'].iteritems():
   if duration < average:
      scores.append(1)
   elif average < duration < average + standardDeviation:
      scores.append(2)
   elif average + standardDeviation < duration < average + 2*standardDeviation:
      scores.append(3)
   else:
      scores.append(4)

terminatedDf["Scores"] = scores

分数=[]
对于索引，在terminatedDf['Relationship Length']中的持续时间。iteritems（）
如果持续时间<平均值：
分数。附加（1）
elif平均值<持续时间<平均值+标准偏差：
分数。附加（2）
elif平均值+标准偏差<持续时间<平均值+2*标准偏差：
分数。附加（3）
其他：
分数。附加（4）
terminatedDf[“分数”]=分数
试试：
你能添加你的数据帧样本吗？@SagunDevkota是的，给我一分钟时间让数据匿名化。另外，一些terminatedDf
或average
和standardDeviation
的数值以及resultDf
的预期输出对测试非常有帮助。附加问题。当平均值
大于标准偏差时会发生什么情况？@HenryEcker的好观点应该是平均值+标准偏差。未捕获从未被介绍过的内容。cut（）。谢谢你，这解决了这个问题，我现在可以把它用于我的其他专栏了！很高兴我能帮上忙！我不确定什么是最好的ettiquette来接受你和@HenryEcker都建议的解决方案，但谢谢你
     x category
0    1      few
1    9      few
2   10     tens
3   17     tens
4   45     tens
5   99     tens
6  100      NaN
7  121      NaN

scores = []
for index, duration in terminatedDf['Relationship Length'].iteritems():
   if duration < average:
      scores.append(1)
   elif average < duration < average + standardDeviation:
      scores.append(2)
   elif average + standardDeviation < duration < average + 2*standardDeviation:
      scores.append(3)
   else:
      scores.append(4)

terminatedDf["Scores"] = scores

import pandas as pd

reportDf = pd.DataFrame({
    'Owner': ['Bob', 'Jane', 'Alice', 'Fred'],
    'Name': ['John', 'Johnny', 'Suzie', 'Larry'],
    'Relationship Length': [0.78, 0.73, 19.36, 7.36]
})

average = reportDf['Relationship Length'].mean()

standardDeviation = reportDf['Relationship Length'].std()

bins = [0, average, average + standardDeviation,
        average + (standardDeviation * 2), 150]
labels = [1, 2, 3, 4]

reportDf['Length of Relationship Score'] = pd.cut(
    reportDf['Relationship Length'],
    bins=bins,
    labels=labels,
    # right=False means upperbound non-inclusive
    # In the question you have row[5] < average
    # So this is set to use strictly less than
    # Remove if this is not the desired behaviour
    right=False
)

print(reportDf)

   Owner    Name  Relationship Length Length of Relationship Score
0    Bob    John                 0.78                            1
1   Jane  Johnny                 0.73                            1
2  Alice   Suzie                19.36                            3
3   Fred   Larry                 7.36                            2