Python 使用值列表对多个dataframe列执行计算，无需迭代_Python_Pandas

Python 使用值列表对多个dataframe列执行计算，无需迭代

python pandas

Python 使用值列表对多个dataframe列执行计算，无需迭代,python,pandas,Python,Pandas,我是Python新手（使用3.7版）。我通过从csv文件加载列表创建了一个数据框。我想更新dataframe中的一列（“Score”），该列将保存对dataframe中特定列值执行的计算总和的结果。以下是代码片段： #load library import pandas as pd #get the data file_name = "c:\myfile.csv" df = pd.read_csv(file_name) #get the variable parameters sVariable

我是Python新手（使用3.7版）。我通过从csv文件加载列表创建了一个数据框。我想更新dataframe中的一列（“Score”），该列将保存对dataframe中特定列值执行的计算总和的结果。以下是代码片段：

#load library
import pandas as pd
#get the data
file_name = "c:\myfile.csv"
df = pd.read_csv(file_name)
#get the variable parameters
sVariableList = ["depth","rpm","pressure","flow_rate","lag" ]
sWeights = [.20, .20, .30, .15, .15] 
sMeans = [57.33283924063220, 7159.6003409761900, 20.270635083327700, 55.102824912342000, 90.67]
sSTD  = [101.803564244615000 , 3124.14373264349000, 32.461940805541400, 93.338695138920900, 61.273]

dataframe包含的列比sVariableList中列出的项多。可变列表仅表示我要对其执行计算的字段。我想做的是计算每行的分数-将值存储在“score”列中。以下是我现在正在做的事情，它给出了正确的结果：

#loop through the record and perform the calculation
for row in range(len(df)):
    ind = 0
    nScore = 0.0
    for fieldname in sVariableList: 

        #calculate the score
        nScore = nScore + ( sWeights[ind]*(((df.loc[row, fieldname] - sVariableMeans[ind])/sSTD[ind])**2) )
        ind = ind + 1 #move to the next variable/field index

    #set the result to the field value
    df.loc[row, "Score"] = nScore

但是速度很慢。我有一个90万条记录的数据集

我发现有文章讨论列表压缩作为迭代的一种可能替代方案，但我对该语言还不够熟悉，无法实现。欢迎提出任何意见

感谢您对基础numpy数据进行计算，并仅将最终结果分配给数据帧：

x = np.array([sWeights, sMeans, sSTD])
y = df[sVariableList].to_numpy()
df['Score'] = (x[0] * ((y - x[1]) / x[2])**2).sum(axis=1)

对于900000条记录，在我的计算机上大约需要0.15秒。

您好-感谢您的回复。我做得很好。你能解释一下发生了什么事吗？我知道您使用单个列表创建了多维数组；流程如何知道要应用哪一列？我希望问题不要太基本。thanksIt使用numpy的矢量化。有关简要说明，请参见SO anwser，有关更多详细信息，请参见本简介：