Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/sql-server-2005/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 具有多个权重和组的加权平均_Python_Pandas_Dataframe - Fatal编程技术网

Python 具有多个权重和组的加权平均

Python 具有多个权重和组的加权平均,python,pandas,dataframe,Python,Pandas,Dataframe,我是Python的初学者,我正在努力改进我的代码——因此,我希望能就如何提高以下内容的效率提供一些建议 我有以下数据集: petdata = { 'animal' : ['dog', 'cat', 'fish'], 'male_1' : [0.57, 0.72, 0.62], 'female_1' : [0.43, 0.28, 0.38], 'age_01_1': [0.10,0.16,0.15], 'age_15_1':[0.17,0.29,0.26],

我是Python的初学者,我正在努力改进我的代码——因此,我希望能就如何提高以下内容的效率提供一些建议

我有以下数据集:

petdata = {
    'animal' : ['dog', 'cat', 'fish'],
    'male_1' : [0.57, 0.72, 0.62],
    'female_1' : [0.43, 0.28, 0.38],
    'age_01_1': [0.10,0.16,0.15],
    'age_15_1':[0.17,0.29,0.26],
    'age_510_1':[0.15,0.19,0.19],
    'age_1015_1':[0.18,0.16,0.17],
    'age_1520_1':[0.20,0.11,0.12],
    'age_20+_1':[0.20,0.09,0.10],
    'male_2' : [0.57, 0.72, 0.62],
    'female_2' : [0.43, 0.28, 0.38],
    'age_01_2': [0.10,0.16,0.15],
    'age_15_2':[0.17,0.29,0.26],
    'age_510_2':[0.15,0.19,0.19],
    'age_1015_2':[0.18,0.16,0.17],
    'age_1520_2':[0.20,0.11,0.12],
    'age_20+_2':[0.20,0.09,0.10],
    'weight_1': [10,20,30],
    'weight_2':[40,50,60]
}

df = pd.DataFrame(petdata) 

我想为我的数据集中的动物计算一个加权平均值,对所有以“_1”结尾的变量使用权重_1,对所有以“_2”结尾的变量使用权重_2

我现在是这样做的:

df['male_wav_1']=np.nansum((df['male_1']*df['weight_1'])/df['weight_1'].sum())
df['female_wav_1']=np.nansum((df['female_1']*df['weight_1'])/df['weight_1'].sum())


df['male_wav_2']=np.nansum((df['male_2']*df['weight_2'])/df['weight_2'].sum())
df['female_wav_2']=np.nansum((df['female_2']*df['weight_2'])/df['weight_2'].sum())
这是针对我的数据框中的每一列(即age_01_1_wav,age_15_1_wav…)我意识到这不是很整洁,所以有人能给我一些关于如何改进流程的建议吗

我曾尝试:

  • 将数据从宽改为长
  • 定义加权平均值的函数
但我两个都不成功。问题不在于重塑,我可以这样做,但我不清楚如何将不同的权重应用到我数据中的不同组

非常感谢你的帮助

您可以使用Python zip()函数进行一些快速计算

    petdata = {
        'animal' : ['dog', 'cat', 'fish'],
        'male_1' : [0.57, 0.72, 0.62],
        'age_20+_2':[0.20,0.09,0.10],
        'weight_1': [10,20,30],
        'weight_2':[40,50,60]
    }
weight_1 = petdata.get('weight_1')
male_1 = petdata.get('male_1')
for sales, costs in zip(weight_1, male_1):
    profit =sales * costs / sales
    print(f'Total profit: {profit}')

Total profit: 0.57
Total profit: 0.72
Total profit: 0.62

首先,我假设'animals'列是您的索引,因此为了看起来像一个表,我将其作为索引:

import pandas as pd
import numpy as np
petdata = {
    # All of your data ^ above
}

df = pd.DataFrame(petdata)  # Creates the DF from your dictionary
df.set_index('animal',inplace=True) # Sets the 'animal' column as the index
首先,我将把数据帧分为两部分:df_1和df_2

# Uses list comprehension to create a list of all column names with a given string
# in the name, and uses this list to get a sub-DataFrame for each
df_1 = df[[name for name in df.columns if '_1' in name]]
df_2 = df[[name for name in df.columns if '_2' in name]]
与其在数据帧中为已经存在的每个系列创建一个新的系列(列),我更愿意为每个列创建一个加权平均值(wav)的新行。因为新行不是动物,所以这就不太好看了,但索引“wav”将位于动物列中

使用列表理解和使用的等式生成两个加权平均值列表:

wav_1 = [np.nansum(df[col]*df_1['weight_1'])/np.nansum(df_1['weight_1']) for col in df_1.columns]
wav_2 = [np.nansum(df[col]*df_1['weight_2'])/np.nansum(df_1['weight_2']) for col in df_2.columns]
然后使用新的“wav”标签将此数据附加到两个数据帧:

df_1.loc['wav'] = wav_1
df_2.loc['wav'] = wav_2
请注意,“wav”-“weight_x”框中有垃圾数据。它是你体重的加权平均值


欢迎来到Python!希望这能有所帮助。

但我仍然需要对每个变量都这样做?i、 e.拉链(重量1,女性1),拉链(年龄1),等等。所以我不认为它能提高效率多少?