R如何从两个相互关联的矩阵列表中进行加权平均计算
我有两个矩阵列表。两者都具有与相同顺序的对象相同的类别。这些对象始终是矩阵R如何从两个相互关联的矩阵列表中进行加权平均计算,r,list,matrix,weighted-average,R,List,Matrix,Weighted Average,我有两个矩阵列表。两者都具有与相同顺序的对象相同的类别。这些对象始终是矩阵 **mylist_1** $region_1 users 50 20 30 revenue 10000 3500 4000 $region_2 users 20 20 60 revenue 5000 4000 10000 **mylist_2** $region_1
**mylist_1**
$region_1
users 50 20 30
revenue 10000 3500 4000
$region_2
users 20 20 60
revenue 5000 4000 10000
**mylist_2**
$region_1
% female 0.1 0.3 0.8
income 10000 25000 30000
$region_2
% female 0.5 0.4 0.3
income 50000 20000 23000
我想使用mylist_1中的用户计算列表2中指标的加权平均值,即女性百分比和平均收入。列表1中区域_1的用户与列表2中区域_1的度量相关,同样的关系也适用于区域2。因此,我们需要区域1和区域2基于每个度量的每列用户的加权平均值。i、 e.要获得收入的第一个数字
(50*10000+20*50000)/(50+20)
i、 e.输出应如下所示:
输出:
% female 0.21 0.35 0.47
income 21429 22500 25333
以下方法可能会有所帮助:
mylist = list(
list1=list(
region_1 = list(
users = c(20,50,100),
revenue = c(10000, 25000, 15000)
),
region_2 = list(
users = c(25,40,85),
revenue = c(15000, 15000, 5000)
)
),
list2= list(
region_1 = list(
pct_females = c(.2,.50,.100),
income = c(10000, 25000, 15000)
),
region_2 = list(
pct_females = c(.25,.40,.85),
income = c(15000, 15000, 5000)
)
)
)
mylist
$list1
$list1$region_1
$list1$region_1$users
[1] 20 50 100
$list1$region_1$revenue
[1] 10000 25000 15000
$list1$region_2
$list1$region_2$users
[1] 25 40 85
$list1$region_2$revenue
[1] 15000 15000 5000
$list2
$list2$region_1
$list2$region_1$pct_females
[1] 0.2 0.5 0.1
$list2$region_1$income
[1] 10000 25000 15000
$list2$region_2
$list2$region_2$pct_females
[1] 0.25 0.40 0.85
$list2$region_2$income
[1] 15000 15000 5000
ddf = data.frame(mylist)
ddf
list1.region_1.users list1.region_1.revenue list1.region_2.users list1.region_2.revenue list2.region_1.pct_females
1 20 10000 25 15000 0.2
2 50 25000 40 15000 0.5
3 100 15000 85 5000 0.1
list2.region_1.income list2.region_2.pct_females list2.region_2.income
1 10000 0.25 15000
2 25000 0.40 15000
3 15000 0.85 5000
>
# for income:
(ddf[,"list1.region_1.users"] * ddf[,"list2.region_1.income"] +
ddf[,"list1.region_2.users"] * ddf[,"list2.region_2.income"]) /
(ddf[,"list1.region_1.users"]+ ddf[,"list1.region_2.users"])
[1] 12777.78 20555.56 10405.41
# for percent females:
(ddf[,"list1.region_1.users"] * ddf[,"list2.region_1.pct_females"] +
ddf[,"list1.region_2.users"] * ddf[,"list2.region_2.pct_females"]) /
(ddf[,"list1.region_1.users"]+ ddf[,"list1.region_2.users"])
[1] 0.2277778 0.4555556 0.4445946
对于具有相同信息的数据帧:
ddf = structure(list(region = c("region_1", "region_1", "region_1",
"region_2", "region_2", "region_2"), types = c("users", "percent_females",
"income", "users", "percent_females", "income"), val1 = c(50,
0.1, 10000, 20, 0.5, 50000), val2 = c(20, 0.3, 25000, 20, 0.4,
20000), val3 = c(30, 0.8, 30000, 60, 0.3, 23000)), .Names = c("region",
"types", "val1", "val2", "val3"), class = "data.frame", row.names = c(NA,
-6L))
ddf
region types val1 val2 val3
1 region_1 users 5e+01 20.0 30.0
2 region_1 percent_females 1e-01 0.3 0.8
3 region_1 income 1e+04 25000.0 30000.0
4 region_2 users 2e+01 20.0 60.0
5 region_2 percent_females 5e-01 0.4 0.3
6 region_2 income 5e+04 20000.0 23000
ddf$newcol = paste(ddf$region, ddf$types, sep="_")
>
> ddf
region types val1 val2 val3 newcol
1 region_1 users 5e+01 20.0 30.0 region_1_users
2 region_1 percent_females 1e-01 0.3 0.8 region_1_percent_females
3 region_1 income 1e+04 25000.0 30000.0 region_1_income
4 region_2 users 2e+01 20.0 60.0 region_2_users
5 region_2 percent_females 5e-01 0.4 0.3 region_2_percent_females
6 region_2 income 5e+04 20000.0 23000.0 region_2_income
>
# for income:
col=3:5
> (ddf[ddf$newcol=='region_1_users',col]* ddf[ddf$newcol=='region_1_income',col]+
+ ddf[ddf$newcol=='region_2_users',col]* ddf[ddf$newcol=='region_2_income',col]) /
+ (ddf[ddf$newcol=='region_1_users',col]+ ddf[ddf$newcol=='region_2_users',col])
val1 val2 val3
1 21428.57 22500 25333.33
# for percent females:
(ddf[ddf$newcol=='region_1_users',col]* ddf[ddf$newcol=='region_1_percent_females',col]+
ddf[ddf$newcol=='region_2_users',col]* ddf[ddf$newcol=='region_2_percent_females',col]) /
(ddf[ddf$newcol=='region_1_users',col]+ ddf[ddf$newcol=='region_2_users',col])
val1 val2 val3
1 0.2142857 0.35 0.4666667
你能解释一下这些值50*10000+20*50000/50+20的来源吗?它们似乎没有完全关联您在第一列区域1的两个列表中声明的值:区域1中的50个用户*区域1中的10000个收入+区域2中的20个用户*区域2中的50000个收入。将所有这些除以区域1和2的用户总数50和20。分别对待每一个专栏,好吧,当你分别谈论区域1和区域2时,你的话是误导性的,或者至少可以这样读,我会编辑这篇文章。谢谢你的反馈