计算R中两列的某些值之和
我目前有一个数据帧,如下面一组成对关联中的一个: 数据计算R中两列的某些值之和,r,R,我目前有一个数据帧,如下面一组成对关联中的一个: 数据 structure(list(ID1 = c("A", "A", "A", "B", "B", "C"), ID2 = c("B", "C", "D", "C", "D", "D"), cor = c(
structure(list(ID1 = c("A", "A", "A", "B", "B", "C"), ID2 = c("B",
"C", "D", "C", "D", "D"), cor = c(0.6, 0.6, 0.2, 0.1, 0.9, 0.2
), value1 = c(50L, 50L, 50L, 20L, 20L, 30L), value2 = c(20L,
30L, 100L, 30L, 100L, 100L)), class = "data.frame", row.names = c(NA,
-6L))
ID1 ID2 cor value1 value2
1 A B 0.6 50 20
2 A C 0.6 50 30
3 A D 0.2 50 100
4 B C 0.1 20 30
5 B D 0.9 20 100
6 C D 0.2 30 100
我试图得到cor和value1或value2之间的乘积的所有id(即B)之和,这取决于它是来自ID1还是ID2
例如,B的总和为(cor x值)
我基本上需要对大约20000个唯一ID执行此操作。我希望这是有道理的。我在R方面还不是很好 除非你在寻找回答这个问题的
dplyr
方法,这里有一个快速但有点不雅观的方法:
cond1 <- df$ID1[df$ID1 == "B"]
sum1 <- sum(df$cor[cond1] * df$value1[cond1])
cond2 <- df$ID2[df$ID2 == "B"]
sum2 <- sum(df$cor[cond2] * df$value2[cond2])
finalsum = sum1 + sum2
如果ID仅存在于ID1或ID2中,则可能存在问题,也可能不存在问题。我相信你可以写一个条件来处理这个问题。另一种看待问题的方式如下 假设您的数据帧名称为
a
a1 <- subset(a,select=c(ID1,cor,value1))
a2 <- subset(a,select=c(ID2,cor,value1))
colnames(a2)[colnames(a2) == "ID2"] <- "ID1"
a3 <- rbind(a1,a2)
a3$MULTIPLY1 <- a3$cor * a3$value1
a4 <- a3 %>% group_by(ID1) %>% summarise(FINALVALUE = sum(MULTIPLY1))
# A tibble: 4 x 2
ID1 FINALVALUE
<chr> <dbl>
1 A 70
2 B 50
3 C 38
4 D 34
a1这能满足您的需要吗
library(tidyverse)
df2 <- df %>%
pivot_longer(names_to = "names", values_to = "values", -c(cor:value2)) %>%
mutate(value = if_else(names == "ID1", value2, value1),
sum = cor * value) %>%
group_by(values) %>%
summarise(sum = sum(sum))
库(tidyverse)
df2%
轴长(名称到=“名称”,值到=“值”,-c(cor:value2))%>%
变异(value=if_else(name==“ID1”、value2、value1),
总和=cor*值)%>%
分组依据(值)%>%
总结(总和=总和(总和))
问题是我有几千个ID。为什么cor
在ID2
中时要乘value1
?这不是直觉。
IDs <- unique(c(df$ID1, df$ID2))
sapply(IDs, function (x) prodsum(df, x)
a1 <- subset(a,select=c(ID1,cor,value1))
a2 <- subset(a,select=c(ID2,cor,value1))
colnames(a2)[colnames(a2) == "ID2"] <- "ID1"
a3 <- rbind(a1,a2)
a3$MULTIPLY1 <- a3$cor * a3$value1
a4 <- a3 %>% group_by(ID1) %>% summarise(FINALVALUE = sum(MULTIPLY1))
# A tibble: 4 x 2
ID1 FINALVALUE
<chr> <dbl>
1 A 70
2 B 50
3 C 38
4 D 34
library(tidyverse)
df2 <- df %>%
pivot_longer(names_to = "names", values_to = "values", -c(cor:value2)) %>%
mutate(value = if_else(names == "ID1", value2, value1),
sum = cor * value) %>%
group_by(values) %>%
summarise(sum = sum(sum))