R 在一个数据帧中,我如何选择特定的变量,以便仅根据它们的名称进行进一步的计算?
使用R 在一个数据帧中,我如何选择特定的变量,以便仅根据它们的名称进行进一步的计算?,r,dplyr,R,Dplyr,使用dplyr,是否有方法选择名称以\u p和\u ln结尾的变量,并将它们与相应的变量相乘?例如,我试图得到三个不同的变量,第一个是a_p与a_ln相乘的结果,第二个是B_p与B_ln相乘的结果。我发现很难精确地确定我需要的命名变量,因为我必须在数据集中保留三个变量A、B和C dput()输出: structure(list(id=structure)(c(2,4,6,8,10),label=“id”,format.spss=“F4.0”,display_width=0L),A=c(13,9,
dplyr
,是否有方法选择名称以\u p
和\u ln
结尾的变量,并将它们与相应的变量相乘?例如,我试图得到三个不同的变量,第一个是a_p
与a_ln
相乘的结果,第二个是B_p
与B_ln
相乘的结果。我发现很难精确地确定我需要的命名变量,因为我必须在数据集中保留三个变量A
、B
和C
dput()输出:
structure(list(id=structure)(c(2,4,6,8,10),label=“id”,format.spss=“F4.0”,display_width=0L),A=c(13,9,14,13),B=c(12,0,9,3,10),c=c(13,8,14,13,11),total=c(38,17,37,30,34),A_p=c(2,5,3,6,10,10),B_=c(3,6,10,10),c=c(3,6,10,5),c=c(3,6,2,5,2,5),c=2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,C_ln=C(2,8,10,2,5)),row.names=C(NA,-5L),class=C(“tbl_df”,“tbl”,“data.frame”))
我不太确定这是您想要的,但一些数据争论可以帮助您获得摘要数据。frame
如果需要,您可以稍后绑定到原始的
library(dplyr)
# your data
df <- structure(list(id = structure(c(2, 4, 6, 8, 10), label = "id", format.spss = "F4.0", display_width = 0L), A = c(13, 9, 14, 14, 13), B = c(12, 0, 9, 3, 10), C = c(13, 8, 14, 13, 11), total = c(38, 17, 37, 30, 34), A_p = c(2, 5, 3, 6, 10), B_p = c(5, 3, 6, 10, 2), C_p = c(3, 6, 10, 2, 5), A_ln = c(10, 2, 5, 3, 6), B_ln = c(10, 2, 5, 1, 2), C_ln = c(2, 8, 10, 2, 5)), row.names = c(NA, -5L), class = c("tbl_df", "tbl", "data.frame"))
# select the columns according to your pattern
df %>% select(id, ends_with("_p"), ends_with("_ln")) %>%
# pivot everything into a long format, keeping the id column
tidyr::pivot_longer(-id) %>%
# separate the variables, we do this because you want the A "_p" to multiply the A of "_ln" and we will group by letter to do this
tidyr::separate(col = "name", into=c("letter", "end"), sep="_") %>%
# don't forget to group by id, so values are multiplied within id
group_by(id, letter) %>%
summarise(prod_value = prod(value))
库(dplyr)
#你的数据
df%select(id,以(“\u p”)结尾,以(“\u ln”)结尾)%>%
#将所有内容转换为长格式,保留id列
tidyr::pivot_更长(-id)%>%
#分开变量,我们这样做是因为你想让A“_p”乘以A的“_ln”,我们将按字母分组
tidyr::分开(col=“name”,插入=c(“字母”,“结束”),sep=“”)%>%
#别忘了按id分组,这样值就会在id内相乘
分组人(id,字母)%>%
总结(产品价值=产品(价值))
这就产生了
# A tibble: 15 x 3
# Groups: id [5]
id letter prod_value
<dbl> <chr> <dbl>
1 2 A 20
2 2 B 50
3 2 C 6
4 4 A 10
5 4 B 6
6 4 C 48
7 6 A 15
8 6 B 30
9 6 C 100
10 8 A 18
11 8 B 10
12 8 C 4
13 10 A 60
14 10 B 4
15 10 C 25
#一个tible:15 x 3
#组别:id[5]
id字母prod_值
12 A 20
2 B 50
3 2 C 6
4 A 10
5 4 B 6
6 4 C 48
7 6 A 15
8 6 B 30
9 6 C 100
10 8 A 18
11 8 B 10
12 8 C 4
13 10 A 60
14 10 B 4
15 10 C 25
我不确定这是否是您想要的,但一些数据争论有助于获得摘要数据。框架
如果需要,您可以稍后绑定到原始框架
library(dplyr)
# your data
df <- structure(list(id = structure(c(2, 4, 6, 8, 10), label = "id", format.spss = "F4.0", display_width = 0L), A = c(13, 9, 14, 14, 13), B = c(12, 0, 9, 3, 10), C = c(13, 8, 14, 13, 11), total = c(38, 17, 37, 30, 34), A_p = c(2, 5, 3, 6, 10), B_p = c(5, 3, 6, 10, 2), C_p = c(3, 6, 10, 2, 5), A_ln = c(10, 2, 5, 3, 6), B_ln = c(10, 2, 5, 1, 2), C_ln = c(2, 8, 10, 2, 5)), row.names = c(NA, -5L), class = c("tbl_df", "tbl", "data.frame"))
# select the columns according to your pattern
df %>% select(id, ends_with("_p"), ends_with("_ln")) %>%
# pivot everything into a long format, keeping the id column
tidyr::pivot_longer(-id) %>%
# separate the variables, we do this because you want the A "_p" to multiply the A of "_ln" and we will group by letter to do this
tidyr::separate(col = "name", into=c("letter", "end"), sep="_") %>%
# don't forget to group by id, so values are multiplied within id
group_by(id, letter) %>%
summarise(prod_value = prod(value))
库(dplyr)
#你的数据
df%select(id,以(“\u p”)结尾,以(“\u ln”)结尾)%>%
#将所有内容转换为长格式,保留id列
tidyr::pivot_更长(-id)%>%
#分开变量,我们这样做是因为你想让A“_p”乘以A的“_ln”,我们将按字母分组
tidyr::分开(col=“name”,插入=c(“字母”,“结束”),sep=“”)%>%
#别忘了按id分组,这样值就会在id内相乘
分组人(id,字母)%>%
总结(产品价值=产品(价值))
这就产生了
# A tibble: 15 x 3
# Groups: id [5]
id letter prod_value
<dbl> <chr> <dbl>
1 2 A 20
2 2 B 50
3 2 C 6
4 4 A 10
5 4 B 6
6 4 C 48
7 6 A 15
8 6 B 30
9 6 C 100
10 8 A 18
11 8 B 10
12 8 C 4
13 10 A 60
14 10 B 4
15 10 C 25
#一个tible:15 x 3
#组别:id[5]
id字母prod_值
12 A 20
2 B 50
3 2 C 6
4 A 10
5 4 B 6
6 4 C 48
7 6 A 15
8 6 B 30
9 6 C 100
10 8 A 18
11 8 B 10
12 8 C 4
13 10 A 60
14 10 B 4
15 10 C 25
此示例将初始数据子集为两个矩阵,您可以将它们相乘,然后只需修复名称即可
库(dplyr,warn.conflicts=FALSE)
dat%
将_重命名为(.fn=函数(x)gsub(“_-ln”,”,x))
#>A、B、C
#> 1 20 50 6
#> 2 10 6 48
#> 3 15 30 100
#> 4 18 10 4
#> 5 60 4 25
由(v0.3.0)于2020-12-13创建此示例将初始数据子集为两个矩阵,您可以将其相乘,然后只需修复名称即可
库(dplyr,warn.conflicts=FALSE)
dat%
将_重命名为(.fn=函数(x)gsub(“_-ln”,”,x))
#>A、B、C
#> 1 20 50 6
#> 2 10 6 48
#> 3 15 30 100
#> 4 18 10 4
#> 5 60 4 25
由(v0.3.0)于2020-12-13创建,您也可以使用dplyr的rowwise()和c_,如下例所示,创建三个列,分别是以“A”、“B_”和“c_”开头的列的乘积
library(tidyverse)
structure(list(id = structure(c(2, 4, 6, 8, 10), label = "id",
format.spss = "F4.0", display_width = 0L),
A = c(13, 9, 14, 14, 13),
B = c(12, 0, 9, 3, 10),
C = c(13, 8, 14, 13, 11),
total = c(38, 17, 37, 30, 34),
A_p = c(2, 5, 3, 6, 10),
B_p = c(5, 3, 6, 10, 2),
C_p = c(3, 6, 10, 2, 5),
A_ln = c(10, 2, 5, 3, 6),
B_ln = c(10, 2, 5, 1, 2),
C_ln = c(2, 8, 10, 2, 5)),
row.names = c(NA, -5L),
class = c("tbl_df", "tbl", "data.frame")) %>%
rowwise() %>%
mutate(
A_product = prod(c_across(starts_with("A_"))),
B_product = prod(c_across(starts_with("B_"))),
C_product = prod(c_across(starts_with("C_"))),
)
如果您只需要包含产品的列,只需添加:
%>% select(A_product, B_product, C_product)
您还可以使用dplyr的rowwise()和c_,如下面的示例所示,该示例创建三个列,它们分别是以“A_u”、“B_u”和“c_u”开头的列的乘积
library(tidyverse)
structure(list(id = structure(c(2, 4, 6, 8, 10), label = "id",
format.spss = "F4.0", display_width = 0L),
A = c(13, 9, 14, 14, 13),
B = c(12, 0, 9, 3, 10),
C = c(13, 8, 14, 13, 11),
total = c(38, 17, 37, 30, 34),
A_p = c(2, 5, 3, 6, 10),
B_p = c(5, 3, 6, 10, 2),
C_p = c(3, 6, 10, 2, 5),
A_ln = c(10, 2, 5, 3, 6),
B_ln = c(10, 2, 5, 1, 2),
C_ln = c(2, 8, 10, 2, 5)),
row.names = c(NA, -5L),
class = c("tbl_df", "tbl", "data.frame")) %>%
rowwise() %>%
mutate(
A_product = prod(c_across(starts_with("A_"))),
B_product = prod(c_across(starts_with("B_"))),
C_product = prod(c_across(starts_with("C_"))),
)
如果您只需要包含产品的列,只需添加:
%>% select(A_product, B_product, C_product)
这是否有效:
library(dplyr)
library(purrr)
bind_cols(df, map2_dfc(grep('_p$', names(df), value = 1), grep('_ln$', names(df), value = 1), ~{
new_col <- paste0(.x,.y)
df %>%
transmute(!!new_col := .data[[.x]]*.data[[.y]])
}))
# A tibble: 5 x 14
id A B C total A_p B_p C_p A_ln B_ln C_ln A_pA_ln B_pB_ln C_pC_ln
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 2 13 12 13 38 2 5 3 10 10 2 20 50 6
2 4 9 0 8 17 5 3 6 2 2 8 10 6 48
3 6 14 9 14 37 3 6 10 5 5 10 15 30 100
4 8 14 3 13 30 6 10 2 3 1 2 18 10 4
5 10 13 10 11 34 10 2 5 6 2 5 60 4 25
库(dplyr)
图书馆(purrr)
bind_cols(df,map2_-dfc(grep)(“p$”,名称(df),值=1),grep(“ln$”,名称(df),值=1)~{
新科
转换(!!新列:=.data[.x]]*.data[.y]])
}))
#一个tibble:5x14
id A B C总计A_p B_p C_p A_ln B_ln C_ln A_pA_ln B_pB_ln C_pC_ln
1 2 13 12 13 38 2 5 3 10 10 2 20 50 6
2 4 9 0 8 17 5 3 6 2 2 8 10 6 48
3 6 14 9 14 37 3 6 10 5 5 10 15 30 100
4 8 14 3 13 30 6 10 2 3 1 2 18 10 4
5 10 13 10 11 34 10 2 5 6 2 5 60 4 25
这是否有效:
library(dplyr)
library(purrr)
bind_cols(df, map2_dfc(grep('_p$', names(df), value = 1), grep('_ln$', names(df), value = 1), ~{
new_col <- paste0(.x,.y)
df %>%
transmute(!!new_col := .data[[.x]]*.data[[.y]])
}))
# A tibble: 5 x 14
id A B C total A_p B_p C_p A_ln B_ln C_ln A_pA_ln B_pB_ln C_pC_ln
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 2 13 12 13 38 2 5 3 10 10 2 20 50 6
2 4 9 0 8 17 5 3 6 2 2 8 10 6 48
3 6 14 9 14 37 3 6 10 5 5 10 15 30 100
4 8 14 3 13 30 6 10 2 3 1 2 18 10 4
5 10 13 10 11 34 10 2 5 6 2 5 60 4 25
库(dplyr)
图书馆(purrr)
bind_cols(df,map2_-dfc(grep)(“p$”,名称(df),值=1),grep(“ln$”,名称(df),值=1)~{
新科
转换(!!新列:=.data[.x]]*.data[.y]])
}))
#一个tibble:5x14
id A B C总计A\u p