R 在一个数据帧中,我如何选择特定的变量,以便仅根据它们的名称进行进一步的计算?

R 在一个数据帧中,我如何选择特定的变量,以便仅根据它们的名称进行进一步的计算?,r,dplyr,R,Dplyr,使用dplyr,是否有方法选择名称以\u p和\u ln结尾的变量,并将它们与相应的变量相乘?例如,我试图得到三个不同的变量,第一个是a_p与a_ln相乘的结果,第二个是B_p与B_ln相乘的结果。我发现很难精确地确定我需要的命名变量,因为我必须在数据集中保留三个变量A、B和C dput()输出: structure(list(id=structure)(c(2,4,6,8,10),label=“id”,format.spss=“F4.0”,display_width=0L),A=c(13,9,

使用
dplyr
,是否有方法选择名称以
\u p
\u ln
结尾的变量,并将它们与相应的变量相乘?例如,我试图得到三个不同的变量,第一个是
a_p
a_ln
相乘的结果,第二个是
B_p
B_ln
相乘的结果。我发现很难精确地确定我需要的命名变量,因为我必须在数据集中保留三个变量
A
B
C

dput()输出:
structure(list(id=structure)(c(2,4,6,8,10),label=“id”,format.spss=“F4.0”,display_width=0L),A=c(13,9,14,13),B=c(12,0,9,3,10),c=c(13,8,14,13,11),total=c(38,17,37,30,34),A_p=c(2,5,3,6,10,10),B_=c(3,6,10,10),c=c(3,6,10,5),c=c(3,6,2,5,2,5),c=2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,C_ln=C(2,8,10,2,5)),row.names=C(NA,-5L),class=C(“tbl_df”,“tbl”,“data.frame”))
我不太确定这是您想要的,但一些数据争论可以帮助您获得摘要
数据。frame
如果需要,您可以稍后绑定到原始的

library(dplyr)
# your data
df <- structure(list(id = structure(c(2, 4, 6, 8, 10), label = "id", format.spss = "F4.0", display_width = 0L), A = c(13, 9, 14, 14, 13), B = c(12, 0, 9, 3, 10), C = c(13, 8, 14, 13, 11), total = c(38, 17, 37, 30, 34), A_p = c(2, 5, 3, 6, 10), B_p = c(5, 3, 6, 10, 2), C_p = c(3, 6, 10, 2, 5), A_ln = c(10, 2, 5, 3, 6), B_ln = c(10, 2, 5, 1, 2), C_ln = c(2, 8, 10, 2, 5)), row.names = c(NA, -5L), class = c("tbl_df", "tbl", "data.frame"))

# select the columns according to your pattern
df %>% select(id, ends_with("_p"), ends_with("_ln")) %>% 
# pivot everything into a long format, keeping the id column
  tidyr::pivot_longer(-id) %>% 
# separate the variables, we do this because you want the A "_p" to multiply the A of "_ln" and we will group by letter to do this
  tidyr::separate(col = "name", into=c("letter", "end"), sep="_") %>%
# don't forget to group by id, so values are multiplied within id 
  group_by(id, letter) %>% 
  summarise(prod_value = prod(value))

库(dplyr)
#你的数据
df%select(id,以(“\u p”)结尾,以(“\u ln”)结尾)%>%
#将所有内容转换为长格式,保留id列
tidyr::pivot_更长(-id)%>%
#分开变量,我们这样做是因为你想让A“_p”乘以A的“_ln”,我们将按字母分组
tidyr::分开(col=“name”,插入=c(“字母”,“结束”),sep=“”)%>%
#别忘了按id分组,这样值就会在id内相乘
分组人(id,字母)%>%
总结(产品价值=产品(价值))
这就产生了

# A tibble: 15 x 3
# Groups:   id [5]
      id letter prod_value
   <dbl> <chr>       <dbl>
 1     2 A              20
 2     2 B              50
 3     2 C               6
 4     4 A              10
 5     4 B               6
 6     4 C              48
 7     6 A              15
 8     6 B              30
 9     6 C             100
10     8 A              18
11     8 B              10
12     8 C               4
13    10 A              60
14    10 B               4
15    10 C              25
#一个tible:15 x 3
#组别:id[5]
id字母prod_值
12 A 20
2 B 50
3 2 C 6
4 A 10
5 4 B 6
6 4 C 48
7 6 A 15
8 6 B 30
9 6 C 100
10 8 A 18
11 8 B 10
12 8 C 4
13 10 A 60
14 10 B 4
15 10 C 25

我不确定这是否是您想要的,但一些数据争论有助于获得摘要
数据。框架
如果需要,您可以稍后绑定到原始框架

library(dplyr)
# your data
df <- structure(list(id = structure(c(2, 4, 6, 8, 10), label = "id", format.spss = "F4.0", display_width = 0L), A = c(13, 9, 14, 14, 13), B = c(12, 0, 9, 3, 10), C = c(13, 8, 14, 13, 11), total = c(38, 17, 37, 30, 34), A_p = c(2, 5, 3, 6, 10), B_p = c(5, 3, 6, 10, 2), C_p = c(3, 6, 10, 2, 5), A_ln = c(10, 2, 5, 3, 6), B_ln = c(10, 2, 5, 1, 2), C_ln = c(2, 8, 10, 2, 5)), row.names = c(NA, -5L), class = c("tbl_df", "tbl", "data.frame"))

# select the columns according to your pattern
df %>% select(id, ends_with("_p"), ends_with("_ln")) %>% 
# pivot everything into a long format, keeping the id column
  tidyr::pivot_longer(-id) %>% 
# separate the variables, we do this because you want the A "_p" to multiply the A of "_ln" and we will group by letter to do this
  tidyr::separate(col = "name", into=c("letter", "end"), sep="_") %>%
# don't forget to group by id, so values are multiplied within id 
  group_by(id, letter) %>% 
  summarise(prod_value = prod(value))

库(dplyr)
#你的数据
df%select(id,以(“\u p”)结尾,以(“\u ln”)结尾)%>%
#将所有内容转换为长格式,保留id列
tidyr::pivot_更长(-id)%>%
#分开变量,我们这样做是因为你想让A“_p”乘以A的“_ln”,我们将按字母分组
tidyr::分开(col=“name”,插入=c(“字母”,“结束”),sep=“”)%>%
#别忘了按id分组,这样值就会在id内相乘
分组人(id,字母)%>%
总结(产品价值=产品(价值))
这就产生了

# A tibble: 15 x 3
# Groups:   id [5]
      id letter prod_value
   <dbl> <chr>       <dbl>
 1     2 A              20
 2     2 B              50
 3     2 C               6
 4     4 A              10
 5     4 B               6
 6     4 C              48
 7     6 A              15
 8     6 B              30
 9     6 C             100
10     8 A              18
11     8 B              10
12     8 C               4
13    10 A              60
14    10 B               4
15    10 C              25
#一个tible:15 x 3
#组别:id[5]
id字母prod_值
12 A 20
2 B 50
3 2 C 6
4 A 10
5 4 B 6
6 4 C 48
7 6 A 15
8 6 B 30
9 6 C 100
10 8 A 18
11 8 B 10
12 8 C 4
13 10 A 60
14 10 B 4
15 10 C 25

此示例将初始数据子集为两个矩阵,您可以将它们相乘,然后只需修复名称即可

库(dplyr,warn.conflicts=FALSE)
dat%
将_重命名为(.fn=函数(x)gsub(“_-ln”,”,x))
#>A、B、C
#> 1 20 50   6
#> 2 10  6  48
#> 3 15 30 100
#> 4 18 10   4
#> 5 60  4  25

由(v0.3.0)于2020-12-13创建此示例将初始数据子集为两个矩阵,您可以将其相乘,然后只需修复名称即可

库(dplyr,warn.conflicts=FALSE)
dat%
将_重命名为(.fn=函数(x)gsub(“_-ln”,”,x))
#>A、B、C
#> 1 20 50   6
#> 2 10  6  48
#> 3 15 30 100
#> 4 18 10   4
#> 5 60  4  25

由(v0.3.0)于2020-12-13创建,您也可以使用dplyr的rowwise()和c_,如下例所示,创建三个列,分别是以“A”、“B_”和“c_”开头的列的乘积

library(tidyverse)

structure(list(id = structure(c(2, 4, 6, 8, 10), label = "id", 
                              format.spss = "F4.0", display_width = 0L), 
               A = c(13, 9, 14, 14, 13), 
               B = c(12, 0, 9, 3, 10), 
               C = c(13, 8, 14, 13, 11), 
               total = c(38, 17, 37, 30, 34), 
               A_p = c(2, 5, 3, 6, 10), 
               B_p = c(5, 3, 6, 10, 2), 
               C_p = c(3, 6, 10, 2, 5), 
               A_ln = c(10, 2, 5, 3, 6), 
               B_ln = c(10, 2, 5, 1, 2), 
               C_ln = c(2, 8, 10, 2, 5)), 
          row.names = c(NA, -5L), 
          class = c("tbl_df", "tbl", "data.frame")) %>%
  rowwise() %>%
  mutate(
    A_product = prod(c_across(starts_with("A_"))),
    B_product = prod(c_across(starts_with("B_"))),
    C_product = prod(c_across(starts_with("C_"))),
    )
如果您只需要包含产品的列,只需添加:

%>% select(A_product, B_product, C_product)

您还可以使用dplyr的rowwise()和c_,如下面的示例所示,该示例创建三个列,它们分别是以“A_u”、“B_u”和“c_u”开头的列的乘积

library(tidyverse)

structure(list(id = structure(c(2, 4, 6, 8, 10), label = "id", 
                              format.spss = "F4.0", display_width = 0L), 
               A = c(13, 9, 14, 14, 13), 
               B = c(12, 0, 9, 3, 10), 
               C = c(13, 8, 14, 13, 11), 
               total = c(38, 17, 37, 30, 34), 
               A_p = c(2, 5, 3, 6, 10), 
               B_p = c(5, 3, 6, 10, 2), 
               C_p = c(3, 6, 10, 2, 5), 
               A_ln = c(10, 2, 5, 3, 6), 
               B_ln = c(10, 2, 5, 1, 2), 
               C_ln = c(2, 8, 10, 2, 5)), 
          row.names = c(NA, -5L), 
          class = c("tbl_df", "tbl", "data.frame")) %>%
  rowwise() %>%
  mutate(
    A_product = prod(c_across(starts_with("A_"))),
    B_product = prod(c_across(starts_with("B_"))),
    C_product = prod(c_across(starts_with("C_"))),
    )
如果您只需要包含产品的列,只需添加:

%>% select(A_product, B_product, C_product)
这是否有效:

library(dplyr)
library(purrr)
bind_cols(df, map2_dfc(grep('_p$', names(df), value = 1), grep('_ln$', names(df), value = 1), ~{
   new_col <- paste0(.x,.y)
   df %>% 
     transmute(!!new_col := .data[[.x]]*.data[[.y]])
 }))
# A tibble: 5 x 14
     id     A     B     C total   A_p   B_p   C_p  A_ln  B_ln  C_ln A_pA_ln B_pB_ln C_pC_ln
  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>   <dbl>   <dbl>   <dbl>
1     2    13    12    13    38     2     5     3    10    10     2      20      50       6
2     4     9     0     8    17     5     3     6     2     2     8      10       6      48
3     6    14     9    14    37     3     6    10     5     5    10      15      30     100
4     8    14     3    13    30     6    10     2     3     1     2      18      10       4
5    10    13    10    11    34    10     2     5     6     2     5      60       4      25
库(dplyr)
图书馆(purrr)
bind_cols(df,map2_-dfc(grep)(“p$”,名称(df),值=1),grep(“ln$”,名称(df),值=1)~{
新科
转换(!!新列:=.data[.x]]*.data[.y]])
}))
#一个tibble:5x14
id A B C总计A_p B_p C_p A_ln B_ln C_ln A_pA_ln B_pB_ln C_pC_ln
1     2    13    12    13    38     2     5     3    10    10     2      20      50       6
2     4     9     0     8    17     5     3     6     2     2     8      10       6      48
3     6    14     9    14    37     3     6    10     5     5    10      15      30     100
4     8    14     3    13    30     6    10     2     3     1     2      18      10       4
5    10    13    10    11    34    10     2     5     6     2     5      60       4      25
这是否有效:

library(dplyr)
library(purrr)
bind_cols(df, map2_dfc(grep('_p$', names(df), value = 1), grep('_ln$', names(df), value = 1), ~{
   new_col <- paste0(.x,.y)
   df %>% 
     transmute(!!new_col := .data[[.x]]*.data[[.y]])
 }))
# A tibble: 5 x 14
     id     A     B     C total   A_p   B_p   C_p  A_ln  B_ln  C_ln A_pA_ln B_pB_ln C_pC_ln
  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>   <dbl>   <dbl>   <dbl>
1     2    13    12    13    38     2     5     3    10    10     2      20      50       6
2     4     9     0     8    17     5     3     6     2     2     8      10       6      48
3     6    14     9    14    37     3     6    10     5     5    10      15      30     100
4     8    14     3    13    30     6    10     2     3     1     2      18      10       4
5    10    13    10    11    34    10     2     5     6     2     5      60       4      25
库(dplyr)
图书馆(purrr)
bind_cols(df,map2_-dfc(grep)(“p$”,名称(df),值=1),grep(“ln$”,名称(df),值=1)~{
新科
转换(!!新列:=.data[.x]]*.data[.y]])
}))
#一个tibble:5x14
id A B C总计A\u p