R 计算行意味着不必提供列名，也不必根据每个列的总和有选择地删除列_R_Function_Dplyr

R 计算行意味着不必提供列名，也不必根据每个列的总和有选择地删除列

r function

R 计算行意味着不必提供列名，也不必根据每个列的总和有选择地删除列,r,function,dplyr,R,Function,Dplyr,给出了以下数据集： library(tidyverse) # example data df1 = data.frame(ID = c("daisy", "lily", "rose", "tulip", "poppy", "iris", "orchid", "lotus", "crocus"), loc1 = c(10, 20, 30, 40, 50, 60, 70, 80, 90), loc2 = c(100, 200, 3

给出了以下数据集：

library(tidyverse)
# example data
df1 = data.frame(ID = c("daisy", "lily", "rose", "tulip", "poppy", "iris", "orchid", "lotus", "crocus"), 
                 loc1 = c(10, 20, 30, 40, 50, 60, 70, 80, 90),
                 loc2 = c(100, 200, 300, 400, 500, 600, 700, 800, 900), 
                 loc3 = c(0, 0, 0, 0, 0, 0, 0, 0, 0), 
                 loc4 = c(1000, 2000, 3000, 4000, 5000, 6000, 7000, 8000, 9000))

问题1：对于每一行，提取最小值，计算平均值，并将这两个结果附加到数据集中。使用以下代码工作：

df1 %>%  
  rowwise() %>% 
  mutate(Min = min(c(loc1, loc2, loc3, loc4)), Mean = mean(c(loc1, loc2, loc3, loc4)))

如何使代码更通用，以便将其应用于数据集中不包含因子或字符串的所有列？当我有100多列时，我想避免输入列名。我尝试了以下方法：

df1 %>%  
  rowwise() %>% 
  mutate(Min =  min(is_double(df1)), Mean = mean(is_double(df1)))

但这并没有产生预期的结果：

ID      loc1  loc2  loc3  loc4 Median  Mean
  <fct>  <dbl> <dbl> <dbl> <dbl> <lgl>  <dbl>
1 daisy     10   100     0  1000 FALSE      0
2 lily      20   200     0  2000 FALSE      0
3 rose      30   300     0  3000 FALSE      0

有什么建议吗

非常感谢

如果可以选择数字列，请使用select_

library(dplyr)
library(matrixStats)
df1 %>%
    mutate(Median = select_if(., is.numeric) %>% 
                               as.matrix %>% 
                              rowMedians, 
           Mean =select_if(., is.numeric) %>% 
                        rowMeans )

或转换为“长”格式，然后按行分组

获取数值列的和，并且条件为仅当列和大于0时求和

df1 %>% 
     summarise_if(~is.numeric(.) && sum(.) > 0, sum)
#  loc1 loc2  loc4
#1  450 4500 45000

或者使用base R

如果目的是选择sum>0和numeric的列，则在需要时使用select\u

或者也包括第一列因子

或者使用OP的代码，我们向其添加+1，因为cs是通过删除第一列创建的

df1 %>% 
      select(which(cs > 0)+1)

df1 %>% 
     select(1, which(cs > 0)+1)

包括第一列

df1 %>% 
      select(which(cs > 0)+1)

df1 %>% 
     select(1, which(cs > 0)+1)

或者从“df1”中删除第一列，然后使用OP文章中的代码

df1 %>%
  select(-1) %>%
  select( which(cs > 0))

对问题1的答复：

我们可以使用pmap_dbl来应用每行函数，并使用select_if来选择非因子或字符的列

对问题2的答复：

我们可以使用summary\u if对所有数值列求和，选择sum为0的列，并将其名称保存在已删除的列中

谢谢你，@akrun！我确认您的解决方案适用于我的原始数据集。关于如何解决问题2有什么想法吗？是的，您更新的解决方案3到5符合我对示例数据集的想法。将检查原始数据集。但是，我怎样才能取回带有花名的ID栏？@Dalmuti71更新了帖子谢谢！您更新的解决方案可在我的原始数据集上运行。太神了我想我会在假期里搞清楚你的select_if~ is.factor.| is.numeric.&&总数>0干杯谢谢你，罗纳克！这解决了如何保存已删除列的列名的问题。

df1 %>% 
    select_if(~ is.factor(.)|(is.numeric(.) && sum(.) > 0))
#      ID loc1 loc2 loc4
#1  daisy   10  100 1000
#2   lily   20  200 2000
#3   rose   30  300 3000
#4  tulip   40  400 4000
#5  poppy   50  500 5000
#6   iris   60  600 6000
#7 orchid   70  700 7000
#8  lotus   80  800 8000
#9 crocus   90  900 9000

df1 %>% 
      select(which(cs > 0)+1)

df1 %>% 
     select(1, which(cs > 0)+1)

df1 %>%
  select(-1) %>%
  select( which(cs > 0))

library(dplyr)
library(purrr)

df1 %>%
  mutate(Min = pmap_dbl(select_if(., ~!(is.factor(.) | is.character(.))), min),
         Mean = pmap_dbl(select_if(., ~!(is.factor(.) | is.character(.))), 
                                      ~mean(c(...))))

#      ID loc1 loc2 loc3 loc4 Min   Mean
#1  daisy   10  100    0 1000   0  277.5
#2   lily   20  200    0 2000   0  555.0
#3   rose   30  300    0 3000   0  832.5
#4  tulip   40  400    0 4000   0 1110.0
#5  poppy   50  500    0 5000   0 1387.5
#6   iris   60  600    0 6000   0 1665.0
#7 orchid   70  700    0 7000   0 1942.5
#8  lotus   80  800    0 8000   0 2220.0
#9 crocus   90  900    0 9000   0 2497.5

removed_cols <- df1 %>%
                 summarise_if(is.numeric, sum) %>%
                 select_if(~. == 0) %>%
                 names
removed_cols
#[1] "loc3"

df1 %>%
  summarise_if(is.numeric, sum) %>%
  select_if(~. != 0) 

#  loc1 loc2  loc4
#1  450 4500 45000