替换数据帧中psych:：Descripte for模式的平均值_R_Psych_Tidyselect

替换数据帧中psych:：Descripte for模式的平均值

替换数据帧中psych:：Descripte for模式的平均值,r,psych,tidyselect,R,Psych,Tidyselect,我喜欢psych:：Descripte的汇总统计数据，但我想用模式替换平均值，但只针对因子变量。如何编程模式的输出以替换setosa（或任何其他因子变量）我使用iris进行复制，即使它只有一个 getMode <- function(df) { ux <- na.omit(unique(df)) ux[which.max(tabulate(match(df, ux)))] } Mode <- apply(iris%>% select(where(is.fact

我喜欢psych:：Descripte的汇总统计数据，但我想用模式替换平均值，但只针对因子变量。如何编程模式的输出以替换setosa（或任何其他因子变量）我使用iris进行复制，即使它只有一个

getMode <- function(df) {
  ux <- na.omit(unique(df))
  ux[which.max(tabulate(match(df, ux)))]
}

Mode <- apply(iris%>% select(where(is.factor)), 2, getMode)

#I only want 5 of psych's descriptive stats plus the mode.
table <- cbind(psych::describe(iris),
                      Mode) [,c(3,4,8,9,2, 14)] 
table

getMode如果您愿意使用不同的函数来生成相同类型的输出，您可以使用dplyr
和tidyr
来实现这一点。使用这种方法，您可以使用ifelse（）
来识别数字或非数字变量。唯一需要注意的是，如果让函数为因子生成非数值，则数值变量的输出也必须是字符。这就是为什么我将mean（）
函数包装在sprintf（）
中
getMode%
pivot_更长（所有内容（），
name_to=c（“set”、“.value”），
name_pattern=“（.+）+）”）
#一个tibble:5x6
#设置平均sd最小值最大值n
#                     
#1萼片长度5.843 0.828 4.300 7.900 150
#2萼片宽3.057 0.436 2.000 4.400 150
#3瓣长3.7581.77 1.000 6.900 150
#4瓣宽1.199 0.762 0.100 2.500 150
#5种刚毛0.819刚毛维吉尼亚150
#     

这还允许您进行其他更改-例如，上面我用第一级物种
替换了最小值，用最后一级物种
替换了最大值。这并不一定是您想要做的，但是根据变量的类型很容易更改输出值。
谢谢Davesprintf
是一个很好的启示，非常酷的替代方案。如果我想要每个物种级别的SD，你建议如何强制因子，以便SD（）可以计算每个物种级别？@ibm不确定你想要每个物种级别的SD是什么意思。按物种划分的变量SD已在上述输出中。你还想找什么样的SD？
getMode <- function(df) {
  ux <- na.omit(unique(df))
  ux[which.max(tabulate(match(df, ux)))]
}

library(tidyr)
iris %>% 
  summarise_all(.funs = list(
    mean = function(x)ifelse(is.numeric(x), sprintf("%.3f", mean(x)), as.character(getMode(x))), 
    sd = function(x)ifelse(is.numeric(x), sd(x), sd(as.numeric(x))), 
    min = function(x)ifelse(is.numeric(x), sprintf("%.3f", min(x)), levels(x)[1]), 
    max = function(x)ifelse(is.numeric(x), sprintf("%.3f", max(x)), levels(x)[length(levels(x))]), 
    n = function(x)sum(!is.na(x))
  )) %>% 
  pivot_longer(everything(),
        names_to = c("set", ".value"),
        names_pattern = "(.+)_(.+)")
                            
# A tibble: 5 x 6
#            set  mean     sd   min    max         n
#          <chr> <chr>  <dbl> <chr>  <chr>     <int>
# 1 Sepal.Length 5.843  0.828 4.300  7.900       150
# 2 Sepal.Width  3.057  0.436 2.000  4.400       150
# 3 Petal.Length 3.758  1.77  1.000  6.900       150
# 4 Petal.Width  1.199  0.762 0.100  2.500       150
# 5 Species      setosa 0.819 setosa virginica   150    
#