尝试在R中使用mutate_at和max()函数编写代码,并使用自己的数据。出现警告消息:max没有未丢失的参数

尝试在R中使用mutate_at和max()函数编写代码,并使用自己的数据。出现警告消息:max没有未丢失的参数,r,dplyr,max,mutate,R,Dplyr,Max,Mutate,我现在正在通过一本书学习R,并尝试使用dplyr中的变异函数。在本例中,我希望以0到1的比例标准化调查项目。为此,我们可以将每个值除以刻度的(理论)最大值 来自“pradadata”软件包的书籍示例stats_测试工作得非常好: data(stats_test, package = "pradadata") stats_test %>% drop_na() %>% mutate_at(.vars = vars(study_time, self_eva

我现在正在通过一本书学习R,并尝试使用dplyr中的变异函数。在本例中,我希望以0到1的比例标准化调查项目。为此,我们可以将每个值除以刻度的(理论)最大值

来自“pradadata”软件包的书籍示例stats_测试工作得非常好:

data(stats_test, package = "pradadata")
  stats_test %>%
  drop_na() %>% 
  mutate_at(.vars = vars(study_time, self_eval, interest),
            .funs = funs(prop = ./max(.))) %>%                                         
  select(contains("_prop"))
输出:

study_time_prop self_eval_prop interest_prop
             <dbl>          <dbl>         <dbl>
 1             0.6            0.7         0.667
 2             0.8            0.8         0.833
 3             0.6            0.4         0.167
 4             0.8            0.7         0.833
 5             0.4            0.6         0.5  
 6             0.4            0.6         0.667
 7             0.8            0.6         0.5  
 8             0.2            0.7         0.667
 9             0.6            0.8         0.833
10             0.6            0.7         0.833
# ... with 1,617 more rows
# A tibble: 0 x 0
Warning messages:
1: Problem with `mutate()` input `prop`.
i no non-missing arguments to max; returning -Inf
i Input `prop` is `RG04/max(RG04)`. 
2: In base::max(x, ..., na.rm = na.rm) :
  no non-missing arguments to max; returning -Inf


str(df_literacy_2$RG04)
int [1:630] 2 4 2 1 2 2 1 3 1 3 ...
为什么它对我的数据不起作用

谢谢你的帮助

使用df_扫盲样本编辑:

> dput(head(df_literacy,20))
structure(list(CASE = c(40, 41, 44, 45, 48, 49, 54, 55, 56, 57, 
58, 61, 62, 63, 64, 65, 66, 67, 68, 69), SERIAL = c(NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA), REF = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA), QUESTNNR = c("base", "base", 
"base", "base", "base", "base", "base", "base", "base", "base", 
"base", "base", "base", "base", "base", "base", "base", "base", 
"base", "base"), MODE = c("interview", "interview", "interview", 
"interview", "interview", "interview", "interview", "interview", 
"interview", "interview", "interview", "interview", "interview", 
"interview", "interview", "interview", "interview", "interview", 
"interview", "interview"), STARTED = structure(c(1607290462, 
1607290608, 1607291086, 1607291118, 1607291265, 1607291793, 1607294071, 
1607294336, 1607294337, 1607294419, 1607294814, 1607296474, 1607301809, 
1607329348, 1607333933, 1607335996, 1607336207, 1607336378, 1607343194, 
1607343414), tzone = "UTC", class = c("POSIXct", "POSIXt")), 
    EI01 = structure(c(2L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L), .Label = c("Ja", 
    "Nein", "Nicht beantwortet"), class = "factor"), EI02 = c(2, 
    2, 2, 1, 1, 2, 1, 2, 2, 2, 2, 1, 2, 2, 1, 1, 1, 1, 2, 3), 
    RF01 = c(4, 2, 4, 3, 4, 4, 1, 3, 2, 3, 4, 3, 2, 3, 2, 2, 
    4, 2, 5, 3), RF02 = c(1, 1, 1, 1, 2, 2, 1, 2, 1, 1, 2, 1, 
    1, 1, 2, 2, 2, 2, 2, 2), RF03 = c(1, 2, 2, 2, 1, 2, 1, 1, 
    1, 1, 2, 1, 1, 2, 2, 2, 1, 2, 1, 2), RG01 = c(2, 2, 2, 2, 
    2, 2, 1, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2), RG02 = c(3, 
    3, 3, 3, 4, 3, 4, 2, 4, 2, 3, 4, 4, 2, 4, 3, 4, 3, 4, 4), 
    RG03 = c(3, 2, 2, 3, 3, 3, 1, 3, 1, 2, 3, 1, 2, 2, 1, 3, 
    2, 3, 2, 2), RG04 = c(2, 4, 2, 1, 2, 2, 1, 3, 1, 3, 2, 4, 
    1, 1, 1, 1, 1, 2, 4, 1), RG05 = c(1, 1, 1, 1, 1, 1, 1, 2, 
    1, 2, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1), SD01 = structure(c(2L, 
    1L, 1L, 1L, 1L, 2L, 1L, 2L, 1L, 1L, 2L, 1L, 1L, 1L, 2L, 2L, 
    2L, 2L, 1L, 1L), .Label = c("weiblich", "männlich", "divers", 
    "nicht beantwortet"), class = "factor"), SD03 = c(4, 3, 2, 
    2, 1, 2, 4, 4, 1, 4, 3, 1, 2, 3, 2, 4, 2, 3, 1, 3), SD05_01 = c(23, 
    22, 22, 21, 18, 22, 21, 27, 17, 22, 17, 21, 21, 22, 50, 25, 
    23, 20, 23, 23), TIME001 = c(2, 3, 23, 73, 29, 2, 3, 3, 29, 7, 
    50, 55, 3, 2, 10, 2, 1, 5, 7, 35), TIME002 = c(2, 2, 16, 
    34, 12, 14, 2, 2, 21, 2, 30, 24, 21, 3, 3, 2, 3, 2, 3, 22
    ), TIME003 = c(34, 8, 12, 15, 13, 12, 12, 7, 13, 11, 16, 
    10, 11, 16, 8, 8, 7, 8, 11, 14), TIME004 = c(60, 33, 25, 
    31, 45, 25, 14, 13, 38, 35, 50, 50, 37, 32, 32, 25, 72, 55, 
    28, 29), TIME005 = c(84, 21, 29, 41, 54, 33, 30, 22, 32, 
    42, 44, 23, 65, 30, 28, 32, 51, 31, 27, 44), TIME006 = c(14, 
    9, 27, 11, 24, 8, 8, 9, 18, 12, 35, 33, 27, 46, 11, 15, 8, 
    14, 12, 14), TIME007 = c(3, 18, 3, 5, 6, 2, 9, 2, 3, 3, 6, 
    7, 3, 13, 4, 4, 378, 3, 4, 10), TIME_SUM = c(199, 94, 135, 
    142, 183, 96, 78, 58, 154, 112, 186, 152, 167, 142, 96, 88, 
    146, 118, 92, 168), MAILSENT = c(NA, NA, NA, NA, NA, NA, 
    NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), 
    LASTDATA = structure(c(1607290661, 1607290702, 1607291221, 
    1607291328, 1607291448, 1607291889, 1607294149, 1607294394, 
    1607294491, 1607294531, 1607295045, 1607296676, 1607301976, 
    1607329490, 1607334030, 1607336084, 1607336727, 1607336496, 
    1607343286, 1607343582), tzone = "UTC", class = c("POSIXct", 
    "POSIXt")), FINISHED = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
    1, 1, 1, 1, 1, 1, 1, 1, 1), Q_VIEWER = c(0, 0, 0, 0, 0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), LASTPAGE = c(7, 
    7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7), 
    MAXPAGE = c(7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 
    7, 7, 7, 7, 7), MISSING = c(7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 
    7, 7, 7, 7, 7, 7, 0, 7, 7, 7), MISSREL = c(1, 1, 1, 1, 1, 
    1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1), TIME_RSI = c("46023", 
    "14246", "0.75", "0.63", "0.54", "12055", "17533", "30682", 
    "0.7", "44197", "0.45", "0.58", "0.83", "44378", "44501", 
    "18629", "46753", "46388", "44197", "0.57"), DEG_TIME = c(27, 
    27, 3, 1, 0, 23, 30, 42, 2, 17, 0, 2, 7, 18, 10, 27, 43, 
    18, 8, 0)), row.names = c(NA, -20L), class = c("tbl_df", 
"tbl", "data.frame"))
> sapply(df_literacy, function(a) table(c(T,F,is.na(a)))-1)
      CASE SERIAL REF QUESTNNR MODE STARTED EI01 EI02 RF01 RF02 RF03 RG01 RG02 RG03 RG04 RG05 SD01 SD03 SD05_01 TE03_01 TIME001 TIME002 TIME003
FALSE  630      0   0      630  630     630  630  630  630  630  630  630  630  630  630  630  629  629     615      99     630     630     630
TRUE     0    630 630        0    0       0    0    0    0    0    0    0    0    0    0    0    1    1      15     531       0       0       0
      TIME004 TIME005 TIME006 TIME007 TIME_SUM MAILSENT LASTDATA FINISHED Q_VIEWER LASTPAGE MAXPAGE MISSING MISSREL TIME_RSI DEG_TIME
FALSE     630     630     629     625      630        0      630      630      630      630     630     630     630      630      630
TRUE        0       0       1       5        0      630        0        0        0        0       0       0       0        0        0
使用正确和错误NAs进行编辑:

> dput(head(df_literacy,20))
structure(list(CASE = c(40, 41, 44, 45, 48, 49, 54, 55, 56, 57, 
58, 61, 62, 63, 64, 65, 66, 67, 68, 69), SERIAL = c(NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA), REF = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA), QUESTNNR = c("base", "base", 
"base", "base", "base", "base", "base", "base", "base", "base", 
"base", "base", "base", "base", "base", "base", "base", "base", 
"base", "base"), MODE = c("interview", "interview", "interview", 
"interview", "interview", "interview", "interview", "interview", 
"interview", "interview", "interview", "interview", "interview", 
"interview", "interview", "interview", "interview", "interview", 
"interview", "interview"), STARTED = structure(c(1607290462, 
1607290608, 1607291086, 1607291118, 1607291265, 1607291793, 1607294071, 
1607294336, 1607294337, 1607294419, 1607294814, 1607296474, 1607301809, 
1607329348, 1607333933, 1607335996, 1607336207, 1607336378, 1607343194, 
1607343414), tzone = "UTC", class = c("POSIXct", "POSIXt")), 
    EI01 = structure(c(2L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L), .Label = c("Ja", 
    "Nein", "Nicht beantwortet"), class = "factor"), EI02 = c(2, 
    2, 2, 1, 1, 2, 1, 2, 2, 2, 2, 1, 2, 2, 1, 1, 1, 1, 2, 3), 
    RF01 = c(4, 2, 4, 3, 4, 4, 1, 3, 2, 3, 4, 3, 2, 3, 2, 2, 
    4, 2, 5, 3), RF02 = c(1, 1, 1, 1, 2, 2, 1, 2, 1, 1, 2, 1, 
    1, 1, 2, 2, 2, 2, 2, 2), RF03 = c(1, 2, 2, 2, 1, 2, 1, 1, 
    1, 1, 2, 1, 1, 2, 2, 2, 1, 2, 1, 2), RG01 = c(2, 2, 2, 2, 
    2, 2, 1, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2), RG02 = c(3, 
    3, 3, 3, 4, 3, 4, 2, 4, 2, 3, 4, 4, 2, 4, 3, 4, 3, 4, 4), 
    RG03 = c(3, 2, 2, 3, 3, 3, 1, 3, 1, 2, 3, 1, 2, 2, 1, 3, 
    2, 3, 2, 2), RG04 = c(2, 4, 2, 1, 2, 2, 1, 3, 1, 3, 2, 4, 
    1, 1, 1, 1, 1, 2, 4, 1), RG05 = c(1, 1, 1, 1, 1, 1, 1, 2, 
    1, 2, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1), SD01 = structure(c(2L, 
    1L, 1L, 1L, 1L, 2L, 1L, 2L, 1L, 1L, 2L, 1L, 1L, 1L, 2L, 2L, 
    2L, 2L, 1L, 1L), .Label = c("weiblich", "männlich", "divers", 
    "nicht beantwortet"), class = "factor"), SD03 = c(4, 3, 2, 
    2, 1, 2, 4, 4, 1, 4, 3, 1, 2, 3, 2, 4, 2, 3, 1, 3), SD05_01 = c(23, 
    22, 22, 21, 18, 22, 21, 27, 17, 22, 17, 21, 21, 22, 50, 25, 
    23, 20, 23, 23), TIME001 = c(2, 3, 23, 73, 29, 2, 3, 3, 29, 7, 
    50, 55, 3, 2, 10, 2, 1, 5, 7, 35), TIME002 = c(2, 2, 16, 
    34, 12, 14, 2, 2, 21, 2, 30, 24, 21, 3, 3, 2, 3, 2, 3, 22
    ), TIME003 = c(34, 8, 12, 15, 13, 12, 12, 7, 13, 11, 16, 
    10, 11, 16, 8, 8, 7, 8, 11, 14), TIME004 = c(60, 33, 25, 
    31, 45, 25, 14, 13, 38, 35, 50, 50, 37, 32, 32, 25, 72, 55, 
    28, 29), TIME005 = c(84, 21, 29, 41, 54, 33, 30, 22, 32, 
    42, 44, 23, 65, 30, 28, 32, 51, 31, 27, 44), TIME006 = c(14, 
    9, 27, 11, 24, 8, 8, 9, 18, 12, 35, 33, 27, 46, 11, 15, 8, 
    14, 12, 14), TIME007 = c(3, 18, 3, 5, 6, 2, 9, 2, 3, 3, 6, 
    7, 3, 13, 4, 4, 378, 3, 4, 10), TIME_SUM = c(199, 94, 135, 
    142, 183, 96, 78, 58, 154, 112, 186, 152, 167, 142, 96, 88, 
    146, 118, 92, 168), MAILSENT = c(NA, NA, NA, NA, NA, NA, 
    NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), 
    LASTDATA = structure(c(1607290661, 1607290702, 1607291221, 
    1607291328, 1607291448, 1607291889, 1607294149, 1607294394, 
    1607294491, 1607294531, 1607295045, 1607296676, 1607301976, 
    1607329490, 1607334030, 1607336084, 1607336727, 1607336496, 
    1607343286, 1607343582), tzone = "UTC", class = c("POSIXct", 
    "POSIXt")), FINISHED = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
    1, 1, 1, 1, 1, 1, 1, 1, 1), Q_VIEWER = c(0, 0, 0, 0, 0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), LASTPAGE = c(7, 
    7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7), 
    MAXPAGE = c(7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 
    7, 7, 7, 7, 7), MISSING = c(7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 
    7, 7, 7, 7, 7, 7, 0, 7, 7, 7), MISSREL = c(1, 1, 1, 1, 1, 
    1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1), TIME_RSI = c("46023", 
    "14246", "0.75", "0.63", "0.54", "12055", "17533", "30682", 
    "0.7", "44197", "0.45", "0.58", "0.83", "44378", "44501", 
    "18629", "46753", "46388", "44197", "0.57"), DEG_TIME = c(27, 
    27, 3, 1, 0, 23, 30, 42, 2, 17, 0, 2, 7, 18, 10, 27, 43, 
    18, 8, 0)), row.names = c(NA, -20L), class = c("tbl_df", 
"tbl", "data.frame"))
> sapply(df_literacy, function(a) table(c(T,F,is.na(a)))-1)
      CASE SERIAL REF QUESTNNR MODE STARTED EI01 EI02 RF01 RF02 RF03 RG01 RG02 RG03 RG04 RG05 SD01 SD03 SD05_01 TE03_01 TIME001 TIME002 TIME003
FALSE  630      0   0      630  630     630  630  630  630  630  630  630  630  630  630  630  629  629     615      99     630     630     630
TRUE     0    630 630        0    0       0    0    0    0    0    0    0    0    0    0    0    1    1      15     531       0       0       0
      TIME004 TIME005 TIME006 TIME007 TIME_SUM MAILSENT LASTDATA FINISHED Q_VIEWER LASTPAGE MAXPAGE MISSING MISSREL TIME_RSI DEG_TIME
FALSE     630     630     629     625      630        0      630      630      630      630     630     630     630      630      630
TRUE        0       0       1       5        0      630        0        0        0        0       0       0       0        0        0

这里有几件事需要纠正

  • drop\u na()
    正在删除您的所有数据

    drop\u na(df\u扫盲)
    ##tibble:0 x 37
    # # ... 有37个变量:CASE、SERIAL、REF、QUESTNNR、,
    ##模式,启动,EI01,EI02,RF01,RF02,
    ##RF03,RG01,RG02,RG03,RG04,RG05,
    ##SD01、SD03、SD05_01、TIME001、TIME002、,
    ##时间003,时间004,时间005,时间006,时间007,
    ##时间(SUM,mailssend,LASTDATA,FINISHED,,
    ##Q#U查看器,最后一页,最大页,缺失,
    ##米斯雷尔,时间#RSI,度#时间
    
    问题是,您有几个列完全是
    NA
    ,即
    SERIAL
    REF
    mailssent

    sapply(df_读写,函数(a)表(c(T,F,is.na(a))-1)
    #案例序列参考请求NNR模式已启动EI01 EI02 RF01 RF02 RF03 RG01 RG02
    #假20 0 20 20 20 20 20 20 20
    #真0 20 20 0 0 0 0 0 0 0 0 0 0 0
    #RG03 RG04 RG05 SD01 SD03 SD05_01时间001时间002时间003时间004时间005
    #假20 20 20 20 20 20 20 20 20
    #真0 0 0 0 0 0 0 0 0 0 0 0
    #TIME006 TIME007 TIME\U SUM MAILSENT LASTDATA已完成Q\U查看器LASTPAGE
    #假20 20 20 20 20 20 20 20
    #真0 0 20 0 0 0 0 0 0 0
    #MAXPAGE缺少MISSREL TIME_RSI DEG_TIME
    #假20 20 20
    #真0 0 0 0 0
    
    放下
    Drop\u na()
    ,或者至少放下
    Drop\u na(-SERIAL,-REF,-mailssent)

  • 您的代码正在使用
    funs
    ,自
    dplyr-0.8.0
    以来,该功能已被弃用

    #警告:`funs()`从dplyr 0.8.0开始就不推荐使用。
    #请使用函数或lambda的列表:
    ##简单命名列表:
    #列表(平均值=平均值,中位数=中位数)
    ##使用'tibble::lst()'自动命名:
    #tibble::lst(平均值、中值)
    ##使用lambdas
    #列表(~平均值(,修剪=0.2),~中值(,na.rm=TRUE))
    
    虽然这不会导致错误,但会导致警告(并且可能会在某个点停止工作。请将您的
    mutate_at
    更改为:

    mutate_at(.vars=vars(RG04,RF02),
    .funs=列表(属性=~./max(.))
    
  • 您在
    .vars
    中使用一个变量,在
    .funs
    中使用一个函数,因此列名将按原样保留(并且您将不会看到
    \u prop
    列)。从
    ?在
    处进行变异:

    新列的名称源自
    输入变量和函数的名称。
    •如果只有一个未命名函数(即如果“.funs”是
    长度为1)的未命名列表,输入变量的名称
    用于命名新列;
    •对于_at函数,如果只有一个未命名变量
    (即,如果“.vars”的形式为“vars(单个列)”,并且
    “.funs”的长度大于1,即
    函数用于命名新列;
    •否则,新名称将通过连接
    输入变量的名称和函数的名称,
    用下划线“\”分隔。
    
    如果不打算添加更多变量和函数,则需要在调用中对其进行自命名,如
    mutate\u at(.vars=vars(RG04=RG04),…)
    。奇怪的是,这会导致它生成
    RG04\u prop

  • 如果我们解决了所有这些问题,它就会起作用

    df_扫盲%>%
    删除na(-SERIAL,-REF,-mailssent)%>%
    在(.vars=vars(RG04=RG04)处突变,
    .funs=列表(属性=~./max(.))%>%
    选择(包含(“_prop”))%>%
    总目(3)
    #一个tibble:3x1
    #RG04_道具
    #       
    # 1       0.5
    # 2       1  
    # 3       0.5
    
    如果
    df_-literacy
    ,您是否有机会分享一个示例?这几乎是毫无意义的尝试,因为我们不知道数据是什么样子。我建议使用
    dput(head(df_-literacy,20))
    的输出,或者使用一些足够大的行来获得所需的效果(但不会更大)。(我猜..当使用
    drop_na()
    进行筛选时返回零行时,我可以重现这些警告。命令告诉您
    #tible:0 x 0
    ,这意味着零行零列。也许零列部分是更大的问题。)我共享了一个来自df_识字的示例作为编辑。正如我之前所说,
    drop_na(df_识字)
    返回零行。该示例中有三列完全是
    na
    SERIAL
    REF
    MAILSENT
    。运行
    sapply(df_识字,函数(a)表(c(T,F,is.na))-1)
    查看每列中有多少NA(true)和非NA(false)。我已经运行了代码并附加了结果。查看我的答案。您添加的输出完全支持我的语句。您是否了解没有参数的
    drop_NA()
    对数据做了什么?(它检查所有列,如果列中的任何一行是
    NA
    ,则