Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/r/69.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
如何进行多次除法,并将余数存储在R中的新变量中?_R_Aggregate_Division - Fatal编程技术网

如何进行多次除法,并将余数存储在R中的新变量中?

如何进行多次除法,并将余数存储在R中的新变量中?,r,aggregate,division,R,Aggregate,Division,我有一个数据帧,每个数据点都有一个结构,比如:ID,measure,timemark ID measure timemark 001 12 15 003 3 13 004 365 0 003 1 13 ID是一个人的唯一研究ID,度量值是该人当时使用服务的天数,时间标记是一个从0到51的数字,

我有一个数据帧,每个数据点都有一个结构,比如:ID,measure,timemark

ID    measure   timemark   
001   12         15    
003   3          13            
004   365        0                   
003   1          13                  
ID是一个人的唯一研究ID,度量值是该人当时使用服务的天数,时间标记是一个从0到51的数字,表示x年中的52周

现在,我想创建52列的dataframe,每个列都包含他们在该服务中花费的天数(因此每周的最大天数应该是7天)。对于每个人,他们在一个时间点可以有多个条目。从这个意义上讲,总天数应该是两行的总和

所以我想让它像:

ID    ... week13 week14 week15 week 16   
001   ... 0      0      7      5        
003   ... 4      0      0      0            
004   ... 7      7      7      7                     

我一直在苦苦思索里面的逻辑,猜想它与商和度量的余数有关,但我无法通过。有人能帮忙吗?

我们可以先为每个
ID
timemark
sum
创建一行
度量值。我们创建一个列表,将
measure
与其余数一起分为7个步骤。使用
unnest\u longer
我们以长格式获取数据,并创建
时间标记
列,将周数追加,最后
以宽格式排列
数据

library(dplyr)
library(tidyr)

df %>%
  group_by(ID, timemark) %>%
  summarise(measure = sum(measure)) %>%
  mutate(measure = list(c(rep(7, floor(measure/7)), measure %% 7))) %>%
  unnest_longer(measure) %>%
  mutate(timemark = paste0('week', first(timemark) + 0:(n() - 1))) %>%
  slice(1:pmin(n(), 52)) %>%
  mutate(timemark = factor(timemark, levels = paste0('week', 0:51))) %>%
  spread(timemark, measure)
  #Or using pivot_wider in new tidyr
  #pivot_wider(names_from = timemark, values_from = measure)


# A tibble: 3 x 53
# Groups:   ID [3]
#     ID week0 week1 week2 week3 week4 week5 week6 week7 week8 week9 week10 week11 week12 week13 week14 week15 week16
#  <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>
#1     1    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA     NA     NA     NA     NA     NA      7      5
#2     3    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA     NA     NA     NA      4     NA     NA     NA
#3     4     7     7     7     7     7     7     7     7     7     7      7      7      7      7      7      7      7
# … with 35 more variables: week17 <dbl>, week18 <dbl>, week19 <dbl>, week20 <dbl>, week21 <dbl>, week22 <dbl>,
#   week23 <dbl>, week24 <dbl>, week25 <dbl>, week26 <dbl>, week27 <dbl>, week28 <dbl>, week29 <dbl>, week30 <dbl>,
#   week31 <dbl>, week32 <dbl>, week33 <dbl>, week34 <dbl>, week35 <dbl>, week36 <dbl>, week37 <dbl>, week38 <dbl>,
#   week39 <dbl>, week40 <dbl>, week41 <dbl>, week42 <dbl>, week43 <dbl>, week44 <dbl>, week45 <dbl>, week46 <dbl>,
#   week47 <dbl>, week48 <dbl>, week49 <dbl>, week50 <dbl>, week51 <dbl>
库(dplyr)
图书馆(tidyr)
df%>%
分组依据(ID,时间标记)%>%
汇总(度量=总和(度量))%>%
变异(度量=列表(c(代表(7,楼层(度量/7)),度量%%7)))%>%
最长时间(测量值)%>%
变异(timemark=paste0('week',first(timemark)+0:(n()-1)))%>%
切片(1:pmin(n(),52))%>%
突变(时间标记=因子(时间标记,级别=粘贴0('week',0:51))%>%
排列(时间标记、度量)
#或者在新三季度更广泛地使用pivot_
#枢轴(名称从=时间标记,值从=度量)
#A tibble:3x53
#组别:ID[3]
#ID周0周1周2周3周4周5周6周7周8周9周10周11周12周13周14周15周16
#                          
#1娜娜娜娜娜娜娜娜娜娜娜娜娜娜娜娜娜娜娜娜娜娜娜娜娜娜娜娜娜娜娜娜娜娜娜娜娜娜娜娜娜娜娜娜娜娜娜娜娜娜娜娜娜娜娜娜娜娜娜娜娜娜娜娜娜娜娜7 5
#2.3钠钠钠钠钠钠钠钠钠钠钠钠钠钠钠钠钠钠钠钠钠钠钠钠钠钠钠钠钠钠钠钠钠钠钠钠钠钠钠钠钠钠钠钠钠钠钠钠钠钠钠钠钠钠钠钠钠钠钠钠钠钠钠钠钠钠钠钠钠钠钠钠钠钠钠钠钠钠钠钠钠钠钠钠钠钠钠钠钠钠钠钠钠
#3     4     7     7     7     7     7     7     7     7     7     7      7      7      7      7      7      7      7
#…还有35个变量:第17周、第18周、第19周、第20周、第21周、第22周、,
#第23周、第24周、第25周、第26周、第27周、第28周、第29周、第30周、,
#第31周、第32周、第33周、第34周、第35周、第36周、第37周、第38周、,
#第39周、第40周、第41周、第42周、第43周、第44周、第45周、第46周、,
#47周、48周、49周、50周、51周
数据

df <- structure(list(ID = c(1L, 3L, 4L, 3L), measure = c(12L, 3L, 365L, 
1L), timemark = c(15L, 13L, 0L, 13L)), class = "data.frame", row.names = c(NA, -4L))

df我们可以首先为每个
ID
timemark
sum
创建一行
度量值。我们创建一个列表,将
measure
与其余数一起分为7个步骤。使用
unnest\u longer
我们以长格式获取数据,并创建
时间标记
列,将周数追加,最后
以宽格式排列
数据

library(dplyr)
library(tidyr)

df %>%
  group_by(ID, timemark) %>%
  summarise(measure = sum(measure)) %>%
  mutate(measure = list(c(rep(7, floor(measure/7)), measure %% 7))) %>%
  unnest_longer(measure) %>%
  mutate(timemark = paste0('week', first(timemark) + 0:(n() - 1))) %>%
  slice(1:pmin(n(), 52)) %>%
  mutate(timemark = factor(timemark, levels = paste0('week', 0:51))) %>%
  spread(timemark, measure)
  #Or using pivot_wider in new tidyr
  #pivot_wider(names_from = timemark, values_from = measure)


# A tibble: 3 x 53
# Groups:   ID [3]
#     ID week0 week1 week2 week3 week4 week5 week6 week7 week8 week9 week10 week11 week12 week13 week14 week15 week16
#  <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>
#1     1    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA     NA     NA     NA     NA     NA      7      5
#2     3    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA     NA     NA     NA      4     NA     NA     NA
#3     4     7     7     7     7     7     7     7     7     7     7      7      7      7      7      7      7      7
# … with 35 more variables: week17 <dbl>, week18 <dbl>, week19 <dbl>, week20 <dbl>, week21 <dbl>, week22 <dbl>,
#   week23 <dbl>, week24 <dbl>, week25 <dbl>, week26 <dbl>, week27 <dbl>, week28 <dbl>, week29 <dbl>, week30 <dbl>,
#   week31 <dbl>, week32 <dbl>, week33 <dbl>, week34 <dbl>, week35 <dbl>, week36 <dbl>, week37 <dbl>, week38 <dbl>,
#   week39 <dbl>, week40 <dbl>, week41 <dbl>, week42 <dbl>, week43 <dbl>, week44 <dbl>, week45 <dbl>, week46 <dbl>,
#   week47 <dbl>, week48 <dbl>, week49 <dbl>, week50 <dbl>, week51 <dbl>
库(dplyr)
图书馆(tidyr)
df%>%
分组依据(ID,时间标记)%>%
汇总(度量=总和(度量))%>%
变异(度量=列表(c(代表(7,楼层(度量/7)),度量%%7)))%>%
最长时间(测量值)%>%
变异(timemark=paste0('week',first(timemark)+0:(n()-1)))%>%
切片(1:pmin(n(),52))%>%
突变(时间标记=因子(时间标记,级别=粘贴0('week',0:51))%>%
排列(时间标记、度量)
#或者在新三季度更广泛地使用pivot_
#枢轴(名称从=时间标记,值从=度量)
#A tibble:3x53
#组别:ID[3]
#ID周0周1周2周3周4周5周6周7周8周9周10周11周12周13周14周15周16
#                          
#1娜娜娜娜娜娜娜娜娜娜娜娜娜娜娜娜娜娜娜娜娜娜娜娜娜娜娜娜娜娜娜娜娜娜娜娜娜娜娜娜娜娜娜娜娜娜娜娜娜娜娜娜娜娜娜娜娜娜娜娜娜娜娜娜娜娜娜7 5
#2 3钠钠钠钠钠钠钠钠钠钠钠钠钠4钠钠钠钠钠钠
#3     4     7     7     7     7     7     7     7     7     7     7      7      7      7      7      7      7      7
#…还有35个变量:第17周、第18周、第19周、第20周、第21周、第22周、,
#第23周、第24周、第25周、第26周、第27周、第28周、第29周、第30周、,
#第31周、第32周、第33周、第34周、第35周、第36周、第37周、第38周、,
#第39周、第40周、第41周、第42周、第43周、第44周、第45周、第46周、,
#47周、48周、49周、50周、51周
数据

df <- structure(list(ID = c(1L, 3L, 4L, 3L), measure = c(12L, 3L, 365L, 
1L), timemark = c(15L, 13L, 0L, 13L)), class = "data.frame", row.names = c(NA, -4L))

df我想留下我为你所做的一切。首先,我使用
expand()
为每个
ID
创建了一个主数据框,其中包含
ID
timemark
的所有组合。然后,我用下面的方法创建了
结果。我通过
ID
timemark
定义了组,并总结了测量值。然后,我确定了在第一个
mutate()
中展开结果需要多少周(行)。然后,我在
splitstackshape
包中使用expandRows()扩展了数据帧。然后,我更新了
timemark
中的数字,以便在第二个
mutate()
中有正确的周数。然后,我进行了一些计算,以分配每周的天数
lag(measure-7*row_number(),默认值=7)
创建一个向量,其中包含
measure
中剩余的天数。我需要使用逻辑条件替换一些数字。对于每个
,当行数为1时,在
测量
中指定值。当
res
大于7时,将7分配给
res
。(任何大于7的数字都是7,因为每周(行)最多需要7天。)否则,请将原始值保留在
res

library(dplyr)
library(tidyr)
library(splitstackshape)

master <- expand(mydf, timemark = 0:51, ID)

group_by(mydf, ID, timemark) %>% 
summarize(measure = sum(measure)) %>% 
ungroup %>% 
group_by(group = 1:n()) %>% 
mutate(nrow = as.integer(measure / 7) + 1) %>% 
expandRows(count = "nrow") %>%
mutate(timemark = first(timemark):(first(timemark) + n() - 1),
       res = lag(measure - 7 * row_number(), default = 7),
       res = case_when(n() == 1 ~ as.numeric(measure),
                       res > 7 ~ 7,
                       TRUE ~ res)) -> result
数据

mydf <- structure(list(ID = c(1L, 3L, 4L, 3L), measure = c(12L, 3L, 365L, 
1L), timemark = c(15L, 13L, 0L, 13L)), class = "data.frame", row.names = c(NA, 
-4L))

mydf我想留下我为你所做的一切。首先,我是c