在R中的函数中使用dplyr，然后使用for循环来执行该函数_R_Function_For Loop_Dplyr

在R中的函数中使用dplyr，然后使用for循环来执行该函数

r function for-loop

在R中的函数中使用dplyr，然后使用for循环来执行该函数,r,function,for-loop,dplyr,R,Function,For Loop,Dplyr,我有一个数据帧，我们称之为df1，它看起来像这样： product_key month price productage 00020e32-8ecd53a64715 201508 65.00000 1 00020e32-8ecd53a64715 201509 65.00000 2 00020e32-8ecd53a64715 201510 65.00000 3 000340b8-60fb50bacac8 201504

我有一个数据帧，我们称之为df1，它看起来像这样：

product_key              month    price     productage

00020e32-8ecd53a64715   201508  65.00000    1
00020e32-8ecd53a64715   201509  65.00000    2
00020e32-8ecd53a64715   201510  65.00000    3
000340b8-60fb50bacac8   201504  55.00000    1
000340b8-60fb50bacac8   201505  55.00000    2
000340b8-60fb50bacac8   201506  53.16667    3
000340b8-60fb50bacac8   201507  27.50000    4
000340b8-60fb50bacac8   201508  27.50000    5
000340b8-60fb50bacac8   201509  27.50000    6
000340b8-60fb50bacac8   201510  27.50000    7
000458f1-9304a2fdb6ae   201506  49.00000    1
000458f1-9304a2fdb6ae   201507  49.00000    2
000458f1-9304a2fdb6ae   201508  49.00000    3
000458f1-9304a2fdb6ae   201509  49.00000    4
000458f1-9304a2fdb6ae   201510  49.00000    5

我要做的是过滤掉数据集中已存在1个月的所有产品（例如，

过滤（productage==1）

），然后根据这些项目及其价格创建单位价值指数。然后我想对数据集中已经存在2个月，然后是3个月的产品做同样的处理，以此类推

到目前为止，我所做的，但冗长的是：

第一个月

df1month1 <- df1 %>%
filter(productage == 1)

df1月1%
过滤器（productage==1）

每个产品的月平均价格

df1_UVIMONTH1<-df1month1%>%
  group_by(month)%>%
  summarise(aveprice=mean(price))

df1\u UVIMONTH1%
分组单位（月）%>%
汇总（价格=平均值（价格））

第1个月的UVI， 计算UVI价格指数

  df1UVIMONTH1<-df1_UVIMONTH1%>%
  mutate(month=as.numeric(month))%>%
  arrange(month)%>%
  mutate(UVI=(aveprice/lag(aveprice)))%>%
  mutate(UVI=case_when(month==min(month)~1,
                       month!=min(month)~ UVI))%>%
  mutate(chained=cumprod(UVI))

df1uvimont1%
变异（月=作为数字（月））%>%
安排（月）%>%
变异（UVI=（aveprice/滞后（aveprice）））%>%
当（月==min（月）~1时发生变异（UVI=case_），
月！=最小（月）~UVI））%>%
突变（链式=cumprod（UVI））

然而，对于数据集中的每个产品年龄（最多可以有26个）和10个不同的数据集，这样做既冗长又乏味。我正在努力使这个过程更有效率，但我正在努力

我已尝试创建一个函数：

product_by_age <- function(df1, age){
  filter_by_month <- df1 %>%
    filter(productage %in% age) %>%
    group_by(month) %>%
    summarise(aveprice=mean(price))

  UVI_index <- filter_by_month %>%
    mutate(month=as.numeric(month))%>%
    arrange(month)%>%
    mutate(UVI=(aveprice/lag(aveprice)))%>%
    mutate(UVI=case_when(month==min(month)~1,
                         month!=min(month)~ UVI))%>%
    mutate(chained=cumprod(UVI))
}



df1productage <- data.frame(age = unique(df1$productage), stringsAsFactors = FALSE)

result <- data.frame()
for (i in df1productage:length(df1productage)) {
  sba <- product_by_age(df1, df1productage[i])
  result <- rbind(result, sba)
}

按年龄划分的产品百分比
分组单位（月）%>%
汇总（价格=平均值（价格））
UVI_指数%
变异（月=作为数字（月））%>%
安排（月）%>%
变异（UVI=（aveprice/滞后（aveprice）））%>%
当（月==min（月）~1时发生变异（UVI=case_），
月！=最小（月）~UVI））%>%
突变（链式=cumprod（UVI））
}
我们需要稍微改变一下循环。假设我们正在“df1productage”中的行序列中循环，并且“result”被初始化为空data.frame
for(i in seq_len(nrow(df1productage))) {
    result <- rbind(result, product_by_age(df1, df1productage$age[i]))
 }

dim(result)
#[1] 15  4

编辑：在map\u df
中添加了一个标识符列，该列可用于分组，无需新功能
require(dplyr)

df1%>%
  group_by(month, productage)%>%
  summarise(aveprice=mean(price)) %>% arrange(productage, month) %>%
    group_by(productage)%>%
    mutate(UVI=c(1, aveprice[2:length(aveprice)]/aveprice[1:length(aveprice)-1])) %>%
  mutate(chained=cumprod(UVI))

 ### Group and then regroup. and I have modified your mutate code which was using 'lag' 

# A tibble: 15 x 5
# Groups:   productage [7]
    month productage aveprice   UVI chained
    <dbl> <chr>         <dbl> <dbl>   <dbl>
 1 201504 1              55.0 1.00    1.00 
 2 201506 1              49.0 0.891   0.891
 3 201508 1              65.0 1.33    1.18 
 4 201505 2              55.0 1.00    1.00 
 5 201507 2              49.0 0.891   0.891
 6 201509 2              65.0 1.33    1.18 
 7 201506 3              53.2 1.00    1.00 
 8 201508 3              49.0 0.922   0.922
 9 201510 3              65.0 1.33    1.22 
10 201507 4              27.5 1.00    1.00 
11 201509 4              49.0 1.78    1.78 
12 201508 5              27.5 1.00    1.00 
13 201510 5              49.0 1.78    1.78 
14 201509 6              27.5 1.00    1.00 
15 201510 7              27.5 1.00    1.00 

require（dplyr）
df1%>%
分组单位（月，生产年龄）%>%
汇总（aveprice=平均（价格））%%>%
分组依据（生产年龄）%>%
变异（UVI=c（1，aveprice[2:长度（aveprice）]/aveprice[1:长度（aveprice）-1]））%>%
突变（链式=cumprod（UVI））
###分组，然后重新分组。我已经修改了你使用“滞后”的变异代码
#一个tibble:15x5
#组别:productage[7]
月productage aveprice UVI链接
1 201504 1              55.0 1.00    1.00 
2 201506 1              49.0 0.891   0.891
3 201508 1              65.0 1.33    1.18 
4 201505 2              55.0 1.00    1.00 
5 201507 2              49.0 0.891   0.891
6 201509 2              65.0 1.33    1.18 
7 201506 3              53.2 1.00    1.00 
8 201508 3              49.0 0.922   0.922
9 201510 3              65.0 1.33    1.22 
10 201507 4              27.5 1.00    1.00 
11 201509 4              49.0 1.78    1.78 
12 201508 5              27.5 1.00    1.00 
13 201510 5              49.0 1.78    1.78 
14 201509 6              27.5 1.00    1.00 
15 201510 7              27.5 1.00    1.00 

现在，您只需使用split
即可按列productage进行拆分
<代码>分组依据（productage，month）

..我认为错误在于

DF1 productage:length（df1productage

@Tjebo它不起作用，因为您将所有产品年龄都放在一个数据集中，因此您也必须在该阶段添加某种过滤函数，以获得您想要的结果。这将再次意味着将过滤函数改为按每个产品年龄过滤-这既冗长又乏味。此外，在函数内部

 price\u currentdaymode

是输入数据集中没有的。数字列的示例应该没有引号。它根据this@akrun-我已经更新了它，这就是我的数据集中调用的price变量，但为了方便起见，在本例中将其更改为price我创建了一个名为df1product age的数据框架，其中包含了所有唯一的product age值，所以我认为这是可行的。我不明白为什么不可行？谢谢！这非常有效（我刚刚使用了dplyr方法）-是否有一种方法可以为我的结果获得每个产品年龄的单独数据框？或者至少将产品年龄添加到单位值索引旁边的另一列中，并在函数中的某个位置对其进行编码，因为目前我有索引变量，但它没有告诉我它们对应于哪个项目年龄。谢谢again@JayJ你可以在

map_-df

中使用

.id

，即

map_-df（df1productage%>%…），.id='grp'）

谢谢-这并不能直接回答问题（对于读者），但它完全满足了我的需求，并且使我的代码更加简单高效！

library(tidyverse)
map_df(df1productage %>% 
              pull(age), ~    
                    product_by_age(df1, .x), .id = 'grp')
# A tibble: 15 x 5
#   grp   month aveprice   UVI chained
#   <chr> <dbl>    <dbl> <dbl>   <dbl>
# 1 1         1     55   1       1    
# 2 1         3     49   0.891   0.891
# 3 1         5     65   1.33    1.18 
# 4 2         2     55   1       1    
# 5 2         4     49   0.891   0.891
# 6 2         6     65   1.33    1.18 
# 7 3         3     53.2 1       1    
# 8 3         5     49   0.922   0.922
# 9 3         7     65   1.33    1.22 
#10 4         4     27.5 1       1    
#11 4         6     49   1.78    1.78 
#12 5         5     27.5 1       1    
#13 5         7     49   1.78    1.78 
#14 6         6     27.5 1       1    
#15 7         7     27.5 1       1

require(dplyr)

df1%>%
  group_by(month, productage)%>%
  summarise(aveprice=mean(price)) %>% arrange(productage, month) %>%
    group_by(productage)%>%
    mutate(UVI=c(1, aveprice[2:length(aveprice)]/aveprice[1:length(aveprice)-1])) %>%
  mutate(chained=cumprod(UVI))

 ### Group and then regroup. and I have modified your mutate code which was using 'lag' 

# A tibble: 15 x 5
# Groups:   productage [7]
    month productage aveprice   UVI chained
    <dbl> <chr>         <dbl> <dbl>   <dbl>
 1 201504 1              55.0 1.00    1.00 
 2 201506 1              49.0 0.891   0.891
 3 201508 1              65.0 1.33    1.18 
 4 201505 2              55.0 1.00    1.00 
 5 201507 2              49.0 0.891   0.891
 6 201509 2              65.0 1.33    1.18 
 7 201506 3              53.2 1.00    1.00 
 8 201508 3              49.0 0.922   0.922
 9 201510 3              65.0 1.33    1.22 
10 201507 4              27.5 1.00    1.00 
11 201509 4              49.0 1.78    1.78 
12 201508 5              27.5 1.00    1.00 
13 201510 5              49.0 1.78    1.78 
14 201509 6              27.5 1.00    1.00 
15 201510 7              27.5 1.00    1.00