如何在R中按日期子集数据帧并执行多个操作?

如何在R中按日期子集数据帧并执行多个操作?,r,for-loop,R,For Loop,我每天收到CSV报告,每个报告都有相同数量的变量,但时间不同。我想根据日期运行一些简单的分析并保存结果。我认为for循环可以完成这项工作,但我只知道基本知识。理想情况下,我只需要每月运行一次脚本并获得结果。欢迎提供任何指导或建议 假设我在一个文件夹中有两个CSV报告: #File 1 - 20200624.csv Date Market Salesman Product Quantity Price Cost 6/24/2020 A MF

我每天收到CSV报告,每个报告都有相同数量的变量,但时间不同。我想根据日期运行一些简单的分析并保存结果。我认为
for
循环可以完成这项工作,但我只知道基本知识。理想情况下,我只需要每月运行一次脚本并获得结果。欢迎提供任何指导或建议

假设我在一个文件夹中有两个CSV报告:

#File 1 - 20200624.csv
Date        Market  Salesman    Product Quantity    Price   Cost
6/24/2020   A       MF          Apple   20          1       0.5
6/24/2020   A       RP          Apple   15          1       0.5
6/24/2020   A       RP          Banana  20          2       0.5
6/24/2020   A       FR          Orange  20          3       0.5
6/24/2020   B       MF          Apple   20          1       0.5
6/24/2020   B       RP          Banana  20          2       0.5

#File 2 - 20200625.csv
Date        Market  Salesman    Product Quantity    Price   Cost
6/25/2020   A       MF          Apple   10          1       0.6
6/25/2020   A       MF          Banana  15          1       0.6
6/25/2020   A       RP          Banana  10          2       0.6
6/25/2020   A       FR          Orange  15          3       0.6
6/25/2020   B       MF          Apple   20          1       0.6
6/25/2020   B       RP          Banana  20          2       0.6
我使用以下代码将所有文件导入到R中:

library(readr)
library(dplyr)

#Import files
files <- list.files(path = "~/JuneReports", 
                    pattern = "*.csv", full.names = T)
tbl <- sapply(files, read_csv, simplify=FALSE) %>% 
  bind_rows(.id = "id")
#Remove the "id" column
tbl2 <- tbl[,-1]
#Subset the data frame to get only Mark A, as Market B is irrelavant.
tbl3 <- subset(tbl2, Market == "A")
head(tbl3)
# A tibble: 6 x 7
  Date      Market Salesman Product Quantity Price  Cost
  <chr>     <chr>  <chr>    <chr>      <dbl> <dbl> <dbl>
1 6/24/2020 A      MF       Apple         20     1   0.5
2 6/24/2020 A      RP       Apple         15     1   0.5
3 6/24/2020 A      RP       Banana        20     2   0.5
4 6/24/2020 A      FR       Orange        20     3   0.5
5 6/25/2020 A      MF       Apple         10     1   0.6
6 6/25/2020 A      MF       Banana        15     1   0.6

我们按“日期”、“市场”分组,计算“数量”与“价格”和“成本”的乘积之和,
。将
与“产品”一起添加到
分组中,得到“数量”的
和,并使用
透视
将其重塑为“宽”格式

library(dplyr) # 1.0.0
library(tidyr)
df1 %>%
    group_by(Date, Market) %>% 
    group_by(Revenue = c(Quantity %*% Price), 
             TotalCost = c(Quantity %*% Cost),
             Product, .add = TRUE) %>% 
    summarise(Sold = sum(Quantity)) %>% 
    pivot_wider(names_from = Product, values_from = Sold)
# A tibble: 2 x 7
# Groups:   Date, Market, Revenue, TotalCost [2]
#  Date      Market Revenue TotalCost Apple Banana Orange
#  <chr>     <chr>    <dbl>     <dbl> <int>  <int>  <int>
#1 6/24/2020 A          135      37.5    35     20     20
#2 6/25/2020 A           25      15      10     15     NA
library(dplyr)#1.0.0
图书馆(tidyr)
df1%>%
集团单位(日期、市场)%>%
分组依据(收入=c(数量%*%价格),
总成本=c(数量%*%成本),
产品,.add=TRUE)%>%
汇总(销售=总额(数量))%>%
pivot(名称来源=产品,价值来源=销售)
#一个tibble:2x7
#分组:日期、市场、收入、总成本[2]
#日期市场收入总成本苹果香蕉橙
#                     
#1 2020年6月24日A 135 37.5 35 20
#2020年6月25日A 25 15 NA
数据
df1您可以使用
%*%
@akrun您能提供更多详细信息吗?我的解决方案输出基于您显示的
数据wesome!但是,我想你想要的是
add=TRUE
而不是
.add=TRUE
@KJM IN
dplyr 1.0.0
groupby(.data,….add=FALSE,.drop=groupby\u drop\u default(.data))
My bad!谢谢你的邀请clarification@KJM每次重新发布都会有一些变化。我同意,当你在使用不同的版本时,这会使它不合适。对不起,我忘了提那件事了version@KJM它可能不起作用,因为数量%*%的价格应在日期和市场范围内。您使用的代码将对整个列进行计算
library(dplyr) # 1.0.0
library(tidyr)
df1 %>%
    group_by(Date, Market) %>% 
    group_by(Revenue = c(Quantity %*% Price), 
             TotalCost = c(Quantity %*% Cost),
             Product, .add = TRUE) %>% 
    summarise(Sold = sum(Quantity)) %>% 
    pivot_wider(names_from = Product, values_from = Sold)
# A tibble: 2 x 7
# Groups:   Date, Market, Revenue, TotalCost [2]
#  Date      Market Revenue TotalCost Apple Banana Orange
#  <chr>     <chr>    <dbl>     <dbl> <int>  <int>  <int>
#1 6/24/2020 A          135      37.5    35     20     20
#2 6/25/2020 A           25      15      10     15     NA
df1 <- structure(list(Date = c("6/24/2020", "6/24/2020", "6/24/2020", 
"6/24/2020", "6/25/2020", "6/25/2020"), Market = c("A", "A", 
"A", "A", "A", "A"), Salesman = c("MF", "RP", "RP", "FR", "MF", 
"MF"), Product = c("Apple", "Apple", "Banana", "Orange", "Apple", 
"Banana"), Quantity = c(20L, 15L, 20L, 20L, 10L, 15L), Price = c(1L, 
1L, 2L, 3L, 1L, 1L), Cost = c(0.5, 0.5, 0.5, 0.5, 0.6, 0.6)), 
class = "data.frame", row.names = c("1", 
"2", "3", "4", "5", "6"))