如何在R中按日期子集数据帧并执行多个操作?
我每天收到CSV报告,每个报告都有相同数量的变量,但时间不同。我想根据日期运行一些简单的分析并保存结果。我认为如何在R中按日期子集数据帧并执行多个操作?,r,for-loop,R,For Loop,我每天收到CSV报告,每个报告都有相同数量的变量,但时间不同。我想根据日期运行一些简单的分析并保存结果。我认为for循环可以完成这项工作,但我只知道基本知识。理想情况下,我只需要每月运行一次脚本并获得结果。欢迎提供任何指导或建议 假设我在一个文件夹中有两个CSV报告: #File 1 - 20200624.csv Date Market Salesman Product Quantity Price Cost 6/24/2020 A MF
for
循环可以完成这项工作,但我只知道基本知识。理想情况下,我只需要每月运行一次脚本并获得结果。欢迎提供任何指导或建议
假设我在一个文件夹中有两个CSV报告:
#File 1 - 20200624.csv
Date Market Salesman Product Quantity Price Cost
6/24/2020 A MF Apple 20 1 0.5
6/24/2020 A RP Apple 15 1 0.5
6/24/2020 A RP Banana 20 2 0.5
6/24/2020 A FR Orange 20 3 0.5
6/24/2020 B MF Apple 20 1 0.5
6/24/2020 B RP Banana 20 2 0.5
#File 2 - 20200625.csv
Date Market Salesman Product Quantity Price Cost
6/25/2020 A MF Apple 10 1 0.6
6/25/2020 A MF Banana 15 1 0.6
6/25/2020 A RP Banana 10 2 0.6
6/25/2020 A FR Orange 15 3 0.6
6/25/2020 B MF Apple 20 1 0.6
6/25/2020 B RP Banana 20 2 0.6
我使用以下代码将所有文件导入到R中:
library(readr)
library(dplyr)
#Import files
files <- list.files(path = "~/JuneReports",
pattern = "*.csv", full.names = T)
tbl <- sapply(files, read_csv, simplify=FALSE) %>%
bind_rows(.id = "id")
#Remove the "id" column
tbl2 <- tbl[,-1]
#Subset the data frame to get only Mark A, as Market B is irrelavant.
tbl3 <- subset(tbl2, Market == "A")
head(tbl3)
# A tibble: 6 x 7
Date Market Salesman Product Quantity Price Cost
<chr> <chr> <chr> <chr> <dbl> <dbl> <dbl>
1 6/24/2020 A MF Apple 20 1 0.5
2 6/24/2020 A RP Apple 15 1 0.5
3 6/24/2020 A RP Banana 20 2 0.5
4 6/24/2020 A FR Orange 20 3 0.5
5 6/25/2020 A MF Apple 10 1 0.6
6 6/25/2020 A MF Banana 15 1 0.6
我们按“日期”、“市场”分组,计算“数量”与“价格”和“成本”的乘积之和,
。将与“产品”一起添加到分组中,得到“数量”的和,并使用透视
将其重塑为“宽”格式
library(dplyr) # 1.0.0
library(tidyr)
df1 %>%
group_by(Date, Market) %>%
group_by(Revenue = c(Quantity %*% Price),
TotalCost = c(Quantity %*% Cost),
Product, .add = TRUE) %>%
summarise(Sold = sum(Quantity)) %>%
pivot_wider(names_from = Product, values_from = Sold)
# A tibble: 2 x 7
# Groups: Date, Market, Revenue, TotalCost [2]
# Date Market Revenue TotalCost Apple Banana Orange
# <chr> <chr> <dbl> <dbl> <int> <int> <int>
#1 6/24/2020 A 135 37.5 35 20 20
#2 6/25/2020 A 25 15 10 15 NA
library(dplyr)#1.0.0
图书馆(tidyr)
df1%>%
集团单位(日期、市场)%>%
分组依据(收入=c(数量%*%价格),
总成本=c(数量%*%成本),
产品,.add=TRUE)%>%
汇总(销售=总额(数量))%>%
pivot(名称来源=产品,价值来源=销售)
#一个tibble:2x7
#分组:日期、市场、收入、总成本[2]
#日期市场收入总成本苹果香蕉橙
#
#1 2020年6月24日A 135 37.5 35 20
#2020年6月25日A 25 15 NA
数据
df1您可以使用%*%
@akrun您能提供更多详细信息吗?我的解决方案输出基于您显示的头
数据wesome!但是,我想你想要的是add=TRUE
而不是.add=TRUE
@KJM INdplyr 1.0.0
groupby(.data,….add=FALSE,.drop=groupby\u drop\u default(.data))
My bad!谢谢你的邀请clarification@KJM每次重新发布都会有一些变化。我同意,当你在使用不同的版本时,这会使它不合适。对不起,我忘了提那件事了version@KJM它可能不起作用,因为数量%*%的价格应在日期和市场范围内。您使用的代码将对整个列进行计算
library(dplyr) # 1.0.0
library(tidyr)
df1 %>%
group_by(Date, Market) %>%
group_by(Revenue = c(Quantity %*% Price),
TotalCost = c(Quantity %*% Cost),
Product, .add = TRUE) %>%
summarise(Sold = sum(Quantity)) %>%
pivot_wider(names_from = Product, values_from = Sold)
# A tibble: 2 x 7
# Groups: Date, Market, Revenue, TotalCost [2]
# Date Market Revenue TotalCost Apple Banana Orange
# <chr> <chr> <dbl> <dbl> <int> <int> <int>
#1 6/24/2020 A 135 37.5 35 20 20
#2 6/25/2020 A 25 15 10 15 NA
df1 <- structure(list(Date = c("6/24/2020", "6/24/2020", "6/24/2020",
"6/24/2020", "6/25/2020", "6/25/2020"), Market = c("A", "A",
"A", "A", "A", "A"), Salesman = c("MF", "RP", "RP", "FR", "MF",
"MF"), Product = c("Apple", "Apple", "Banana", "Orange", "Apple",
"Banana"), Quantity = c(20L, 15L, 20L, 20L, 10L, 15L), Price = c(1L,
1L, 2L, 3L, 1L, 1L), Cost = c(0.5, 0.5, 0.5, 0.5, 0.6, 0.6)),
class = "data.frame", row.names = c("1",
"2", "3", "4", "5", "6"))