R 删除冲销交易记录
我有一些冲销交易的交易级数据。这些交易由一个负数和一个正数表示R 删除冲销交易记录,r,dataframe,dplyr,data-cleaning,R,Dataframe,Dplyr,Data Cleaning,我有一些冲销交易的交易级数据。这些交易由一个负数和一个正数表示 trnx_df <- data.frame(Date = c("2018-01-01", "2018-01-01", "2018-01-01", "2018-01-01", "2018-01-03", "2018-01-03", "2018-01-05", "2018-02-01", "2018-02-01", "2018-02-01"),
trnx_df <- data.frame(Date = c("2018-01-01", "2018-01-01", "2018-01-01", "2018-01-01", "2018-01-03", "2018-01-03", "2018-01-05", "2018-02-01",
"2018-02-01", "2018-02-01"),
Product = c("A", "A", "A", "A", "B", "B", "B", "A", "A", "A"),
Amount = c(-1000, 1000, 1000, 1000, -1000, 1000, 500, -2000, 1000, 2000))
trnx_df
Date Product Amount
1 2018-01-01 A -1000
2 2018-01-01 A 1000
3 2018-01-01 A 1000
4 2018-01-01 A 1000
5 2018-01-03 B -1000
6 2018-01-03 B 1000
7 2018-01-05 B 500
8 2018-02-01 A -2000
9 2018-02-01 A 1000
10 2018-02-01 A 2000
trnx\u df%
汇总(总金额=总和(金额),
最大金额=最大(金额))
trnx_摘要
产品总金额最大金额
1 A 3000 2000
2b 500 1000
总的来说不会有问题,因为负分录会抵消正分录,但对于最大花费,我会得到错误的输出
产品A的最大金额应为1000(2000
和-2000
将相互抵消)
我怎样才能解决这个问题?此外,是否有办法从数据帧本身删除这些冲销交易?df%>%\n过滤负面交易,保存在dftemp中
过滤器(数量<0)%>%
mutate(Amount=abs(Amount))->dftemp#在dftemp中,负交易为正,以便于查找匹配项
df%>%#过滤没有负副本的正交易
过滤器(数量>0)%>%
反连接(dftemp)->dfuniques
df%>%
筛选(金额>0)%>%#筛选正向交易
内部连接(dftemp)%>%#合并原始df和dftemp中的OB
分组依据(日期、产品、金额)%>%#分组依据日期、产品和金额
切片(-1)%>%#对于每个日期、产品和金额组合,删除1行(这是一个负交易和一个正交易的副本)
完全联接(dfuniques)%>%#联接唯一的正事务(从这里开始,您就有了所需的数据帧,其中包含相互抵消的正负事务)
组别(产品)%>%
总结(总金额=总额(金额),最大金额=最大金额)
产品总金额最大金额
1 A 3000 1000
2B500 500
使用超前和滞后功能:
trnx_df %>%
group_by(Product, AmountAbs = abs(Amount)) %>%
arrange(Product, AmountAbs, Amount) %>%
mutate(
remove =
(sign(lag(Amount, default = 0)) == -1 &
lag(AmountAbs, default = 0) == Amount) |
((sign(Amount)) == -1 &
lead(AmountAbs) == AmountAbs)) %>%
ungroup() %>%
filter(!remove) %>%
group_by(Product) %>%
summarise(Total_Amount = sum(Amount), Max_Amount = max(Amount))
# # A tibble: 2 x 3
# Product Total_Amount Max_Amount
# <fct> <dbl> <dbl>
# 1 A 3000 1000
# 2 B 500 500
trnx\u df%>%
集团单位(产品、金额=资产负债表(金额))%>%
安排(产品、数量、金额)%>%
变异(
除去=
(符号(滞后(金额,默认值=0))=-1&
滞后(amoundabs,默认值=0)=金额)|
((符号(金额))==-1&
铅含量(amoundabs)=amoundabs))%>%
解组()%>%
过滤器(!移除)%%>%
组别(产品)%>%
总结(总金额=总额(金额),最大金额=最大金额)
##tibble:2 x 3
#产品总金额最大金额
#
#1 A 3000 1000
#2B500 500
“冲销交易”是否意味着如果存在1000
和-1000
,则忽略这些行?是的。我们应该忽略这些行显示如果交易被取消,您知道会发生什么吗?负数表示被取消的金额,但作为交易条目,正数和负数都被捕获谢谢。很好!!
df %>% #filter the negative transactions, save in dftemp
filter(Amount < 0) %>%
mutate(Amount = abs(Amount)) -> dftemp # in dftemp, negative transactions are positive to ease looking for matches
df %>% #filter the positive transactions that do no have a negative duplicate
filter(Amount > 0) %>%
anti_join(dftemp) -> dfuniques
df %>%
filter(Amount > 0) %>% #filter positive transactions
inner_join(dftemp) %>% #merge obs that are both in the original df and in dftemp
group_by(Date, Product, Amount) %>% #group by date, product and amount
slice(-1) %>% #for each date, product & amount combo, delete 1 row (which is a duplicate of one negative and one positive transaction)
full_join(dfuniques) %>% # join the unique positive transactions (from here on, you have your desired dataframe with negative and positive transactions that cancelled each other out deleted)
group_by(Product) %>%
summarise(Total_Amount = sum(Amount), Max_Amount = max(Amount))
Product Total_Amount Max_Amount
<fctr> <dbl> <dbl>
1 A 3000 1000
2 B 500 500
trnx_df %>%
group_by(Product, AmountAbs = abs(Amount)) %>%
arrange(Product, AmountAbs, Amount) %>%
mutate(
remove =
(sign(lag(Amount, default = 0)) == -1 &
lag(AmountAbs, default = 0) == Amount) |
((sign(Amount)) == -1 &
lead(AmountAbs) == AmountAbs)) %>%
ungroup() %>%
filter(!remove) %>%
group_by(Product) %>%
summarise(Total_Amount = sum(Amount), Max_Amount = max(Amount))
# # A tibble: 2 x 3
# Product Total_Amount Max_Amount
# <fct> <dbl> <dbl>
# 1 A 3000 1000
# 2 B 500 500