Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/r/68.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
R 删除冲销交易记录_R_Dataframe_Dplyr_Data Cleaning - Fatal编程技术网

R 删除冲销交易记录

R 删除冲销交易记录,r,dataframe,dplyr,data-cleaning,R,Dataframe,Dplyr,Data Cleaning,我有一些冲销交易的交易级数据。这些交易由一个负数和一个正数表示 trnx_df <- data.frame(Date = c("2018-01-01", "2018-01-01", "2018-01-01", "2018-01-01", "2018-01-03", "2018-01-03", "2018-01-05", "2018-02-01", "2018-02-01", "2018-02-01"),

我有一些冲销交易的交易级数据。这些交易由一个负数和一个正数表示

trnx_df <- data.frame(Date = c("2018-01-01", "2018-01-01", "2018-01-01", "2018-01-01", "2018-01-03", "2018-01-03", "2018-01-05", "2018-02-01",
                            "2018-02-01", "2018-02-01"),
                   Product = c("A", "A", "A", "A", "B", "B", "B", "A", "A", "A"),
                   Amount = c(-1000, 1000, 1000, 1000, -1000, 1000, 500, -2000, 1000, 2000))

trnx_df

             Date Product Amount
    1  2018-01-01       A  -1000
    2  2018-01-01       A   1000
    3  2018-01-01       A   1000
    4  2018-01-01       A   1000
    5  2018-01-03       B  -1000
    6  2018-01-03       B   1000
    7  2018-01-05       B    500
    8  2018-02-01       A  -2000
    9  2018-02-01       A   1000
    10 2018-02-01       A   2000
trnx\u df%
汇总(总金额=总和(金额),
最大金额=最大(金额))
trnx_摘要
产品总金额最大金额
1 A 3000 2000
2b 500 1000
总的来说不会有问题,因为负分录会抵消正分录,但对于最大花费,我会得到错误的输出

产品A的最大金额应为1000(
2000
-2000
将相互抵消)

我怎样才能解决这个问题?此外,是否有办法从数据帧本身删除这些冲销交易?

df%>%\n过滤负面交易,保存在dftemp中
过滤器(数量<0)%>%
mutate(Amount=abs(Amount))->dftemp#在dftemp中,负交易为正,以便于查找匹配项
df%>%#过滤没有负副本的正交易
过滤器(数量>0)%>%
反连接(dftemp)->dfuniques
df%>%
筛选(金额>0)%>%#筛选正向交易
内部连接(dftemp)%>%#合并原始df和dftemp中的OB
分组依据(日期、产品、金额)%>%#分组依据日期、产品和金额
切片(-1)%>%#对于每个日期、产品和金额组合,删除1行(这是一个负交易和一个正交易的副本)
完全联接(dfuniques)%>%#联接唯一的正事务(从这里开始,您就有了所需的数据帧,其中包含相互抵消的正负事务)
组别(产品)%>%
总结(总金额=总额(金额),最大金额=最大金额)
产品总金额最大金额
1 A 3000 1000
2B500 500
使用超前和滞后功能:

trnx_df %>% 
  group_by(Product, AmountAbs = abs(Amount)) %>% 
  arrange(Product, AmountAbs, Amount) %>% 
  mutate(
    remove =
      (sign(lag(Amount, default = 0)) == -1 &
           lag(AmountAbs, default = 0) == Amount) |
      ((sign(Amount)) == -1 &
         lead(AmountAbs) == AmountAbs)) %>% 
  ungroup() %>% 
  filter(!remove) %>%
  group_by(Product) %>% 
  summarise(Total_Amount = sum(Amount), Max_Amount = max(Amount))

# # A tibble: 2 x 3
# Product Total_Amount Max_Amount
#   <fct>          <dbl>      <dbl>
# 1 A               3000       1000
# 2 B                500        500
trnx\u df%>%
集团单位(产品、金额=资产负债表(金额))%>%
安排(产品、数量、金额)%>%
变异(
除去=
(符号(滞后(金额,默认值=0))=-1&
滞后(amoundabs,默认值=0)=金额)|
((符号(金额))==-1&
铅含量(amoundabs)=amoundabs))%>%
解组()%>%
过滤器(!移除)%%>%
组别(产品)%>%
总结(总金额=总额(金额),最大金额=最大金额)
##tibble:2 x 3
#产品总金额最大金额
#                   
#1 A 3000 1000
#2B500 500

“冲销交易”是否意味着如果存在
1000
-1000
,则忽略这些行?是的。我们应该忽略这些行显示如果交易被取消,您知道会发生什么吗?负数表示被取消的金额,但作为交易条目,正数和负数都被捕获谢谢。很好!!
df %>% #filter the negative transactions, save in dftemp
  filter(Amount < 0) %>% 
  mutate(Amount = abs(Amount)) -> dftemp # in dftemp, negative transactions are positive to ease looking for matches

df %>%  #filter the positive transactions that do no have a negative duplicate
  filter(Amount > 0) %>% 
  anti_join(dftemp) -> dfuniques  

df %>% 
  filter(Amount > 0) %>% #filter positive transactions
  inner_join(dftemp) %>% #merge obs that are both in the original df and in dftemp 
  group_by(Date, Product, Amount) %>%  #group by date, product and amount
  slice(-1) %>% #for each date, product & amount combo, delete 1 row (which is a duplicate of one negative and one positive transaction)
  full_join(dfuniques) %>% # join the unique positive transactions (from here on, you have your desired dataframe with negative and positive transactions that cancelled each other out deleted)
  group_by(Product) %>% 
  summarise(Total_Amount = sum(Amount), Max_Amount = max(Amount))

  Product Total_Amount Max_Amount
   <fctr>        <dbl>      <dbl>
1       A         3000       1000
2       B          500        500
trnx_df %>% 
  group_by(Product, AmountAbs = abs(Amount)) %>% 
  arrange(Product, AmountAbs, Amount) %>% 
  mutate(
    remove =
      (sign(lag(Amount, default = 0)) == -1 &
           lag(AmountAbs, default = 0) == Amount) |
      ((sign(Amount)) == -1 &
         lead(AmountAbs) == AmountAbs)) %>% 
  ungroup() %>% 
  filter(!remove) %>%
  group_by(Product) %>% 
  summarise(Total_Amount = sum(Amount), Max_Amount = max(Amount))

# # A tibble: 2 x 3
# Product Total_Amount Max_Amount
#   <fct>          <dbl>      <dbl>
# 1 A               3000       1000
# 2 B                500        500