使用R数据表标记指示器
我有一个数据集,我想在其中执行以下操作,但我无法找到最佳解决方案使用R数据表标记指示器,r,R,我有一个数据集,我想在其中执行以下操作,但我无法找到最佳解决方案 Name Date Paid Outstanding Mark as Follows Close Indicator A 2000 100 200 Open 0 A 2001 224 100 Open 0 A 2002 348 400 Open 0 A 2
Name Date Paid Outstanding Mark as Follows Close Indicator
A 2000 100 200 Open 0
A 2001 224 100 Open 0
A 2002 348 400 Open 0
A 2003 472 0 First Time it Closes 1
A 2004 596 196 Reopens -1
B 2004 720 200 Open 0
B 2005 844 200 Open 0
B 2006 968 0 First Time it Closes 1
B 2007 968 0 Closes 0
C 2000 1092 200 Open 0
C 2001 1216 1200 Open 0
B 2008 1340 1200 Reopens -1
B 2010 1464 100 Open 0
B 2011 1588 0 Closes 1
A 2016 1712 0 Closes 1
D 2009 1836 60 Open 0
D 2010 1896 0 Closes 1
D 2016 1900 0 Closes 0
我想要的是能够复制Close indicator列。这些是交易累积金额。我的逻辑是名称,如果付款完成,并且没有未付款,那么我想将其标记为1,表示结束。但是,如果将来这个案例打开,那么我想在它关闭时再次标记-1和1。因此A在2003年关闭,然后在2004年重新开放,并在2016年关闭
对于D来说,该案件在2010年结案,但付款在2016年发生变化,因此尽管理论上这也会得到重新开放的标志,但考虑到它在同一时间再次结案,我也希望能够处理这种情况
在R数据表中执行此操作的最佳方法是什么 对于每个名称,逻辑是
- 如果未完成项以前为非零时为零,则关闭_指示器 是1
- 如果未偿债务在以前为零时为非零, 关闭指示灯为-1
- 否则就没有改变, 关闭指示器为0
(未完成==0)-(滞后(未完成)==0)
。这是两个被强制为0或1的逻辑之间的差异
所以我们所要做的就是按名称分组,按日期排序,然后使用这个公式
library('tidyverse')
df <- tribble(
~Name, ~Date, ~Outstanding,
"A", 2000L, 200L,
"A", 2001L, 100L,
"A", 2002L, 400L,
"A", 2003L, 0L,
"A", 2004L, 196L,
"B", 2004L, 200L,
"B", 2005L, 200L,
"B", 2006L, 0L,
"B", 2007L, 0L,
"C", 2000L, 200L,
"C", 2001L, 1200L,
"B", 2008L, 1200L,
"B", 2010L, 100L,
"B", 2011L, 0L,
"A", 2016L, 0L,
"D", 2009L, 60L,
"D", 2010L, 0L,
"D", 2016L, 0L
)
df %>%
rowid_to_column %>%
group_by(Name) %>%
arrange(Date) %>%
mutate(close_indicator = (Outstanding == 0) - (lag(Outstanding) == 0)) %>%
replace_na(list(close_indicator = 0)) %>%
arrange(rowid)
# # A tibble: 18 x 5
# # Groups: Name [4]
# rowid Name Date Outstanding close_indicator
# <int> <chr> <int> <int> <dbl>
# 1 1 A 2000 200 0
# 2 2 A 2001 100 0
# 3 3 A 2002 400 0
# 4 4 A 2003 0 1
# 5 5 A 2004 196 -1
# 6 6 B 2004 200 0
# 7 7 B 2005 200 0
# 8 8 B 2006 0 1
# 9 9 B 2007 0 0
# 10 10 C 2000 200 0
# 11 11 C 2001 1200 0
# 12 12 B 2008 1200 -1
# 13 13 B 2010 100 0
# 14 14 B 2011 0 1
# 15 15 A 2016 0 1
# 16 16 D 2009 60 0
# 17 17 D 2010 0 1
# 18 18 D 2016 0 0
使用
数据表
。我不确定我是否完全理解你的标准,但我认为下面的例子应该足以让你振作起来。这里应该帮助您的主要功能是。加上分组操作(使用by=(Name)
子句),您可以为上一个余额添加一列
一旦创建了该列,就可以根据条件使用复合逻辑在相关行上添加所需的标志
library(data.table)
DT <- data.table(Name = c("A", "A", "A", "A", "A","A", "A", "A", "B", "B","B", "B", "B"),
Date = c(2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2000, 2001, 2002, 2003),
Outstanding = c(200, 100, 600 ,400, 0, 196, 200, 0, 500, 600, 0, 200, 0))
setkey(DT,Name,Date)
## Add a new column for previous outstanding balance
DT[,Prev_Outstanding := shift(Outstanding, n = 1L, fill = NA, type = "lag"), by = .(Name)]
DT[,CloseIndicator := 0] ## Pre-fill all rows with 0 initially
DT[Prev_Outstanding > 0 & Outstanding == 0, CloseIndicator := 1, by = .(Name)] ## Mark account closings
DT[Prev_Outstanding == 0 & Outstanding > 0, CloseIndicator := -1, by = .(Name)] ## Mark Account re-openings
print(DT)
请使用R中的
dput()
函数生成R代码以创建测试数据。例如,如果您的数据帧被称为df
,则在R控制台中执行dput(df)
,然后复制并粘贴此问题开头的函数输出。这将使人们更容易帮助你回答你的问题!嗨,保罗,谢谢你的回复。我会尝试遵循这个逻辑,这是有道理的。我目前正在处理数据表,所以尝试在那里复制它。
library(data.table)
DT <- data.table(Name = c("A", "A", "A", "A", "A","A", "A", "A", "B", "B","B", "B", "B"),
Date = c(2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2000, 2001, 2002, 2003),
Outstanding = c(200, 100, 600 ,400, 0, 196, 200, 0, 500, 600, 0, 200, 0))
setkey(DT,Name,Date)
## Add a new column for previous outstanding balance
DT[,Prev_Outstanding := shift(Outstanding, n = 1L, fill = NA, type = "lag"), by = .(Name)]
DT[,CloseIndicator := 0] ## Pre-fill all rows with 0 initially
DT[Prev_Outstanding > 0 & Outstanding == 0, CloseIndicator := 1, by = .(Name)] ## Mark account closings
DT[Prev_Outstanding == 0 & Outstanding > 0, CloseIndicator := -1, by = .(Name)] ## Mark Account re-openings
print(DT)
Name Date Outstanding Prev_Outstanding CloseIndicator
1: A 2000 200 NA 0
2: A 2001 100 200 0
3: A 2002 600 100 0
4: A 2003 400 600 0
5: A 2004 0 400 1
6: A 2005 196 0 -1
7: A 2006 200 196 0
8: A 2007 0 200 1
9: B 2000 600 NA 0
10: B 2001 0 600 1
11: B 2002 200 0 -1
12: B 2003 0 200 1
13: B 2008 500 0 -1