R 基于列中的字符和数据帧中的出现顺序的每组的子集行_R_Dataframe_Data.table_Data Manipulation

R 基于列中的字符和数据帧中的出现顺序的每组的子集行

r dataframe

R 基于列中的字符和数据帧中的出现顺序的每组的子集行,r,dataframe,data.table,data-manipulation,R,Dataframe,Data.table,Data Manipulation,我有一个类似的数据 B <- data.frame(State = c(rep("Arizona", 8), rep("California", 8), rep("Texas", 8)), Account = rep(c("Balance", "Balance", "In the Bimester", "In the Bimester", "Expenses", "Expenses", "In the Bimester", "In the Bimester"), 3), Va

我有一个类似的数据

B <- data.frame(State = c(rep("Arizona", 8), rep("California", 8), rep("Texas", 8)), 
  Account = rep(c("Balance", "Balance", "In the Bimester", "In the Bimester", "Expenses",  
  "Expenses", "In the Bimester", "In the Bimester"), 3), Value = runif(24))

您可以使用dplyr软件包：

如果使用max而不是min，则将获得每个状态的BIMster中最后出现的。还可以通过将最后一个管道更改为选择“辅助对象”-“帐户”来排除“帐户”列

p、如果您不想使用data.table中的rleid，只需使用dplyr函数，请查看此项

library(data.table)
B <- as.data.table(B)
B <- B[, .(Account, Value, index = 1:.N), by = .(State)]
x <- B[Account == "Expenses", .(min_ind = min(index)), by = .(State)]
B <- merge(B, x, by = "State")
B <- B[index < min_ind & Account == "In the Bimester", .(Value), by = .(State)]

library(dplyr)
B %>% mutate(helper = data.table::rleid(Account)) %>% 
      filter(Account == "In the Bimester") %>% 
      group_by(State) %>% filter(helper == min(helper)) %>% select(-helper)

# # A tibble: 6 x 3
# # Groups:   State [3]
#        State         Account      Value
#       <fctr>          <fctr>      <dbl>
# 1    Arizona In the Bimester 0.17730148
# 2    Arizona In the Bimester 0.05695585
# 3 California In the Bimester 0.29089678
# 4 California In the Bimester 0.86952723
# 5      Texas In the Bimester 0.54076144
# 6      Texas In the Bimester 0.59168138