R数据表唯一记录计数基于两列给定值列表的所有组合
我有一个R数据表唯一记录计数基于两列给定值列表的所有组合,r,data.table,R,Data.table,我有一个数据。R中的表如下 Col1 Col2 Col1Value1 Col2Value1 Col1Value1 Col2Value2 Col1Value1 Col2Value3 Col1Value2 Col2Value1 Col1Value2 Col2Value3 Col1Value3 Col2Value1 Col1Value3 Col2Value2 Col1Value3 Col2Value3 我
数据。R中的表如下
Col1 Col2
Col1Value1 Col2Value1
Col1Value1 Col2Value2
Col1Value1 Col2Value3
Col1Value2 Col2Value1
Col1Value2 Col2Value3
Col1Value3 Col2Value1
Col1Value3 Col2Value2
Col1Value3 Col2Value3
我想获得Col1-(Col1Value1,Col1Value2)
中给定值与Col2-Col1(Col2Value1,Col2Value2)
中的值之间的每个组合的记录计数,如果没有组合的记录,则返回0
计数您可以尝试以下代码:
a<-c("Col1Value1", "Col1Value2")
b<-c("Col2Value1", "Col2Value2")
df2<-df %>% select(Col1, Col2) %>% filter(Col1 %in% a) %>% filter(Col2 %in% b) %>% group_by(Col1, Col2) %>% summarise(count = n()) %>% as.data.frame()
expand.grid(a,b) %>% left_join(df2, by = c("Var1"="Col1", "Var2"="Col2")) %>% mutate(count2 = ifelse(is.na(count), 0, count)) %>% select(-count)
资料
在base R中,您可以执行以下操作:
data.frame(table(dt))
Var1 Var2 Freq
1 Col1Value1 Col2Value1 1
2 Col1Value2 Col2Value1 1
3 Col1Value3 Col2Value1 1
4 Col1Value1 Col2Value2 1
5 Col1Value2 Col2Value2 0
6 Col1Value3 Col2Value2 1
7 Col1Value1 Col2Value3 1
8 Col1Value2 Col2Value3 1
9 Col1Value3 Col2Value3 1
您可以像这样使用表格
:
data.table(with(dt, table(Col1, Col2)))
Col1 Col2 N
1: Col1Value1 Col2Value1 1
2: Col1Value2 Col2Value1 1
3: Col1Value3 Col2Value1 1
4: Col1Value1 Col2Value2 1
5: Col1Value2 Col2Value2 0
6: Col1Value3 Col2Value2 1
7: Col1Value1 Col2Value3 1
8: Col1Value2 Col2Value3 1
9: Col1Value3 Col2Value3 1
数据
dt <- setDT(read.table(text="Col1 Col2
Col1Value1 Col2Value1
Col1Value1 Col2Value2
Col1Value1 Col2Value3
Col1Value2 Col2Value1
Col1Value2 Col2Value3
Col1Value3 Col2Value1
Col1Value3 Col2Value2
Col1Value3 Col2Value3", header=TRUE,stringsAsFactors=FALSE) )
dt对于具有0条记录的组合,您可能必须首先创建所有具有零值的行,然后覆盖那些具有0条以上记录的行。这让我想知道您是否真的需要将这两列分开,或者您是否可以从创建一个新列中获益,该列通过添加Col1和Col2作为逗号分隔的字符(例如“2,3”)而产生。这样你就可以先清点所有可能的组合,然后统计出现的次数。谢谢。我认为要求是所有组合集合中的有限组合,所以采用这种方法。
DT[CJ(Col1, Col2, unique = TRUE), on = .(Col1, Col2), .(count = .N), by = .EACHI]
# Col1 Col2 count
# 1: Col1Value1 Col2Value1 1
# 2: Col1Value1 Col2Value2 1
# 3: Col1Value1 Col2Value3 1
# 4: Col1Value2 Col2Value1 1
# 5: Col1Value2 Col2Value2 0
# 6: Col1Value2 Col2Value3 1
# 7: Col1Value3 Col2Value1 1
# 8: Col1Value3 Col2Value2 1
# 9: Col1Value3 Col2Value3 1
DT <- fread(
"Col1 Col2
Col1Value1 Col2Value1
Col1Value1 Col2Value2
Col1Value1 Col2Value3
Col1Value2 Col2Value1
Col1Value2 Col2Value3
Col1Value3 Col2Value1
Col1Value3 Col2Value2
Col1Value3 Col2Value3"
)
a <- c("Col1Value1", "Col1Value2")
b <- c("Col2Value1", "Col2Value2")
DT[Col1 %in% a & Col2 %in% b
][CJ(Col1, Col2, unique = TRUE), on = .(Col1, Col2), .(count = .N), by = .EACHI]
data.frame(table(dt))
Var1 Var2 Freq
1 Col1Value1 Col2Value1 1
2 Col1Value2 Col2Value1 1
3 Col1Value3 Col2Value1 1
4 Col1Value1 Col2Value2 1
5 Col1Value2 Col2Value2 0
6 Col1Value3 Col2Value2 1
7 Col1Value1 Col2Value3 1
8 Col1Value2 Col2Value3 1
9 Col1Value3 Col2Value3 1
data.table(with(dt, table(Col1, Col2)))
Col1 Col2 N
1: Col1Value1 Col2Value1 1
2: Col1Value2 Col2Value1 1
3: Col1Value3 Col2Value1 1
4: Col1Value1 Col2Value2 1
5: Col1Value2 Col2Value2 0
6: Col1Value3 Col2Value2 1
7: Col1Value1 Col2Value3 1
8: Col1Value2 Col2Value3 1
9: Col1Value3 Col2Value3 1
dt <- setDT(read.table(text="Col1 Col2
Col1Value1 Col2Value1
Col1Value1 Col2Value2
Col1Value1 Col2Value3
Col1Value2 Col2Value1
Col1Value2 Col2Value3
Col1Value3 Col2Value1
Col1Value3 Col2Value2
Col1Value3 Col2Value3", header=TRUE,stringsAsFactors=FALSE) )