R 在数据框中的列中唯一
从这样的数据帧R 在数据框中的列中唯一,r,dataframe,R,Dataframe,从这样的数据帧 DF <- read.table(text = "String Found Count 0-025823 0 1 1-042055 1 1 1-018396 1 2 1-018396 1 2 1-002984 1 3
DF <- read.table(text = "String Found Count
0-025823 0 1
1-042055 1 1
1-018396 1 2
1-018396 1 2
1-002984 1 3
1-002984 1 3
1-002984 1 3", header = TRUE)
所需的输出列显示从上到下的唯一值。找到的第一个唯一值将标记为1,其余重复值将全部为0
我在excel中使用了以下公式来获得excel中的输出:
=IF(COUNTIF($A$2:A2,A2)>1,0,1)
where the sequenceof columns is same as above.
我使用过循环、聚合和函数内部,但没有得到理想的结果。是否要将重复的值标记为0:
DF <- read.table(text = "String Found Count
0-025823 0 1
1-042055 1 1
1-018396 1 2
1-018396 1 2
1-002984 1 3
1-002984 1 3
1-002984 1 3", header = TRUE)
DF$unique <- 1 - duplicated(DF$String)
# String Found Count unique
#1 0-025823 0 1 1
#2 1-042055 1 1 1
#3 1-018396 1 2 1
#4 1-018396 1 2 0
#5 1-002984 1 3 1
#6 1-002984 1 3 0
#7 1-002984 1 3 0
duplicated返回逻辑值,我使用在算术中使用时,TRUE/FALSE强制为1/0
请注意,通常不应强制使用整数。你可以这么做!改为duplicatedDF$字符串。您想将重复的值标记为0:
DF <- read.table(text = "String Found Count
0-025823 0 1
1-042055 1 1
1-018396 1 2
1-018396 1 2
1-002984 1 3
1-002984 1 3
1-002984 1 3", header = TRUE)
DF$unique <- 1 - duplicated(DF$String)
# String Found Count unique
#1 0-025823 0 1 1
#2 1-042055 1 1 1
#3 1-018396 1 2 1
#4 1-018396 1 2 0
#5 1-002984 1 3 1
#6 1-002984 1 3 0
#7 1-002984 1 3 0
duplicated返回逻辑值,我使用在算术中使用时,TRUE/FALSE强制为1/0
请注意,通常不应强制使用整数。你可以这么做!重复DDF$字符串。假设您的数据帧是df
df[,"Desired output"]=0
for(i in (1:nrow(df)))
{
if(length(which(df[1:i,]$Count==df[i,"Count"]))==1)
{ df[i,"Desired output"]=1
}
else
{
df[i,"Desired output"]=0
}
}
假设您的数据帧是df
df[,"Desired output"]=0
for(i in (1:nrow(df)))
{
if(length(which(df[1:i,]$Count==df[i,"Count"]))==1)
{ df[i,"Desired output"]=1
}
else
{
df[i,"Desired output"]=0
}
}
罗兰的解决方案比使用dplyr更快,但为了展示另一种解决方案:
library(dplyr)
DF %>% group_by(String) %>% mutate(unique = ifelse(row_number()==1,1,0))
# # A tibble: 7 x 4
# # Groups: String [4]
# String Found Count unique
# <fctr> <int> <int> <dbl>
# 1 0-025823 0 1 1
# 2 1-042055 1 1 1
# 3 1-018396 1 2 1
# 4 1-018396 1 2 0
# 5 1-002984 1 3 1
# 6 1-002984 1 3 0
# 7 1-002984 1 3 0
罗兰的解决方案比使用dplyr更快,但为了展示另一种解决方案:
library(dplyr)
DF %>% group_by(String) %>% mutate(unique = ifelse(row_number()==1,1,0))
# # A tibble: 7 x 4
# # Groups: String [4]
# String Found Count unique
# <fctr> <int> <int> <dbl>
# 1 0-025823 0 1 1
# 2 1-042055 1 1 1
# 3 1-018396 1 2 1
# 4 1-018396 1 2 0
# 5 1-002984 1 3 1
# 6 1-002984 1 3 0
# 7 1-002984 1 3 0