R 查找数据帧中不同对象范围内的所有最大值
我想知道是否有一种比写作更简单的方法,如果……或者。。。对于以下情况。我有一个数据帧,我只想要列“百分比”>=95中的数字行。此外,对于一个对象,如果有多行符合此条件,我只需要最大的一行。如果有多个最大的,我想保留所有的 例如:R 查找数据帧中不同对象范围内的所有最大值,r,filter,R,Filter,我想知道是否有一种比写作更简单的方法,如果……或者。。。对于以下情况。我有一个数据帧,我只想要列“百分比”>=95中的数字行。此外,对于一个对象,如果有多行符合此条件,我只需要最大的一行。如果有多个最大的,我想保留所有的 例如: object city street percentage A NY Sun 100 A NY Malino 97 A NY Waterfall 100 B CA Washington 98 B WA Lieber 95
object city street percentage
A NY Sun 100
A NY Malino 97
A NY Waterfall 100
B CA Washington 98
B WA Lieber 95
C NA Moon 75
然后我希望结果显示:
object city street percentage
A NY Sun 100
A NY Waterfall 100
B CA Washington 98
我可以用if-else语句来做,但我觉得应该有一些更聪明的方式来表达:1.>=95 2. 如果有多个,请选择最大的3个。如果有多个最大的,请全部选择。以下方法有效:
ddf2 = ddf[ddf$percentage>95,]
ddf3 = ddf2[-c(1:nrow(ddf2)),]
for(oo in unique(ddf2$object)){
tempdf = ddf2[ddf2$object == oo, ]
maxval = max(tempdf$percentage)
tempdf = tempdf[tempdf$percentage==maxval,]
for(i in 1:nrow(tempdf)) ddf3[nrow(ddf3)+1,] = tempdf[i,]
}
ddf3
object city street percentage
1 A NY Sun 100
3 A NY Waterfall 100
4 B CA Washington 98
以下工作:
ddf2 = ddf[ddf$percentage>95,]
ddf3 = ddf2[-c(1:nrow(ddf2)),]
for(oo in unique(ddf2$object)){
tempdf = ddf2[ddf2$object == oo, ]
maxval = max(tempdf$percentage)
tempdf = tempdf[tempdf$percentage==maxval,]
for(i in 1:nrow(tempdf)) ddf3[nrow(ddf3)+1,] = tempdf[i,]
}
ddf3
object city street percentage
1 A NY Sun 100
3 A NY Waterfall 100
4 B CA Washington 98
可以通过创建一个变量来实现这一点,该变量指示每个对象具有最大百分比的行。然后,我们可以使用该指标对数据进行子集划分
# your data
dat <- read.table(text = "object city street percentage
A NY Sun 100
A NY Malino 97
A NY Waterfall 100
B CA Washington 98
B WA Lieber 95
C NA Moon 75", header=TRUE, na.strings="", stringsAsFactors=FALSE)
# create an indicator to identify the rows that have the maximum
# percentage by object
id <- with(dat, ave(percentage, object, FUN=function(i) i==max(i)) )
# subset your data - keep rows that are greater than 95 and have the
# maximum group percentage (given by id equal to one)
dat[dat$percentage >= 95 & id , ]
或者把这些放在一起
with(dat, dat[percentage >= 95 & ave(percentage, object,
FUN=function(i) i==max(i)) , ])
# object city street percentage
# 1 A NY Sun 100
# 3 A NY Waterfall 100
# 4 B CA Washington 98
可以通过创建一个变量来实现这一点,该变量指示每个对象具有最大百分比的行。然后,我们可以使用该指标对数据进行子集划分
# your data
dat <- read.table(text = "object city street percentage
A NY Sun 100
A NY Malino 97
A NY Waterfall 100
B CA Washington 98
B WA Lieber 95
C NA Moon 75", header=TRUE, na.strings="", stringsAsFactors=FALSE)
# create an indicator to identify the rows that have the maximum
# percentage by object
id <- with(dat, ave(percentage, object, FUN=function(i) i==max(i)) )
# subset your data - keep rows that are greater than 95 and have the
# maximum group percentage (given by id equal to one)
dat[dat$percentage >= 95 & id , ]
或者把这些放在一起
with(dat, dat[percentage >= 95 & ave(percentage, object,
FUN=function(i) i==max(i)) , ])
# object city street percentage
# 1 A NY Sun 100
# 3 A NY Waterfall 100
# 4 B CA Washington 98
您也可以使用@user20650的相同方法在
data.table
中执行此操作
library(data.table)
setDT(dat)[dat[,percentage==max(percentage) & percentage >=95, by=object]$V1,]
# object city street percentage
#1: A NY Sun 100
#2: A NY Waterfall 100
#3: B CA Washington 98
或者使用dplyr
dat %>%
group_by(object) %>%
filter(percentage==max(percentage) & percentage >=95)
您也可以使用@user20650的相同方法在
data.table
中执行此操作
library(data.table)
setDT(dat)[dat[,percentage==max(percentage) & percentage >=95, by=object]$V1,]
# object city street percentage
#1: A NY Sun 100
#2: A NY Waterfall 100
#3: B CA Washington 98
或者使用dplyr
dat %>%
group_by(object) %>%
filter(percentage==max(percentage) & percentage >=95)
谢谢你,阿克伦。我不知道data.table-现在去看看。@Helene没问题。对于更大的数据集,
data.table
会更快。谢谢运行。我不知道data.table-现在去看看。@Helene没问题。对于更大的数据集,data.table
会更快。我最初做了一些与您类似的事情,但我试图避免For循环。谢谢你的帮助:)我最初做了一些与你类似的事情,但我试图避免for循环。谢谢你的帮助:)