Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/r/64.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
删除R中重复变量的行_R_Duplicates - Fatal编程技术网

删除R中重复变量的行

删除R中重复变量的行,r,duplicates,R,Duplicates,我有重复年份的面板数据,但我想删除作业值较小的行: id name year job 1 Jane 1990 100 1 Jane 1992 200 1 Jane 1993 300 1 Jane 1993 1 1 Jane 1997 400 1 Jane 1997 2 2 Tom 1990 400 2 Tom 1992 500 2 Tom

我有重复年份的面板数据,但我想删除作业值较小的行:

id  name    year    job
1   Jane    1990    100
1   Jane    1992    200
1   Jane    1993    300
1   Jane    1993    1
1   Jane    1997    400
1   Jane    1997    2
2   Tom     1990    400
2   Tom     1992    500
2   Tom     1993    700
2   Tom     1993    1
2   Tom     1997    900
2   Tom     1997    3
我希望:

id  name    year    job
1   Jane    1990    100
1   Jane    1992    200
1   Jane    1993    1
1   Jane    1997    2
2   Tom     1990    400
2   Tom     1992    500
2   Tom     1993    1
2   Tom     1997    3

有什么方法可以做到这一点吗?

您可以使用base R和函数
order
,正如James所建议的:

> tab[order(tab$job),][! duplicated(tab[order(tab$job), c('id', 'year')], fromLast=T), ]
   id name year job
1   1 Jane 1990 100
2   1 Jane 1992 200
3   1 Jane 1993 300
5   1 Jane 1997 400
7   2  Tom 1990 400
8   2  Tom 1992 500
9   2  Tom 1993 700
11  2  Tom 1997 900

您可以为此使用
ddply

x <- read.table(textConnection("id  name    year    job
 1   Jane    1990    100
 1   Jane    1992    200
 1   Jane    1993    300
 1   Jane    1993    1
 1   Jane    1997    400
 1   Jane    1997    2
 2   Tom     1990    400
 2   Tom     1992    500
 2   Tom     1993    700
 2   Tom     1993    1
 2   Tom     1997    900
 2   Tom     1997    3"),header=T)

library(plyr)
ddply(x,c("id","name","year"),summarise, job=max(job))
  id name year job
1  1 Jane 1990 100
2  1 Jane 1992 200
3  1 Jane 1993 300
4  1 Jane 1997 400
5  2  Tom 1990 400
6  2  Tom 1992 500
7  2  Tom 1993 700
8  2  Tom 1997 900

x如果您的数据是数据帧df

library(data.table)

dt <- as.data.table(df)
dt[, .SD[which.min(job)], by = list(id, name, year)]
库(data.table)

dt您有不同的可能性,例如plyr和dplyr:

# plyr
ddply(tab, .(id, name, year), summarise, job=min(job))
# dplyr
tabg <- group_by(tab, id, name, year)
summarise(tabg, job=min(job))
# basic fonction
aggregate(tab[,"job", drop=FALSE], tab[,3:1], min)
#plyr
ddply(标签页,(id,姓名,年份),摘要,职务=min(职务))
#dplyr

tabg您的输出删除了job较大的值。如果没有糟糕的文字游戏,可能会重复,在我看来,这个问题是一个应该删除的重复;)