如何仅提取R中分组项的第一行？_R_Group By_Subset_Data Manipulation

如何仅提取R中分组项的第一行？

如何仅提取R中分组项的第一行？,r,group-by,subset,data-manipulation,R,Group By,Subset,Data Manipulation,我在Lahman数据库中按字母顺序列出了纽约大都会棒球队的球员名单。对于每个球员来说，他所踢的年数是按升序排列的。我需要为每个球员提取他第一年比赛的数据，并将所有第一行放入一个新的数据框中在我的Mac电脑上的RStudio中，我已经达到了我需要的数据分组和排序的程度。这是一个样本 playerID,yearID,G,AB,R,H aceveju01,1997,25,6,0,0 acostma01,2010,41,0,0,0 acostma01,2011,44,0,0,0 acostma01,2

我在

Lahman

数据库中按字母顺序列出了纽约大都会棒球队的球员名单。对于每个球员来说，他所踢的年数是按升序排列的。我需要为每个球员提取他第一年比赛的数据，并将所有第一行放入一个新的数据框中

在我的Mac电脑上的

RStudio

中，我已经达到了我需要的数据分组和排序的程度。这是一个样本

playerID,yearID,G,AB,R,H
aceveju01,1997,25,6,0,0
acostma01,2010,41,0,0,0
acostma01,2011,44,0,0,0
acostma01,2012,45,0,0,0
adkinjo01,2007,1,0,0,0
agbaybe01,1998,11,15,1,2
agbaybe01,1999,101,276,42,79
agbaybe01,2000,119,350,59,101
agbaybe01,2001,91,296,28,82
ageeto01,1968,132,368,30,80
ageeto01,1969,149,565,97,153
ageeto01,1970,153,636,107,182
ageeto01,1971,113,425,58,121
ageeto01,1972,114,422,52,96
aguilch01,2008,8,12,0,2

出于测试目的，我从这段代码开始，而不是从管道开始。这是我所能做到的

Lahman_batting18 <- read.csv('Batting-copy.csv', header = TRUE, stringsAsFactors=FALSE)
Lahman_batting18s <- select(Lahman_batting18,playerID:SO)
Lahman_batting18f <- filter(Lahman_batting18s,teamID == 'NYN')
Lahman_batting18fa <- arrange(Lahman_batting18f, playerID, yearID)

谢谢你的帮助

使用base

，而我更喜欢

dplyr

和

pipe

Lahman_batting18 %>% group_by(playerID) %>% arrange(playerID, yearID) %>% 
filter(yearID == min(yearID))

只过滤最小值的年份。我希望这就是你想要的？使用示例数据获得的输出：

# A tibble: 6 x 6
# Groups:   playerID [6]
  playerID  yearID     G    AB     R     H
  <fct>      <int> <int> <int> <int> <int>
1 aceveju01   1997    25     6     0     0
2 acostma01   2010    41     0     0     0
3 adkinjo01   2007     1     0     0     0
4 agbaybe01   1998    11    15     1     2
5 ageeto01    1968   132   368    30    80
6 aguilch01   2008     8    12     0     2

#一个tible:6 x 6
#组别:playerID[6]
playerID yearID G AB R H
1 aceveju01 1997 25 6 0 0 0
2 acostma01 2010 41 0 0 0
3 adkinjo01 2007 1 0 0 0
4 agbaybe01 1998 11 15 1 2
5岁01 1968 132 368 30 80
6 aguilch01 2008 8 12 0 2

d[ave（1:NROW（d），d$playerID，FUN=seq_-along）==1，]

@d.b.您的代码是如何工作的？您好，如果这是您问题的解决方案，请您接受答案（答案旁边的勾号…）？然后显示问题已解决。

Lahman_batting18%>%group_by（playerID）%%>%slice（1L）

@RonakShah使用slice（1L）和slice（1）有什么区别？在您的输出中，playerID不是按字母顺序排列的，这是我希望的。你的是yearID订购的。哦，对不起，更正了。现在输出与我提到的代码完全一致了…@Ronak Shah今天，我使用这两种解决方案检查了第一行的总数：（1）季节1_all%groupu by（playerID）%%>%arrange（playerID，yearID）%%>%filter（yearID==min（yearID））（2）季节1_all%groupy by（playerID）%%>%slice（1）我希望解决方案中的总行数相同，但实际情况并非如此。解决方案1有19999行；然而，解决方案2有19428行。此外，当我运行“distinct（Lahman_batting18，playerID）”时，我也得到了19428排。为什么我会得到不同的数字？哪种解决方案给出正确的总数？

# A tibble: 6 x 6
# Groups:   playerID [6]
  playerID  yearID     G    AB     R     H
  <fct>      <int> <int> <int> <int> <int>
1 aceveju01   1997    25     6     0     0
2 acostma01   2010    41     0     0     0
3 adkinjo01   2007     1     0     0     0
4 agbaybe01   1998    11    15     1     2
5 ageeto01    1968   132   368    30    80
6 aguilch01   2008     8    12     0     2