我想根据同一数据帧中其他列的条件,从R数据帧中的一列生成8个名称组合

我想根据同一数据帧中其他列的条件,从R数据帧中的一列生成8个名称组合,r,dataframe,combinations,R,Dataframe,Combinations,我有一个数据框,有来自4支不同球队的20名球员(每支球队5名球员),每个人都从一份梦幻选秀中获得了一份薪水。我希望能够创建8名球员的所有组合,他们的工资等于或低于10000英镑&他们的总分大于x,但不包括来自同一球队的4名或更多球员的任何组合 以下是我的数据框的外观: Team Player K D A LH Points Salary PPS 4 ATN ExoticDeer 6.1 3.3 6.4 306.9 22.209

我有一个数据框,有来自4支不同球队的20名球员(每支球队5名球员),每个人都从一份梦幻选秀中获得了一份薪水。我希望能够创建8名球员的所有组合,他们的工资等于或低于10000英镑&他们的总分大于x,但不包括来自同一球队的4名或更多球员的任何组合

以下是我的数据框的外观:

       Team      Player    K   D    A    LH Points Salary    PPS
  4     ATN  ExoticDeer  6.1 3.3  6.4 306.9 22.209   1622 1.3692
  2     ATN     Supreme  6.8 5.3  7.1 229.4 21.954   1578 1.3913
  1     ATN        sasu  3.6 6.4 11.0  95.7 19.357   1244 1.5560
  3     ATN eL lisasH 2  2.6 6.1  7.9  29.7 12.037    998 1.2061
  5     ATN       Nisha  2.7 5.6  7.5  48.2 12.282    955 1.2861
  11     CL Swiftending  6.0 5.8  7.8 360.5 22.285   1606 1.3876
  13     CL     Pajkatt 13.3 7.5  9.3 326.8 37.248   1489 2.5015
  15     CL  SexyBamboe  6.3 8.5  9.3 168.0 20.660   1256 1.6449
  14     CL         EGM  2.8 6.0 13.5  78.8 21.988    989 2.2233
  12     CL       Saksa  2.5 6.5 10.5  59.8 15.898    967 1.6441
  51 DBEARS         Ace  7.0 3.4  6.9 195.6 23.596   1578 1.4953
  31 DBEARS    HesteJoe  5.4 5.4  6.1 176.7 16.927   1512 1.1195
  61 DBEARS      Miggel  2.8 6.8 11.0 141.8 17.818   1212 1.4701
  21 DBEARS        Noia  3.0 6.0  8.0  36.1 13.161    970 1.3568
  41 DBEARS        Ryze  2.7 4.7  6.7  74.6 12.166    937 1.2984
  8      GB Keyser Soze  6.0 5.0  5.6 316.0 19.120   1602 1.1935
  9      GB      Madara  5.4 5.3  6.6 334.5 19.405   1577 1.2305
  10     GB     SkyLark  1.8 5.3  7.0  71.8 10.218   1266 0.8071
  7      GB         MNT  2.3 5.9  6.1  85.6  9.316   1007 0.9251
  6      GB   SKANKS224  1.4 7.6  7.4  52.5  7.565    954 0.7930
我遵循这篇文章中描述的一般概念:

调整代码以满足我的需要。这就是我到目前为止所做的:

## make a list of all combinations of 8 of Player, Points and Salary
xx <- with(FantasyPlayers, lapply(list(as.character(Player), Points, Salary), combn,     8))
## convert the names to a string, 
## find the column sums of the others,
## set the names
yy <- setNames(
lapply(xx, function(x) {
    if(typeof(x) == "character") apply(x, 2, toString) else colSums(x)
}),
names(FantasyPlayers)[c(2, 7, 8)]
)
## coerce to data.frame
newdf <- as.data.frame(yy)
这里有一个方法:

splt.names <- strsplit(as.character(newdf$Player), ", ")
indices <- lapply(splt.names, function(x) match(x, FantasyPlayers$Player))
exclude <- lapply(indices, function(x) any(table(FantasyPlayers$Team[x]) > 3))
newdf2 <- newdf[!unlist(exclude), ]

splt.names我认为,最好以长格式构建它:

组建团队

library(data.table)
setDT(FantasyPlayers)

xx    <- combn(as.character(FantasyPlayers$Player), 8)
mxx   <- setDT(melt(xx, varnames=c("jersey_no", "team_no"), value.name="Player"))

head(mxx,10)
#     jersey_no team_no      Player
#  1:         1       1  ExoticDeer
#  2:         2       1     Supreme
#  3:         3       1        sasu
#  4:         4       1 eL lisasH 2
#  5:         5       1       Nisha
#  6:         6       1 Swiftending
#  7:         7       1     Pajkatt
#  8:         8       1  SexyBamboe
#  9:         1       2  ExoticDeer
# 10:         2       2     Supreme
FantasyTeams <- FantasyPlayers[mxx, on="Player"]

#          Team      Player   K   D    A    LH Points Salary    PPS jersey_no team_no
#       1:  ATN  ExoticDeer 6.1 3.3  6.4 306.9 22.209   1622 1.3692         1       1
#       2:  ATN     Supreme 6.8 5.3  7.1 229.4 21.954   1578 1.3913         2       1
#       3:  ATN        sasu 3.6 6.4 11.0  95.7 19.357   1244 1.5560         3       1
#       4:  ATN eL lisasH 2 2.6 6.1  7.9  29.7 12.037    998 1.2061         4       1
#       5:  ATN       Nisha 2.7 5.6  7.5  48.2 12.282    955 1.2861         5       1
#      ---                                                                           
# 1007756:   GB Keyser Soze 6.0 5.0  5.6 316.0 19.120   1602 1.1935         4  125970
# 1007757:   GB      Madara 5.4 5.3  6.6 334.5 19.405   1577 1.2305         5  125970
# 1007758:   GB     SkyLark 1.8 5.3  7.0  71.8 10.218   1266 0.8071         6  125970
# 1007759:   GB         MNT 2.3 5.9  6.1  85.6  9.316   1007 0.9251         7  125970
# 1007760:   GB   SKANKS224 1.4 7.6  7.4  52.5  7.565    954 0.7930         8  125970
默认情况下,仅打印data.table的第一行和最后几行。要检查整个过程,请尝试查看或查看print.data.table的参数

筛选到一组具有选定功能的团队

library(data.table)
setDT(FantasyPlayers)

xx    <- combn(as.character(FantasyPlayers$Player), 8)
mxx   <- setDT(melt(xx, varnames=c("jersey_no", "team_no"), value.name="Player"))

head(mxx,10)
#     jersey_no team_no      Player
#  1:         1       1  ExoticDeer
#  2:         2       1     Supreme
#  3:         3       1        sasu
#  4:         4       1 eL lisasH 2
#  5:         5       1       Nisha
#  6:         6       1 Swiftending
#  7:         7       1     Pajkatt
#  8:         8       1  SexyBamboe
#  9:         1       2  ExoticDeer
# 10:         2       2     Supreme
FantasyTeams <- FantasyPlayers[mxx, on="Player"]

#          Team      Player   K   D    A    LH Points Salary    PPS jersey_no team_no
#       1:  ATN  ExoticDeer 6.1 3.3  6.4 306.9 22.209   1622 1.3692         1       1
#       2:  ATN     Supreme 6.8 5.3  7.1 229.4 21.954   1578 1.3913         2       1
#       3:  ATN        sasu 3.6 6.4 11.0  95.7 19.357   1244 1.5560         3       1
#       4:  ATN eL lisasH 2 2.6 6.1  7.9  29.7 12.037    998 1.2061         4       1
#       5:  ATN       Nisha 2.7 5.6  7.5  48.2 12.282    955 1.2861         5       1
#      ---                                                                           
# 1007756:   GB Keyser Soze 6.0 5.0  5.6 316.0 19.120   1602 1.1935         4  125970
# 1007757:   GB      Madara 5.4 5.3  6.6 334.5 19.405   1577 1.2305         5  125970
# 1007758:   GB     SkyLark 1.8 5.3  7.0  71.8 10.218   1266 0.8071         6  125970
# 1007759:   GB         MNT 2.3 5.9  6.1  85.6  9.316   1007 0.9251         7  125970
# 1007760:   GB   SKANKS224 1.4 7.6  7.4  52.5  7.565    954 0.7930         8  125970
要筛选到同一
团队中不超过三名玩家的
团队

my_teams <- FantasyTeams[, max(table(Team)) <= 3, by=team_no][V1==TRUE]$team_no
要节省一些按键和微秒,请用
V1==TRUE
替换
(V1)
。这是惯用的方式

从一组团队中恢复名册

library(data.table)
setDT(FantasyPlayers)

xx    <- combn(as.character(FantasyPlayers$Player), 8)
mxx   <- setDT(melt(xx, varnames=c("jersey_no", "team_no"), value.name="Player"))

head(mxx,10)
#     jersey_no team_no      Player
#  1:         1       1  ExoticDeer
#  2:         2       1     Supreme
#  3:         3       1        sasu
#  4:         4       1 eL lisasH 2
#  5:         5       1       Nisha
#  6:         6       1 Swiftending
#  7:         7       1     Pajkatt
#  8:         8       1  SexyBamboe
#  9:         1       2  ExoticDeer
# 10:         2       2     Supreme
FantasyTeams <- FantasyPlayers[mxx, on="Player"]

#          Team      Player   K   D    A    LH Points Salary    PPS jersey_no team_no
#       1:  ATN  ExoticDeer 6.1 3.3  6.4 306.9 22.209   1622 1.3692         1       1
#       2:  ATN     Supreme 6.8 5.3  7.1 229.4 21.954   1578 1.3913         2       1
#       3:  ATN        sasu 3.6 6.4 11.0  95.7 19.357   1244 1.5560         3       1
#       4:  ATN eL lisasH 2 2.6 6.1  7.9  29.7 12.037    998 1.2061         4       1
#       5:  ATN       Nisha 2.7 5.6  7.5  48.2 12.282    955 1.2861         5       1
#      ---                                                                           
# 1007756:   GB Keyser Soze 6.0 5.0  5.6 316.0 19.120   1602 1.1935         4  125970
# 1007757:   GB      Madara 5.4 5.3  6.6 334.5 19.405   1577 1.2305         5  125970
# 1007758:   GB     SkyLark 1.8 5.3  7.0  71.8 10.218   1266 0.8071         6  125970
# 1007759:   GB         MNT 2.3 5.9  6.1  85.6  9.316   1007 0.9251         7  125970
# 1007760:   GB   SKANKS224 1.4 7.6  7.4  52.5  7.565    954 0.7930         8  125970
要获取与每个团队相关联的名册,请加入/合并
mxx

mxx[.(team_no = my_new_teams), on="team_no"]
如果您希望在一行中列出玩家,如OP中所示:

mxx[.(team_no = my_new_teams), .(roster = toString(Player)), on="team_no", by=.EACHI]
如果您想要每个团队的汇总统计数据,则需要加入
FantasyTeams

FantasyTeams[.(team_no = my_new_teams), .(
  roster     = toString(Player),
  tot_salary = sum(Salary),
  tot_points = sum(Points)
), on="team_no", by=.EACHI]

#        team_no                                                              roster tot_salary tot_points
#     1:    3716      ExoticDeer, Supreme, sasu, Swiftending, EGM, Saksa, Noia, Ryze       9913    149.018
#     2:    3720       ExoticDeer, Supreme, sasu, Swiftending, EGM, Saksa, Noia, MNT       9983    146.168
#     3:    3721 ExoticDeer, Supreme, sasu, Swiftending, EGM, Saksa, Noia, SKANKS224       9930    144.417
#     4:    3725       ExoticDeer, Supreme, sasu, Swiftending, EGM, Saksa, Ryze, MNT       9950    145.173
#     5:    3726 ExoticDeer, Supreme, sasu, Swiftending, EGM, Saksa, Ryze, SKANKS224       9897    143.422
#    ---                                                                                                  
# 40202:  125663         EGM, Saksa, Miggel, Noia, Ryze, Keyser Soze, MNT, SKANKS224       8638    117.032
# 40203:  125664                EGM, Saksa, Miggel, Noia, Ryze, Madara, SkyLark, MNT       8925    119.970
# 40204:  125665          EGM, Saksa, Miggel, Noia, Ryze, Madara, SkyLark, SKANKS224       8872    118.219
# 40205:  125666              EGM, Saksa, Miggel, Noia, Ryze, Madara, MNT, SKANKS224       8613    117.317
# 40206:  125667             EGM, Saksa, Miggel, Noia, Ryze, SkyLark, MNT, SKANKS224       8302    108.130
要理解by=.EACHI
在做什么,需要一点背景知识。这里的合并语法是DT[i,j,on=cols,by=.EACHI]

  • 如果省略了
    j
    by
    ,它只进行合并,就像在
    FantasyTeams
    的构造中一样
  • 如果省略了
    by
    ,但包含了
    j
    ,则在合并后计算
    j
  • 如果
    by=.EACHI
    ,则对
    i
    中的每个值分别计算
    j

请给出dput(您的数据框)的结果。嗨,弗兰克,谢谢。几乎所有的一切都似乎都起了作用——我能够创建FantasyTeams,它反映了您发布的示例,然后能够将它们过滤到适当数量的团队。然而,当我去看我的新球队时,它只显示了一个球队的名单。不。我怎么能让它返回一个球员的名单,这些球员组成了这个球队的总工资和积分呢?@GodlikeRoy我刚刚在最后添加了一个部分,涵盖了这一点。哇,这真的很好!我对newdf2进行了子集筛选,筛选出工资>10000,积分>165,并获得了一个非常有用的样本,其中约20个团队符合我需要的所有标准。非常感谢。再次你好。我想知道你是否能帮我解决另一个问题。我尝试在一个新的数据帧上执行相同的代码,该数据帧有40行,列数相同。当我运行代码时,我收到了几条错误消息,说“无法分配大小为8.0 gb的向量”,现在每当我运行R时,我的计算机都会使用90%以上的ram(16gb)。我想数据框中有40个玩家,组合的数量太多了——那么你知道有什么解决办法吗?干杯。@GodlikeRoy你说得对。仅供参考,您可以使用
选择(40,8)
查看组合的数量。比率
choose(40,8)/choose(20,8)
表明您需要600倍的空间。@GodlikeRoy如果这个答案对您有效,您可以接受它,如帮助中心所述:如果您有一个与原来问的问题大不相同的问题,请随意将其作为新问题发布。