R 当存在连接值时,请根据其在另一列中的值选择一个特定的连接值
我在R工作室工作 我有一个数据集,看起来像这样R 当存在连接值时,请根据其在另一列中的值选择一个特定的连接值,r,R,我在R工作室工作 我有一个数据集,看起来像这样 Condition TargetWord WordProduced WPcondition realValue 1 Target1 table P .009 1 Target1 word P
Condition TargetWord WordProduced WPcondition realValue
1 Target1 table P .009
1 Target1 word P .025
1 Target1 chair P .005
1 Target1 pole Q .015
1 Target1 skate Q .023
1 Target2 car Q .014
1 Target2 house P .014
1 Target2 shoes P .019
1 Target2 girl Q .011
1 Target2 life Q .020
1 Target3 computer Q .007
1 Target3 ball Q .007
1 Target3 court P .009
1 Target3 plane Q .035
1 Target3 sky O .008
2 Target4 tree P .051
2 Target4 five P .051
2 Target4 help Q .003
2 Target4 shave Q .006
2 Target4 love P .028
2 Target5 three P .056
2 Target5 file Q .056
2 Target5 hemp P .003
2 Target5 share P .006
2 Target5 long Q .028
2 Target6 ten Q .058
2 Target6 friend P .051
2 Target6 hail Q .003
2 Target6 shine P .006
2 Target6 loner P .028
Condition TargetWord WordProduced WPcondition realValue
1 Target1 word P .025
1 Target2 house P .014
1 Target3 computer Q .007
1 Target4 tree P .051
1 Target5 three P .056
1 Target6 ten Q .058
所以,每个目标重复五次,我需要过滤第一次。我的问题是,如果前两个位置的reaValue相同(.014和.014),我需要在wp条件下的reaValue为P
也就是说,在过滤第一个位置之前,如果我在前两个位置内有一个realValue的tie,那么我需要查看左侧的列(WPcondition)以查看其中一个是否是p。如果其中一个是p,那么我需要将该列放在第一个位置
比如说
1position P .05
2position P .05
(stay with the one that it is in the first position)
1position Q .05
2position P .05
(Use the one that it is in the second position because it has a P)
1position Q .05
2position Q .05
(stay with the one that it is in the first position)
1position P .05
2position Q .05
(stay with the one that it is in the first position)
1position P .06
2position Q .05
(stay with the one that it is in the first position because the realValue is higher)
1position Q .06
2position P .05
(stay with the one that it is in the first position because the realValue is higher)
因此,我需要选择一个更高的值,但如果值相同,我们需要考虑p&Q值,如果有p,则选择该值
考虑到上述数据,我预计会出现类似的情况
Condition TargetWord WordProduced WPcondition realValue
1 Target1 table P .009
1 Target1 word P .025
1 Target1 chair P .005
1 Target1 pole Q .015
1 Target1 skate Q .023
1 Target2 car Q .014
1 Target2 house P .014
1 Target2 shoes P .019
1 Target2 girl Q .011
1 Target2 life Q .020
1 Target3 computer Q .007
1 Target3 ball Q .007
1 Target3 court P .009
1 Target3 plane Q .035
1 Target3 sky O .008
2 Target4 tree P .051
2 Target4 five P .051
2 Target4 help Q .003
2 Target4 shave Q .006
2 Target4 love P .028
2 Target5 three P .056
2 Target5 file Q .056
2 Target5 hemp P .003
2 Target5 share P .006
2 Target5 long Q .028
2 Target6 ten Q .058
2 Target6 friend P .051
2 Target6 hail Q .003
2 Target6 shine P .006
2 Target6 loner P .028
Condition TargetWord WordProduced WPcondition realValue
1 Target1 word P .025
1 Target2 house P .014
1 Target3 computer Q .007
1 Target4 tree P .051
1 Target5 three P .056
1 Target6 ten Q .058
任何帮助都会很好
谢谢。似乎您可以在“TargetWord”和“realValue”上对数据帧进行排序,然后在这些组中,在“WPcondition”上进行排序。
order
函数创建索引排序顺序向量。然后在每个TargetWord中拾取第一个项目
txt <-
"Condition TargetWord WordProduced WPcondition realValue
1 Target1 table P .009
1 Target1 word P .025
1 Target1 chair P .005
1 Target1 pole Q .015
1 Target1 skate Q .023
1 Target2 car Q .014
1 Target2 house P .014
1 Target2 shoes P .019
1 Target2 girl Q .011
1 Target2 life Q .020
1 Target3 computer Q .007
1 Target3 ball Q .007
1 Target3 court P .009
1 Target3 plane Q .035
1 Target3 sky O .008
2 Target4 tree P .051
2 Target4 five P .051
2 Target4 help Q .003
2 Target4 shave Q .006
2 Target4 love P .028
2 Target5 three P .056
2 Target5 file Q .056
2 Target5 hemp P .003
2 Target5 share P .006
2 Target5 long Q .028
2 Target6 ten Q .058
2 Target6 friend P .051
2 Target6 hail Q .003
2 Target6 shine P .006
2 Target6 loner P .028"
txt-dat类似于IRTFM的答案,但带有data.table
package
library(data.table)
setDT(dat)
dat[order(Condition,TargetWord,realValue,WPcondition),.SD[1,],"TargetWord"]
TargetWord Condition WordProduced WPcondition realValue
1: Target1 1 chair P 0.005
2: Target2 1 girl Q 0.011
3: Target3 1 computer Q 0.007
4: Target4 2 help Q 0.003
5: Target5 2 hemp P 0.003
6: Target6 2 hail Q 0.003
>
如果我清楚地理解了您的意思,您希望为每个TargetWord
选择具有最高realValue
的行,并且如果realValue
与Q
有关联,请使用p
值
利用“p”<“Q”
这一事实,我们可以-
library(dplyr)
df %>%
arrange(Condition, TargetWord, desc(realValue), WPcondition) %>%
group_by(Condition, TargetWord) %>%
slice(1L) %>%
ungroup
# Condition TargetWord WordProduced WPcondition realValue
# <int> <chr> <chr> <chr> <dbl>
#1 1 Target1 word P 0.025
#2 1 Target2 life Q 0.02
#3 1 Target3 plane Q 0.035
#4 2 Target4 tree P 0.051
#5 2 Target5 three P 0.056
#6 2 Target6 ten Q 0.058
库(dplyr)
df%>%
排列(条件、目标、描述(真实值)、WPcondition)%>%
分组依据(条件,目标)%>%
切片(1L)%>%
解组
#条件TargetWord生成的WPcondition realValue
#
#1目标词P 0.025
#2 1 Target2寿命Q 0.02
#3 1目标3飞机Q 0.035
#4 2目标4树P 0.051
#5 2目标5三个P 0.056
#6.2目标6十Q 0.058
对于data.table的新用户来说可能更清楚,可以将第三个参数的名称添加到[.data.table
。还可以解释.SD[1,]
正在做什么以及如何评估?对于Target1,最高值是0.025
。为什么选择.009
?而且0.009的wordproducted
是表
,而不是word
,如您所示。感谢您的观察。这是我的错误。单词产生了“word”是您提到的值最高的(0.025),但我写错了.009。我将对其进行编辑。