Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/r/69.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
R 如何将值转换为变量,并根据它们的等级为它们赋值?_R_Dataframe_Dplyr_Ranking - Fatal编程技术网

R 如何将值转换为变量,并根据它们的等级为它们赋值?

R 如何将值转换为变量,并根据它们的等级为它们赋值?,r,dataframe,dplyr,ranking,R,Dataframe,Dplyr,Ranking,给定如下数据帧: df <- data.frame(ID = seq(1,8), rank1 = c("apple", "NA", "banana", "kiwi", "peach", "kiwi", "kiwi", "grape"), rank2 = c('mango', 'NA', 'date', 'grape', 'kiw

给定如下数据帧:

df <- data.frame(ID = seq(1,8), 
rank1 = c("apple", "NA", "banana", "kiwi", "peach", "kiwi", "kiwi", "grape"), 
rank2 = c('mango', 'NA', 'date', 'grape', 'kiwi', 'apple', 'pear', 'NA'), 
rank3 = c('kiwi', 'NA', 'apple ', 'peach', 'banana', 'NA', 'mango', 'NA'))

ID  rank1 rank2  rank3  
1  1  apple mango   kiwi  
2  2     NA    NA     NA  
3  3 banana  date apple   
4  4   kiwi grape  peach  
5  5  peach  kiwi banana  
6  6   kiwi apple     NA
  ID apple mango  kiwi banana  date grape peach pear 
1  1 rank1 rank2 rank3     NA    NA    NA    NA   NA  
2  2    NA    NA    NA     NA    NA    NA    NA   NA  
3  3 rank3    NA    NA  rank1 rank2    NA    NA   NA  
4  4    NA    NA rank1     NA    NA rank2 rank3   NA  
5  5    NA    NA rank2  rank3    NA    NA rank1   NA  
6  6 rank2    NA rank1     NA    NA    NA    NA   NA

如何使用以下公式为排名位置分配权重:n-r+1(n=标准数量,r=排名位置)?

一种方法是将原始数据帧重新调整为更长的格式,然后再重新调整为宽格式切换变量

library(tidyr)
library(dplyr)

#pivot longer
dfl <- pivot_longer(df, starts_with("rank"), names_to="rank", values_to = "fruit")

#clean up data
dfl$fruit <- trimws(dfl$fruit) 
#dfl <- dfl[dfl$fruit != "NA",]  #optional

#reshape wider
pivot_wider(dfl, ID, names_from = "fruit", values_from = "rank", values_fn = first)
# # A tibble: 8 x 10
#    ID apple mango kiwi  `NA`  banana date  grape peach pear 
# <int> <chr> <chr> <chr> <chr> <chr>  <chr> <chr> <chr> <chr>
#     1     1 rank1 rank2 rank3 NA    NA     NA    NA    NA    NA   
#     2     2 NA    NA    NA    rank1 NA     NA    NA    NA    NA   
#     3     3 rank3 NA    NA    NA    rank1  rank2 NA    NA    NA   
#     4     4 NA    NA    rank1 NA    NA     NA    rank2 rank3 NA   
#     5     5 NA    NA    rank2 NA    rank3  NA    NA    rank1 NA   
#     6     6 rank2 NA    rank1 rank3 NA     NA    NA    NA    NA   
#     7     7 NA    rank3 rank1 NA    NA     NA    NA    NA    rank2
#     8     8 NA    NA    NA    rank2 NA     NA    rank1 NA    NA    
library(tidyr)
图书馆(dplyr)
#支点更长

dfl我相信这个
dplyr/tidyr
管道计算的是等级,而不是问题中提到的权重

library(tidyverse)

df %>%
  pivot_longer(
    cols = starts_with('rank'),
    names_to = 'rank',
    values_to = 'fruit'
  ) %>%
  mutate(rank = as.integer(sub('^rank', '', rank)),
         fruit = trimws(fruit)) %>%
  filter(!is.na(fruit), fruit != 'NA') %>%
  pivot_wider(
    id_cols = ID,
    names_from = fruit,
    values_from = rank
  )
## A tibble: 7 x 9
#     ID apple mango  kiwi banana  date grape peach  pear
#  <int> <int> <int> <int>  <int> <int> <int> <int> <int>
#1     1     1     2     3     NA    NA    NA    NA    NA
#2     3     3    NA    NA      1     2    NA    NA    NA
#3     4    NA    NA     1     NA    NA     2     3    NA
#4     5    NA    NA     2      3    NA    NA     1    NA
#5     6     2    NA     1     NA    NA    NA    NA    NA
#6     7    NA     3     1     NA    NA    NA    NA     2
#7     8    NA    NA    NA     NA    NA     1    NA    NA
库(tidyverse)
df%>%
再长一点(
cols=以('rank')开头,
name_to='rank',
值_to=‘水果’
) %>%
mutate(rank=as.integer(sub(“^rank”,”,rank)),
水果=trimws(水果))%>%
过滤器(!is.na(水果),水果!='na')%>%
支点更宽(
id_cols=id,
名字来自=水果,
值从=秩
)
##一个tibble:7x9
#苹果芒果猕猴桃香蕉枣葡萄桃梨
#           
#1 1 2 3纳纳纳纳纳纳纳纳纳纳纳纳纳纳纳纳纳纳纳纳纳纳纳纳纳纳纳纳纳纳纳纳纳纳纳纳纳纳纳纳纳纳纳纳纳纳纳纳纳纳纳纳纳纳纳纳纳
#2 3 3 NA 1 2 NA NA
#3 4钠1钠2 3钠
#4 5钠2 3钠1钠
#5 6 2 NA 1 NA NA NA NA
#6 7 NA 3 1 NA NA 2
#7 8 NA NA NA 1 NA NA

这里是一个使用
堆栈的基本R选项
+
重塑

dfout <- reshape(
  subset(
    cbind(stack(df[-1]), id = df$ID),
    values != "NA"
  ),
  direction = "wide",
  idvar = "id",
  timevar = "values"
)

dfout <- setNames(dfout,gsub("ind\\.","",names(dfout)))

dfout我看到了你的输入,我看到了你想要的输出,这一切看起来都很好,很有意义。但是你会说“我如何使用公式为排名的位置分配权重:n-r+1(n=标准数量,r=排名的位置)?”,这没有意义。这是否已经是您期望的输出的一部分,而我没有看到它?或者您是否有另一个未显示的所需输出?条件数是否是第二个数据集中的果列数?很抱歉,描述不正确。例如,在第3行中,有3个水果排名,这是标准的数量。在这里,我会给“苹果”分配3-3+1=1的权重,给第6行分配2-2+1=1的权重。
   id apple banana  kiwi peach grape mango  date  pear apple
1  1 rank1   <NA> rank3  <NA>  <NA> rank2  <NA>  <NA>   <NA>
3  3  <NA>  rank1  <NA>  <NA>  <NA>  <NA> rank2  <NA>  rank3
4  4  <NA>   <NA> rank1 rank3 rank2  <NA>  <NA>  <NA>   <NA>
5  5  <NA>  rank3 rank2 rank1  <NA>  <NA>  <NA>  <NA>   <NA>
6  6 rank2   <NA> rank1  <NA>  <NA>  <NA>  <NA>  <NA>   <NA>
7  7  <NA>   <NA> rank1  <NA>  <NA> rank3  <NA> rank2   <NA>
8  8  <NA>   <NA>  <NA>  <NA> rank1  <NA>  <NA>  <NA>   <NA>