Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/r/70.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/5/excel/25.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
R 列中任何单元格出现在任何其他列中的频率是多少?_R_Excel - Fatal编程技术网

R 列中任何单元格出现在任何其他列中的频率是多少?

R 列中任何单元格出现在任何其他列中的频率是多少?,r,excel,R,Excel,我有一个关于比较数据框中的列的问题。。。。 假设我有一些数据如下所示: Unique <- c("apple", "orange", "melon", "car", "mouse", "headphones", "light") a1 <- c("apple", "tomato", "banana", "dog", "cat", "headphones", "future") a2 <- c("apple", "orange", "pear", "monkey", "dog",

我有一个关于比较数据框中的列的问题。。。。 假设我有一些数据如下所示:

Unique <- c("apple", "orange", "melon", "car", "mouse", "headphones", "light")
a1 <- c("apple", "tomato", "banana", "dog", "cat", "headphones", "future")
a2 <- c("apple", "orange", "pear", "monkey", "dog", "cat", "river")
a3 <- c("tomato", "pineapple", "cherry", "car", "space", "mars", "rocket")
df <- data.frame(Unique, a1, a2, a3)
df
我试图回答的问题是:除了在Unique列中之外,“Unique”列的每个单元格在整个数据帧中出现的频率是多少?

我想要一个如下所示的输出:

 apple     2 
 orange    1 
  melon    0 
    car    1  
  mouse    0
headphones 0
  light    0
因为在除“唯一”列之外的整个数据帧中,苹果显示2次,橙色显示1次,甜瓜显示0次,以此类推

你将如何得到这个

此外,我们如何根据频率的数量对它们进行排序,比如从最高到最低

几天来我一直在想这个问题,但我就是无法解决。。。 任何帮助都将不胜感激


p、 同样,在R中,数据帧中的每个“单元”似乎都不是一个单元。。?我说得对吗?它们指的是什么,元素?

我们可以
取消列出除“唯一”之外的列,将其转换为
因子,将
级别指定为“唯一”,并在
基本R
中获得

table(factor(unlist(df[-1]), levels = df$Unique))
#      apple     orange      melon        car      mouse headphones      light 
#         2          1          0          1          0          1          0 

或者使用
tidyverse

library(dplyr)
library(tidyr)
df %>% 
   pivot_longer(cols = -Unique) %>%
   mutate(value = factor(value, levels = unique(Unique))) %>% 
   filter(!is.na(value)) %>%
   count(value, .drop = FALSE)
# A tibble: 7 x 2
#  value          n
#* <fct>      <int>
#1 apple          2
#2 orange         1
#3 melon          0
#4 car            1
#5 mouse          0
#6 headphones     1
#7 light          0
库(dplyr)
图书馆(tidyr)
df%>%
pivot_更长(cols=-唯一)%>%
变异(值=因子(值,级别=唯一(唯一)))%>%
过滤器(!is.na(值))%>%
计数(值,.drop=FALSE)
#一个tibble:7x2
#值n
#*       
#1苹果2
#2橙色1
#3.0
#4车1
#5鼠标0
#6个耳机1
#7灯0

这是一个基于tidyverse的解决方案

 Unique <- c("apple", "orange", "melon", "car", "mouse", "headphones", "light")
a1 <- c("apple", "tomato", "banana", "dog", "cat", "headphones", "future")
a2 <- c("apple", "orange", "pear", "monkey", "dog", "cat", "river")
a3 <- c("tomato", "pineapple", "cherry", "car", "space", "mars", "rocket")
df <- data.frame(Unique, a1, a2, a3,stringsAsFactors = FALSE)
df

library(tidyr)
library(dplyr)
df[,2:4] %>% pivot_longer(.,cols=c("a1","a2","a3")) %>% 
     group_by(value) %>% summarise(.,count = n()) %>% 
     right_join(.,df[1],by = c('value' = 'Unique')) %>% 
     mutate(count = ifelse(is.na(count),0,count))
Unique%
变异(count=ifelse(is.na(count),0,count))
…以及输出

# A tibble: 7 x 2
  value      count
  <chr>      <dbl>
1 apple          2
2 orange         1
3 melon          0
4 car            1
5 mouse          0
6 headphones     1
7 light          0
> 
#一个tible:7 x 2
价值计数
1苹果2
2橙色1
3.0
4车1
5鼠标0
6个耳机1
7灯0
> 
库(data.table)

将data.frame转换为data.table

setDT(df)
然后可以使用id=“Unique”融化data.table。这非常方便,因为对于
Unique
的每个值,在一列中有
df
所有列的值

##  melt(df,id.vars = "Unique")
##         Unique variable      value
##  1:      apple       a1      apple
##  2:     orange       a1     tomato
##  3:      melon       a1     banana
##  4:        car       a1        dog
##  5:      mouse       a1        cat
##  6: headphones       a1 headphones
##  7:      light       a1     future
##  8:      apple       a2      apple
##  9:     orange       a2     orange
## 10:      melon       a2       pear
## 11:        car       a2     monkey
## 12:      mouse       a2        dog
## 13: headphones       a2        cat
## 14:      light       a2      river
## 15:      apple       a3     tomato
## 16:     orange       a3  pineapple
## 17:      melon       a3     cherry
## 18:        car       a3        car
## 19:      mouse       a3      space
## 20: headphones       a3       mars
## 21:      light       a3     rocket
##         Unique variable      value
最后,对于Unique的每个值,我们只需计算Unique列中有多少值等于value

melt(df,id.vars = "Unique")[,sum(Unique==value),Unique]
##        Unique V1
## 1:      apple  2
## 2:     orange  1
## 3:      melon  0
## 4:        car  1
## 5:      mouse  0
## 6: headphones  1
## 7:      light  0

欢迎来到stack overflow,祝贺格式良好的第一篇文章。在R中,当引用表中的值时,有行和列。我仍然想为你以后的帖子推荐这些。非常感谢DJJ!谢谢你的指导方针;我会给它一个彻底的阅读…惊人的解决方案!谢谢你的回答!我对此非常激动:谢谢你的解决方案。我还必须学会正确使用tidyr和dplyr…我将使用您的解决方案进行学习。非常感谢。谢谢你的建议!我没想过要把数据表融化。非常感谢。
melt(df,id.vars = "Unique")[,sum(Unique==value),Unique]
##        Unique V1
## 1:      apple  2
## 2:     orange  1
## 3:      melon  0
## 4:        car  1
## 5:      mouse  0
## 6: headphones  1
## 7:      light  0