在R和dplyr中将列变量展平为是/否
我有一个数据框中各种人的简历数据集。每行是一个新人条目,有多个列(学校、职位、出生城市等)。我想为这些人建立一个邻接矩阵,所以我正在寻找一种方法,将列变量“展平”为是/否 例如,数据框的一个片段如下所示:在R和dplyr中将列变量展平为是/否,r,dataframe,dplyr,adjacency-matrix,R,Dataframe,Dplyr,Adjacency Matrix,我有一个数据框中各种人的简历数据集。每行是一个新人条目,有多个列(学校、职位、出生城市等)。我想为这些人建立一个邻接矩阵,所以我正在寻找一种方法,将列变量“展平”为是/否 例如,数据框的一个片段如下所示: Name: City_of_birth: Job Title: Person1 'New York', 'Librarian' Person2 'Shanghai', 'Secretary' Person3 'Tokyo', 'Engine
Name: City_of_birth: Job Title:
Person1 'New York', 'Librarian'
Person2 'Shanghai', 'Secretary'
Person3 'Tokyo', 'Engineer'
Person4 'Lagos', 'CEO'
Person5 'Atlanta' 'Mayor'
我想转换数据框架,以便有新的列标题“纽约”、“上海”、“东京”。。。以及与每行(人员)关联的是/否值
我对R很陌生,所以我愿意使用任何工具来做这件事。提前多谢 在base R中,您可以执行以下操作:
a<-table(cbind(df[1],unlist(df[-1])))
a[]<- ifelse(!a,"no","yes")
a
Atlanta Lagos New York Shanghai Tokyo CEO Engineer Librarian Mayor Secretary
Person1 no no yes no no no no yes no no
Person2 no no no yes no no no no no yes
Person3 no no no no yes no yes no no no
Person4 no yes no no no yes no no no no
Person5 yes no no no no no no no yes no
a您可能希望将出生城市和工作职位的列组合在一起。我们可以创建一个列,其中包含“Yes”
,然后使用pivot\u wide
将数据转换为宽格式
library(dplyr)
library(tidyr)
df %>%
mutate(value = "Yes") %>%
pivot_wider(names_from = c('City_of_birth', 'Job_Title'),
values_from = value,
values_fill = list(value = "No"))
# A tibble: 5 x 6
# Name NewYork_Librarian Shanghai_Secretary Tokyo_Engineer Lagos_CEO Atlanta_Mayor
# <fct> <chr> <chr> <chr> <chr> <chr>
#1 Person1 Yes No No No No
#2 Person2 No Yes No No No
#3 Person3 No No Yes No No
#4 Person4 No No No Yes No
#5 Person5 No No No No Yes
库(dplyr)
图书馆(tidyr)
df%>%
变异(value=“Yes”)%>%
pivot\u wider(姓名从=c(‘出生城市’、‘职位’),
值\u from=值,
值\u fill=list(value=“No”))
#一个tibble:5x6
#姓名纽约大学图书馆员上海大学秘书长东京大学工程师拉各斯大学首席执行官亚特兰大市长
#
#1个人1是否否否
#2人2否否否否
#3个人3不不是不是
#4个人4不不不是不
#5个人5不不不不是
数据
df <- structure(list(Name = structure(1:5, .Label = c("Person1", "Person2",
"Person3", "Person4", "Person5"), class = "factor"), City_of_birth =
structure(c(3L, 4L, 5L, 2L, 1L), .Label = c("Atlanta", "Lagos", "NewYork",
"Shanghai", "Tokyo"), class = "factor"), Job_Title = structure(c(3L, 5L,
2L, 1L, 4L), .Label = c("CEO", "Engineer", "Librarian", "Mayor",
"Secretary"), class = "factor")), class = "data.frame", row.names = c(NA, -5L))
df
df <- structure(list(Name = structure(1:5, .Label = c("Person1", "Person2",
"Person3", "Person4", "Person5"), class = "factor"), City_of_birth =
structure(c(3L, 4L, 5L, 2L, 1L), .Label = c("Atlanta", "Lagos", "NewYork",
"Shanghai", "Tokyo"), class = "factor"), Job_Title = structure(c(3L, 5L,
2L, 1L, 4L), .Label = c("CEO", "Engineer", "Librarian", "Mayor",
"Secretary"), class = "factor")), class = "data.frame", row.names = c(NA, -5L))