R-将列转换为行标题,并为每条记录将该标题的存在填充为真/假
我有一个如下所示的数据框:R-将列转换为行标题,并为每条记录将该标题的存在填充为真/假,r,dataframe,transpose,R,Dataframe,Transpose,我有一个如下所示的数据框: +-----------+------------+-----------+-----+----------------+ | Unique ID | First Name | Last Name | Age | Characteristic | +-----------+------------+-----------+-----+----------------+ | 1 | Bob | Smith | 25 | Intel
+-----------+------------+-----------+-----+----------------+
| Unique ID | First Name | Last Name | Age | Characteristic |
+-----------+------------+-----------+-----+----------------+
| 1 | Bob | Smith | 25 | Intelligent |
| 1 | Bob | Smith | 25 | Funny |
| 1 | Bob | Smith | 25 | Short |
| 2 | Jim | Murphy | 62 | Tall |
| 2 | Jim | Murphy | 62 | Funny |
| 3 | Kelly | Green | 33 | Tall |
+-----------+------------+-----------+-----+----------------+
我想将“Characteristic”列转换为一个行标题,对于每个记录中存在的该特征,如果有,则用1填充,如果没有,则用0填充,这样每个记录只有1行,并且我的输出如下所示:
+-----------+------------+-----------+-----+-------------+-------+-------+------+
| Unique ID | First Name | Last Name | Age | Intelligent | Funny | Short | Tall |
+-----------+------------+-----------+-----+-------------+-------+-------+------+
| 1 | Bob | Smith | 25 | 1 | 1 | 1 | 0 |
| 2 | Jim | Murphy | 62 | 0 | 1 | 0 | 1 |
| 3 | Kelly | Green | 33 | 0 | 0 | 0 | 1 |
+-----------+------------+-----------+-----+-------------+-------+-------+------+
更易消耗的数据,以及使用
dplyr
和tidyr
的解决方案:
library(dplyr)
library(tidyr)
read.table(header=TRUE, stringsAsFactors=FALSE, text="
Unique_ID First_Name Last_Name Age Characteristic
1 Bob Smith 25 Intelligent
1 Bob Smith 25 Funny
1 Bob Smith 25 Short
2 Jim Murphy 62 Tall
2 Jim Murphy 62 Funny
3 Kelly Green 33 Tall") %>%
mutate(v = 1L) %>%
tidyr::spread(Characteristic, v, fill=0L)
# Unique_ID First_Name Last_Name Age Funny Intelligent Short Tall
# 1 1 Bob Smith 25 1 1 1 0
# 2 2 Jim Murphy 62 1 0 0 1
# 3 3 Kelly Green 33 0 0 0 1
大部分工作都是通过
spread
完成的。不幸的是,对于所有空点,这都是NA
而不是0
。如果你能接受它,你就是好人。(根据@www的建议进行编辑。)更易消耗的数据,以及使用dplyr
和tidyr
的解决方案:
library(dplyr)
library(tidyr)
read.table(header=TRUE, stringsAsFactors=FALSE, text="
Unique_ID First_Name Last_Name Age Characteristic
1 Bob Smith 25 Intelligent
1 Bob Smith 25 Funny
1 Bob Smith 25 Short
2 Jim Murphy 62 Tall
2 Jim Murphy 62 Funny
3 Kelly Green 33 Tall") %>%
mutate(v = 1L) %>%
tidyr::spread(Characteristic, v, fill=0L)
# Unique_ID First_Name Last_Name Age Funny Intelligent Short Tall
# 1 1 Bob Smith 25 1 1 1 0
# 2 2 Jim Murphy 62 1 0 0 1
# 3 3 Kelly Green 33 0 0 0 1
大部分工作都是通过
spread
完成的。不幸的是,对于所有空点,这都是NA
而不是0
。如果你能接受它,你就是好人。(根据@www的建议编辑。)这里是另一个tidyverse
解决方案
df %>%
mutate(ind = 1L) %>%
spread(Characteristic, ind, fill = 0L)
# Unique.ID First.Name Last.Name Age Funny Intelligent Short Tall
# 1 1 Bob Smith 25 1 1 1 0
# 2 2 Jim Murphy 62 1 0 0 1
# 3 3 Kelly Green 33 0 0 0 1
当每个案例有一个以上的实例时,您还可以使用
reformae2
来说明该案例
library(reshape2)
dcast(df, ...~Characteristic, fun.aggregate = length)
数据
df <- read.table(text = "Unique ID | First Name | Last Name | Age | Characteristic
1 | Bob | Smith | 25 | Intelligent
1 | Bob | Smith | 25 | Funny
1 | Bob | Smith | 25 | Short
2 | Jim | Murphy | 62 | Tall
2 | Jim | Murphy | 62 | Funny
3 | Kelly | Green | 33 | Tall ", sep = "|", header = T, strip.white = T, stringsAsFactors = F)
df这里是另一个tidyverse
解决方案
df %>%
mutate(ind = 1L) %>%
spread(Characteristic, ind, fill = 0L)
# Unique.ID First.Name Last.Name Age Funny Intelligent Short Tall
# 1 1 Bob Smith 25 1 1 1 0
# 2 2 Jim Murphy 62 1 0 0 1
# 3 3 Kelly Green 33 0 0 0 1
当每个案例有一个以上的实例时,您还可以使用reformae2
来说明该案例
library(reshape2)
dcast(df, ...~Characteristic, fun.aggregate = length)
数据
df <- read.table(text = "Unique ID | First Name | Last Name | Age | Characteristic
1 | Bob | Smith | 25 | Intelligent
1 | Bob | Smith | 25 | Funny
1 | Bob | Smith | 25 | Short
2 | Jim | Murphy | 62 | Tall
2 | Jim | Murphy | 62 | Funny
3 | Kelly | Green | 33 | Tall ", sep = "|", header = T, strip.white = T, stringsAsFactors = F)
df谢谢@www,我记得以前见过,太好了。在看到@r2evans和你的评论@www之前,我正在写我未经编辑的答案。无论如何,我添加了一个reforme2
备选方案。谢谢@www,我记得以前见过,太好了。在看到@r2evans和你的评论@www之前,我正在写我未经编辑的答案。无论如何,我添加了一个重塑2
备选方案。为了一致性,我建议变异(ind=1L)
或fill=0
@r2evans Done.:)为了一致性,我建议变异(ind=1L)
或fill=0
@r2evans Done.)