如何在R中转换数据
我有这样的数据如何在R中转换数据,r,R,我有这样的数据 Id Name gid GO:0019992 diacylglycerol binding 23025 GO:0019992 diacylglycerol binding 10497 GO:0045703 ketoreductase activity 8644 GO:0016519 gastric inhibitory peptide receptor activity
Id Name gid
GO:0019992 diacylglycerol binding 23025
GO:0019992 diacylglycerol binding 10497
GO:0045703 ketoreductase activity 8644
GO:0016519 gastric inhibitory peptide receptor activity 2696
GO:0035174 histone serine kinase activity 5562
GO:0035174 histone serine kinase activity 5563
GO:0035174 histone serine kinase activity 6795
GO:0030298 receptor signaling protein tyrosine kinase activator activity 6352
GO:0030292 protein tyrosine kinase inhibitor activity 11116
GO:0030292 protein tyrosine kinase inhibitor activity 10399
我得把这个换成
GO:0019992 diacylglycerol binding 23025 10497
GO:0045703 ketoreductase activity 8644
GO:0016519 gastric inhibitory peptide receptor activity 2696
GO:0035174 histone serine kinase activity 5562 5563 472 6790 9212 6795
GO:0035175 histone kinase activity (H3-S10 specific) 7443
GO:0030298 receptor signaling protein tyrosine kinase activator activity 6352
GO:0030292 protein tyrosine kinase inhibitor activity 11116 10399
如何在R中执行此操作?假设您的data.frame命名为
df
基数:
new_var <- unlist(
lapply(
split(df,f = df$Name),
function(x) paste0(x$gid, collapse= " ")
)
)
df <- df[unique(df[,1:2]),]
df$new_var <- new_var
new\u var假设您的data.frame命名为df
基数:
new_var <- unlist(
lapply(
split(df,f = df$Name),
function(x) paste0(x$gid, collapse= " ")
)
)
df <- df[unique(df[,1:2]),]
df$new_var <- new_var
new\u var您可以使用data.table到达那里:
library(data.table)
dt <- as.data.table(df) # where df is your table of GO terms
dt <- dt[,list(gids=paste(gid, collapse=" ")), by=list(Id, Name)]
库(data.table)
dt您可以使用数据到达那里。表:
library(data.table)
dt <- as.data.table(df) # where df is your table of GO terms
dt <- dt[,list(gids=paste(gid, collapse=" ")), by=list(Id, Name)]
库(data.table)
dt除了其他答案之外,这里还有另一种使用dplyr的方法:
library(dplyr)
df = df %>%
group_by(Id, Name) %>%
summarise(gids = paste(gid, collapse = " "))
除了其他答案,这里还有另一种使用dplyr
包的方法:
library(dplyr)
df = df %>%
group_by(Id, Name) %>%
summarise(gids = paste(gid, collapse = " "))
您不需要完整的dt,谢谢!斯科特。。。谢谢@Jaap我不知道setDT
你不需要完整的dt谢谢!斯科特。。。谢谢@Jaap我不知道setDT
@Downvoter:没有评论的Downvoting是一种糟糕的形式。对于新用户的第一个问题,这一点尤其正确。@Downvoter:没有评论的Downvoting是一种糟糕的形式。对于新用户的第一个问题,这一点尤其正确。