R 创建两个新列并删除源列
我有以下示例数据:R 创建两个新列并删除源列,r,dataframe,R,Dataframe,我有以下示例数据: df <- data.frame(ID=c("A1","A2","A3","A4","A1","A2","A3","A4"), NUM=c(469,586,394,595,398,203,604,809)) 我希望提取NUM列的第一个值并将其放在新列NUM1中,然后在同一ID的第二个NUM值出现时,将该值提取到新列NUM2中。最后,我想删除原始列。除了ID和NUM之外,我拥有的数据集还有更多的变量和列 df1 <- data.f
df <- data.frame(ID=c("A1","A2","A3","A4","A1","A2","A3","A4"),
NUM=c(469,586,394,595,398,203,604,809))
我希望提取NUM列的第一个值并将其放在新列NUM1中,然后在同一ID的第二个NUM值出现时,将该值提取到新列NUM2中。最后,我想删除原始列。除了ID和NUM之外,我拥有的数据集还有更多的变量和列
df1 <- data.frame(ID=c("A1","A2","A3","A4"),NUM1=c(469,586,394,595),NUM2=c(398,203,604,809))
这里有一种方法。您需要创建一个COL列作为新列的名称,因此在本例中,我们使用group_by和str_c来创建此列。pivot_wider是扩展函数的更新版本。所有这些功能都来自tidyverse软件包
library(tidyverse)
df1 <- df %>%
group_by(ID) %>%
mutate(COL = str_c("NUM", row_number())) %>%
pivot_wider(names_from = COL, values_from = NUM) %>%
ungroup()
df1
# # A tibble: 4 x 3
# ID NUM1 NUM2
# <fct> <dbl> <dbl>
# 1 A1 469 398
# 2 A2 586 203
# 3 A3 394 604
# 4 A4 595 809
这里有一种方法。您需要创建一个COL列作为新列的名称,因此在本例中,我们使用group_by和str_c来创建此列。pivot_wider是扩展函数的更新版本。所有这些功能都来自tidyverse软件包
library(tidyverse)
df1 <- df %>%
group_by(ID) %>%
mutate(COL = str_c("NUM", row_number())) %>%
pivot_wider(names_from = COL, values_from = NUM) %>%
ungroup()
df1
# # A tibble: 4 x 3
# ID NUM1 NUM2
# <fct> <dbl> <dbl>
# 1 A1 469 398
# 2 A2 586 203
# 3 A3 394 604
# 4 A4 595 809
您可以通过子集来获得每个ID的第一个和第二个值
library(dplyr)
df %>%
group_by(ID) %>%
summarise(NUM1 = NUM[1L],
NUM2 = NUM[2L])
# A tibble: 4 x 3
# ID NUM1 NUM2
# <fct> <dbl> <dbl>
#1 A1 469 398
#2 A2 586 203
#3 A3 394 604
#4 A4 595 809
您可以通过子集来获得每个ID的第一个和第二个值
library(dplyr)
df %>%
group_by(ID) %>%
summarise(NUM1 = NUM[1L],
NUM2 = NUM[2L])
# A tibble: 4 x 3
# ID NUM1 NUM2
# <fct> <dbl> <dbl>
#1 A1 469 398
#2 A2 586 203
#3 A3 394 604
#4 A4 595 809
使用base R,您可以执行以下操作:
reshape(transform(df,time=cumsum(grepl("1",ID))),idvar = "ID",dir="wide",sep="")
ID NUM1 NUM2
1 A1 469 398
2 A2 586 203
3 A3 394 604
4 A4 595 809
或者你可以试试:
`colnames<-`(t(unstack(df,NUM~ID)),c("NUM1","NUM2"))
NUM1 NUM2
A1 469 398
A2 586 203
A3 394 604
A4 595 809
使用base R,您可以执行以下操作:
reshape(transform(df,time=cumsum(grepl("1",ID))),idvar = "ID",dir="wide",sep="")
ID NUM1 NUM2
1 A1 469 398
2 A2 586 203
3 A3 394 604
4 A4 595 809
或者你可以试试:
`colnames<-`(t(unstack(df,NUM~ID)),c("NUM1","NUM2"))
NUM1 NUM2
A1 469 398
A2 586 203
A3 394 604
A4 595 809
@akrun雄辩的Base R解决方案:
df1 <- aggregate(NUM ~ ID, df, I)
我的基本R解决方案:
#Transform the dataframe:
df1 <- within(df, {
count_num_by_id <- ave(NUM, ID, FUN = seq.int);
NUM2 <- ifelse(count_num_by_id == 2, NUM, 0);
NUM <- ifelse(count_num_by_id == 1, NUM, 0);
rm(count_num_by_id)})
# Aggregate the dataframe:
df1 <- data.frame(aggregate(.~ID, df1, sum))
@akrun雄辩的Base R解决方案:
df1 <- aggregate(NUM ~ ID, df, I)
我的基本R解决方案:
#Transform the dataframe:
df1 <- within(df, {
count_num_by_id <- ave(NUM, ID, FUN = seq.int);
NUM2 <- ifelse(count_num_by_id == 2, NUM, 0);
NUM <- ifelse(count_num_by_id == 1, NUM, 0);
rm(count_num_by_id)})
# Aggregate the dataframe:
df1 <- data.frame(aggregate(.~ID, df1, sum))
数据表解决方案
require(data.table)
# Set as a data.table and create a unique row.
setDT(df)[, rid := paste0('NUM', rowid(ID))]
# Cast the data by ID and rid.
df <- dcast(df, ID ~ rid, value.var = 'NUM')
df
# ID NUM1 NUM2
# 1: A1 469 398
# 2: A2 586 203
# 3: A3 394 604
# 4: A4 595 809
数据表解决方案
require(data.table)
# Set as a data.table and create a unique row.
setDT(df)[, rid := paste0('NUM', rowid(ID))]
# Cast the data by ID and rid.
df <- dcast(df, ID ~ rid, value.var = 'NUM')
df
# ID NUM1 NUM2
# 1: A1 469 398
# 2: A2 586 203
# 3: A3 394 604
# 4: A4 595 809
下面是一种dcast方法,它直接在公式中调用rowid,还将处理df中的其他列:
注意调用rowid时的prefix=NUM参数
df中的附加列
OP指出他的数据集[…]除了ID和NUM之外还有更多的变量和列
如果每个ID的附加列的值相同,则+。。。将它们添加到输出中:
df2 <- data.frame(
ID = c("A1", "A2", "A3", "A4", "A1", "A2", "A3", "A4"),
NUM = c(469, 586, 394, 595, 398, 203, 604, 809),
other1 = rep(4:1, 2),
other2 = rep(letters[1:4], 2)
)
df2
下面是一种dcast方法,它直接在公式中调用rowid,还将处理df中的其他列:
注意调用rowid时的prefix=NUM参数
df中的附加列
OP指出他的数据集[…]除了ID和NUM之外还有更多的变量和列
如果每个ID的附加列的值相同,则+。。。将它们添加到输出中:
df2 <- data.frame(
ID = c("A1", "A2", "A3", "A4", "A1", "A2", "A3", "A4"),
NUM = c(469, 586, 394, 595, 398, 203, 604, 809),
other1 = rep(4:1, 2),
other2 = rep(letters[1:4], 2)
)
df2
@akrun是一个极好的解决方案。我已经修改了我上面的解决方案。我已经在上面修改了我的。