R 合并行&;数据帧中的列
诚然,我是R新手,但我已经设法获取了一个大型数据集,提取了我想要的数据,并使用plyr将其放入数据帧中。我一直在尝试合并(和计数)重复的行和列 作为一个例子,我有R 合并行&;数据帧中的列,r,plyr,dplyr,R,Plyr,Dplyr,诚然,我是R新手,但我已经设法获取了一个大型数据集,提取了我想要的数据,并使用plyr将其放入数据帧中。我一直在尝试合并(和计数)重复的行和列 作为一个例子,我有 > df X x.APPLES x.BANANAS x.PEARS x.ORANGES x.GRAPES x.KIWIS x.APPLES.1 x.ORANGES.1 1 A APPLES
> df
X x.APPLES x.BANANAS x.PEARS x.ORANGES x.GRAPES x.KIWIS x.APPLES.1 x.ORANGES.1
1 A APPLES
2 B APPLES
3 C APPLES
4 D BANANAS
5 E BANANAS
6 F BANANAS
7 G BANANAS
8 H PEARS ORANGES GRAPES
9 I PEARS ORANGES GRAPES
10 C PEARS ORANGES GRAPES
11 C PEARS ORANGES GRAPES
12 R PEARS ORANGES GRAPES
13 A KIWIS
14 B APPLES
15 Y APPLES
16 A ORANGES
17 J ORANGES
我想要
X x.APPLES x.BANANAS x.PEARS x.ORANGES x.GRAPES x.KIWIS COUNT
1 A APPLES (1) ORANGES (1) KIWIS (1) 3
2 B APPLES (2) 2
3 C APPLES (1) PEARS (1) ORANGES (2) GRAPES (2) 3
4 D BANANAS (1) 1
5 E BANANAS (1) 1
6 F BANANAS (1) 1
7 G BANANAS (1) 1
8 H PEARS (1) ORANGES (1) GRAPES (1) 1
9 I PEARS (1) ORANGES (1) GRAPES (1) 1
10 R PEARS (1) ORANGES (1) GRAPES (1) 1
11 Y APPLES (1) 1
12 J ORANGES (1) 1
13 COUNT 5 4 4 7 5 1 NA
这是我的实际代码:
library("jsonlite")
library("plyr")
anom <- fromJSON("https://api.fda.gov/drug/event.json?search=_exists_:seriousnesscongenitalanomali&limit=25")
reactions <- anom$results$patient$reaction
drugs <- llply(anom$results$patient$drug, function(x) x$medicinalproduct)
l <- mapply(c, reactions, drugs, SIMPLIFY=FALSE)
df <- ldply (l, data.frame)
library(“jsonlite”)
图书馆(“plyr”)
使用OP数据进行anom编辑:
我下载了您的实际数据,并将数据转换为两列data.frame,您可以使用下面的示例将其转换为所需的输出
require(jsonlite)
anom <- fromJSON("https://api.fda.gov/drug/event.json?search=_exists_:seriousnesscongenitalanomali&limit=5")
## Extract the reactions and drugs as character vectors
reactions <- lapply(anom$results$patient$reaction,
function(x) as.character(unlist(x)))
drugs <- lapply(anom$results$patient$drug,
function(x) as.character(unlist(x$medicinalproduct)))
## Use expand.grid to make subset data.frames with all drug/reaction
## combinations for every patient
l <- mapply(expand.grid, reactions, drugs, SIMPLIFY = FALSE)
## Collapse all the subset data.frames into one
two_col <- do.call(rbind, l)
require(jsonlite)
anom这看起来是一种非常奇怪和复杂的数据格式。原始的大型数据集是什么样子的?一个高的两列(X
和fruit
)data.frame会更容易使用。您能提供用于实现这一点的代码吗?你试过什么?如果你能让你的问题重现,你会更容易帮助你。谢谢你们的反馈,伙计们。我已经更新了这个问题,以显示我是如何操纵我的实际数据到达这里的。高高的、两列的方法似乎是个好主意。
require(reshape2)
fruits <- c("Bannana", "Apple", "Orange", "Grape", "Kiwi")
example <- data.frame(ID = sample(LETTERS[1:6], 25, replace = TRUE),
Fruit = sample(fruits, 25, replace = TRUE))
# > example
# ID Fruit
# 1 F Kiwi
# 2 A Apple
# 3 F Kiwi
# ...
dcast(example, ID~Fruit, length, value.var = "Fruit")
more_complex <- function(x) {
x_len <- length(x)
x <- paste0(unique(x), " (", x_len, ")")
x
}
dcast(example, ID~Fruit, more_complex, value.var = "Fruit")
# > dcast(example, ID~Fruit, more_complex, value.var = "Fruit")
# ID Apple Bannana Grape Kiwi Orange
# 1 A Apple (2) Bannana (2) Grape (2) (0) Orange (2)
# 2 B Apple (1) (0) (0) Kiwi (1) Orange (2)
# 3 C (0) Bannana (2) (0) Kiwi (1) Orange (1)
# 4 D (0) Bannana (1) (0) (0) Orange (1)
# 5 E (0) (0) Grape (1) Kiwi (1) (0)
# 6 F (0) Bannana (1) Grape (1) Kiwi (2) Orange (1)
another_option <- function(x) {
x_len <- length(x)
if (x_len == 0) return(NA_character_)
x <- paste0(unique(x), " (", x_len, ")")
x
}
dcast(example, ID~Fruit, another_option, value.var = "Fruit")
# > dcast(example, ID~Fruit, another_option, value.var = "Fruit")
# ID Apple Bannana Grape Kiwi Orange
# 1 A Apple (2) Bannana (2) Grape (2) <NA> Orange (2)
# 2 B Apple (1) <NA> <NA> Kiwi (1) Orange (2)
# 3 C <NA> Bannana (2) <NA> Kiwi (1) Orange (1)
# 4 D <NA> Bannana (1) <NA> <NA> Orange (1)
# 5 E <NA> <NA> Grape (1) Kiwi (1) <NA>
# 6 F <NA> Bannana (1) Grape (1) Kiwi (2) Orange (1)