使用R/SQL无循环地重新排列数据
我想知道如何重新排列源数据表,以便使用R或SQL输出所需的表,如下所示 因为循环在R中非常慢,而且我的数据集非常大。。。不希望脚本中有太多的循环。效率很重要 源数据表:使用R/SQL无循环地重新排列数据,sql,r,dataset,reshape,large-data,Sql,R,Dataset,Reshape,Large Data,我想知道如何重新排列源数据表,以便使用R或SQL输出所需的表,如下所示 因为循环在R中非常慢,而且我的数据集非常大。。。不希望脚本中有太多的循环。效率很重要 源数据表: Date | Country | ID | Fruit | Favorite | Money 20120101 US 1 Apple Book 100 20120101 US 2 Orange Knife 150 20120101 US
Date | Country | ID | Fruit | Favorite | Money
20120101 US 1 Apple Book 100
20120101 US 2 Orange Knife 150
20120101 US 3 Banana Watch 80
20120101 US 4 Melon Water 90
20120102 US 1 Apple Phone 120
20120102 US 2 Apple Knife 130
20120102 US 3 Banana Watch 100
..... ...... .. ..... ...... ......
输出表:
Date | Country | Field | ID 1 | ID 2 | ID 3 | ID 4
20120101 US Fruit Apple Orange Banana Melon
20120101 US Favorite Book Knife Watch Water
20120101 US Money 100 150 80 90
20120102 US Fruit Apple Apple Banana N.A.
.... .... .... .... .... .... ....
以下是R中的一种方法,使用您的示例数据:
x <- cbind(mydf[, c("Date", "Country", "ID")],
stack(mydf[, c("Fruit", "Favorite", "Money")]))
reshape(x, direction = "wide", idvar = c("Date", "Country", "ind"), timevar="ID")
# Date Country ind values.1 values.2 values.3 values.4
# 1 20120101 US Fruit Apple Orange Banana Melon
# 5 20120102 US Fruit Apple Apple Banana <NA>
# 8 20120101 US Favorite Book Knife Watch Water
# 12 20120102 US Favorite Phone Knife Watch <NA>
# 15 20120101 US Money 100 150 80 90
# 19 20120102 US Money 120 130 100 <NA>
在本回答中,mydf的定义如下:
mydf <- structure(
list(Date = c(20120101L, 20120101L, 20120101L,
20120101L, 20120102L, 20120102L, 20120102L),
Country = c("US", "US", "US", "US", "US", "US", "US"),
ID = c(1L, 2L, 3L, 4L, 1L, 2L, 3L),
Fruit = c("Apple", "Orange", "Banana", "Melon",
"Apple", "Apple", "Banana"),
Favorite = c("Book", "Knife", "Watch", "Water",
"Phone", "Knife", "Watch"),
Money = c(100L, 150L, 80L, 90L, 120L, 130L, 100L)),
.Names = c("Date", "Country", "ID",
"Fruit", "Favorite", "Money"),
class = "data.frame", row.names = c(NA, -7L))
到目前为止你试过什么?请出示您的密码。看看如何使你的代码可复制的技巧。我试过的代码放在这里。我使用了两次循环来产生它。它看起来和你想要的一样,谢谢!如果国家/地区字段有多个值,例如US、KR、HKetc,它是否也起作用?@C.T.,它应该起作用。你为什么不先在一小部分数据上试试呢?是的,它适用于国家的多种价值观。对于庞大的数据帧,完成过程需要时间。@C.T.仅供参考,我发现您试图实现的目标结构很难使用。长格式通常对用户更友好。您想详细介绍长格式吗?我同意即使使用简洁的脚本也很难实现这种结构。
mydf <- structure(
list(Date = c(20120101L, 20120101L, 20120101L,
20120101L, 20120102L, 20120102L, 20120102L),
Country = c("US", "US", "US", "US", "US", "US", "US"),
ID = c(1L, 2L, 3L, 4L, 1L, 2L, 3L),
Fruit = c("Apple", "Orange", "Banana", "Melon",
"Apple", "Apple", "Banana"),
Favorite = c("Book", "Knife", "Watch", "Water",
"Phone", "Knife", "Watch"),
Money = c(100L, 150L, 80L, 90L, 120L, 130L, 100L)),
.Names = c("Date", "Country", "ID",
"Fruit", "Favorite", "Money"),
class = "data.frame", row.names = c(NA, -7L))