将哈希表/字典/数组格式的数据转换为基于正则列的data.frame
我是R的初学者,以前从未处理过这些类型的数据。我有以下两种类型的样本数据集(df1和df2),如下所示:将哈希表/字典/数组格式的数据转换为基于正则列的data.frame,r,dataframe,dictionary,hashtable,R,Dataframe,Dictionary,Hashtable,我是R的初学者,以前从未处理过这些类型的数据。我有以下两种类型的样本数据集(df1和df2),如下所示: df1 <- c("{\"\"Wednesday\"\":4,\"\"Monday\"\":5,\"\"Saturday\"\":4,\"\"Thursday\"\":4,\"\"Tuesday\&
df1 <- c("{\"\"Wednesday\"\":4,\"\"Monday\"\":5,\"\"Saturday\"\":4,\"\"Thursday\"\":4,\"\"Tuesday\"\":5,\"\"Friday\"\":1,\"\"Sunday\"\":5,\"\"Missing day\"\":2}",
"{\"\"Wednesday\"\":6,\"\"Monday\"\":5,\"\"Saturday\"\":2,\"\"Thursday\"\":6,\"\"Tuesday\"\":0,\"\"Friday\"\":2,\"\"Sunday\"\":4,\"\"Missing day\"\":1}",
"{\"\"Wednesday\"\":5,\"\"Monday\"\":5,\"\"Saturday\"\":3,\"\"Thursday\"\":8,\"\"Tuesday\"\":4,\"\"Friday\"\":3,\"\"Sunday\"\":6,\"\"Missing day\"\":4}",
"{\"\"Wednesday\"\":3,\"\"Monday\"\":5,\"\"Saturday\"\":4,\"\"Thursday\"\":1,\"\"Tuesday\"\":5,\"\"Friday\"\":4,\"\"Sunday\"\":4,\"\"Missing day\"\":6}")
df2 <- c("[373,357,382,411,310,315,330,385,367,396,402,348,354,343,392,395,392,401,376,448,341,373,369,304,298,332,366,287,334,222]",
"[319,347,284,313,300,292,228,322,291,275,278,289,323,342,272,242,295,347,290,343,337,309,268,251,256,266,346,260,232,160]",
"[165,154,161,152,164,152,156,150,137,170,147,210,235,190,176,175,191,186,209,157,210,199,162,149,162,165,174,171,178,126]",
"[253,274,240,258,264,231,296,233,230,252,210,233,233,295,235,229,270,275,278,297,255,253,250,252,299,305,310,308,263,141]")
df1因此您可以使用网状或jsonlite。我将使用Jsonlite,如下所示:
对于df1
:
df1_f <- jsonlite::fromJSON(gsub('"+','"',sprintf("[%s]", paste0(df1, collapse = ","))))
data.frame(Day = names(df1_f), `colnames<-`(t(df1_f), paste0("count",1:4)), row.names = NULL)
Day count1 count2 count3 count4
1 Wednesday 4 6 5 3
2 Monday 5 5 5 5
3 Saturday 4 2 3 4
4 Thursday 4 6 8 1
5 Tuesday 5 0 4 5
6 Friday 1 2 3 4
7 Sunday 5 4 6 4
8 Missing day 2 1 4 6
因此,您可以使用网状或jsonlite。我将使用Jsonlite,如下所示:
对于df1
:
df1_f <- jsonlite::fromJSON(gsub('"+','"',sprintf("[%s]", paste0(df1, collapse = ","))))
data.frame(Day = names(df1_f), `colnames<-`(t(df1_f), paste0("count",1:4)), row.names = NULL)
Day count1 count2 count3 count4
1 Wednesday 4 6 5 3
2 Monday 5 5 5 5
3 Saturday 4 2 3 4
4 Thursday 4 6 8 1
5 Tuesday 5 0 4 5
6 Friday 1 2 3 4
7 Sunday 5 4 6 4
8 Missing day 2 1 4 6
非常感谢。这对我有用。您是否可以建议使用其他软件包(如jsonlite)来解析这些类型的数据结构,或者提供了解这些软件包的链接?@smk。您首先必须识别数据的结构。Ie第一个看起来像python中的字典列表,而第二个看起来像列表列表列表。这就是为什么你可以使用网状包装。还要注意的是,这两个看起来都像JavaScript对象ie(JSon),因此我们可以使用R中的任何JSon库,例如RJson、jsonlite等等。这对我有用。您是否可以建议使用其他软件包(如jsonlite)来解析这些类型的数据结构,或者提供了解这些软件包的链接?@smk。您首先必须识别数据的结构。Ie第一个看起来像python中的字典列表,而第二个看起来像列表列表列表。这就是为什么你可以使用网状包装。还要注意的是,两者看起来都像JavaScript对象ie(JSon),因此我们可以使用R eg、RJson、jsonlite等中的任何JSon库
df2_fin <- jsonlite::fromJSON(sprintf("[%s]",paste0(df2, collapse = ",")))
(df2_final <- setNames(data.frame(t(df2_fin)), paste0("group",1:4)))
group1 group2 group3 group4
1 373 319 165 253
2 357 347 154 274
3 382 284 161 240
4 411 313 152 258
5 310 300 164 264
6 315 292 152 231
7 330 228 156 296
8 385 322 150 233
9 367 291 137 230
10 396 275 170 252
11 402 278 147 210
12 348 289 210 233
13 354 323 235 233
:
: