编辑并过滤R中列表的JSON列表
我正在尝试显示此数据集-> 但是,我希望将数据展平,这样它就不会嵌套为列表中、列表中、列表中的一组JSON数据 更具体地说,我试图将数据显示为一个数据框,按照编辑并过滤R中列表的JSON列表,r,json,dplyr,tidyr,jsonlite,R,Json,Dplyr,Tidyr,Jsonlite,我正在尝试显示此数据集-> 但是,我希望将数据展平,这样它就不会嵌套为列表中、列表中、列表中的一组JSON数据 更具体地说,我试图将数据显示为一个数据框,按照$releaseDate(变量之一)的顺序排列 以下是我迄今为止的尝试: library(jsonlite) library(tidyjson) mtgdata <- fromJSON("~/path/to/file.json") 在这些列表中的每一个都是我感兴趣的变量,我想通过分析这些变量来过滤和排序这些数据,就像它是一个扁平的数
$releaseDate
(变量之一)的顺序排列
以下是我迄今为止的尝试:
library(jsonlite)
library(tidyjson)
mtgdata <- fromJSON("~/path/to/file.json")
在这些列表中的每一个都是我感兴趣的变量,我想通过分析这些变量来过滤和排序这些数据,就像它是一个扁平的数据帧一样
当我们检查其中一个列表中的变量列表时(以“mtgdata$UST”为例),我们得到了这组变量:
names(mtgdata$UST)
[1] "name" "code" "releaseDate" "border" "type"
"booster" "mkm_name"
[8] "mkm_id" "cards"
在mtgdata(“mtgdata$SOI”)中的另一个列表上运行相同的查询,我们会得到另一组变量,尽管它们基本相同
正如我前面提到的,我主要感兴趣的是将这个数据集展平并按mtgdata$releaseDate进行排名——但就目前而言,“$releaseDate”当前嵌套在第一组列表中($UST”等)
非常感谢您的帮助,或者我如何更好地重新表述这个问题。您可以在命令行上尝试类似的操作,将JSON对象数组转换为文件ndjson记录,然后使用类似于
ndjson::stream\u in(“您刚刚转换的东西的文件名\u”)
但最终会得到一个14000多列的、毫无用处的“扁平”数据框
相反,做一些洞穴探险:
library(tidyverse)
as1 <- jsonlite::read_json("~/Downloads/AllSets.json")
str(as1, 1)
## List of 221
## $ UST :List of 9
## $ UNH :List of 10
## $ UGL :List of 11
## $ pWOS :List of 8
## $ pWOR :List of 8
## $ pWCQ :List of 8
## $ pSUS :List of 8
## $ pSUM :List of 10
## $ pREL :List of 8
## $ pPRO :List of 8
## $ pPRE :List of 8
## $ pPOD :List of 7
## $ pMPR :List of 8
## $ pMGD :List of 8
## $ pMEI :List of 8
## $ pLPA :List of 8
## $ pLGM :List of 8
## $ pJGP :List of 10
## $ pHHO :List of 11
## ...
您确实不想展平booster
、translations
或cards
,应根据需要将它们保留为列表
列和unest
但是,由于每个记录都有不同的字段,我们不能简单地“data.table::rbindlist()或
dplyr::bind_rows()`因为它会抱怨其中的一些列
我们必须逐个记录,并将每个记录转换为数据帧,处理缺少的字段,并将列表
中的字段包装到列表()
中。我们将使用帮助函数简化函数习惯用法,以测试缺少的值:
`%l0%` <- function(x, y) if (length(x) > 0) x else y
您可以看到结果:
all_sets
## # A tibble: 221 x 14
## name code gathererCode magicCardsInfoC… oldCode releaseDate border type block booster
## <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <list>
## 1 Unstable UST NA NA NA 2017-12-08 silver un NA <list […
## 2 Unhinged UNH NA uh NA 2004-11-20 silver un NA <list […
## 3 Unglued UGL UG ug NA 1998-08-11 silver un NA <list […
## 4 Wizards of th… pWOS NA wotc NA 1999-09-04 black promo NA <NULL>
## 5 Worlds pWOR NA wrl NA 1999-08-04 black promo NA <NULL>
## 6 World Magic C… pWCQ NA wmcq NA 2013-04-06 black promo NA <NULL>
## 7 Super Series pSUS NA sus NA 1999-12-01 black promo NA <NULL>
## 8 Summer of Mag… pSUM NA sum NA 2007-07-21 black promo NA <NULL>
## 9 Release Events pREL NA rep NA 2003-07-26 black promo NA <NULL>
## 10 Pro Tour pPRO NA pro NA 2007-02-09 black promo NA <NULL>
## # ... with 211 more rows, and 4 more variables: translations <list>, mkm_name <chr>, mkm_id <int>,
## # cards <list>
glimpse(all_sets)
## Observations: 221
## Variables: 14
## $ name <chr> "Unstable", "Unhinged", "Unglued", "Wizards of the Coast Online Store"...
## $ code <chr> "UST", "UNH", "UGL", "pWOS", "pWOR", "pWCQ", "pSUS", "pSUM", "pREL", "...
## $ gathererCode <chr> NA, NA, "UG", NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, ...
## $ magicCardsInfoCode <chr> NA, "uh", "ug", "wotc", "wrl", "wmcq", "sus", "sum", "rep", "pro", "pt...
## $ oldCode <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## $ releaseDate <chr> "2017-12-08", "2004-11-20", "1998-08-11", "1999-09-04", "1999-08-04", ...
## $ border <chr> "silver", "silver", "silver", "black", "black", "black", "black", "bla...
## $ type <chr> "un", "un", "un", "promo", "promo", "promo", "promo", "promo", "promo"...
## $ block <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## $ booster <list> [["rare", "uncommon", "uncommon", "uncommon", "common", "common", "co...
## $ translations <list> [NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NU...
## $ mkm_name <chr> "Unstable", "Unhinged", "Unglued", NA, NA, NA, NA, "Summer Magic", NA,...
## $ mkm_id <int> 1821, 59, 22, NA, NA, NA, NA, 76, NA, NA, NA, NA, NA, NA, NA, NA, NA, ...
## $ cards <list> [[["Andrea Radeck", 1, ["W"], ["White"], "95ebdf85f4ea74d584dfdfb72e3...
您可以在命令行上尝试将JSON对象数组转换为文件ndjson记录,然后在中使用类似于
ndjson::stream\u的内容(“您刚刚转换的东西的文件名”)
,但最终会得到一个14000多个列,非常无用的“扁平”数据帧
相反,做一些洞穴探险:
library(tidyverse)
as1 <- jsonlite::read_json("~/Downloads/AllSets.json")
str(as1, 1)
## List of 221
## $ UST :List of 9
## $ UNH :List of 10
## $ UGL :List of 11
## $ pWOS :List of 8
## $ pWOR :List of 8
## $ pWCQ :List of 8
## $ pSUS :List of 8
## $ pSUM :List of 10
## $ pREL :List of 8
## $ pPRO :List of 8
## $ pPRE :List of 8
## $ pPOD :List of 7
## $ pMPR :List of 8
## $ pMGD :List of 8
## $ pMEI :List of 8
## $ pLPA :List of 8
## $ pLGM :List of 8
## $ pJGP :List of 10
## $ pHHO :List of 11
## ...
您确实不想展平booster
、translations
或cards
,应根据需要将它们保留为列表
列和unest
但是,由于每个记录都有不同的字段,我们不能简单地“data.table::rbindlist()或
dplyr::bind_rows()`因为它会抱怨其中的一些列
我们必须逐个记录,并将每个记录转换为一个数据帧,处理缺少的字段,并将列表
中的字段包装到列表()
中。我们将使用一个helper函数简化函数习惯用法,以测试缺少的值:
`%l0%` <- function(x, y) if (length(x) > 0) x else y
您可以看到结果:
all_sets
## # A tibble: 221 x 14
## name code gathererCode magicCardsInfoC… oldCode releaseDate border type block booster
## <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <list>
## 1 Unstable UST NA NA NA 2017-12-08 silver un NA <list […
## 2 Unhinged UNH NA uh NA 2004-11-20 silver un NA <list […
## 3 Unglued UGL UG ug NA 1998-08-11 silver un NA <list […
## 4 Wizards of th… pWOS NA wotc NA 1999-09-04 black promo NA <NULL>
## 5 Worlds pWOR NA wrl NA 1999-08-04 black promo NA <NULL>
## 6 World Magic C… pWCQ NA wmcq NA 2013-04-06 black promo NA <NULL>
## 7 Super Series pSUS NA sus NA 1999-12-01 black promo NA <NULL>
## 8 Summer of Mag… pSUM NA sum NA 2007-07-21 black promo NA <NULL>
## 9 Release Events pREL NA rep NA 2003-07-26 black promo NA <NULL>
## 10 Pro Tour pPRO NA pro NA 2007-02-09 black promo NA <NULL>
## # ... with 211 more rows, and 4 more variables: translations <list>, mkm_name <chr>, mkm_id <int>,
## # cards <list>
glimpse(all_sets)
## Observations: 221
## Variables: 14
## $ name <chr> "Unstable", "Unhinged", "Unglued", "Wizards of the Coast Online Store"...
## $ code <chr> "UST", "UNH", "UGL", "pWOS", "pWOR", "pWCQ", "pSUS", "pSUM", "pREL", "...
## $ gathererCode <chr> NA, NA, "UG", NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, ...
## $ magicCardsInfoCode <chr> NA, "uh", "ug", "wotc", "wrl", "wmcq", "sus", "sum", "rep", "pro", "pt...
## $ oldCode <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## $ releaseDate <chr> "2017-12-08", "2004-11-20", "1998-08-11", "1999-09-04", "1999-08-04", ...
## $ border <chr> "silver", "silver", "silver", "black", "black", "black", "black", "bla...
## $ type <chr> "un", "un", "un", "promo", "promo", "promo", "promo", "promo", "promo"...
## $ block <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## $ booster <list> [["rare", "uncommon", "uncommon", "uncommon", "common", "common", "co...
## $ translations <list> [NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NU...
## $ mkm_name <chr> "Unstable", "Unhinged", "Unglued", NA, NA, NA, NA, "Summer Magic", NA,...
## $ mkm_id <int> 1821, 59, 22, NA, NA, NA, NA, 76, NA, NA, NA, NA, NA, NA, NA, NA, NA, ...
## $ cards <list> [[["Andrea Radeck", 1, ["W"], ["White"], "95ebdf85f4ea74d584dfdfb72e3...
请按照这个制作一个可复制的,否则没有人能帮助。请按照这个制作一个可复制的,否则没有人能帮助。非常感谢您的回复-这是一个非常有用的大开眼界!但我想检查的一件事是,我主要考虑的是,我希望能够按$releaseDate对每个列表进行排名。在你的代码中,我可以看到一段;releaseDate=x.$name,因此发布日期被省略并替换为集合名称-我想这是一个简单的修复方法,但是我能按发布日期从上到下排列所有集合吗?再次感谢您在这方面的帮助!我的错。我复制粘贴的代码位太快了。让我来解决它。如果这起作用,勾选“已回答”复选框可以帮助其他人知道有一个已验证的工作答案。再次感谢你,非常感谢你在这方面的帮助。非常感谢你的回复-这是一个非常有帮助的大开眼界!但我想检查的一件事是,我主要考虑的是,我希望能够按$releaseDate对每个列表进行排名。在你的代码中,我可以看到一段;releaseDate=x.$name,因此发布日期被省略并替换为集合名称-我想这是一个简单的修复方法,但是我能按发布日期从上到下排列所有集合吗?再次感谢您在这方面的帮助!我的错。我复制粘贴的代码位太快了。让我来修复它。如果这有效,勾选“已回答”复选框可以帮助其他人知道有一个已验证的工作答案。再次感谢您,非常感谢您的帮助。
all_sets
## # A tibble: 221 x 14
## name code gathererCode magicCardsInfoC… oldCode releaseDate border type block booster
## <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <list>
## 1 Unstable UST NA NA NA 2017-12-08 silver un NA <list […
## 2 Unhinged UNH NA uh NA 2004-11-20 silver un NA <list […
## 3 Unglued UGL UG ug NA 1998-08-11 silver un NA <list […
## 4 Wizards of th… pWOS NA wotc NA 1999-09-04 black promo NA <NULL>
## 5 Worlds pWOR NA wrl NA 1999-08-04 black promo NA <NULL>
## 6 World Magic C… pWCQ NA wmcq NA 2013-04-06 black promo NA <NULL>
## 7 Super Series pSUS NA sus NA 1999-12-01 black promo NA <NULL>
## 8 Summer of Mag… pSUM NA sum NA 2007-07-21 black promo NA <NULL>
## 9 Release Events pREL NA rep NA 2003-07-26 black promo NA <NULL>
## 10 Pro Tour pPRO NA pro NA 2007-02-09 black promo NA <NULL>
## # ... with 211 more rows, and 4 more variables: translations <list>, mkm_name <chr>, mkm_id <int>,
## # cards <list>
glimpse(all_sets)
## Observations: 221
## Variables: 14
## $ name <chr> "Unstable", "Unhinged", "Unglued", "Wizards of the Coast Online Store"...
## $ code <chr> "UST", "UNH", "UGL", "pWOS", "pWOR", "pWCQ", "pSUS", "pSUM", "pREL", "...
## $ gathererCode <chr> NA, NA, "UG", NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, ...
## $ magicCardsInfoCode <chr> NA, "uh", "ug", "wotc", "wrl", "wmcq", "sus", "sum", "rep", "pro", "pt...
## $ oldCode <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## $ releaseDate <chr> "2017-12-08", "2004-11-20", "1998-08-11", "1999-09-04", "1999-08-04", ...
## $ border <chr> "silver", "silver", "silver", "black", "black", "black", "black", "bla...
## $ type <chr> "un", "un", "un", "promo", "promo", "promo", "promo", "promo", "promo"...
## $ block <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## $ booster <list> [["rare", "uncommon", "uncommon", "uncommon", "common", "common", "co...
## $ translations <list> [NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NU...
## $ mkm_name <chr> "Unstable", "Unhinged", "Unglued", NA, NA, NA, NA, "Summer Magic", NA,...
## $ mkm_id <int> 1821, 59, 22, NA, NA, NA, NA, 76, NA, NA, NA, NA, NA, NA, NA, NA, NA, ...
## $ cards <list> [[["Andrea Radeck", 1, ["W"], ["White"], "95ebdf85f4ea74d584dfdfb72e3...
mutate(all_sets, releaseDate = lubridate::ymd(releaseDate)) %>%
arrange(desc(releaseDate))
## # A tibble: 221 x 14
## name code gathererCode magicCardsInfoCo… oldCode releaseDate border type block booster
## <chr> <chr> <chr> <chr> <chr> <date> <chr> <chr> <chr> <list>
## 1 Masters 25 A25 NA a25 NA 2018-03-16 black reprint NA <NULL>
## 2 Rivals of … RIX NA rix NA 2018-01-19 black expansi… Ixal… <list …
## 3 Unstable UST NA NA NA 2017-12-08 silver un NA <list …
## 4 Explorers … E02 NA e02 NA 2017-11-24 black board g… NA <NULL>
## 5 From the V… V17 NA v17 NA 2017-11-24 black from th… NA <NULL>
## 6 Iconic Mas… IMA NA ima NA 2017-11-17 black reprint NA <list …
## 7 Duel Decks… DDT NA ddt NA 2017-11-10 black duel de… NA <NULL>
## 8 Ixalan XLN NA xln NA 2017-09-29 black expansi… Ixal… <list …
## 9 Commander … C17 NA NA NA 2017-08-25 black command… NA <NULL>
## 10 Hour of De… HOU NA hou NA 2017-07-14 black expansi… Amon… <list …
## # ... with 211 more rows, and 4 more variables: translations <list>, mkm_name <chr>, mkm_id <int>,
## # cards <list>