编辑并过滤R中列表的JSON列表

编辑并过滤R中列表的JSON列表,r,json,dplyr,tidyr,jsonlite,R,Json,Dplyr,Tidyr,Jsonlite,我正在尝试显示此数据集-> 但是,我希望将数据展平,这样它就不会嵌套为列表中、列表中、列表中的一组JSON数据 更具体地说,我试图将数据显示为一个数据框,按照$releaseDate(变量之一)的顺序排列 以下是我迄今为止的尝试: library(jsonlite) library(tidyjson) mtgdata <- fromJSON("~/path/to/file.json") 在这些列表中的每一个都是我感兴趣的变量,我想通过分析这些变量来过滤和排序这些数据,就像它是一个扁平的数

我正在尝试显示此数据集->

但是,我希望将数据展平,这样它就不会嵌套为列表中、列表中、列表中的一组JSON数据

更具体地说,我试图将数据显示为一个数据框,按照
$releaseDate
(变量之一)的顺序排列

以下是我迄今为止的尝试:

library(jsonlite)
library(tidyjson)
mtgdata <- fromJSON("~/path/to/file.json")
在这些列表中的每一个都是我感兴趣的变量,我想通过分析这些变量来过滤和排序这些数据,就像它是一个扁平的数据帧一样

当我们检查其中一个列表中的变量列表时(以“mtgdata$UST”为例),我们得到了这组变量:

names(mtgdata$UST)
[1] "name"        "code"        "releaseDate" "border"      "type"        
"booster"     "mkm_name"   
[8] "mkm_id"      "cards"
在mtgdata(“mtgdata$SOI”)中的另一个列表上运行相同的查询,我们会得到另一组变量,尽管它们基本相同

正如我前面提到的,我主要感兴趣的是将这个数据集展平并按mtgdata$releaseDate进行排名——但就目前而言,“$releaseDate”当前嵌套在第一组列表中($UST”等)


非常感谢您的帮助,或者我如何更好地重新表述这个问题。

您可以在命令行上尝试类似的操作,将JSON对象数组转换为文件ndjson记录,然后使用类似于
ndjson::stream\u in(“您刚刚转换的东西的文件名\u”)
但最终会得到一个14000多列的、毫无用处的“扁平”数据框

相反,做一些洞穴探险:

library(tidyverse)

as1 <- jsonlite::read_json("~/Downloads/AllSets.json")

str(as1, 1) 
## List of 221
##  $ UST     :List of 9
##  $ UNH     :List of 10
##  $ UGL     :List of 11
##  $ pWOS    :List of 8
##  $ pWOR    :List of 8
##  $ pWCQ    :List of 8
##  $ pSUS    :List of 8
##  $ pSUM    :List of 10
##  $ pREL    :List of 8
##  $ pPRO    :List of 8
##  $ pPRE    :List of 8
##  $ pPOD    :List of 7
##  $ pMPR    :List of 8
##  $ pMGD    :List of 8
##  $ pMEI    :List of 8
##  $ pLPA    :List of 8
##  $ pLGM    :List of 8
##  $ pJGP    :List of 10
##  $ pHHO    :List of 11
## ...
您确实不想展平
booster
translations
cards
,应根据需要将它们保留为
列表
列和
unest

但是,由于每个记录都有不同的字段,我们不能简单地“data.table::rbindlist()
dplyr::bind_rows()`因为它会抱怨其中的一些列

我们必须逐个记录,并将每个记录转换为数据帧,处理缺少的字段,并将
列表
中的字段包装到
列表()
中。我们将使用帮助函数简化函数习惯用法,以测试缺少的值:

`%l0%` <- function(x, y) if (length(x) > 0) x else y
您可以看到结果:

all_sets
## # A tibble: 221 x 14
##    name           code  gathererCode magicCardsInfoC… oldCode releaseDate border type  block booster 
##    <chr>          <chr> <chr>        <chr>            <chr>   <chr>       <chr>  <chr> <chr> <list>  
##  1 Unstable       UST   NA           NA               NA      2017-12-08  silver un    NA    <list […
##  2 Unhinged       UNH   NA           uh               NA      2004-11-20  silver un    NA    <list […
##  3 Unglued        UGL   UG           ug               NA      1998-08-11  silver un    NA    <list […
##  4 Wizards of th… pWOS  NA           wotc             NA      1999-09-04  black  promo NA    <NULL>  
##  5 Worlds         pWOR  NA           wrl              NA      1999-08-04  black  promo NA    <NULL>  
##  6 World Magic C… pWCQ  NA           wmcq             NA      2013-04-06  black  promo NA    <NULL>  
##  7 Super Series   pSUS  NA           sus              NA      1999-12-01  black  promo NA    <NULL>  
##  8 Summer of Mag… pSUM  NA           sum              NA      2007-07-21  black  promo NA    <NULL>  
##  9 Release Events pREL  NA           rep              NA      2003-07-26  black  promo NA    <NULL>  
## 10 Pro Tour       pPRO  NA           pro              NA      2007-02-09  black  promo NA    <NULL>  
## # ... with 211 more rows, and 4 more variables: translations <list>, mkm_name <chr>, mkm_id <int>,
## #   cards <list>

glimpse(all_sets)
## Observations: 221
## Variables: 14
## $ name               <chr> "Unstable", "Unhinged", "Unglued", "Wizards of the Coast Online Store"...
## $ code               <chr> "UST", "UNH", "UGL", "pWOS", "pWOR", "pWCQ", "pSUS", "pSUM", "pREL", "...
## $ gathererCode       <chr> NA, NA, "UG", NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, ...
## $ magicCardsInfoCode <chr> NA, "uh", "ug", "wotc", "wrl", "wmcq", "sus", "sum", "rep", "pro", "pt...
## $ oldCode            <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## $ releaseDate        <chr> "2017-12-08", "2004-11-20", "1998-08-11", "1999-09-04", "1999-08-04", ...
## $ border             <chr> "silver", "silver", "silver", "black", "black", "black", "black", "bla...
## $ type               <chr> "un", "un", "un", "promo", "promo", "promo", "promo", "promo", "promo"...
## $ block              <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## $ booster            <list> [["rare", "uncommon", "uncommon", "uncommon", "common", "common", "co...
## $ translations       <list> [NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NU...
## $ mkm_name           <chr> "Unstable", "Unhinged", "Unglued", NA, NA, NA, NA, "Summer Magic", NA,...
## $ mkm_id             <int> 1821, 59, 22, NA, NA, NA, NA, 76, NA, NA, NA, NA, NA, NA, NA, NA, NA, ...
## $ cards              <list> [[["Andrea Radeck", 1, ["W"], ["White"], "95ebdf85f4ea74d584dfdfb72e3...

您可以在命令行上尝试将JSON对象数组转换为文件ndjson记录,然后在中使用类似于
ndjson::stream\u的内容(“您刚刚转换的东西的文件名”)
,但最终会得到一个14000多个列,非常无用的“扁平”数据帧

相反,做一些洞穴探险:

library(tidyverse)

as1 <- jsonlite::read_json("~/Downloads/AllSets.json")

str(as1, 1) 
## List of 221
##  $ UST     :List of 9
##  $ UNH     :List of 10
##  $ UGL     :List of 11
##  $ pWOS    :List of 8
##  $ pWOR    :List of 8
##  $ pWCQ    :List of 8
##  $ pSUS    :List of 8
##  $ pSUM    :List of 10
##  $ pREL    :List of 8
##  $ pPRO    :List of 8
##  $ pPRE    :List of 8
##  $ pPOD    :List of 7
##  $ pMPR    :List of 8
##  $ pMGD    :List of 8
##  $ pMEI    :List of 8
##  $ pLPA    :List of 8
##  $ pLGM    :List of 8
##  $ pJGP    :List of 10
##  $ pHHO    :List of 11
## ...
您确实不想展平
booster
translations
cards
,应根据需要将它们保留为
列表
列和
unest

但是,由于每个记录都有不同的字段,我们不能简单地“data.table::rbindlist()
dplyr::bind_rows()`因为它会抱怨其中的一些列

我们必须逐个记录,并将每个记录转换为一个数据帧,处理缺少的字段,并将
列表
中的字段包装到
列表()
中。我们将使用一个helper函数简化函数习惯用法,以测试缺少的值:

`%l0%` <- function(x, y) if (length(x) > 0) x else y
您可以看到结果:

all_sets
## # A tibble: 221 x 14
##    name           code  gathererCode magicCardsInfoC… oldCode releaseDate border type  block booster 
##    <chr>          <chr> <chr>        <chr>            <chr>   <chr>       <chr>  <chr> <chr> <list>  
##  1 Unstable       UST   NA           NA               NA      2017-12-08  silver un    NA    <list […
##  2 Unhinged       UNH   NA           uh               NA      2004-11-20  silver un    NA    <list […
##  3 Unglued        UGL   UG           ug               NA      1998-08-11  silver un    NA    <list […
##  4 Wizards of th… pWOS  NA           wotc             NA      1999-09-04  black  promo NA    <NULL>  
##  5 Worlds         pWOR  NA           wrl              NA      1999-08-04  black  promo NA    <NULL>  
##  6 World Magic C… pWCQ  NA           wmcq             NA      2013-04-06  black  promo NA    <NULL>  
##  7 Super Series   pSUS  NA           sus              NA      1999-12-01  black  promo NA    <NULL>  
##  8 Summer of Mag… pSUM  NA           sum              NA      2007-07-21  black  promo NA    <NULL>  
##  9 Release Events pREL  NA           rep              NA      2003-07-26  black  promo NA    <NULL>  
## 10 Pro Tour       pPRO  NA           pro              NA      2007-02-09  black  promo NA    <NULL>  
## # ... with 211 more rows, and 4 more variables: translations <list>, mkm_name <chr>, mkm_id <int>,
## #   cards <list>

glimpse(all_sets)
## Observations: 221
## Variables: 14
## $ name               <chr> "Unstable", "Unhinged", "Unglued", "Wizards of the Coast Online Store"...
## $ code               <chr> "UST", "UNH", "UGL", "pWOS", "pWOR", "pWCQ", "pSUS", "pSUM", "pREL", "...
## $ gathererCode       <chr> NA, NA, "UG", NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, ...
## $ magicCardsInfoCode <chr> NA, "uh", "ug", "wotc", "wrl", "wmcq", "sus", "sum", "rep", "pro", "pt...
## $ oldCode            <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## $ releaseDate        <chr> "2017-12-08", "2004-11-20", "1998-08-11", "1999-09-04", "1999-08-04", ...
## $ border             <chr> "silver", "silver", "silver", "black", "black", "black", "black", "bla...
## $ type               <chr> "un", "un", "un", "promo", "promo", "promo", "promo", "promo", "promo"...
## $ block              <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## $ booster            <list> [["rare", "uncommon", "uncommon", "uncommon", "common", "common", "co...
## $ translations       <list> [NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NU...
## $ mkm_name           <chr> "Unstable", "Unhinged", "Unglued", NA, NA, NA, NA, "Summer Magic", NA,...
## $ mkm_id             <int> 1821, 59, 22, NA, NA, NA, NA, 76, NA, NA, NA, NA, NA, NA, NA, NA, NA, ...
## $ cards              <list> [[["Andrea Radeck", 1, ["W"], ["White"], "95ebdf85f4ea74d584dfdfb72e3...

请按照这个制作一个可复制的,否则没有人能帮助。请按照这个制作一个可复制的,否则没有人能帮助。非常感谢您的回复-这是一个非常有用的大开眼界!但我想检查的一件事是,我主要考虑的是,我希望能够按$releaseDate对每个列表进行排名。在你的代码中,我可以看到一段;releaseDate=x.$name,因此发布日期被省略并替换为集合名称-我想这是一个简单的修复方法,但是我能按发布日期从上到下排列所有集合吗?再次感谢您在这方面的帮助!我的错。我复制粘贴的代码位太快了。让我来解决它。如果这起作用,勾选“已回答”复选框可以帮助其他人知道有一个已验证的工作答案。再次感谢你,非常感谢你在这方面的帮助。非常感谢你的回复-这是一个非常有帮助的大开眼界!但我想检查的一件事是,我主要考虑的是,我希望能够按$releaseDate对每个列表进行排名。在你的代码中,我可以看到一段;releaseDate=x.$name,因此发布日期被省略并替换为集合名称-我想这是一个简单的修复方法,但是我能按发布日期从上到下排列所有集合吗?再次感谢您在这方面的帮助!我的错。我复制粘贴的代码位太快了。让我来修复它。如果这有效,勾选“已回答”复选框可以帮助其他人知道有一个已验证的工作答案。再次感谢您,非常感谢您的帮助。
all_sets
## # A tibble: 221 x 14
##    name           code  gathererCode magicCardsInfoC… oldCode releaseDate border type  block booster 
##    <chr>          <chr> <chr>        <chr>            <chr>   <chr>       <chr>  <chr> <chr> <list>  
##  1 Unstable       UST   NA           NA               NA      2017-12-08  silver un    NA    <list […
##  2 Unhinged       UNH   NA           uh               NA      2004-11-20  silver un    NA    <list […
##  3 Unglued        UGL   UG           ug               NA      1998-08-11  silver un    NA    <list […
##  4 Wizards of th… pWOS  NA           wotc             NA      1999-09-04  black  promo NA    <NULL>  
##  5 Worlds         pWOR  NA           wrl              NA      1999-08-04  black  promo NA    <NULL>  
##  6 World Magic C… pWCQ  NA           wmcq             NA      2013-04-06  black  promo NA    <NULL>  
##  7 Super Series   pSUS  NA           sus              NA      1999-12-01  black  promo NA    <NULL>  
##  8 Summer of Mag… pSUM  NA           sum              NA      2007-07-21  black  promo NA    <NULL>  
##  9 Release Events pREL  NA           rep              NA      2003-07-26  black  promo NA    <NULL>  
## 10 Pro Tour       pPRO  NA           pro              NA      2007-02-09  black  promo NA    <NULL>  
## # ... with 211 more rows, and 4 more variables: translations <list>, mkm_name <chr>, mkm_id <int>,
## #   cards <list>

glimpse(all_sets)
## Observations: 221
## Variables: 14
## $ name               <chr> "Unstable", "Unhinged", "Unglued", "Wizards of the Coast Online Store"...
## $ code               <chr> "UST", "UNH", "UGL", "pWOS", "pWOR", "pWCQ", "pSUS", "pSUM", "pREL", "...
## $ gathererCode       <chr> NA, NA, "UG", NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, ...
## $ magicCardsInfoCode <chr> NA, "uh", "ug", "wotc", "wrl", "wmcq", "sus", "sum", "rep", "pro", "pt...
## $ oldCode            <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## $ releaseDate        <chr> "2017-12-08", "2004-11-20", "1998-08-11", "1999-09-04", "1999-08-04", ...
## $ border             <chr> "silver", "silver", "silver", "black", "black", "black", "black", "bla...
## $ type               <chr> "un", "un", "un", "promo", "promo", "promo", "promo", "promo", "promo"...
## $ block              <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## $ booster            <list> [["rare", "uncommon", "uncommon", "uncommon", "common", "common", "co...
## $ translations       <list> [NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NU...
## $ mkm_name           <chr> "Unstable", "Unhinged", "Unglued", NA, NA, NA, NA, "Summer Magic", NA,...
## $ mkm_id             <int> 1821, 59, 22, NA, NA, NA, NA, 76, NA, NA, NA, NA, NA, NA, NA, NA, NA, ...
## $ cards              <list> [[["Andrea Radeck", 1, ["W"], ["White"], "95ebdf85f4ea74d584dfdfb72e3...
mutate(all_sets, releaseDate = lubridate::ymd(releaseDate)) %>% 
  arrange(desc(releaseDate))
## # A tibble: 221 x 14
##    name        code  gathererCode magicCardsInfoCo… oldCode releaseDate border type     block booster
##    <chr>       <chr> <chr>        <chr>             <chr>   <date>      <chr>  <chr>    <chr> <list> 
##  1 Masters 25  A25   NA           a25               NA      2018-03-16  black  reprint  NA    <NULL> 
##  2 Rivals of … RIX   NA           rix               NA      2018-01-19  black  expansi… Ixal… <list …
##  3 Unstable    UST   NA           NA                NA      2017-12-08  silver un       NA    <list …
##  4 Explorers … E02   NA           e02               NA      2017-11-24  black  board g… NA    <NULL> 
##  5 From the V… V17   NA           v17               NA      2017-11-24  black  from th… NA    <NULL> 
##  6 Iconic Mas… IMA   NA           ima               NA      2017-11-17  black  reprint  NA    <list …
##  7 Duel Decks… DDT   NA           ddt               NA      2017-11-10  black  duel de… NA    <NULL> 
##  8 Ixalan      XLN   NA           xln               NA      2017-09-29  black  expansi… Ixal… <list …
##  9 Commander … C17   NA           NA                NA      2017-08-25  black  command… NA    <NULL> 
## 10 Hour of De… HOU   NA           hou               NA      2017-07-14  black  expansi… Amon… <list …
## # ... with 211 more rows, and 4 more variables: translations <list>, mkm_name <chr>, mkm_id <int>,
## #   cards <list>