将包含嵌入列表的JSON解析为平坦的data.frame，忽略不需要的键_Json_R_Jsonlite

将包含嵌入列表的JSON解析为平坦的data.frame，忽略不需要的键

json r

将包含嵌入列表的JSON解析为平坦的data.frame，忽略不需要的键,json,r,jsonlite,Json,R,Jsonlite,一位同事向我发送了一个Elasticsearch查询结果（100000条记录，数百个属性），如下所示： pets_json <- paste0('[{"animal":"cat","attributes":{"intelligence":"medium","noises":[{"noise":"meow","code":4},{"noise":"hiss","code":2}]}},', '{"animal":"dog","attributes":

一位同事向我发送了一个Elasticsearch查询结果（100000条记录，数百个属性），如下所示：

pets_json <- paste0('[{"animal":"cat","attributes":{"intelligence":"medium","noises":[{"noise":"meow","code":4},{"noise":"hiss","code":2}]}},',
                     '{"animal":"dog","attributes":{"intelligence":"high","noises":{"noise":"bark","code":1}}},',
                     '{"animal":"snake","attributes":{"intelligence":"low","noises":{"noise":"hiss","code":2}}}]')

我可以读入json，但是

flatte=TRUE

不能完全展平：

library(jsonlite)
str(df <- fromJSON(txt=pets_json, flatten=TRUE))
# 'data.frame': 3 obs. of  3 variables:
#   $ animal                 : chr  "cat" "dog" "snake"
# $ attributes.intelligence: chr  "medium" "high" "low"
# $ attributes.noises      :List of 3
# ..$ :'data.frame':    2 obs. of  2 variables: \
#   .. ..$ noise   : chr  "meow" "hiss"         \
# .. ..$ code: int  4 2                          |
# ..$ :List of 2                                 |
# .. ..$ noise   : chr "bark"                    |- need to remove code and flatten    
# .. ..$ code: int 1                             |
# ..$ :List of 2                                 |
# .. ..$ noise   : chr "hiss"                   /
# .. ..$ code: int 2                           /

然后在那之后。。。？我知道，使用

tidyr:：separate

我可能会想出一种简陋的方法来

将噪声值分散到列中并设置标志。但是一次只对一个属性有效，我可能有数百个这样的属性。我事先不知道所有可能的属性值
如何有效地生成所需的data.frame？谢谢你的时间
 我不认为有一种超简单的方法可以将它转换成正确的格式，但这里有一个尝试：
out <- fromJSON(pets_json)

# drop the "code" data and do some initial cleaning
out$noises <- lapply(
  out$attributes$noises, 
  function(x) unlist(x[-match("code",names(x))]) 
)

# extract the key part of the intelligence variable
out$intelligence <- out$attributes$intelligence

# set up a vector of all possible noises
unq_noises <- unique(unlist(out$noises)) 

# make the new separate noise variables
out[unq_noises] <- t(vapply(
  out$noises, 
  function(x) unq_noises %in% x,
  FUN.VALUE=logical(length(out$noises)))
)

# clean up no longer needed variables
out[c("attributes","noises")] <- list(NULL)

out

#  animal intelligence  meow  hiss  bark
#1    cat       medium  TRUE  TRUE FALSE
#2    dog         high FALSE FALSE  TRUE
#3  snake          low FALSE  TRUE FALSE

out带有magrittr和data.table的基本情况
下面是另一个结合magrittr
和数据的提案。表
提供了额外的时代精神布朗尼点数：
# Do not simplify to data.frame
str(df <- fromJSON(txt=pets_json, simplifyDataFrame=F))

# The %<>% operator create a pipe and assigns back to the variable
df %<>% 
  lapply(. %>%
    data.table(animal = .$animal, 
               intelligence = .$attributes$intelligence, 
               noises = unlist(.$attributes$noises)) %>% # Create a data.table
    .[!noises %in% as.character(0:9)] ) %>% # Remove numeric values
  rbindlist %>% # Combine into a single data.table
  dcast(animal + intelligence ~ paste0("noises.", noises), # Cast the noises variables
        value.var = "noises", 
        fill = 0, # Put 0 instead of NA
        fun.aggregate = function(x) 1) # Put 1 instead of noise

对于多属性
现在，您似乎想要对多个属性进行泛化。假设您的数据也有一个colors
属性，例如：
pets_json <- paste0('[{"animal":"cat","attributes":{"intelligence":"medium","noises":[{"noise":"meow","code":4},{"noise":"hiss","code":2}],"colors":[{"color":"brown","code":4},{"color":"white","code":2}]}},',
                    '{"animal":"dog","attributes":{"intelligence":"high","noises":{"noise":"bark","code":1},"colors":{"color":"brown","code":4}}},',
                    '{"animal":"snake","attributes":{"intelligence":"low","noises":{"noise":"hiss","code":2},"colors":[{"color":"green","code":4},{"color":"brown","code":4}]}}]')

pets\u json非常感谢！让我看看是否可以将其推广到大量属性（噪波就是其中之一）。如果成功的话，我会接受的——但我可能要到周一才能开始。这真是太棒了，你显然是这个标签上的一颗冉冉升起的明星。我注意到，？dcast
现在可以转换多个value.var列
。我正在研究这个问题，看看是否有办法将变量名保留为noises.bark
，color.brown
，等等@C8H10N4O2我修复了第二部分，我忽略了这一点，对不起。至于使用多个dcast属性，这是我的第一个想法，但我不认为它适用于此类问题。你可以自己试试：dcast（df，as.formula（paste0（“动物+智力~paste0（attr.names，”，“，attr.names，”））），value.var=attr.names，fill=0，fun.aggregate=function（x）1）
给出了非常奇怪的结果。这对你帮助很大。谢谢。@C8H10N4O2刚刚想到了一个更好的解决方案，再次使用melt
和dcast。多优雅啊！
# Do not simplify to data.frame
str(df <- fromJSON(txt=pets_json, simplifyDataFrame=F))

# The %<>% operator create a pipe and assigns back to the variable
df %<>% 
  lapply(. %>%
    data.table(animal = .$animal, 
               intelligence = .$attributes$intelligence, 
               noises = unlist(.$attributes$noises)) %>% # Create a data.table
    .[!noises %in% as.character(0:9)] ) %>% # Remove numeric values
  rbindlist %>% # Combine into a single data.table
  dcast(animal + intelligence ~ paste0("noises.", noises), # Cast the noises variables
        value.var = "noises", 
        fill = 0, # Put 0 instead of NA
        fun.aggregate = function(x) 1) # Put 1 instead of noise

df
#    animal intelligence noises.bark noises.hiss noises.meow
# 1:    cat       medium           0           1           1
# 2:    dog         high           1           0           0
# 3:  snake          low           0           1           0

pets_json <- paste0('[{"animal":"cat","attributes":{"intelligence":"medium","noises":[{"noise":"meow","code":4},{"noise":"hiss","code":2}],"colors":[{"color":"brown","code":4},{"color":"white","code":2}]}},',
                    '{"animal":"dog","attributes":{"intelligence":"high","noises":{"noise":"bark","code":1},"colors":{"color":"brown","code":4}}},',
                    '{"animal":"snake","attributes":{"intelligence":"low","noises":{"noise":"hiss","code":2},"colors":[{"color":"green","code":4},{"color":"brown","code":4}]}}]')

# Do not simplify to data.frame
str(df <- fromJSON(txt=pets_json, simplifyDataFrame=F))

# Set up the attributes names
attr.names <- c("noises", "colors")

# The %<>% operator create a pipe and assigns back to the variable
df %<>% 
  lapply(function(.)
    eval(parse(text=paste0(
      "data.table(animal = .$animal, ",
      "intelligence = .$attributes$intelligence, ", 
      paste0(attr.names, " = unlist(.$attributes$", attr.names, ")", collapse=", "), 
      ")")))
    %>%
      .[eval(parse(text=paste("!", attr.names, "%in% as.character(0:9)", collapse = " & ")))] ) %>%
  rbindlist 

# Cast each variable and merge together
df <- dcast(melt(df, measure.vars=c(attr.names)), 
        animal + intelligence ~ variable + value, sep=".")

#    animal intelligence noises.bark noises.hiss noises.meow colors.brown
# 1:    cat       medium           0           1           1            1
# 2:    dog         high           1           0           0            1
# 3:  snake          low           0           1           0            1
#    colors.green colors.white
# 1:            0            1
# 2:            0            0
# 3:            1            0