带有JSON字符串的R数据帧列-需要创建JSON对象列

带有JSON字符串的R数据帧列-需要创建JSON对象列,r,json,dataframe,R,Json,Dataframe,因此,数据帧有一个列(类别),即JSON。这是一个样本 {"id":254,"name":"Performances","slug":"dance/performances","position":1,"parent_id":6,"parent_name":"Dance","color":10917369,"urls":{"web":{"discover":"http://www.kickstarter.com/discover/categories/dance/performances"}}}

因此,数据帧有一个列(类别),即JSON。这是一个样本

{"id":254,"name":"Performances","slug":"dance/performances","position":1,"parent_id":6,"parent_name":"Dance","color":10917369,"urls":{"web":{"discover":"http://www.kickstarter.com/discover/categories/dance/performances"}}}
我很难将一些json对象转换成数据帧的特性

示例:我希望数据框['parent_cat']包含来自数据框['category']的“parent_name”的JSON值

下面是我对apply的尝试,但正如您所看到的,其中一条记录返回一个列表

json <- function(r){
  return(data.frame(jsonlite::fromJSON(txt=r['category']),stringsAsFactors=F)$name)
}

json2 <- function(df){
  data.frame(jsonlite::fromJSON(df$category),stringsAsFactors=F)$parent_name
}

df$child_cat <- apply(df, 1,json)

df$parent_cat <- apply(df,1,json2)

head(df[c("child_cat","parent_cat","category")])

child_cat
<chr>
                   parent_cat
                     <list>
1   Performances    <chr [1]>   
2   Hardware    <chr [1]>   
3   Software    <chr [1]>   
4   Anthologies <chr [1]>   
5   Experimental    <chr [1]>   
6   Software    <chr [1]>   


json我能够让mutate操作与rowwise()函数一起工作。我在阅读JSON时也遇到了问题。有时JSON是不完整的,它会导致一些列返回错误的行数,从而破坏mutate函数。所以我把它扔到一个trycatch块中,并在出现错误或警告时返回“Undefined”。我还确保JSON对象存在,并且在返回之前确实是一个字符串,或者返回未定义。这段代码对于希望从data.frame列读取JSON字符数据的任何人都很有用

json <- function(r, str){
    df <- tryCatch({
            data.frame(fromJSON(r),stringsAsFactors=F)
          }, warning = function(war) {
            return("undefined")
          }, error = function(err) {
            return("undefined")
          }, finally = {
     })
    if(!is.data.frame(df) || !str %in% colnames(df) || !is.character(df[[str]])){
        return("undefined")
    }
    return(df[[str]])
}

df1 <- df1 %>%
       rowwise() %>%
       mutate(cat_parent = json(category,"parent_name"),
              cat_child = json(category,"name"),
              city = json(location,"name"))

json我能够让mutate操作与rowwise()函数一起工作。我在阅读JSON时也遇到了问题。有时JSON是不完整的,它会导致一些列返回错误的行数,从而破坏mutate函数。所以我把它扔到一个trycatch块中,并在出现错误或警告时返回“Undefined”。我还确保JSON对象存在,并且在返回之前确实是一个字符串,或者返回未定义。这段代码对于希望从data.frame列读取JSON字符数据的任何人都很有用

json <- function(r, str){
    df <- tryCatch({
            data.frame(fromJSON(r),stringsAsFactors=F)
          }, warning = function(war) {
            return("undefined")
          }, error = function(err) {
            return("undefined")
          }, finally = {
     })
    if(!is.data.frame(df) || !str %in% colnames(df) || !is.character(df[[str]])){
        return("undefined")
    }
    return(df[[str]])
}

df1 <- df1 %>%
       rowwise() %>%
       mutate(cat_parent = json(category,"parent_name"),
              cat_child = json(category,"name"),
              city = json(location,"name"))

jsontidyjson
包可能就是您想要的:

library(dplyr)
library(tidyjson)

df <- tibble(category = '{"id":254,"name":"Performances","slug":"dance/performances","position":1,"parent_id":6,"parent_name":"Dance","color":10917369,"urls":{"web":{"discover":"http://www.kickstarter.com/discover/categories/dance/performances"}}}')

df <- df %>% 
  mutate(cat_parent = category %>% 
           spread_all() %>%
           pull(parent_name),
         cat_child = category %>% 
           spread_all() %>%
           pull(name))
库(dplyr)
库(tidyjson)
df%
全部展开()%>%
pull(父项名称),
类别儿童=类别%>%
全部展开()%>%
(姓名)
这可能有助于检查列中的字符是否不是有效的json输入

library(dplyr)
library(tidyjson)
library(jsonlite)

df <- tibble(category = c('{"id":254,"name":"Performances","slug":"dance/performances","position":1,"parent_id":6,"parent_name":"Dance","color":10917369,"urls":{"web":{"discover":"http://www.kickstarter.com/discover/categories/dance/performances"}}}', 
                          'not_json'))

df <- df %>% 
  rowwise() %>%
  mutate(check_json = validate(category),
         cat_parent = ifelse(check_json, 
                              category %>% 
                                spread_all() %>%
                                pull(parent_name), 
                             NA))
库(dplyr)
库(tidyjson)
图书馆(jsonlite)
df%
变异(检查_json=validate(类别),
cat_parent=ifelse(检查,
类别%>%
全部展开()%>%
pull(父项名称),
NA))

您可能正在寻找的是
tidyjson
包:

library(dplyr)
library(tidyjson)

df <- tibble(category = '{"id":254,"name":"Performances","slug":"dance/performances","position":1,"parent_id":6,"parent_name":"Dance","color":10917369,"urls":{"web":{"discover":"http://www.kickstarter.com/discover/categories/dance/performances"}}}')

df <- df %>% 
  mutate(cat_parent = category %>% 
           spread_all() %>%
           pull(parent_name),
         cat_child = category %>% 
           spread_all() %>%
           pull(name))
库(dplyr)
库(tidyjson)
df%
全部展开()%>%
pull(父项名称),
类别儿童=类别%>%
全部展开()%>%
(姓名)
这可能有助于检查列中的字符是否不是有效的json输入

library(dplyr)
library(tidyjson)
library(jsonlite)

df <- tibble(category = c('{"id":254,"name":"Performances","slug":"dance/performances","position":1,"parent_id":6,"parent_name":"Dance","color":10917369,"urls":{"web":{"discover":"http://www.kickstarter.com/discover/categories/dance/performances"}}}', 
                          'not_json'))

df <- df %>% 
  rowwise() %>%
  mutate(check_json = validate(category),
         cat_parent = ifelse(check_json, 
                              category %>% 
                                spread_all() %>%
                                pull(parent_name), 
                             NA))
库(dplyr)
库(tidyjson)
图书馆(jsonlite)
df%
变异(检查_json=validate(类别),
cat_parent=ifelse(检查,
类别%>%
全部展开()%>%
pull(父项名称),
NA))

您可以使用
dput
添加数据,以便我们可以复制与此处相同的内容吗?您可以使用
dput
添加数据,以便我们可以复制与此处相同的内容吗?嗨,克里斯,对于一列,每行包含一个JSON字符串。但有些列不是JSON。这就是我在提供的答案中使用rowwise()的原因。Tidyjson看起来也不错,我确信如果使用rowwise()它会起作用。但有些列不是JSON。这就是我在提供的答案中使用rowwise()的原因。Tidyjson看起来也不错,我确信如果使用rowwise()它会起作用。