在R中读取凌乱的json
我有一个具有以下结构的csv文件: 输入在R中读取凌乱的json,r,json,R,Json,我有一个具有以下结构的csv文件: 输入 {"eid":"START","ver":"3.0","ets":1514764800238}} {"eid":"INTERACT","ver":"3.0","ets":1514764820546}} {"eid":"IMPRESSION","ver":"3.0","ets":895732}} {"eid":"IMPRESSION","ver":"3.0","ets":245636}} {"eid":"INTERACT","ver":"3.0","ets
{"eid":"START","ver":"3.0","ets":1514764800238}}
{"eid":"INTERACT","ver":"3.0","ets":1514764820546}}
{"eid":"IMPRESSION","ver":"3.0","ets":895732}}
{"eid":"IMPRESSION","ver":"3.0","ets":245636}}
{"eid":"INTERACT","ver":"3.0","ets":535235423525}}
[{"eid":"START","ver":"3.0","ets":1514764800238},
{"eid":"INTERACT","ver":"3.0","ets":1514764820546},
{"eid":"IMPRESSION","ver":"3.0","ets":895732},
{"eid":"IMPRESSION","ver":"3.0","ets":245636},
{"eid":"INTERACT","ver":"3.0","ets":535235423525}]
如您所见,它不是一个有效的json,要使上述内容成为有效的json,结构应如下所示:
预期产出
{"eid":"START","ver":"3.0","ets":1514764800238}}
{"eid":"INTERACT","ver":"3.0","ets":1514764820546}}
{"eid":"IMPRESSION","ver":"3.0","ets":895732}}
{"eid":"IMPRESSION","ver":"3.0","ets":245636}}
{"eid":"INTERACT","ver":"3.0","ets":535235423525}}
[{"eid":"START","ver":"3.0","ets":1514764800238},
{"eid":"INTERACT","ver":"3.0","ets":1514764820546},
{"eid":"IMPRESSION","ver":"3.0","ets":895732},
{"eid":"IMPRESSION","ver":"3.0","ets":245636},
{"eid":"INTERACT","ver":"3.0","ets":535235423525}]
问题:
理想情况下,我希望读取文件并修复它,然后另存为JSON,
就是
提前感谢对于可复制的工作流来说,手动查找/替换是一个糟糕、糟糕、糟糕的建议 一个选项-假设每行末尾确实有一个
}
,并且文件位于/tmp/badlines
:
library(magrittr)
library(ndjson)
readLines("/tmp/badlines") %>%
sub("\\}$", "", .) %>%
ndjson::flatten(cls = "tbl")
## # A tibble: 5 x 3
## eid ets ver
## <chr> <dbl> <chr>
## 1 START 1.51e12 3.0
## 2 INTERACT 1.51e12 3.0
## 3 IMPRESSION 8.96e 5 3.0
## 4 IMPRESSION 2.46e 5 3.0
## 5 INTERACT 5.35e11 3.0
库(magrittr)
库(ndjson)
读线(“/tmp/badlines”)%>%
子(“\\}$”,“,)%>%
ndjson::展平(cls=“tbl”)
###tibble:5 x 3
##开斋节
##
##1启动1.51e12 3.0
##2.1.51e12 3.0
##3印象8.96e 5 3.0
##4印象2.46e 5 3.0
##5.5.35e11 3.0
对于可复制的工作流来说,手动查找/替换是一个糟糕、糟糕、糟糕的建议
一个选项-假设每行末尾确实有一个}
,并且文件位于/tmp/badlines
:
library(magrittr)
library(ndjson)
readLines("/tmp/badlines") %>%
sub("\\}$", "", .) %>%
ndjson::flatten(cls = "tbl")
## # A tibble: 5 x 3
## eid ets ver
## <chr> <dbl> <chr>
## 1 START 1.51e12 3.0
## 2 INTERACT 1.51e12 3.0
## 3 IMPRESSION 8.96e 5 3.0
## 4 IMPRESSION 2.46e 5 3.0
## 5 INTERACT 5.35e11 3.0
库(magrittr)
库(ndjson)
读线(“/tmp/badlines”)%>%
子(“\\}$”,“,)%>%
ndjson::展平(cls=“tbl”)
###tibble:5 x 3
##开斋节
##
##1启动1.51e12 3.0
##2.1.51e12 3.0
##3印象8.96e 5 3.0
##4印象2.46e 5 3.0
##5.5.35e11 3.0
注意,这个问题几乎是重复的
除了在JSON中读取并运行fromJSON
(jsonlite包)之外,一行基本代码可以将其转换为有效的JSON(在变量JSON
中)
- 在每个输入行上使用
将sub
“}}}”替换为
“}”
- 使用
和在行之间插入逗号toString
- 用
和“[”
使用“]”
c
library(jsonlite)
library(magrittr)
"test.json" %>%
sub("}}", "}", .) %>%
toString %>%
c("[", ., "]") %>%
fromJSON
注
使用以下代码生成测试输入:
Lines <- c('{"eid":"START","ver":"3.0","ets":1514764800238}}',
'{"eid":"INTERACT","ver":"3.0","ets":1514764820546}}',
'{"eid":"IMPRESSION","ver":"3.0","ets":895732}}',
'{"eid":"IMPRESSION","ver":"3.0","ets":245636}}',
'{"eid":"INTERACT","ver":"3.0","ets":535235423525}}')
writeLines(Lines, "test.json")
Lines注意,这个问题几乎与
除了在JSON中读取并运行fromJSON
(jsonlite包)之外,一行基本代码可以将其转换为有效的JSON(在变量JSON
中)
- 在每个输入行上使用
sub
将“}}}”替换为“}”
- 使用
toString
和在行之间插入逗号
- 用
“[”
和“]”
使用c
代码:
变异
这也可以表示为提供相同输出的管道:
library(jsonlite)
library(magrittr)
"test.json" %>%
sub("}}", "}", .) %>%
toString %>%
c("[", ., "]") %>%
fromJSON
注
使用以下代码生成测试输入:
Lines <- c('{"eid":"START","ver":"3.0","ets":1514764800238}}',
'{"eid":"INTERACT","ver":"3.0","ets":1514764820546}}',
'{"eid":"IMPRESSION","ver":"3.0","ets":895732}}',
'{"eid":"IMPRESSION","ver":"3.0","ets":245636}}',
'{"eid":"INTERACT","ver":"3.0","ets":535235423525}}')
writeLines(Lines, "test.json")
行使用vscode查找和替换如何解决除最后一行之外的问题?我可以使用sublime来实现这一点,但是json的大小非常大。你确定每行的结尾都是}
吗?@hrbrmstr是的,但是在这一点上,我甚至无法理解阅读此csvUse vscode查找和替换我们如何解决除最后一行之外的问题?我可以使用sublime来实现这一点,但是json的大小非常大。你确定每行的结尾都是}
吗?@hrbrmstr是的,但是在这一点上,我甚至无法理解如何读取这个csv