正在R中加载.log文件

正在R中加载.log文件,r,json,jsonlite,R,Json,Jsonlite,我有bigdata.log文件,下面是几行示例。我想把它转换成EDA的数据帧 {"date":"2018-03-29T12:49:25.308+0000","level":"INFO","message":"User authenticated","action":"user_authenticated","username":"test@test.com"} {"date":"2018-03-29T12:49:35.518+0000","level":"INFO","message":"Us

我有bigdata.log文件,下面是几行示例。我想把它转换成EDA的数据帧

{"date":"2018-03-29T12:49:25.308+0000","level":"INFO","message":"User 
authenticated","action":"user_authenticated","username":"test@test.com"}
{"date":"2018-03-29T12:49:35.518+0000","level":"INFO","message":"User changed 
password with recovery (Web)","action":"recovery_password_changed","requestSource":"WEB","username":"test123@test.com"}
我尝试从jsonlite库加载json,但我得到了错误,解析错误:拖尾垃圾。我查过wd了,一切正常

mydata <- fromJSON("data.log")

mydata此处没有有效的json。您需要将其预处理为类似这样的内容

x <- '[{"date":"2018-03-29T12:49:25.308+0000","level":"INFO","message":"User authenticated","action":"user_authenticated","username":"test@test.com"},
{"date":"2018-03-29T12:49:35.518+0000","level":"INFO","message":"User changed password with recovery (Web)","action":"recovery_password_changed","requestSource":"WEB","username":"test123@test.com"}
]'

library(jsonlite)

fromJSON(x)

                          date level                                   message
1 2018-03-29T12:49:25.308+0000  INFO                        User authenticated
2 2018-03-29T12:49:35.518+0000  INFO User changed password with recovery (Web)
                     action         username requestSource
1        user_authenticated    test@test.com          <NA>
2 recovery_password_changed test123@test.com           WEB
或者您可以直接将其强制为data.frame

sapply(xy, FUN = function(x) {
  out <- fromJSON(x)
  as.data.frame(out)
}, USE.NAMES = FALSE)

[[1]]
                          date level            message             action
1 2018-03-29T12:49:25.308+0000  INFO User authenticated user_authenticated
       username
1 test@test.com

[[2]]
                          date level                                   message
1 2018-03-29T12:49:35.518+0000  INFO User changed password with recovery (Web)
                     action requestSource         username
1 recovery_password_changed           WEB test123@test.com
sappy(xy,FUN=function(x){

out您可以使用
ndjson::stream_in()
jsonlite::stream_in()
。您使用的是换行分隔的JSON。这在现在非常常见。

如果我在.log文件中有超过900000行带有{}每一行,jsonlite是最简单的方法还是有其他选择?好的。我用sapply试过,它创建了一个列表,并尝试用myDf将其转换为数据框。请编辑您的原始问题或将其作为单独的问题发布。我需要查看您使用的完整代码,包括数据。
xy <- readLines("mylog.txt")
sapply(xy, fromJSON, USE.NAMES = FALSE)

[[1]]
[[1]]$`date`
[1] "2018-03-29T12:49:25.308+0000"

[[1]]$level
[1] "INFO"

[[1]]$message
[1] "User authenticated"

[[1]]$action
[1] "user_authenticated"

[[1]]$username
[1] "test@test.com"


[[2]]
[[2]]$`date`
[1] "2018-03-29T12:49:35.518+0000"

[[2]]$level
[1] "INFO"

[[2]]$message
[1] "User changed password with recovery (Web)"

[[2]]$action
[1] "recovery_password_changed"

[[2]]$requestSource
[1] "WEB"

[[2]]$username
[1] "test123@test.com"
sapply(xy, FUN = function(x) {
  out <- fromJSON(x)
  as.data.frame(out)
}, USE.NAMES = FALSE)

[[1]]
                          date level            message             action
1 2018-03-29T12:49:25.308+0000  INFO User authenticated user_authenticated
       username
1 test@test.com

[[2]]
                          date level                                   message
1 2018-03-29T12:49:35.518+0000  INFO User changed password with recovery (Web)
                     action requestSource         username
1 recovery_password_changed           WEB test123@test.com