如何在Julia中读取记录格式json？_Julia_Julia Dataframe

如何在Julia中读取记录格式json？

julia

如何在Julia中读取记录格式json？,julia,julia-dataframe,Julia,Julia Dataframe,我能够读取json文件并使用下面的代码将其转换为数据帧 df = open(jsontable, "normal.json") |> DataFrame normal.json如下所示 data = [] for line in open("sample.json", 'r'): data.append(json.loads(line)) print(data) df=pd.DataFrame(data) {“col1:[“thasin”

我能够读取json文件并使用下面的代码将其转换为数据帧

df = open(jsontable, "normal.json") |> DataFrame

normal.json

如下所示

data = []
for line in open("sample.json", 'r'):
    data.append(json.loads(line))
print(data)
df=pd.DataFrame(data)

{“col1:[“thasin”，“hello”，“world”]，“col2:[1,2,3]，“col3:[“abc”，“def”，“ghi”]}

那么最后的df呢,

3×3 DataFrame
│ Row │ col1   │ col2  │ col3   │
│     │ String │ Int64 │ String │
├─────┼────────┼───────┼────────┤
│ 1   │ thasin │ 1     │ abc    │
│ 2   │ hello  │ 2     │ def    │
│ 3   │ world  │ 3     │ ghi    │

但是，相同的代码不适用于

record

格式的json文件

格式是类似于{column->value}、{column->value}的列表

我的示例json

{"billing_account_id":"0139A","credits":[],"invoice":{"month":"202003"},"cost_type":"regular"}
{"billing_account_id":"0139A","credits":[1.45],"invoice":{"month":"202003"},"cost_type":"regular"}
{"billing_account_id":"0139A","credits":[2.00, 3.56],"invoice":{"month":"202003"},"cost_type":"regular"}

预期产出：

  billing_account_id cost_type      credits              invoice
0             0139A   regular           []  {'month': '202003'}
1             0139A   regular       [1.45]  {'month': '202003'}
2             0139A   regular  [2.0, 3.56]  {'month': '202003'}

这可以在python中完成，如下所示

data = []
for line in open("sample.json", 'r'):
    data.append(json.loads(line))
print(data)
df=pd.DataFrame(data)

如何在Julia中执行此操作？

请注意，您的文件不是有效的JSON（它的行是有效的JSON，而不是整个文件）

您可以这样做：

julia> using DataFrames, JSON3

julia> df = JSON3.read.(eachline("sample.json")) |> DataFrame;

julia> df.credits = Vector{Float64}.(df.credits);

julia> df.invoice = Dict{Symbol,String}.(df.invoice);

julia> df
3×4 DataFrame
│ Row │ billing_account_id │ credits                    │ invoice                │ cost_type │
│     │ String             │ Array{Float64,1}           │ Dict{Symbol,String}    │ String    │
├─────┼────────────────────┼────────────────────────────┼────────────────────────┼───────────┤
│ 1   │ 0139A              │ 0-element Array{Float64,1} │ Dict(:month=>"202003") │ regular   │
│ 2   │ 0139A              │ [1.45]                     │ Dict(:month=>"202003") │ regular   │
│ 3   │ 0139A              │ [2.0, 3.56]                │ Dict(:month=>"202003") │ regular   │

：credits

和

：invoice

列上的转换使它们的类型易于使用（否则它们使用由JSON3.jl内部定义的类型）

一个更高级的选项是通过使用

NamedTuple

类型指定行模式一次性完成，例如：

julia> df = JSON3.read.(eachline("sample.json"),
                        NamedTuple{(:billing_account_id, :credits, :invoice, :cost_type),Tuple{String,Vector{Float64},Dict{String,String},String}}) |>
            DataFrame
3×4 DataFrame
│ Row │ billing_account_id │ credits                    │ invoice                 │ cost_type │
│     │ String             │ Array{Float64,1}           │ Dict{String,String}     │ String    │
├─────┼────────────────────┼────────────────────────────┼─────────────────────────┼───────────┤
│ 1   │ 0139A              │ 0-element Array{Float64,1} │ Dict("month"=>"202003") │ regular   │
│ 2   │ 0139A              │ [1.45]                     │ Dict("month"=>"202003") │ regular   │
│ 3   │ 0139A              │ [2.0, 3.56]                │ Dict("month"=>"202003") │ regular   │

与julia答案无关，但在python中可以

pd.read\u json（“sample.json”，lines=True）