CSV到Elixir中的地图流_Csv_Elixir

CSV到Elixir中的地图流

csv elixir

CSV到Elixir中的地图流,csv,elixir,Csv,Elixir,我需要解析大量csv数据，其中文件的第一行是标题。库：csv已经给了我一个列表流，我需要从第一行推断出结构，但忽略它，然后生成一个具有给定结构的地图流我喜欢这样： data.csv CSV.stream\u映射文件名输出我正在查看Stream.transform，但不知道如何跳过第一个元素。该结构可以存储在累加器中。虽然我发现csv模块已经做到了这一点，但我自己也找到了实现这一点的方法。事实证明，如果在Stream.transform回调上发回空列表[]，则不会将任何元素推送到流中： def

我需要解析大量csv数据，其中文件的第一行是标题。库：csv已经给了我一个列表流，我需要从第一行推断出结构，但忽略它，然后生成一个具有给定结构的地图流

我喜欢这样：

data.csv

CSV.stream\u映射文件名输出

我正在查看Stream.transform，但不知道如何跳过第一个元素。该结构可以存储在累加器中。

虽然我发现csv模块已经做到了这一点，但我自己也找到了实现这一点的方法。事实证明，如果在Stream.transform回调上发回空列表[]，则不会将任何元素推送到流中：

def map_stream(enum) do
    enum
    |> Stream.transform(:first, &structure_from_header/2)
end

#The accumulator starts as :first, the its the structure of the csv
#that is the first line
def structure_from_header(line, :first),
    do: { [ ], line } #<=================== Here is the trick

def structure_from_header(line, structure) do
    map = 
      structure
      |> Enum.zip(line)
      |> Enum.into(%{})

{ [ map ], structure }
end

def map_stream(enum) do
    enum
    |> Stream.transform(:first, &structure_from_header/2)
end

#The accumulator starts as :first, the its the structure of the csv
#that is the first line
def structure_from_header(line, :first),
    do: { [ ], line } #<=================== Here is the trick

def structure_from_header(line, structure) do
    map = 
      structure
      |> Enum.zip(line)
      |> Enum.into(%{})

{ [ map ], structure }
end

如果将headers:true作为中提到的CSV.decode/2的第二个参数传递，它将自动使用第一行作为键名，并返回以下所有行的映射

iex(1)> CSV.decode(File.stream!("data.csv"), headers: true) |> Enum.to_list
[%{"a" => "1", "b" => "2"}, %{"a" => "3", "b" => "4"}]

data.csv包含：

a,b
1,2
3,4

如果将headers:true作为中提到的CSV.decode/2的第二个参数传递，它将自动使用第一行作为键名，并返回以下所有行的映射

iex(1)> CSV.decode(File.stream!("data.csv"), headers: true) |> Enum.to_list
[%{"a" => "1", "b" => "2"}, %{"a" => "3", "b" => "4"}]

data.csv包含：

a,b
1,2
3,4

我认为有两种选择。在这里，您可以设置块大小，这样就不会将整个文件加载到内存中，并且可以处理它的集合。如果需要解析数据，请不要使用流解决方案。在这两个示例中，我将演示如何跳过标题。至于创建地图结构，您可以查看一下，然后利用结构为地图集创建结构。如果你有很多专栏，我建议你使用地图集，而不是地图

def stream_parse(file_path, chunk_size) do
  file_path
    |> File.stream!
    |> Stream.drop(1)
    |> Stream.map(&String.split(&1, ","))
    |> Stream.chunk(chunk_size, chunk_size, [])
    |> Stream.map(&MapSet.new(&1))
end

def flow_parse(file_path, chunk_size) do
  file_path
    |> File.stream!(read_ahead: chunk_size)
    |> Stream.drop(1)
    |> Flow.from_enumerable
    |> Flow.map(&String.split(&1, ","))
    |> Flow.partition
    |> Flow.map(&MapSet.new(&1)
end

def stream_parse(file_path, chunk_size) do
  file_path
    |> File.stream!
    |> Stream.drop(1)
    |> Stream.map(&String.split(&1, ","))
    |> Stream.chunk(chunk_size, chunk_size, [])
    |> Stream.map(&MapSet.new(&1))
end

def flow_parse(file_path, chunk_size) do
  file_path
    |> File.stream!(read_ahead: chunk_size)
    |> Stream.drop(1)
    |> Flow.from_enumerable
    |> Flow.map(&String.split(&1, ","))
    |> Flow.partition
    |> Flow.map(&MapSet.new(&1)
end