NimbleCSV is a great option when parsing CSV files in Elixir. By design, it will only give you the data for each row as a List – there’s no option for it to generate a Map (using the headers as keys) for each row. When processing data from CSV files, I much prefer to use maps instead of lists, that way the order of the columns in the file doesn’t matter. Luckily, we can using Stream.transform/3 to generate a map for each row of data:
alias NimbleCSV.RFC4180, as: CSV
"path/to/file.csv"
|> File.stream!()
|> CSV.parse_stream(skip_headers: false)
|> Stream.transform(nil, fn
headers, nil -> {[], headers}
row, headers -> {[Enum.zip(headers, row) |> Map.new()], headers}
end)
|> Enum.to_list()
And, that’s it! Quite a bit is happening in a few lines of code. For the
initial accumulator, we need to use something that will not match the first row
of the CSV, so nil
or some other atom is a great choice here. We match on our
initial accumulator in the first clause of the function to Stream.transform/3
to know we’re on the headers row and we’ll use the headers as the accumulator
from thereon. For the rest of the rows, since we know the order of the headers
will correspond to the order of the values in each row, we zip the headers and
row values together into a list of {key, value}
tuples and then turn that
list into a map. After that we have an Enumerable of maps that we
can use for processing the data.
To see it in action, I’ll use a String instead of a File:
alias NimbleCSV.RFC4180, as: CSV
"""
name,age
Alex,21
Billie,8
Charlie,32
"""
|> CSV.parse_string(skip_headers: false)
|> Stream.transform(nil, fn
headers, nil -> {[], headers}
row, headers -> {[Enum.zip(headers, row) |> Map.new()], headers}
end)
|> Enum.to_list()
# [
# %{"age" => "21", "name" => "Alex"},
# %{"age" => "8", "name" => "Billie"},
# %{"age" => "32", "name" => "Charlie"}
# ]
A similar approach can be taken with Enum.flat_map_reduce/3, if you really wanted:
alias NimbleCSV.RFC4180, as: CSV
"""
name,age
Alex,21
Billie,8
Charlie,32
"""
|> CSV.parse_string(skip_headers: false)
|> Enum.flat_map_reduce(nil, fn
headers, nil -> {[], headers}
row, headers -> {[Enum.zip(headers, row) |> Map.new()], headers}
end)
|> elem(0)
# [
# %{"age" => "21", "name" => "Alex"},
# %{"age" => "8", "name" => "Billie"},
# %{"age" => "32", "name" => "Charlie"}
# ]
The call to Kernel.elem/2
at the end is there to grab the resulting
enumerable and ignore the accumulator.