Seems unusual at first to claim JSON
can replace CSV
because both file types look
entirely different. CSV
looks relatively plain and unstructured and is often viewed inside a
spreadsheet app. JSON
looks highly structured and well organized viewed inside a webapp.
CSV
files contain multiple lines formed from values separated by commas. It’s rigid.
CSV
expects every nth entry from each line represents the same thing. Think of it as table
where each row has all entries for all defined columns.
JSON
files contain one or more data structures containing a collection of named properties and
values. In other words, well-structured objects of arbitrary complexity.
JSONL
takes the best ideas of both file formats. Imagine a data file forming JSON
into
lines. It looks like a file that’s rows and rows of JavaScript objects.
JSONL
allows a line to have more complicated properties like
nested arrays and object. You won't see the equivalent casually done in a CSV
file.
JSONL
allows for flexibility, resilience, extensibility, and retains a convenient text-based format.
Easy for humans to read and write by hand in popular programming editors.
You can imagine it’s incredibly useful for tools that need to process streams of records.
You’ll find JSONL
to be a simple, line-separated format that’s easily
readable and writable by command line tools and scripting languages.
JSONL
is new to me. I got lucky learning about it doing R&D with OpenAI APIs. They use
JSONL
formatted training data when fine-tuning the ChatGPT LLM.
I'll actively look for opportunities to this file format in future projects. Reach out to me on Twitter and let me know of your success. Let’s do
something awesome!