← Back to Blog
2025-06-05
JSON to CSV: Practical Tips for Data Pipelines
Flattening nested JSON to CSV seems trivial until you hit nested arrays, nulls, and special characters. Here's how to do it right.
Why JSON-to-CSV Is Tricky
Flat CSV and hierarchical JSON are fundamentally different shapes. Problems arise when:
- Nested objects:
{ address: { city: "NY" } }— do you flatten toaddress.cityor serialize as JSON? - Arrays:
{ tags: ["a","b"] }— impossible to map 1:1 to a cell - Special characters: Commas and quotes in values break naive serializers
The RFC 4180 Approach
The correct way to handle special characters is RFC 4180 escaping:
- Wrap any value containing a comma, double-quote, or newline in double-quotes
- Escape internal double-quotes by doubling them:
"say ""hello"""
Our converter implements this correctly, unlike many naive alternatives.
Flattening Strategy
For nested objects, we serialize as a JSON string within the cell. This preserves data without data loss:
id,name,address
1,John,"{""city"":""NY"",""zip"":""10001""}"
For data pipelines feeding into pandas or BigQuery, this is often the safest approach — the downstream system can further parse if needed.
When to Use CSV
CSV is ideal for:
- Importing into Excel/Google Sheets
- Feeding simple ML pipelines
- Data exports for non-technical users
For anything that needs to preserve the full structure, keep it as JSON. CSV is a lossy format by design.