Introduction
Have you ever struggled with handling JSON data in SQL queries? If so, you’re not alone. In this post, we’ll explore a real-world scenario where we tackled this challenge in Snowflake, transforming complex JSON structures into structured, queryable data.
Understanding the Challenge
In one of our ETL projects, we process raw data ingested from a third-party REST API into a Snowflake database. Some parent-child relationships within this data are stored as JSON strings inside VARCHAR columns.
Here is a simplified example for the purposes of this article:

Consider a table storing Person records where the Countries column contains country details in JSON format. This field can hold different types of values:
- A single JSON object:
{“id”:11,”name”:”France”} - A JSON array with multiple records:
[{“id”:22,”name”:”Germany”}, {“id”:33,”name”:”Italy”}] - Blank value:
NULL
We need to extract country names associated with each person as a simple comma-separated string for further processing. For instance, given PersonID = 3, we aim to extract “Germany, Italy”. Let’s walk through how to achieve this efficiently in Snowflake.
Extracting String Items from JSON Arrays
Step 1: Create a Sample Table
First, let’s set up a temporary table with sample data:
CREATE OR REPLACE TEMPORARY TABLE tmpSampleData (PersonID int, Countries VARCHAR(16777216));
INSERT INTO tmpSampleData VALUES (1, '{"id":11,"name":"France"}');
INSERT INTO tmpSampleData VALUES (2, '{"id":22,"name":"Germany"}');
INSERT INTO tmpSampleData VALUES (3, '[{"id":22,"name":"Germany"}, {"id":33,"name":"Italy"}]');
INSERT INTO tmpSampleData VALUES (4, '[{"id":44,"name":"USA"}, {"id":33,"name":"Italy"}, {"id":55,"name":"Spain"}]');
INSERT INTO tmpSampleData VALUES (5, NULL);
Step 2: Extract Country Names
Now, let’s write a SQL query to extract country names and convert them into a comma-separated string:
SELECT a.*, b.*
FROM tmpSampleData a
LEFT OUTER JOIN (
SELECT PersonID,
LISTAGG(CountryName, ', ') WITHIN GROUP(ORDER BY CountryName) AS CountryName
FROM (SELECT PersonID, items.value:name as CountryName
FROM tmpSampleData, LATERAL FLATTEN(PARSE_JSON(Countries)) AS items
WHERE ARRAY_SIZE(PARSE_JSON(Countries)) IS NOT NULL
UNION ALL
SELECT PersonID, PARSE_JSON(Countries):name
FROM tmpSampleData
WHERE ARRAY_SIZE(PARSE_JSON(Countries)) IS NULL)
GROUP BY PersonID) b ON a.PersonID = b.PersonID
ORDER BY a.PersonID;
Here is the output of the query where you can see the desired output in the rightest column:

Query Breakdown
1) Parsing JSON Data
Since Countries is stored as a VARCHAR, we use PARSE_JSON(Countries) to convert it into a VARIANT data type in Snowflake which can store either OBJECT or ARRAY.
2) Handling Different JSON Structures
Parsing our JSON would return either an array of objects or a single object. We need to handle each case differently. We do this in this sub-query where we combine the two cases with a UNION ALL clause:
SELECT PersonID, items.value:name as CountryName
FROM tmpSampleData, LATERAL FLATTEN(PARSE_JSON(Countries)) AS items
WHERE ARRAY_SIZE(PARSE_JSON(Countries)) IS NOT NULL
UNION ALL
SELECT PersonID, PARSE_JSON(Countries):name
FROM tmpSampleData
WHERE ARRAY_SIZE(PARSE_JSON(Countries)) IS NULL
The top part handles the records where the JSON contain an array of items. We recognize these records by checking the ARRAY_SIZE() function looking for non-null values: ARRAY_SIZE(PARSE_JSON(Countries)) IS NOT NULL.
The bottom part handles the records where the JSON contain a single object (or NULL). We recognize these records by checking the ARRAY_SIZE() function looking for null value: ARRAY_SIZE(PARSE_JSON(Countries)) IS NULL.
3) Flatten the Items
The top part flattens the array of countries linked to each person. We do this by combining these two functions:
a) FLATTEN – This flattens the compound array into multiple rows
b) LATERAL – When this function is used in a FROM clause it behaves like joining the main table to a correlated subquery producing a combination for each record from the main query and an associated record from the subquery.
Here is an example how the countries linked to each person are flattened out:

The bottom part directly uses the object notation accessing the “name” property using the semi-colon syntax: PARSE_JSON(Countries):name
4) Aggregating Country Names
Once we have flattened the country names for all persons from the source table, we are using the LISTAGG() function to combine the records back to a comma-separated string containing all countries per person:
LISTAGG(CountryName, ‘, ‘) WITHIN GROUP(ORDER BY CountryName) AS CountryName
…
GROUP BY PersonID
The LISTAGG() returns the concatenated input values, separated by the delimiter string which is comma in this case. The input values are coming from the items contained into each PersonID group.
5) Merging Data with the Original Table
Finally, we LEFT JOIN our results back to the original table to retain other fields from the main table.
Real-World Use Cases & Lessons Learned
In our ETL pipeline, we handle multiple JSON fields using a similar approach—creating specific LEFT JOIN subqueries for each JSON attribute. If we only need the flattened data (without aggregation), we can skip the LISTAGG() step.
Mastering these Snowflake functions makes working with JSON much more manageable. Stay tuned for more insights into Snowflake and data analytics challenges!
Happy querying!