## Constructing JSON From Postgres (Part 1)

From: andrew cooke <andrew@...>

Date: Sat, 24 Jan 2015 11:13:09 -0300

Recent releases of Postgres have included support for JSON.  You can store
JSON objects, access values within the objects, and construct new objects.

The most relevant docs are

* http://www.postgresql.org/docs/9.4/static/datatype-json.html

* http://www.postgresql.org/docs/9.4/static/functions-json.html

* http://www.postgresql.org/docs/9.4/static/functions-aggregate.html

However, there don't seem to be many good examples on the web for how to use
these.  So work asked me to look at how we can construct nested JSON objects
from complex queries.  These posts are a summary of what I have found.

Disclaimer: I don't claim that the techniques here are optimal, or even a good
idea.  They're purely to show what can be done.  There may be better ways to
do these things, and it may be best not to do them in Postgres at all...

Example Data
------------

These are the data I will be using.  They describe a relatively simple 1-many
parent/child/grandchild relationship.

drop table grandchild;
drop table child;
drop table parent;

create table parent (
id serial primary key,
name text
);

create table child (
id serial primary key,
parent int references parent(id),
name text
);

create table grandchild (
id serial primary key,
parent int references child(id),
name text
);

insert into parent (name) values
('albert'),
('barbara'),
('claude');

insert into child (name, parent) values
('delilah', 1),
('ebeneezer', 1),
('francoise', 3);

insert into grandchild (name, parent) values
('george', 1),
('henrietta', 2),
('ibrahim', 3),
('jenny', 3);

Our eventual aim (which won't be complete until part 2) is to construct
something like (I'm writing this by hand, so forgive the lack of quotes and any
other errors):

[{name: albert,
children: [{name: delilah,
children: [george]},
{name: ebeneezer,
children: [henrietta]}]},
{name: barbara,
children: []},
{name: claude,
children: [{name: francoise,
children: [ibrahim, jenny]}]}]

But this is quite ambitious - the output doesn't include IDs, and the
grandchildren are treated as simple strings.  In this first post I will aim
for something simpler, with objects at each level, and including IDs.

With Queries
------------

We will build the JSON structure in multiple steps, following a few simple
"recipes".  This can get quite messy in SQL - one way to simplify things is to
use "with" to isolate and name each step.

Here's a simple example (all the examples here assume that the data above are
present in the database):

with with_example
as (select parent,
array_agg(name) as children
from child
group by parent)
select *
from with_example as e
inner join parent as p
on p.id = e.parent;

This works like a temporary definition of a table, called "with_example",
which has the "child" table grouped by the "parent" attribute.  That table is
then joined with the "parent" table, giving:

parent |      children       | id |  name
--------+---------------------+----+--------
1 | {delilah,ebeneezer} |  1 | albert
3 | {francoise}         |  3 | claude
(2 rows)

Note that "array_agg()" is a function that constructs native Postgres arrays
(in a moment we will use "json_agg()" which is similar, but constructs JSON
arrays).

More generally, you can have a series of "with" definitions, that work like
you're defining a series of tables.  This lets you work through the process of
creating the JSON (or doing any other query) in simple steps, one at a time.

If you're confused about what's happening, you can always put a simple select
at the end of a series of "with" queries to display what you have so far.

For example:

with with_example
as (select parent,
array_agg(name) as children
from child
group by parent)
select *
from with_example;

parent |      children
--------+---------------------
1 | {delilah,ebeneezer}
3 | {francoise}
(2 rows)

Plan of Attack
--------------

In the next few sections I am going to create a nested JSON object, using
"with" queries (see above).

The general idea is that we start from the "innermost" objects and work our
way up.  Each step will be a new "with" query.

Creating JSON Objects
---------------------

The simplest way to create a JSON object is from a row in a select statement.
This creates an object whose names are taken from the columns, and whose
values are taken from the rows.

with json_grandchild
as (select row_to_json(grandchild) as grandchild_obj
from grandchild)
select *
from json_grandchild;

grandchild_obj
----------------------------------------
{"id":1,"parent":1,"name":"george"}
{"id":2,"parent":2,"name":"henrietta"}
{"id":3,"parent":3,"name":"ibrahim"}
{"id":4,"parent":3,"name":"jenny"}
(4 rows)

If this seems complicated, break it down:

* The "select * from json_grandchild" is just there so we can see the
results.

* The "as grandchild_obj" gives a name to the column in the final
result.

* The "select row_to_json(grandchild) ... from grandchild" is just
applying the "row_to_json()" function to each row of the table,
creating a JSON object.

* The "with json_grandchild as" is saving what we've done as something like a
temporary table, called "json_grandchild" (with one column, called
"grandchild_obj", which contains a JSON object for each line of the
"grandchild" table).

Inspecting JSON Values
----------------------

Now that we have JSON objects, we need to be able to look "inside" them.
There is a small problem here - JSON is "untyped".  That is, the database does
not "know" that "id" in the objects above is an integer.  So the only way to
access it is as a string.  If we (as programmers) know that it is an integer,
then we can cast that string to an "int" as a separate step.

So here's an intermediate step that shows how we access the "id":

with json_grandchild
as (select row_to_json(grandchild) as grandchild_obj
from grandchild)
select (grandchild_obj->>'id')::int as id
from json_grandchild;

id
----
1
2
3
4
(4 rows)

Note the clumsy syntax - we need quotes around "id" and parens around the
value we are casting.

Creating JSON Arrays
--------------------

Given that we have "grandchild" objects, we now need to group them by parent
(this is the next step in constructing the output, building things up).

with json_grandchild
as (select row_to_json(grandchild) as grandchild_obj
from grandchild),
json_grandchild_arr
as (select json_agg(json_grandchild) as grandchild_arr
from json_grandchild
group by (grandchild_obj->>'parent')::int)
select *
from json_grandchild_arr;

grandchild_arr
-------------------------------------------------------------
[{"grandchild_obj":{"id":1,"parent":1,"name":"george"}}]
[{"grandchild_obj":{"id":3,"parent":3,"name":"ibrahim"}},  +
{"grandchild_obj":{"id":4,"parent":3,"name":"jenny"}}]
[{"grandchild_obj":{"id":2,"parent":2,"name":"henrietta"}}]
(3 rows)

Again, if this looks complex, break it down.  All we've done is add an extra
step to what we had before, where we're using "group by" and "json_agg" to
place all the objects with the same "parent" value in an array.

Join With Child
---------------

Next we need to join what we have with the "child" table.  This is your normal
SQL join.  The only complication is that we need to extract the "parent" value
from the arrays above (every entry in the array has the same parent value, so
we use the first).

with json_grandchild
as (select row_to_json(grandchild) as grandchild_obj
from grandchild),
json_grandchild_arr
as (select json_agg(json_grandchild) as grandchild_arr
from json_grandchild
group by (grandchild_obj->>'parent')::int),
child_grandchild_join
as (select *
from json_grandchild_arr as g
inner join child as c
on c.id = (g.grandchild_arr->0->'grandchild_obj'->>'parent')::int)
select *
from child_grandchild_join;

grandchild_arr                        | id | parent |
name
-------------------------------------------------------------+----+--------+-----------
[{"grandchild_obj":{"id":1,"parent":1,"name":"george"}}]    |  1 |      1 |
delilah
[{"grandchild_obj":{"id":3,"parent":3,"name":"ibrahim"}},  +|  3 |      3 |
francoise
{"grandchild_obj":{"id":4,"parent":3,"name":"jenny"}}]     |    |        |
[{"grandchild_obj":{"id":2,"parent":2,"name":"henrietta"}}] |  2 |      1 |
ebeneezer
(3 rows)

This is getting a little tricky to read, sorry (it's going to get worse before
it gets better).

The most important thing to understand here is that we used "->" to dive
inside the JSON data.  This is NOT THE SAME as the "->>" we used earlier:

* ->> selects a string (which we have typically then cast to an integer)

* -> selects a JSON value of some type (object, array, int, string, etc).

So "g.grandchild_arr->0->'grandchild_obj'->>'parent'" is taking the first (0
index) array entry, then the "grandchild_obj" attribute, then converting the
"parent" attribute of that to a (Postgres) string.

Create A JSON Object (Again)
----------------------------

From here on we're basically repeating ourselves for the next level.  Things
look worse and worse, but it's the same logic as before.

with json_grandchild
as (select row_to_json(grandchild) as grandchild_obj
from grandchild),
json_grandchild_arr
as (select json_agg(json_grandchild) as grandchild_arr
from json_grandchild
group by (grandchild_obj->>'parent')::int),
child_grandchild_join
as (select *
from json_grandchild_arr as g
inner join child as c
on c.id = (g.grandchild_arr->0->'grandchild_obj'->>'parent')::int),
json_child
as (select row_to_json(child_grandchild_join) as child_obj
from child_grandchild_join)
select *
from json_child;

child_obj
---------------------------------------------------------------------------------------------------------------------
{"grandchild_arr":[{"grandchild_obj":{"id":1,"parent":1,"name":"george"}}],"id":1,"parent":1,"name":"delilah"}
{"grandchild_arr":[{"grandchild_obj":{"id":3,"parent":3,"name":"ibrahim"}},   +
{"grandchild_obj":{"id":4,"parent":3,"name":"jenny"}}],"id":3,"parent":3,"name":"francoise"}
{"grandchild_arr":[{"grandchild_obj":{"id":2,"parent":2,"name":"henrietta"}}],"id":2,"parent":1,"name":"ebeneezer"}
(3 rows)

Create JSON Array (Again)
-------------------------

As before, group by "parent".

with json_grandchild
as (select row_to_json(grandchild) as grandchild_obj
from grandchild),
json_grandchild_arr
as (select json_agg(json_grandchild) as grandchild_arr
from json_grandchild
group by (grandchild_obj->>'parent')::int),
child_grandchild_join
as (select *
from json_grandchild_arr as g
inner join child as c
on c.id = (g.grandchild_arr->0->'grandchild_obj'->>'parent')::int),
json_child
as (select row_to_json(child_grandchild_join) as child_obj
from child_grandchild_join),
json_child_arr
as (select json_agg(json_child) as child_arr
from json_child
group by (child_obj->>'parent')::int)
select *
from json_child_arr;

child_arr
-------------------------------------------------------------------------------------------------------------------------------------
[{"child_obj":{"grandchild_arr":[{"grandchild_obj":{"id":1,"parent":1,"name":"george"}}],"id":1,"parent":1,"name":"delilah"}},   +
{"child_obj":{"grandchild_arr":[{"grandchild_obj":{"id":2,"parent":2,"name":"henrietta"}}],"id":2,"parent":1,"name":"ebeneezer"}}]
[{"child_obj":{"grandchild_arr":[{"grandchild_obj":{"id":3,"parent":3,"name":"ibrahim"}},   +
{"grandchild_obj":{"id":4,"parent":3,"name":"jenny"}}],"id":3,"parent":3,"name":"francoise"}}]
(2 rows)

Join With Parent
----------------

As before.  And we'll create the final object too.  And we'll wrap it all in a
list.

with json_grandchild
as (select row_to_json(grandchild) as grandchild_obj
from grandchild),
json_grandchild_arr
as (select json_agg(json_grandchild) as grandchild_arr
from json_grandchild
group by (grandchild_obj->>'parent')::int),
child_grandchild_join
as (select *
from json_grandchild_arr as g
inner join child as c
on c.id = (g.grandchild_arr->0->'grandchild_obj'->>'parent')::int),
json_child
as (select row_to_json(child_grandchild_join) as child_obj
from child_grandchild_join),
json_child_arr
as (select json_agg(json_child) as child_arr
from json_child
group by (child_obj->>'parent')::int),
parent_child_join
as (select *
from json_child_arr as c
inner join parent as p
on p.id = (c.child_arr->0->'child_obj'->>'parent')::int),
json_parent
as (select row_to_json(parent_child_join) as parent_obj
from parent_child_join),
json_parent_arr
as (select json_agg(json_parent) as parent_arr
from json_parent)
select *
from json_parent_arr;

parent_arr
----------------------------------------------------------------------------------------------------------------------------------------------------------------
[{"parent_obj":{"child_arr":[{"child_obj":{"grandchild_arr":[{"grandchild_obj":{"id":1,"parent":1,"name":"george"}}],"id":1,"parent":1,"name":"delilah"}}, +
{"child_obj":{"grandchild_arr":[{"grandchild_obj":{"id":2,"parent":2,"name":"henrietta"}}],"id":2,"parent":1,"name":"ebeneezer"}}],"id":1,"name":"albert"}},  +
{"parent_obj":{"child_arr":[{"child_obj":{"grandchild_arr":[{"grandchild_obj":{"id":3,"parent":3,"name":"ibrahim"}},  +
{"grandchild_obj":{"id":4,"parent":3,"name":"jenny"}}],"id":3,"parent":3,"name":"francoise"}}],"id":3,"name":"claude"}}]
(1 row)

And that's it!  It's ugly as sin, and it contains way too much structure, but
it's a nested JSON object that contains all the data that was in our database.

Conclusions
-----------

I've shown how a few simple JSON-specific steps (creating objects and arrays)
can be combined with normal SQL operations like joins to create nested JSON
structures.

We can also "dive into" those structures to select values, extract the values
as strings, and then cast them to the correct type.

Finally, we can piece together multiple steps using "with" queries.  That
allows us to construct arbitrarily complex objects as a series of simple
steps.

Next post I'll shows how to make the output nicer - it's basically one
Andrew