Andrew Cooke | Contents | Latest | RSS | Previous | Next

C[omp]ute

Welcome to my blog, which was once a mailing list of the same name and is still generated by mail. Please reply via the "comment" links.

Always interested in offers/projects/new ideas. Eclectic experience in fields like: numerical computing; Python web; Java enterprise; functional languages; GPGPU; SQL databases; etc. Based in Santiago, Chile; telecommute worldwide. CV; email.

Personal Projects

Choochoo Training Diary

Last 100 entries

Surprise Paradox; [Books] Good Author List; [Computing] Efficient queries with grouping in Postgres; [Computing] Automatic Wake (Linux); [Computing] AWS CDK Aspects in Go; [Bike] Adidas Gravel Shoes; [Computing, Horror] Biological Chips; [Books] Weird Lit Recs; [Covid] Extended SIR Models; [Art] York-based Printmaker; [Physics] Quantum Transitions are not Instantaneous; [Computing] AI and Drum Machines; [Computing] Probabilities, Stopping Times, Martingales; bpftrace Intro Article; [Computing] Starlab Systems - Linux Laptops; [Computing] Extended Berkeley Packet Filter; [Green] Mainspring Linear Generator; Better Approach; Rummikub Solver; Chilean Poetry; Felicitations - Empowerment Grant; [Bike] Fixing Spyre Brakes (That Need Constant Adjustment); [Computing, Music] Raspberry Pi Media (Audio) Streamer; [Computing] Amazing Hack To Embed DSL In Python; [Bike] Ruta Del Condor (El Alfalfal); [Bike] Estimating Power On Climbs; [Computing] Applying Azure B2C Authentication To Function Apps; [Bike] Gearing On The Back Of An Envelope; [Computing] Okular and Postscript in OpenSuse; There's a fix!; [Computing] Fail2Ban on OpenSuse Leap 15.3 (NFTables); [Cycling, Computing] Power Calculation and Brakes; [Hardware, Computing] Amazing Pockit Computer; Bullying; How I Am - 3 Years Post Accident, 8+ Years With MS; [USA Politics] In America's Uncivil War Republicans Are The Aggressors; [Programming] Selenium and Python; Better Walking Data; [Bike] How Fast Before Walking More Efficient Than Cycling?; [COVID] Coronavirus And Cycling; [Programming] Docker on OpenSuse; Cadence v Speed; [Bike] Gearing For Real Cyclists; [Programming] React plotting - visx; [Programming] React Leaflet; AliExpress Independent Sellers; Applebaum - Twilight of Democracy; [Politics] Back + US Elections; [Programming,Exercise] Simple Timer Script; [News] 2019: The year revolt went global; [Politics] The world's most-surveilled cities; [Bike] Hope Freehub; [Restaurant] Mama Chau's (Chinese, Providencia); [Politics] Brexit Podcast; [Diary] Pneumonia; [Politics] Britain's Reichstag Fire moment; install cairo; [Programming] GCC Sanitizer Flags; [GPU, Programming] Per-Thread Program Counters; My Bike Accident - Looking Back One Year; [Python] Geographic heights are incredibly easy!; [Cooking] Cookie Recipe; Efficient, Simple, Directed Maximisation of Noisy Function; And for argparse; Bash Completion in Python; [Computing] Configuring Github Jekyll Locally; [Maths, Link] The Napkin Project; You can Masquerade in Firewalld; [Bike] Servicing Budget (Spring) Forks; [Crypto] CIA Internet Comms Failure; [Python] Cute Rate Limiting API; [Causality] Judea Pearl Lecture; [Security, Computing] Chinese Hardware Hack Of Supermicro Boards; SQLAlchemy Joined Table Inheritance and Delete Cascade; [Translation] The Club; [Computing] Super Potato Bruh; [Computing] Extending Jupyter; Further HRM Details; [Computing, Bike] Activities in ch2; [Books, Link] Modern Japanese Lit; What ended up there; [Link, Book] Logic Book; Update - Garmin Express / Connect; Garmin Forerunner 35 v 230; [Link, Politics, Internet] Government Trolls; [Link, Politics] Why identity politics benefits the right more than the left; SSH Forwarding; A Specification For Repeating Events; A Fight for the Soul of Science; [Science, Book, Link] Lost In Math; OpenSuse Leap 15 Network Fixes; Update; [Book] Galileo's Middle Finger; [Bike] Chinese Carbon Rims; [Bike] Servicing Shimano XT Front Hub HB-M8010; [Bike] Aliexpress Cycling Tops; [Computing] Change to ssh handling of multiple identities?; [Bike] Endura Hummvee Lite II; [Computing] Marble Based Logic; [Link, Politics] Sanity Check For Nuclear Launch; [Link, Science] Entropy and Life

© 2006-2017 Andrew Cooke (site) / post authors (content).

Constructing JSON From Postgres (Part 1)

From: andrew cooke <andrew@...>

Date: Sat, 24 Jan 2015 11:13:09 -0300

Recent releases of Postgres have included support for JSON.  You can store
JSON objects, access values within the objects, and construct new objects.

The most relevant docs are 

 * http://www.postgresql.org/docs/9.4/static/datatype-json.html

 * http://www.postgresql.org/docs/9.4/static/functions-json.html

 * http://www.postgresql.org/docs/9.4/static/functions-aggregate.html


However, there don't seem to be many good examples on the web for how to use
these.  So work asked me to look at how we can construct nested JSON objects
from complex queries.  These posts are a summary of what I have found.

Disclaimer: I don't claim that the techniques here are optimal, or even a good
idea.  They're purely to show what can be done.  There may be better ways to
do these things, and it may be best not to do them in Postgres at all...


Example Data
------------

These are the data I will be using.  They describe a relatively simple 1-many
parent/child/grandchild relationship.

  drop table grandchild;
  drop table child;
  drop table parent;

  create table parent (
    id serial primary key,
    name text
  );

  create table child (
    id serial primary key,
    parent int references parent(id),
    name text
  );

  create table grandchild (
    id serial primary key,
    parent int references child(id),
    name text
  );

  insert into parent (name) values
    ('albert'),
    ('barbara'),
    ('claude');

  insert into child (name, parent) values
    ('delilah', 1),
    ('ebeneezer', 1),
    ('francoise', 3);

  insert into grandchild (name, parent) values
    ('george', 1),
    ('henrietta', 2),
    ('ibrahim', 3),
    ('jenny', 3);

Our eventual aim (which won't be complete until part 2) is to construct
something like (I'm writing this by hand, so forgive the lack of quotes and any
other errors):

[{name: albert,
  children: [{name: delilah,
              children: [george]},
             {name: ebeneezer,
              children: [henrietta]}]},
 {name: barbara,
  children: []},
 {name: claude,
  children: [{name: francoise,
              children: [ibrahim, jenny]}]}]

But this is quite ambitious - the output doesn't include IDs, and the
grandchildren are treated as simple strings.  In this first post I will aim
for something simpler, with objects at each level, and including IDs.


With Queries
------------

We will build the JSON structure in multiple steps, following a few simple
"recipes".  This can get quite messy in SQL - one way to simplify things is to
use "with" to isolate and name each step.

Here's a simple example (all the examples here assume that the data above are
present in the database):

    with with_example
      as (select parent,
                 array_agg(name) as children
            from child
           group by parent)
  select *
    from with_example as e
   inner join parent as p
      on p.id = e.parent;

This works like a temporary definition of a table, called "with_example",
which has the "child" table grouped by the "parent" attribute.  That table is
then joined with the "parent" table, giving:

   parent |      children       | id |  name  
  --------+---------------------+----+--------
        1 | {delilah,ebeneezer} |  1 | albert
        3 | {francoise}         |  3 | claude
  (2 rows)

Note that "array_agg()" is a function that constructs native Postgres arrays
(in a moment we will use "json_agg()" which is similar, but constructs JSON
arrays).

More generally, you can have a series of "with" definitions, that work like
you're defining a series of tables.  This lets you work through the process of
creating the JSON (or doing any other query) in simple steps, one at a time.

If you're confused about what's happening, you can always put a simple select
at the end of a series of "with" queries to display what you have so far.

For example:

    with with_example
      as (select parent,
                 array_agg(name) as children
            from child
           group by parent)
  select *
    from with_example;

   parent |      children       
  --------+---------------------
        1 | {delilah,ebeneezer}
        3 | {francoise}
  (2 rows)


Plan of Attack
--------------

In the next few sections I am going to create a nested JSON object, using
"with" queries (see above).

The general idea is that we start from the "innermost" objects and work our
way up.  Each step will be a new "with" query.


Creating JSON Objects
---------------------

The simplest way to create a JSON object is from a row in a select statement.
This creates an object whose names are taken from the columns, and whose
values are taken from the rows.

    with json_grandchild
      as (select row_to_json(grandchild) as grandchild_obj
            from grandchild)
  select *
    from json_grandchild;

               grandchild_obj             
  ----------------------------------------
   {"id":1,"parent":1,"name":"george"}
   {"id":2,"parent":2,"name":"henrietta"}
   {"id":3,"parent":3,"name":"ibrahim"}
   {"id":4,"parent":3,"name":"jenny"}
  (4 rows)

If this seems complicated, break it down:

 * The "select * from json_grandchild" is just there so we can see the
   results.

 * The "as grandchild_obj" gives a name to the column in the final 
   result.

 * The "select row_to_json(grandchild) ... from grandchild" is just 
   applying the "row_to_json()" function to each row of the table, 
   creating a JSON object.

 * The "with json_grandchild as" is saving what we've done as something like a
   temporary table, called "json_grandchild" (with one column, called
   "grandchild_obj", which contains a JSON object for each line of the
   "grandchild" table).


Inspecting JSON Values
----------------------

Now that we have JSON objects, we need to be able to look "inside" them.
There is a small problem here - JSON is "untyped".  That is, the database does
not "know" that "id" in the objects above is an integer.  So the only way to
access it is as a string.  If we (as programmers) know that it is an integer,
then we can cast that string to an "int" as a separate step.

So here's an intermediate step that shows how we access the "id":

    with json_grandchild
      as (select row_to_json(grandchild) as grandchild_obj
            from grandchild)
  select (grandchild_obj->>'id')::int as id
    from json_grandchild;

   id 
  ----
    1
    2
    3
    4
  (4 rows)

Note the clumsy syntax - we need quotes around "id" and parens around the
value we are casting.


Creating JSON Arrays
--------------------

Given that we have "grandchild" objects, we now need to group them by parent
(this is the next step in constructing the output, building things up).

    with json_grandchild
      as (select row_to_json(grandchild) as grandchild_obj
            from grandchild),
         json_grandchild_arr
      as (select json_agg(json_grandchild) as grandchild_arr
            from json_grandchild
           group by (grandchild_obj->>'parent')::int)
  select *
    from json_grandchild_arr;

                         grandchild_arr                        
  -------------------------------------------------------------
   [{"grandchild_obj":{"id":1,"parent":1,"name":"george"}}]
   [{"grandchild_obj":{"id":3,"parent":3,"name":"ibrahim"}},  +
    {"grandchild_obj":{"id":4,"parent":3,"name":"jenny"}}]
   [{"grandchild_obj":{"id":2,"parent":2,"name":"henrietta"}}]
  (3 rows)

Again, if this looks complex, break it down.  All we've done is add an extra
step to what we had before, where we're using "group by" and "json_agg" to
place all the objects with the same "parent" value in an array.


Join With Child
---------------

Next we need to join what we have with the "child" table.  This is your normal
SQL join.  The only complication is that we need to extract the "parent" value
from the arrays above (every entry in the array has the same parent value, so
we use the first).

    with json_grandchild
      as (select row_to_json(grandchild) as grandchild_obj
            from grandchild),
         json_grandchild_arr
      as (select json_agg(json_grandchild) as grandchild_arr
            from json_grandchild
           group by (grandchild_obj->>'parent')::int),
         child_grandchild_join
      as (select *
            from json_grandchild_arr as g
           inner join child as c
              on c.id = (g.grandchild_arr->0->'grandchild_obj'->>'parent')::int)
  select *
    from child_grandchild_join;

                         grandchild_arr                        | id | parent |
                         name    
  -------------------------------------------------------------+----+--------+-----------
   [{"grandchild_obj":{"id":1,"parent":1,"name":"george"}}]    |  1 |      1 |
                         delilah
   [{"grandchild_obj":{"id":3,"parent":3,"name":"ibrahim"}},  +|  3 |      3 |
                         francoise
    {"grandchild_obj":{"id":4,"parent":3,"name":"jenny"}}]     |    |        | 
   [{"grandchild_obj":{"id":2,"parent":2,"name":"henrietta"}}] |  2 |      1 |
                         ebeneezer
  (3 rows)

This is getting a little tricky to read, sorry (it's going to get worse before
it gets better).

The most important thing to understand here is that we used "->" to dive
inside the JSON data.  This is NOT THE SAME as the "->>" we used earlier:

 * ->> selects a string (which we have typically then cast to an integer)

 * -> selects a JSON value of some type (object, array, int, string, etc).

So "g.grandchild_arr->0->'grandchild_obj'->>'parent'" is taking the first (0
index) array entry, then the "grandchild_obj" attribute, then converting the
"parent" attribute of that to a (Postgres) string.


Create A JSON Object (Again)
----------------------------

From here on we're basically repeating ourselves for the next level.  Things
look worse and worse, but it's the same logic as before.

    with json_grandchild
      as (select row_to_json(grandchild) as grandchild_obj
            from grandchild),
         json_grandchild_arr
      as (select json_agg(json_grandchild) as grandchild_arr
            from json_grandchild
           group by (grandchild_obj->>'parent')::int),
         child_grandchild_join
      as (select *
            from json_grandchild_arr as g
           inner join child as c
              on c.id = (g.grandchild_arr->0->'grandchild_obj'->>'parent')::int),
         json_child
      as (select row_to_json(child_grandchild_join) as child_obj
            from child_grandchild_join)
  select *
    from json_child;

                                                        child_obj                                                      
  ---------------------------------------------------------------------------------------------------------------------
   {"grandchild_arr":[{"grandchild_obj":{"id":1,"parent":1,"name":"george"}}],"id":1,"parent":1,"name":"delilah"}
   {"grandchild_arr":[{"grandchild_obj":{"id":3,"parent":3,"name":"ibrahim"}},   +
    {"grandchild_obj":{"id":4,"parent":3,"name":"jenny"}}],"id":3,"parent":3,"name":"francoise"}
   {"grandchild_arr":[{"grandchild_obj":{"id":2,"parent":2,"name":"henrietta"}}],"id":2,"parent":1,"name":"ebeneezer"}
  (3 rows)


Create JSON Array (Again)
-------------------------

As before, group by "parent".

    with json_grandchild
      as (select row_to_json(grandchild) as grandchild_obj
            from grandchild),
         json_grandchild_arr
      as (select json_agg(json_grandchild) as grandchild_arr
            from json_grandchild
           group by (grandchild_obj->>'parent')::int),
         child_grandchild_join
      as (select *
            from json_grandchild_arr as g
           inner join child as c
              on c.id = (g.grandchild_arr->0->'grandchild_obj'->>'parent')::int),
         json_child
      as (select row_to_json(child_grandchild_join) as child_obj
            from child_grandchild_join),
         json_child_arr
      as (select json_agg(json_child) as child_arr
            from json_child
           group by (child_obj->>'parent')::int)
  select *
    from json_child_arr;

                                                                child_arr                                                              
  -------------------------------------------------------------------------------------------------------------------------------------
   [{"child_obj":{"grandchild_arr":[{"grandchild_obj":{"id":1,"parent":1,"name":"george"}}],"id":1,"parent":1,"name":"delilah"}},   +
    {"child_obj":{"grandchild_arr":[{"grandchild_obj":{"id":2,"parent":2,"name":"henrietta"}}],"id":2,"parent":1,"name":"ebeneezer"}}]
   [{"child_obj":{"grandchild_arr":[{"grandchild_obj":{"id":3,"parent":3,"name":"ibrahim"}},   +
    {"grandchild_obj":{"id":4,"parent":3,"name":"jenny"}}],"id":3,"parent":3,"name":"francoise"}}]
  (2 rows)


Join With Parent
----------------

As before.  And we'll create the final object too.  And we'll wrap it all in a
list.


    with json_grandchild
      as (select row_to_json(grandchild) as grandchild_obj
            from grandchild),
         json_grandchild_arr
      as (select json_agg(json_grandchild) as grandchild_arr
            from json_grandchild
           group by (grandchild_obj->>'parent')::int),
         child_grandchild_join
      as (select *
            from json_grandchild_arr as g
           inner join child as c
              on c.id = (g.grandchild_arr->0->'grandchild_obj'->>'parent')::int),
         json_child
      as (select row_to_json(child_grandchild_join) as child_obj
            from child_grandchild_join),
         json_child_arr
      as (select json_agg(json_child) as child_arr
            from json_child
           group by (child_obj->>'parent')::int),
         parent_child_join
      as (select *
            from json_child_arr as c
           inner join parent as p
              on p.id = (c.child_arr->0->'child_obj'->>'parent')::int),
         json_parent
      as (select row_to_json(parent_child_join) as parent_obj
            from parent_child_join),
         json_parent_arr
      as (select json_agg(json_parent) as parent_arr
            from json_parent)
  select *
    from json_parent_arr;

                                                                             parent_arr                                                                           
  ----------------------------------------------------------------------------------------------------------------------------------------------------------------
   [{"parent_obj":{"child_arr":[{"child_obj":{"grandchild_arr":[{"grandchild_obj":{"id":1,"parent":1,"name":"george"}}],"id":1,"parent":1,"name":"delilah"}}, +
    {"child_obj":{"grandchild_arr":[{"grandchild_obj":{"id":2,"parent":2,"name":"henrietta"}}],"id":2,"parent":1,"name":"ebeneezer"}}],"id":1,"name":"albert"}},  +
    {"parent_obj":{"child_arr":[{"child_obj":{"grandchild_arr":[{"grandchild_obj":{"id":3,"parent":3,"name":"ibrahim"}},  +
    {"grandchild_obj":{"id":4,"parent":3,"name":"jenny"}}],"id":3,"parent":3,"name":"francoise"}}],"id":3,"name":"claude"}}]
  (1 row)

And that's it!  It's ugly as sin, and it contains way too much structure, but
it's a nested JSON object that contains all the data that was in our database.


Conclusions
-----------

I've shown how a few simple JSON-specific steps (creating objects and arrays)
can be combined with normal SQL operations like joins to create nested JSON
structures.

We can also "dive into" those structures to select values, extract the values
as strings, and then cast them to the correct type.

Finally, we can piece together multiple steps using "with" queries.  That
allows us to construct arbitrarily complex objects as a series of simple
steps.


Next post I'll shows how to make the output nicer - it's basically one
additional trick.

Andrew

Comment on this post