A common problem when pulling data out of databases is flattening rows into columns, or even into a single field. Sometimes the Right Thing to do is a pivot table, but this isn’t always true.
I hit one of those cases when writing some code to migrate from serendipity into pelican. Serendipity has a fairly sane relational datamodel, which means there are many-to-many mappings for tag attributes, thusly:
id | title | author | date | text |
---|---|---|---|---|
1 | Article 1 | Author A | Yesterday | Text |
2 | Article 2 | Author A | Yesterday | Text |
articleid | tag |
---|---|
1 | tag1 |
1 | tag2 |
1 | tag3 |
2 | tag1 |
2 | tag4 |
2 | tag5 |
For my result set my ideal output would look something like:
Title | Author | Date | Tags | Text |
---|---|---|---|---|
Article 1 | Author A | Yesterday | tag1,tag2,tag3 | Text |
Article 2 | Author A | Today | tag1,tag4,tag5 | Text |
…but this is fairly annoying to achieve with multi-way join between the tables containing the relevant information. This tends to lead into various awful ways of achieving this.
Except it tuns out I’ve been doing it wrong.
Postgresql offers the very handy function array_agg(), which collapses rows into an array. Wrapping this in array_to_string() and you have output that looks like exactly what I want, thus:
::::PostgresLexer
select title,
to_timestamp(timestamp) as date,
(
select array_to_string(array_agg(category_name), ',')
from serendipity_category c,
serendipity_entrycat ec
where c.categoryid = ec.categoryid and
e.id = ec.entryid
) as category,
(
select array_to_string(array_agg(tag), ',')
from serendipity_entrytags t1
where t1.entryid = e.id
) as tags,
'archives/' || e.id || '-' || replace(e.title, ' ', '-') as slug,
a.realname as Author,
body,
from serendipity_authors a,
serendipity_entries e
where e.authorid = a.authorid
While it’s not necessary the most performant solution, it’s perfect for one-shot migrations like this.