Do you start your tests by writing some setup code that inserts data to the database, which you end up repeating over and over again? Is it a pain to setup a development environment, because you need some example data into your database? Lets fix all of that with a simple (postgresql) fixtures.

Prior art Link to heading

At least django (and probably other big frameworks) come with builtin support for database fixtures that they use for setting up initial state and/or tests. You can read more about django’s system here and here.

We’re going to make a much simpler, albeit database dependant system. We wont use yaml or json, but only sql which makes the fixtures easily readable. There is a downside tho’: tracking relations isn’t that easy, since those are ‘just’ numbers. You can easily adapt this approach into your own project to make fixture loading one or two lines of code per test.

The setup Link to heading

We’ll be working with the following database schema and data:

-- schema_version is used to track the database's version.
CREATE TABLE schema_version (
    version INTEGER NOT NULL
);

CREATE TABLE company (
    id SERIAL PRIMARY KEY,
    name TEXT NOT NULL
);

CREATE TABLE employee (
    id SERIAL PRIMARY KEY,
    name TEXT NOT NULL,
    company_id INTEGER NOT NULL REFERENCES company(id)
);

INSERT INTO schema_version VALUES (1);

INSERT INTO company (name) VALUES
    ('Company 1'),
    ('Company 2');

INSERT INTO employee (name, company_id) VALUES
    ('Jennifer', 1),
    ('John', 2),
    ('Agatha', 2);

A few things to note:

  • There is a table schema_version that in this case is used to track the database schema version, managed by our application
  • There are columns that use sequences (e.g. the SERIAL type)

Dump the data Link to heading

Before dumping the data out we need to ask our selves that what it is we need from the dump? Well that’s simple: we want the data.

Luckily this is easy with pg_dump:

$ pg_dump -h localhost -U postgres --data-only mydb > my-fixture.sql
Password:
$ cat my-fixture.sql
--
-- PostgreSQL database dump
--

-- Dumped from database version 12.4 (Debian 12.4-1.pgdg100+1)
-- Dumped by pg_dump version 13.6 (Ubuntu 13.6-0ubuntu0.21.10.1)

SET statement_timeout = 0;
SET lock_timeout = 0;
SET idle_in_transaction_session_timeout = 0;
SET client_encoding = 'UTF8';
SET standard_conforming_strings = on;
SELECT pg_catalog.set_config('search_path', '', false);
SET check_function_bodies = false;
SET xmloption = content;
SET client_min_messages = warning;
SET row_security = off;

--
-- Data for Name: company; Type: TABLE DATA; Schema: public; Owner: postgres
--

COPY public.company (id, name) FROM stdin;
1       Company 1
2       Company 2
\.


--
-- Data for Name: employee; Type: TABLE DATA; Schema: public; Owner: postgres
--

COPY public.employee (id, name, company_id) FROM stdin;
1       Jennifer        1
2       John    2
3       Agatha  2
\.


--
-- Data for Name: schema_version; Type: TABLE DATA; Schema: public; Owner: postgres
--

COPY public.schema_version (version) FROM stdin;
1
\.


--
-- Name: company_id_seq; Type: SEQUENCE SET; Schema: public; Owner: postgres
--

SELECT pg_catalog.setval('public.company_id_seq', 2, true);


--
-- Name: employee_id_seq; Type: SEQUENCE SET; Schema: public; Owner: postgres
--

SELECT pg_catalog.setval('public.employee_id_seq', 3, true);


--
-- PostgreSQL database dump complete
--

But as you can see, there is a lot of things that we don’t want here. And the format of the data is not that readable.

What we need to do to this is:

  • Use INSERT instead of COPY
  • Remove SET rows
  • Remove pg_catalog statements
  • Remove SQL comments
  • Remove public schema
  • Exclude schema_version (in this particular example)

Some of these are doable with pg_dump directly, but for some we need to use something like sed:

$ pg_dump -h localhost -U postgres --data-only --rows-per-insert=1000 --exclude-table=schema_version
 mydb | sed \
-e '/^SET /d' \
-e '/^--/d' \
-e '/^SELECT pg_catalog\./d' \
-e 's/^INSERT INTO public\./INSERT INTO /g' \
-e '/^$/d' > my-fixture.sql
Password:
$ cat my-fixture.sql
INSERT INTO company VALUES
        (1, 'Company 1'),
        (2, 'Company 2');
INSERT INTO employee VALUES
        (1, 'Jennifer', 1),
        (2, 'John', 2),
        (3, 'Agatha', 2);

There we go!

Importing the data Link to heading

Now, lets try to import this data into a empty database (with a correct schema of course):

$ psql -h localhost -U postgres mydb < my-fixture.sql
Password for user postgres:
INSERT 0 2
INSERT 0 3
$ psql -h localhost -U postgres mydb
Password for user postgres:
psql (13.6 (Ubuntu 13.6-0ubuntu0.21.10.1), server 12.4 (Debian 12.4-1.pgdg100+1))
Type "help" for help.

mydb=# INSERT INTO company (name) VALUES ('Company 3');
ERROR:  duplicate key value violates unique constraint "company_pkey"
DETAIL:  Key (id)=(1) already exists.

Oh no! Right, we need to fix our sequences because we removed those pg_catalog.setval rows. We can find a solution for this from the postgresql wiki:

SELECT 'SELECT SETVAL(' ||
       quote_literal(quote_ident(PGT.schemaname) || '.' || quote_ident(S.relname)) ||
       ', COALESCE(MAX(' ||quote_ident(C.attname)|| '), 1) ) FROM ' ||
       quote_ident(PGT.schemaname)|| '.'||quote_ident(T.relname)|| ';'
FROM pg_class AS S,
     pg_depend AS D,
     pg_class AS T,
     pg_attribute AS C,
     pg_tables AS PGT
WHERE S.relkind = 'S'
    AND S.oid = D.objid
    AND D.refobjid = T.oid
    AND D.refobjid = C.attrelid
    AND D.refobjsubid = C.attnum
    AND T.relname = PGT.tablename
ORDER BY S.relname;

The above query will output queries for each sequence to set them to the next value, like so:

                                       ?column?
--------------------------------------------------------------------------------------
 SELECT SETVAL('public.company_id_seq', COALESCE(MAX(id), 1) ) FROM public.company;
 SELECT SETVAL('public.employee_id_seq', COALESCE(MAX(id), 1) ) FROM public.employee;

Now we need to run these queries. The postgres wiki has a example on how to run these through a temp file, but for a more integrated version you’d want to do something like this:

const sequencesQuery = "...the above query..."

func FixSequences(ctx context.Context, exec sqlx.ExtContext) error {
	var stmts []string
	if err := sqlx.SelectContext(ctx, exec, &stmts, sequencesQuery); err != nil {
		return err
	}

	_, err := exec.ExecContext(ctx, strings.Join(stmts, ""))
	return err
}

For demonstration purposes, I’ll run those queries manually:

mydb=# SELECT SETVAL('public.company_id_seq', COALESCE(MAX(id), 1) ) FROM public.company;
 setval
--------
      2
(1 row)

mydb=# SELECT SETVAL('public.employee_id_seq', COALESCE(MAX(id), 1) ) FROM public.employee;
 setval
--------
      3
(1 row)

mydb=# INSERT INTO company (name) VALUES ('Company 3');
INSERT 0 1
mydb=# select * from company;
 id |   name
----+-----------
  1 | Company 1
  2 | Company 2
  3 | Company 3
(3 rows)

And it works!

Conclusion Link to heading

You don’t need fancy yaml of json files in order to have nice database fixtures. Instead you can get by with just pg_dump and sed. In reality, for proper project you’d invoke pg_dump from your language of choice and use any available regex library instead sed. You also need to remember to update the fixtures each time your schema changes. This is as easy as loading the fixture with the schema version it was created with, run migrations and dump out the migrated data.