Formatting Standards

These guidelines should be followed when writing SQL.

Write keywords in ALL CAPS

SELECT      id,
            name,
            email
FROM        app_users
WHERE       is_active = TRUE;

Create vertical alignment using tabs

There should be a minimum of one tab after the longest command. Other items may need to be tabbed more than that to maintain vertical alignment.

SELECT      u.age,
            COUNT(u.id) AS num_of_users
FROM        app_users AS u
WHERE       u.is_active = TRUE
GROUP BY    u.age
HAVING      COUNT(u.id) > 50
ORDER BY    u.age;

Nest JOIN conditions under the table name

SELECT      u.age,
            COUNT(u.id) AS num_of_users
FROM        app_users AS u
JOIN        address AS a
            ON  u.id = a.user_id AND
                a.is_shipping = TRUE AND
                a.city = 'Chicago'
WHERE       u.is_active = TRUE AND
            u.age > 25
GROUP BY    u.age;

Explicitly list fields

When selecting fields it's important to only list the fields you need, especially when joining tables. Always reference the full table name (app_users.id) or use an alias (u.id) as show in the example below. It's possible for a field with the same name to exist in multiple tables. If your query only uses one table, there is no need to use an alias.

SELECT      u.id,
            u.name,
            u.email,
            u.age,
            a.city,
            a.zip_code
FROM        app_users AS u
JOIN        address AS a
            ON  u.id = a.user_id AND
                a.is_shipping = TRUE
WHERE       u.is_active = TRUE;

There is an exception to this rule when using a CTE. Since you would have already listed the fields in the CTE definition, using the * is ok.

WITH active_users AS (
    SELECT      u.id,
                u.name,
                u.email,
                u.age,
                a.city,
                a.zip_code
    FROM        app_users AS u
    JOIN        address AS a
                ON  u.id = a.user_id AND
                    a.is_shipping = TRUE
    WHERE       u.is_active = TRUE
)

SELECT      *
FROM        active_users;

Using Stored Procedures

When your project favors raw SQL over an ORM, your code can quickly become littered with queries in string variables. A better approach is to organize your queries as stored procedures in the database. This results in a clean separation between application code and database operations, as well as an increase in performance because the database server can store pre-compiled functions.

The syntax for creating stored procedures is very similar across all flavors of SQL, however each will have their own specific differences. Here are links to documentation for the three most common databases we encounter:

When favoring this approach over an ORM, you will still track database migrations. This process will be a little more manual, but will ultimately grant you fine-grained control over all database schema changes during a project. FlywayDB is our preferred migration tool for this approach.

With FlywayDB, you will maintain a top level directory in your project repo: /sql. Each time you need to create or alter a database entity, you will add a new SQL file containing those instructions in this directory. The FlywayDB CLI will execute each migration file in order based on the the tool’s file naming requirements, as well as maintain migration history in a schema_version database table.

Examples

The following is an example of creating a simple stored procedure in PostgreSQL:

CREATE OR REPLACE FUNCTION sp_get_employees_for_city(city VARCHAR)
RETURNS TABLE(id INTEGER, name VARCHAR) AS $$
    BEGIN
        RETURN      QUERY
        SELECT      e.id,
                    e.name
        FROM        employees AS e
        WHERE       lower(e.city) = lower(city)
        ORDER BY    e.name;
    END; $$
LANGUAGE PLPGSQL;

The following is an example of calling the stored procedure from Python:

import psycopg2


conn = psycopg2.connect(database="test", user="postgres", password="secret")

cur = conn.cursor()
cur.callproc('sp_get_employees_for_city', ['Chicago'])

rows = cur.fetchall()
for r in rows:
    print(r)