Skip to content

Latest commit

 

History

History
634 lines (482 loc) · 21.4 KB

sql-indent.org

File metadata and controls

634 lines (482 loc) · 21.4 KB

sql-indent.el – syntax based indentation for SQL files for GNU Emacs

NOTE This file is formatted as an Emacs Org file. If you open it in GNU Emacs, you will be able to open-close sections of the file, which will make navigation easier.

Overview

sql-indent.el is a GNU Emacs minor mode which adds support for syntax-based indentation when editing SQL code: TAB indents the current line based on the syntax of the SQL code on previous lines. This works like the indentation for C and C++ code.

The package also defines align rules so that the align function works for SQL statements, see sqlind-align-rules for which rules are defined. This can be used to align multiple lines around equal signs or “as” statements. Here is an example of alignment rules:

update my_table
   set col1_has_a_long_name = value1,
       col2_is_short        = value2,
       col3_med             = v2,
       c4                   = v5
 where cond1 is not null;

select long_colum as lcol,
       scol       as short_column,
       mcolu      as mcol,
  from my_table;

To use this feature, select the region you want to align and type:

M-x align RET

sqlind-minor-mode together with the align rules can assist in writing tidy SQL code or formatting existing SQL code. The indentation rules are customizable, so you can adapt it to match your own indentation preferences.

Installation

To install this package, open the file sql-indent.el in Emacs and type

M-x package-install-from-buffer RET

The syntax-based indentation of SQL code can be turned ON/OFF at any time by enabling or disabling sqlind-minor-mode:

M-x sqlind-minor-mode RET

To enable syntax-based indentation for every SQL buffer, you can add sqlind-minor-mode to sql-mode-hook. First, bring up the customization buffer using the command:

M-x customize-variable RET sql-mode-hook RET

Than, click on the “INS” button to add a new entry and put “sqlind-minor-mode” in the text field.

Customization

The sql-indent.el package allows customizing the indentation rules to suit your personal preferences or site specific coding styles. To create a set of customized rules, you will need to have basic familiarity with how indentation works. See also the “A Simple Example” section below and the sql-indent-left.el file, which demonstrate how to set up custom indentation rules. The sections below contain the rest of the details.

The indentation process happens in two separate phases: first syntactic information is determined about the line, than the line is indented based on that syntactic information. The syntactic parse is not expected to change often, since it deals with the structure of the SQL code, however, indentation is a personal preference, and can be easily customized.

Two variables control the way code is indented: sqlind-basic-offset and sqlind-indentation-offsets-alist. To customize these variables, you need to create a function that sets custom values and add it to sql-mode-hook.

A Simple Example

The default indentation rules will align to the right all the keywords in a SELECT statement, like this:

select c1, c2
  from t1
 where c3 = 2

If you prefer to have them aligned to the left, like this:

select c1, c2
from t1
where c3 = 2

You can add the following code to your init file:

(require 'sql-indent)

;; Update indentation rules, select, insert, delete and update keywords
;; are aligned with the clause start

(defvar my-sql-indentation-offsets-alist
  `((select-clause 0)
    (insert-clause 0)
    (delete-clause 0)
    (update-clause 0)
    ,@sqlind-default-indentation-offsets-alist))

(add-hook 'sqlind-minor-mode-hook
    (lambda ()
       (setq sqlind-indentation-offsets-alist
             my-sql-indentation-offsets-alist)))

Customization Basics

To customize indentation, you will need to provide a new value for the sqlind-indentation-offsets-alist variable. The variable is made buffer local each time it is set, so you need to set it inside the sql-mode-hook. The variable specifies how each syntactic symbol should be indented. Since only a few symbols need to be updated, the usual way to update it is to “extend” the value of sqlind-default-indentation-offsets-alist, like so:

(defvar my-sql-indentation-offsets-alist
  `( ;; put new syntactic symbols here, and add the default ones at the end.
     ;; If there is no value specified for a syntactic symbol, the default
     ;; will be picked up.
    ,@sqlind-default-indentation-offsets-alist))

;; Arrange for the new indentation offset to be set up for each SQL buffer.
(add-hook 'sqlind-minor-mode-hook
          (lambda ()
            (setq sqlind-indentation-offsets-alist
                  my-sql-indentation-offsets-alist)))

The simplest way to adjust the indentation is to explore the syntactic information using sqlind-show-syntax-of-line. To use it, move the cursor to the line you would like to indent and type:

M-x sqlind-show-syntax-of-line RET

A message like the one below will be shown in the messages buffer:

((select-clause . 743) (statement-continuation . 743))

The first symbol displayed is the syntactic symbol used for indentation, in this case select-clause. The syntactic symbols are described in a section below, however, for now, this is the symbol that will need to be updated in sqlind-indentation-offsets-alist. The number next to it represents the anchor, or reference position in the buffer where the current statement starts. The anchor and is useful if you need to write your own indentation functions.

To customize indentation for this type of statement, add an entry in the sqlind-indentation-offsets-alist, for the syntactic symbol shown, with information about how it should be indented. This information is a list containing indentation control items (these are described below).

For example, to indent keyword in SELECT clauses at the same level as the keyword itself, we use a number which is added to the indentation level of the anchor, in this case, 0:

(select-clause 0)

To indent it at sqlind-basic-offset plus one more space, use:

(select-clause + 1)

To right-justify the keyword w.r.t the SELECT keyword, use:

(select-clause sqlind-right-justify-clause)

The default value for sqlind-indentation-offsets-alist contains many examples for indentation setup rules.

Indentation control items

sqlind-calculate-indentation is the function that calculates the indentation offset to use, based on the contents of sqlind-indentation-offsets-alist. The indentation offset starts with the indentation column of the ANCHOR point and it is adjusted based on the following items:

  • a NUMBER – the NUMBER will be added to the indentation offset.
  • + – the current indentation offset is incremented by sqlind-basic-offset
  • ++ – the current indentation offset is indentation by 2 * sqlind-basic-offset
  • - – the current indentation offset is decremented by sqlind-basic-offset
  • -- – the current indentation offset is decremented by 2 * sqlind-basic-offset
  • a FUNCTION – the syntax and current indentation offset is passed to the function and its result is used as the new indentation offset. This can be used to further customize indentation.

Indentation Helper Functions

The following helper functions are available as part of the package and can be used as the FUNCTION part in the sqlind-indentation-offsets-alist

sqlind-use-anchor-indentation

discard the current offset and returns the indentation column of the ANCHOR

sqlind-lineup-to-anchor

discard the current offset and returns the column of the anchor point, which may be different than the indentation column of the anchor point.

sqlind-use-previous-line-indentation

discard the current offset and returns the indentation column of the previous line

sqlind-lineup-open-paren-to-anchor

if the line starts with an open parenthesis, discard the current offset and return the column of the anchor point.

sqlind-lone-semicolon

if the line contains a single semicolon ‘;’, use the value of sqlind-use-anchor-indentation

sqlind-adjust-operator

if the line starts with an arithmetic operator (like + , -, or ||), line it up so that the right hand operand lines up with the left hand operand of the previous line. For example, it will indent the || operator like this:

select col1, col2
          || col3 as composed_column, -- align col3 with col2
       col4
    || col5 as composed_column2
from   my_table
where  cond1 = 1
and    cond2 = 2;

sqlind-left-justify-logical-operator

If the line starts with a logic operator (AND, OR NOT), line the operator with the start of the WHERE clause. This rule should be added to the in-select-clause syntax after the sqlind-lineup-to-clause-end rule.

sqlind-right-justify-logical-operator

If the line starts with a logic operator (AND, OR NOT), line the operator with the end of the WHERE clause. This rule should be added to the in-select-clause syntax.

select *
  from table
 where a = b
   and c = d; -- AND clause sits under the where clause

sqlind-adjust-comma

if the line starts with a comma, adjust the current offset so that the line is indented to the first word character. For example, if added to a select-column syntax indentation rule, it will indent as follows:

select col1
   ,   col2 -- align "col2" with "col1"
from my_table;

sqlind-lineup-into-nested-statement

discard the current offset and return the column of the first word inside a nested statement. This rule should be added to nested-statement-continuation syntax indentation rule, and will indent as follows:

(    a,
     b  -- b is aligned with a
)

More Indentation Helper Functions

The following function contain indentation code specific to various SQL statements. Have a look at their doc strings for what they do:

  • sqlind-indent-comment-start, sqlind-indent-comment-continuation
  • sqlind-indent-select-column
  • sqlind-indent-select-table
  • sqlind-lineup-to-clause-end
  • sqlind-right-justify-clause
  • sqlind-lineup-joins-to-anchor

Syntactic Symbols

The SQL parsing code returns a syntax definition (either a symbol or a list) and an anchor point, which is a buffer position. The syntax symbols can be used to define how to indent each line. The following syntax symbols are defined for SQL code:

  • (syntax-error MESSAGE START END) – this is returned when the parse failed. MESSAGE is an informative message, START and END are buffer locations denoting the problematic region. ANCHOR is undefined for this syntax info
  • in-comment – line is inside a multi line comment, ANCHOR is the start of the comment.
  • comment-start – line starts with a comment. ANCHOR is the start of the enclosing block.
  • in-string – line is inside a string, ANCHOR denotes the start of the string.
  • toplevel – line is at toplevel (not inside any programming construct). ANCHOR is usually (point-min).
  • (in-block BLOCK-KIND LABEL) – line is inside a block construct. BLOCK-KIND (a symbol) is the actual block type and can be one of “if”, “case”, “exception”, “loop” etc. If the block is labeled, LABEL contains the label. ANCHOR is the start of the block.
  • (in-begin-block KIND LABEL) – line is inside a block started by a begin statement. KIND (a symbol) is “toplevel-block” for a begin at toplevel, “defun” for a begin that starts the body of a procedure or function, "package" for a begin that starts the body of a package, nil for a begin that is none of the previous. For a “defun” or “package”, LABEL is the name of the procedure, function or package, for the other block types LABEL contains the block label, or the empty string if the block has no label. ANCHOR is the start of the block.
  • (block-start KIND) – line begins with a statement that starts a block. KIND (a symbol) can be one of “then”, “else” or “loop”. ANCHOR is the reference point for the block start (the corresponding if, case, etc).
  • (block-end KIND LABEL) – the line contains an end statement. KIND (a symbol) is the type of block we are closing, LABEL (a string) is the block label (or procedure name for an end defun).
  • declare-statement – line is after a declare keyword, but before the begin. ANCHOR is the start of the declare statement.
  • (package NAME) – line is inside a package definition. NAME is the name of the package, ANCHOR is the start of the package.
  • (package-body NAME) – line is inside a package body. NAME is the name of the package, ANCHOR is the start of the package body.
  • (create-statement WHAT NAME) – line is inside a CREATE statement (other than create procedure or function). WHAT is the thing being created, NAME is its name. ANCHOR is the start of the create statement.
  • (defun-start NAME) – line is inside a procedure of function definition but before the begin block that starts the body. NAME is the name of the procedure/function, ANCHOR is the start of the procedure/function definition.

The following SYNTAX-es are for SQL statements. For all of them ANCHOR points to the start of a statement itself.

  • labeled-statement-start – line is just after a label.
  • statement-continuation – line is inside a statement which starts on a previous line.
  • nested-statement-open – line is just inside an opening bracket, but the actual bracket is on a previous line.
  • nested-statement-continuation – line is inside an opening bracket, but not the first element after the bracket.
  • nested-statement-close line is inside an opening bracket and the line contains the closing bracket as the first character.

The following SYNTAX-es are for statements which are SQL code (DML statements). They are specializations on the previous statement syntaxes and for all of them a previous generic statement syntax is present earlier in the SYNTAX list. Unless otherwise specified, ANCHOR points to the start of the clause (select, from, where, etc) in which the current point is.

  • with-clause – line is inside a WITH clause, but before the main SELECT clause.
  • with-clause-cte – line is inside a with clause before a CTE (common table expression) declaration
  • with-clause-cte-cont – line is inside a with clause before a CTE definition
  • case-clause – line is on a CASE expression (WHEN or END clauses). ANCHOR is the start of the CASE expression.
  • case-clause-item – line is on a CASE expression (THEN and ELSE clauses). ANCHOR is the position of the case clause.
  • case-clause-item-cont – line is on a CASE expression but not on one of the CASE sub-keywords. ANCHOR points to the case keyword that this line is a continuation of.
  • select-clause – line is inside a select statement, right before one of its clauses (from, where, order by, etc).
  • select-column – line is inside the select column section, after a full column was defined (and a new column definition is about to start).
  • select-column-continuation – line is inside the select column section, but in the middle of a column definition. The defined column starts on a previous like. Note that ANCHOR still points to the start of the select statement itself.
  • select-join-condition – line is right before or just after the ON clause for an INNER, LEFT or RIGHT join. ANCHOR points to the join statement for which the ON is defined.
  • select-table – line is inside the from clause, just after a table was defined and a new one is about to start.
  • select-table-continuation – line is inside the from clause, inside a table definition which starts on a previous line. ANCHOR still points to the start of the table definition.
  • (in-select-clause CLAUSE) – line is inside the select CLAUSE, which can be “where”, “order by”, “group by” or “having”. Note that CLAUSE can never be “select” and “from”, because we have special syntaxes inside those clauses.
  • insert-clause – line is inside an insert statement, right before one of its clauses (values, select).
  • (in-insert-clause CLAUSE) – line is inside the insert CLAUSE, which can be “insert into” or “values”.
  • delete-clause – line is inside a delete statement right before one of its clauses.
  • (in-delete-clause CLAUSE) – line is inside a delete CLAUSE, which can be “delete from” or “where”.
  • update-clause – line is inside an update statement right before one of its clauses.
  • (in-update-clause CLAUSE) – line is inside an update CLAUSE, which can be “update”, “set” or “where”

Limitations

The sql-indent package does not contain a full SQL parser, instead it relies on various heuristics to determine the context of each line in a SQL program. Relying on heuristics means that sometimes valid SQL code is not detected correctly, and therefore the indentation will be wrong.

This section contains some of the known cases that are incorrectly detected, and provides some workarounds for them.

Parsing expressions

There is no support for parsing SQL expressions, so if an expression is broken over several lines, sql-indent.el will consider all lines to be statement-continuation lines. The exception is that bracketed expressions are identified correctly so they can be used for indentation.

The examples below summarize what is supported and what is not, as well as the workarounds:

-- SUPPORTED: case expression immediately after assignment
var := case ind
       when 1 then 'Guy'
       when 2 then 'Abc'
       when 3 then 'Def'
       else 'World'
       end case;

-- NOT SUPPORTED: any complex expression involving a case expression.  entire
-- expression is a 'statement-continuation
var := 'abc'
  || case ind
  when 1 then 'Guy'
  when 2 then 'Abc'
  when 3 then 'Def'
  else 'World'
  end case;

-- WORKAROUND: use brackets instead
var := 'abc'
  || (case ind
      when 1 then 'Guy'
      when 2 then 'Abc'
      when 3 then 'Def'
      else 'World'
      end case);

-- SUPPORTED: case expression as select column
select col1,
       case ind
       when 1 then 'Guy'
       when 2 then 'Abc'
       when 3 then 'Def'
       else 'World'
       end case,
       col2,
  from some_table;

-- NOT SUPPORTED: any complex expression involving a case expression in a
-- select column.  Entire column is a 'select-column-continuation
select col1,
       'abc' || case ind
         when 1 then 'Guy'
         when 2 then 'Abc'
         when 3 then 'Def'
         else 'World'
         end case,
       col2,
  from some_table;

-- WORKAROUND: use brackets instead
select col1,
       'abc' || (case ind
                 when 1 then 'Guy'
                 when 2 then 'Abc'
                 when 3 then 'Def'
                 else 'World'
                 end case),
       col2,
  from some_table;

DECLARE statements in Postgres

The DECLARE statement in Postgres can start a block of declarations or declare a single item. Unfortunately, the sql-indent package is not always able to differentiate between the two. Curently, DECLARE blocks are only recognized at the start of another enclosing block, such as $$, BEGIN, THEN or ELSE, but they can occur on other situations. The workaround is to enclose the DECLARE block inside a BEGIN/END block or to use individual DECLARE statements.

For more info see #92

DECLARE blocks are not recognized when the follow normal statements, sql-indent considers them single statements instead. In the example below, DECLARE is considered a statement, and the local_b declaration is anchored off the previous BEGIN statement:

-- NOT SUPPORTED: `local_b` is not recognized as a declaration
create function less_than(a text, b text) returns boolean as $$
  begin
    raise debug 'less_than(%, %)', a, b;
    declare
      local_a text := a;
    local_b text := b;
    begin
      return local_a < local_b;
    end;
  end;
end;
$$ language plpgsql;

The workaround is to either surround the DECLARE block with a BEGIN/END statement, or to prefix each variable declaration with DECLARE, as in the two examples below:

-- WORKAROUND 1: surround the DECLARE/BEGIN/END with another BEGIN/END block
create function less_than(a text, b text) returns boolean as $$
  begin
    raise debug 'less_than(%, %)', a, b;
    begin
      declare
        local_a text := a;
        local_b text := b;
      begin
        return local_a < local_b;
      end;
    end;
  end;
end;
$$ language plpgsql;

-- WORKAROUND 2: declare each variable individually
create function less_than(a text, b text) returns boolean as $$
  begin
    raise debug 'less_than(%, %)', a, b;
    declare
      local_a text := a;
    declare
      local_b text := b;
    begin
      return local_a < local_b;
    end;
  end;
end;
$$ language plpgsql;

.