Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add option for specifying case for identifiers #663

Merged
58 changes: 58 additions & 0 deletions docs/identifierCase.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
# identifierCase

Converts identifiers to upper or lowercase.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be a good place to specify exactly what classifies as an identifier and what doesn't.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a note in commit afe5b48 - is it okay?


Note: An identifier is a name of a SQL object.
There are two types of SQL identifiers: ordinary identifiers and quoted identifiers.
Only ordinary identifiers are subject to be converted.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is better, but according to this description I would expect the case of variables and parameters to also be converted, because they too are names of SQL objects.

In general I see all these as different kinds of identifiers:

  • schema, table and column names
  • variable names
  • function names
  • parameter names

Variables are a particularly tricky case. Some dialects like MySQL have variables with a prefix like @foo, and therefore we can easily distinguish them. Other dialects like PostgreSQL also have variables, but there is no special prefix, an idientifier foo could refer to a variable, or it could refer to a table or column.

Copy link
Contributor Author

@chrjorgensen chrjorgensen Nov 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I think I mixed up variables and parameters... by variables, do you mean SQL variables defined by CREATE VARIABLE (in DB2)? If yes, then I agree that they should also be considered an identifier and converted.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, variables like that.


## Options

- `"preserve"` (default) preserves the original case.
- `"upper"` converts to uppercase.
- `"lower"` converts to lowercase.

### preserve

```
select
count(a.Column1),
max(a.Column2 + a.Column3),
a.Column4 AS myCol
from
Table1 as a
where
Column6
and Column7
group by Column4
```

### upper

```
select
count(A.COLUMN1),
max(A.COLUMN2 + A.COLUMN3),
A.COLUMN4 AS MYCOL
from
TABLE1 as A
where
COLUMN6
and COLUMN7
group by COLUMN4
```

### lower

```
select
count(a.column1),
max(a.column2 + a.column3),
a.column4 AS mycol
from
table1 as a
where
column6
and column7
group by column4
```
3 changes: 3 additions & 0 deletions src/FormatOptions.ts
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,8 @@ export type IndentStyle = 'standard' | 'tabularLeft' | 'tabularRight';

export type KeywordCase = 'preserve' | 'upper' | 'lower';

export type IdentifierCase = 'preserve' | 'upper' | 'lower';

export type CommaPosition = 'before' | 'after' | 'tabular';

export type LogicalOperatorNewline = 'before' | 'after';
Expand All @@ -14,6 +16,7 @@ export interface FormatOptions {
tabWidth: number;
useTabs: boolean;
keywordCase: KeywordCase;
identifierCase: IdentifierCase;
indentStyle: IndentStyle;
logicalOperatorNewline: LogicalOperatorNewline;
tabulateAlias: boolean;
Expand Down
21 changes: 19 additions & 2 deletions src/formatter/ExpressionFormatter.ts
Original file line number Diff line number Diff line change
Expand Up @@ -145,7 +145,9 @@ export default class ExpressionFormatter {
private formatArraySubscript(node: ArraySubscriptNode) {
this.withComments(node.array, () => {
this.layout.add(
node.array.type === NodeType.keyword ? this.showKw(node.array) : node.array.text
node.array.type === NodeType.keyword
? this.showKw(node.array)
: this.showIdentifier(node.array)
);
});
this.formatNode(node.parenthesis);
Expand Down Expand Up @@ -286,7 +288,7 @@ export default class ExpressionFormatter {
}

private formatIdentifier(node: IdentifierNode) {
this.layout.add(node.text, WS.SPACE);
this.layout.add(this.showIdentifier(node), WS.SPACE);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In addition to plain identifiers we also have several identifier-like things:

  • variables
  • parameters

these usually behave just like identifiers and the only thing distinguishing them from normal identifiers is some prefix like @myvar or :myparam.

As the parser currently treats variables as identifiers, these too end up upper/lowercased. But parameters are kepts separate by the parser, so the case of these doesn't change. Should parameters also change together with identifiers?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My change is only for unquoted identifiers - I think variables and parameters are outside the scope and should be handled in another PR...

}

private formatParameter(node: ParameterNode) {
Expand Down Expand Up @@ -506,4 +508,19 @@ export default class ExpressionFormatter {
return node.text.toLowerCase();
}
}

private showIdentifier(node: IdentifierNode): string {
if (node.tokenType === TokenType.IDENTIFIER || node.tokenType === TokenType.ARRAY_IDENTIFIER) {
switch (this.cfg.identifierCase) {
case 'preserve':
return node.text;
case 'upper':
return node.text.toUpperCase();
case 'lower':
return node.text.toLowerCase();
}
} else {
return node.text;
}
}
}
1 change: 1 addition & 0 deletions src/index.ts
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ export type {
export type {
IndentStyle,
KeywordCase,
IdentifierCase,
CommaPosition,
LogicalOperatorNewline,
FormatOptions,
Expand Down
1 change: 1 addition & 0 deletions src/parser/ast.ts
Original file line number Diff line number Diff line change
Expand Up @@ -125,6 +125,7 @@ export interface PropertyAccessNode extends BaseNode {

export interface IdentifierNode extends BaseNode {
type: NodeType.identifier;
tokenType: TokenType;
text: string;
}

Expand Down
12 changes: 9 additions & 3 deletions src/parser/grammar.ne
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
@preprocessor typescript
@{%
import LexerAdapter from './LexerAdapter.js';
import { NodeType, AstNode, CommentNode, KeywordNode } from './ast.js';
import { NodeType, AstNode, CommentNode, KeywordNode, IdentifierNode } from './ast.js';
import { Token, TokenType } from '../lexer/token.js';

// The lexer here is only to provide the has() method,
Expand All @@ -16,6 +16,12 @@ const lexer = new LexerAdapter(chunk => []);
// which otherwise produce single element nested inside two arrays
const unwrap = <T>([[el]]: T[][]): T => el;

const toIdentifierNode = (token: Token): IdentifierNode => ({
type: NodeType.identifier,
tokenType: token.type,
text: token.text,
});

const toKeywordNode = (token: Token): KeywordNode => ({
type: NodeType.keyword,
tokenType: token.type,
Expand Down Expand Up @@ -202,7 +208,7 @@ atomic_expression ->
array_subscript -> %ARRAY_IDENTIFIER _ square_brackets {%
([arrayToken, _, brackets]) => ({
type: NodeType.array_subscript,
array: addComments({ type: NodeType.identifier, text: arrayToken.text}, { trailing: _ }),
array: addComments({ type: NodeType.identifier, tokenType: TokenType.ARRAY_IDENTIFIER, text: arrayToken.text}, { trailing: _ }),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, currently ARRAY_IDENTIFIER tokens are treated differently from normal identifiers. This results in the following SQL:

select foo, foo[1] from tbl

being formatted as:

select
  FOO,
  foo[1]
from
  TBL

That is, when the foo column is used normally, it gets uppercased, but when used together with array-accessor operator it is not uppercased.

This difference between normal identifiers and array-identifiers is really just an internal quirk of SQL Formatter implementation. We should treat both as identifiers and change the case of them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Array identifiers are now being converted together with ordinary identifiers - see commit 0bb7d21

parenthesis: brackets,
})
%}
Expand Down Expand Up @@ -309,7 +315,7 @@ operator -> ( %OPERATOR ) {% ([[token]]) => ({ type: NodeType.operator, text: to
identifier ->
( %IDENTIFIER
| %QUOTED_IDENTIFIER
| %VARIABLE ) {% ([[token]]) => ({ type: NodeType.identifier, text: token.text }) %}
| %VARIABLE ) {% ([[token]]) => toIdentifierNode(token) %}

parameter ->
( %NAMED_PARAMETER
Expand Down
1 change: 1 addition & 0 deletions src/sqlFormatter.ts
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,7 @@ const defaultOptions: FormatOptions = {
tabWidth: 2,
useTabs: false,
keywordCase: 'preserve',
identifierCase: 'preserve',
indentStyle: 'standard',
logicalOperatorNewline: 'before',
tabulateAlias: false,
Expand Down
2 changes: 2 additions & 0 deletions test/behavesLikeSqlFormatter.ts
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ import supportsTabWidth from './options/tabWidth.js';
import supportsUseTabs from './options/useTabs.js';
import supportsExpressionWidth from './options/expressionWidth.js';
import supportsKeywordCase from './options/keywordCase.js';
import supportsIdentifierCase from './options/identifierCase.js';
import supportsIndentStyle from './options/indentStyle.js';
import supportsCommaPosition from './options/commaPosition.js';
import supportsLinesBetweenQueries from './options/linesBetweenQueries.js';
Expand All @@ -30,6 +31,7 @@ export default function behavesLikeSqlFormatter(format: FormatFn) {
supportsTabWidth(format);
supportsUseTabs(format);
supportsKeywordCase(format);
supportsIdentifierCase(format);
supportsIndentStyle(format);
supportsLinesBetweenQueries(format);
supportsExpressionWidth(format);
Expand Down
44 changes: 44 additions & 0 deletions test/features/arrayAndMapAccessors.ts
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,18 @@ export default function supportsArrayAndMapAccessors(format: FormatFn) {
`);
});

it('supports square brackets for map lookup - uppercase', () => {
const result = format(`SELECT Alpha['a'], Beta['gamma'].zeTa, yotA['foo.bar-baz'];`, {
identifierCase: 'upper',
});
expect(result).toBe(dedent`
SELECT
ALPHA['a'],
BETA['gamma'].ZETA,
YOTA['foo.bar-baz'];
`);
});

it('supports namespaced array identifiers', () => {
const result = format(`SELECT foo.coalesce['blah'];`);
expect(result).toBe(dedent`
Expand All @@ -46,4 +58,36 @@ export default function supportsArrayAndMapAccessors(format: FormatFn) {
foo./* comment */ arr[1];
`);
});

it('supports namespaced array identifiers in uppercase', () => {
const result = format(`SELECT Foo.Coalesce['Blah'];`, { identifierCase: 'upper' });
expect(result).toBe(dedent`
SELECT
FOO.COALESCE['Blah'];
`);
});

it('supports namespaced array identifiers in lowercase', () => {
const result = format(`SELECT Foo.Coalesce['Blah'];`, { identifierCase: 'lower' });
expect(result).toBe(dedent`
SELECT
foo.coalesce['Blah'];
`);
});

it('formats namespaced array accessor with comment in-between in uppercase', () => {
const result = format(`SELECT foo./* comment */arr[1];`, { identifierCase: 'upper' });
expect(result).toBe(dedent`
SELECT
FOO./* comment */ ARR[1];
`);
});

it('formats namespaced array accessor with comment in-between in lowercase', () => {
const result = format(`SELECT Foo./* comment */Arr[1];`, { identifierCase: 'lower' });
expect(result).toBe(dedent`
SELECT
foo./* comment */ arr[1];
`);
});
}
81 changes: 81 additions & 0 deletions test/options/identifierCase.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,81 @@
import dedent from 'dedent-js';

import { FormatFn } from '../../src/sqlFormatter.js';

export default function supportsIdentifierCase(format: FormatFn) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are likely more tests needed here to cover:

  • that all identifiers in expression like foo.bar.baz get uppercased,
  • variables
  • parameters
  • array names (treated differently by lexer)
  • various types of identifier quoting styles

As most of these things depend on the dialect, it might be better to just add additional tests to places like supportsIdentifiers().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a test for multi-part identifiers in commit 558efa3

it('preserves identifier case by default', () => {
const result = format(
dedent`
select Abc, 'mytext' as MyText from tBl1 left join Tbl2 where colA > 1 and colB = 3`
);
expect(result).toBe(dedent`
select
Abc,
'mytext' as MyText
from
tBl1
left join Tbl2
where
colA > 1
and colB = 3
`);
});

it('converts identifiers to uppercase', () => {
const result = format(
dedent`
select Abc, 'mytext' as MyText from tBl1 left join Tbl2 where colA > 1 and colB = 3`,
{ identifierCase: 'upper' }
);
expect(result).toBe(dedent`
select
ABC,
'mytext' as MYTEXT
from
TBL1
left join TBL2
where
COLA > 1
and COLB = 3
`);
});

it('converts identifiers to lowercase', () => {
const result = format(
dedent`
select Abc, 'mytext' as MyText from tBl1 left join Tbl2 where colA > 1 and colB = 3`,
{ identifierCase: 'lower' }
);
expect(result).toBe(dedent`
select
abc,
'mytext' as mytext
from
tbl1
left join tbl2
where
cola > 1
and colb = 3
`);
});

it('does not uppercase quoted identifiers', () => {
const result = format(`select "abc" as foo`, {
identifierCase: 'upper',
});
expect(result).toBe(dedent`
select
"abc" as FOO
`);
});

it('converts multi-part identifiers to uppercase', () => {
const result = format('select Abc from Part1.Part2.Part3', { identifierCase: 'upper' });
expect(result).toBe(dedent`
select
ABC
from
PART1.PART2.PART3
`);
});
}
Loading
Loading