Export language-specific format functions #484

inferrinizzard · 2022-10-01T20:40:49Z

moved Format config validation to separate file
curried language-specific format functions
update index to export only specific members

nene · 2022-10-02T06:49:12Z

I would prefer not to have a separate function for formatting each language, but instead export the *Formatter classes and the equivalent of languageFormat() function, which takes them as language parameter, as I originally proposed.

This multi-function approach has a downside that it won't work with a custom Formatter class parameter - for that one would still need to use the old format() function, which will pull in all languages.

I'm even thinking whether it would be good to break backwards-compatibility and no more allow language:string parameter to be used with format(), instead only allowing language:Formatter. But probably the gains aren't really worth it to make such a sweeping breaking change. Better maybe show a deprecation notice.

I propose the following API:

import { formatSql, postgresql } from 'sql-formatter';

formatSql("SELECT *", { language: postgresql });

That includes:

exporting all *Formatter classes using their shorthand names (bigquery, db2, hive, sql, transactsql, ...). I'd like to avoid exporting the -Formatter suffixed class names, as I'd like to be able to change how these dialects are implemented, and I don't really like the current solution of them all extending the base Formatter class. Basically, hiding the implementation details here as much as possible.
exporting a new formatSql() function that takes language: Formatter parameter. That could then internally be also called inside the old format() function.
dropping support for calling format() with a custom Formatter class - formatSql() can be used instead. This functionality is labeled experimental, so we're free to break it.
OPTIONAL: deprecating the format() function. Printing a warning message when it's called, which suggest using formatSql() instead. (Though only printing it once, so when somebody calls format() 100 times, we don't pollute their logs).

nene · 2022-10-02T07:14:03Z

There's one issue though with my proposed API. How to handle the default sql dialect?

I see 2 possible approaches:

`formatSql()` with optional language parameter - defaulting to sql

That would mean always pulling in the default implementation, which would mean we still pull in two dialects instead of just one. It's still better than pulling in all dialects, but not ideal.

`formatSql()` with mandatory language parameter

This means one is forced to explicitly make a choice of a dialect.
This might be a good thing, as over the years we've received many bug reports where the problem has been that the user didn't specify a concrete dialect and then had a problem when their chosen dialect wasn't formatted well.

I like the second option better.

inferrinizzard · 2022-10-02T15:40:46Z

I'm not sure if I love exporting the XFormatter classes since the user could easily just do new XFormatter(...options).format(query) and render the formatX functions useless + they would skip the config validation steps as well

I left the current classic format as is for the rare support for custom Formatter classes (but we could just as easily export languageFormat for that) and it should be used as a last resort , most users should use one of the language-specific exports.

nene · 2022-10-02T16:38:31Z

I'm not sure if I love exporting the XFormatter classes since the user could easily just do new XFormatter(...options).format(query) and render the formatX functions useless + they would skip the config validation steps as well

I wouldn't really worry about that. There's nearly always a possibility to shoot yourself in the foot by using some private API. Like, we currently expose the base Formatter class. If one doesn't use TypeScript, one can easily call any one of the many private methods on that. But why would one really do this?

However, on second thought, there might be further merits on this multiple-functions approach... let me think...

nene · 2022-10-02T17:31:01Z

Thinking out loud...

There's currently a design problem with these Formatter classes. Each one of them defines a different Tokenizer configuration, and the tokenizer is kinda expensive to instantiate. So we want to ensure we only ever instantiate it once for each dialect. Currently this is achieved by caching the tokenizer instance to the each XFormatter class constructor function. That's a pretty awkward solution. Because of this design, the XFormatter classes have to extend from base Formatter class. Though they really just contain configuration settings for each language, they shouldn't need to pull in all the logic in Formatter class. I wonder whether this separate-functions approach would help us untangle this mess...

For start it would be nice to store the configuration of each dialect as just data. For example:

export const postgresql: DialectConfig = {
  tokenizerConfig: {
    reservedClauses,
    reservedKeywords,
    ...
  },
  formatConfig: {
    alwaysDenseOperators: ['::']
  },
};

We could then use this to perform an initialization inside a higher order function (like your languageFormat):

import { postgresql } from "src/languages/postgresql/postgresql.formatter";
import { sqlite } from "src/languages/postgresql/sqlite.formatter";

export function createFormatter(dialect: DialectConfig): FormatFn {
  const tokenizer = new Tokenizer(dialect.tokenizerConfig);

  return (sql: string, options: FormatOptions) => {
    return new Formatter(tokenizer, dialect.formatConfig).format(sql, options);
  };
}

export const formatPostgresql = createFormatter(postgresql);
export const formatSqlite = createFormatter(sqlite);

This code is much nicer then our current tokenizer caching logic. However, it still has a problem - these expensive tokenizer instance are created even when the formatPostgresql is never called. So it's really a no-go.

It would work if we would force user to always call createFormatter on their own:

import { createFormatter, postgresql } from "sql-formatter";

const format = createFormatter(postgresql);

format("SELECT *");

But with that solution we're simply pushing the burden on the user of the library. He has the option to use it well, but has even more opportunities to shoot himself in the foot. Like he could call in a loop createFormatter(sqlite)("Some sql"); which would result in pretty horrible performance. So, really not a solution.

So we really have to perform this caching of tokenizer creation inside the library itself. We could do this by caching the tokenizers inside a static Map object:

class TokenizerFactory {
  private map = new Map<DialectConfig, Tokenizer>();
  create(dialect: DialectConfig) {
    if (!this.map.has(dialect)) {
      this.map.set(dialect, new Tokenizer(dialect));
    }
    return this.map.get(dialect);
  }
}

const tokenizerFactory = new TokenizerFactory();

export const createFormatter = (dialect: DialectConfig) => (sql: string, options: FormatOptions) => {
  return new Formatter(tokenizerFactory.create(dialect)).format(sql, options, dialect.formatConfig);
}

But here we're pretty much back at the start with regards to the public API. We can implement it either as separate functions or as one function that takes DialectConfig object as one parameter.

nene · 2022-10-02T18:00:07Z

Thinking further over which kind of API would be better to expose...

I'm still leaning towards my initial proposal. It seems like a more flexible option to me. Especially when we also decouple these dialect configs from Formatter class (as I described above). Then one can use the same function to accomplish several things:

One can call one of the builtin formatters (though one needs to import two items instead of one).
One can pass in a custom dialect config.
One can import one of the builtin dialect configs, modify it slightly (e.g. to patch a bug) and pass that to the formatter.
One could even publish a separate npm library with a custom dialect config, which wouldn't need to have sql-formatter library as it's dependency.

This added flexibility has of course downsides too:

it provides even more opportunities for one to shoot himself in the foot. That's acceptable for me, as it's a completely opt-in feature. Nobody's forced to use this extra flexibility.
it can bring us extra support burden. Though it can also facilitate easier contribution. So that's 1:1.
There's more opportunities to rely on this custom dialect API and accordingly request this API to be stable. This was the reason I was originally reluctant to expose such custom dialect API at all. I want to be able to freely evolve this part of the code without concerning myself too much with backwards compatibility. I see no reason why we can't continue the same line - treating this API as experimental and giving no guarantees about it not breaking between minor or even patch releases.

nene · 2022-10-05T09:46:26Z

Another concern with the separate function based approach is that it's more inconvenient when trying to parameterize the specific dialect.

With language-parameter you simply need to list the dialects inside a mapping object:

import { formatSql, sqlite, postgresql, transactsql } from "sql-formatter";

const languageMap = {
  sqlite,
  postgresql,
  transactsql,
};

export function format(code: string, lang: keyof typeof languageMap) {
  return format(code, { language: languageMap[lang] });
}

With separate functions, you'll have to list name-function pairs, which is a bit more tedious:

import { formatSqlite, formatPostgresql, formatTransactsql } from "sql-formatter";

const formatterMap = {
  sqlite: formatSqlite,
  postgresql: formatPostgresql,
  transactsql: formatTransactsql,
};

export function format(code: string, lang: keyof typeof formatterMap) {
  return formatterMap[lang](code);
}

It's a minor difference though, and me preferring the first version might just be my personal stylistic choice.

nene · 2022-11-04T17:02:40Z

I'll close this, as it's no more relevant after merging of #511

inferrinizzard added 5 commits October 1, 2022 12:14

move language validation to general format function

8c2b99c

move config validation to validateConfigs.ts

2d6c0a7

add lang specific format for standard sql

23a33b5

add language specific format fns for all dialects

7d40170

update exports

50cd027

inferrinizzard added the enhancement label Oct 1, 2022

inferrinizzard requested a review from nene October 1, 2022 20:40

nene mentioned this pull request Oct 5, 2022

Replace extending of Formatter with DialectOptions object #493

Merged

nene mentioned this pull request Nov 2, 2022

Single dialect import #511

Merged

nene closed this Nov 4, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Export language-specific format functions #484

Export language-specific format functions #484

inferrinizzard commented Oct 1, 2022

nene commented Oct 2, 2022

nene commented Oct 2, 2022

inferrinizzard commented Oct 2, 2022

nene commented Oct 2, 2022

nene commented Oct 2, 2022

nene commented Oct 2, 2022

nene commented Oct 5, 2022

nene commented Nov 4, 2022

Export language-specific format functions #484

Export language-specific format functions #484

Conversation

inferrinizzard commented Oct 1, 2022

nene commented Oct 2, 2022

nene commented Oct 2, 2022

formatSql() with optional language parameter - defaulting to sql

formatSql() with mandatory language parameter

inferrinizzard commented Oct 2, 2022

nene commented Oct 2, 2022

nene commented Oct 2, 2022

nene commented Oct 2, 2022

nene commented Oct 5, 2022

nene commented Nov 4, 2022

`formatSql()` with optional language parameter - defaulting to sql

`formatSql()` with mandatory language parameter