Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Export language-specific format functions #484

Closed
wants to merge 5 commits into from

Conversation

inferrinizzard
Copy link
Collaborator

  • moved Format config validation to separate file
  • curried language-specific format functions
  • update index to export only specific members

@nene
Copy link
Collaborator

nene commented Oct 2, 2022

I would prefer not to have a separate function for formatting each language, but instead export the *Formatter classes and the equivalent of languageFormat() function, which takes them as language parameter, as I originally proposed.

This multi-function approach has a downside that it won't work with a custom Formatter class parameter - for that one would still need to use the old format() function, which will pull in all languages.

I'm even thinking whether it would be good to break backwards-compatibility and no more allow language:string parameter to be used with format(), instead only allowing language:Formatter. But probably the gains aren't really worth it to make such a sweeping breaking change. Better maybe show a deprecation notice.

I propose the following API:

import { formatSql, postgresql } from 'sql-formatter';

formatSql("SELECT *", { language: postgresql });

That includes:

  • exporting all *Formatter classes using their shorthand names (bigquery, db2, hive, sql, transactsql, ...). I'd like to avoid exporting the -Formatter suffixed class names, as I'd like to be able to change how these dialects are implemented, and I don't really like the current solution of them all extending the base Formatter class. Basically, hiding the implementation details here as much as possible.
  • exporting a new formatSql() function that takes language: Formatter parameter. That could then internally be also called inside the old format() function.
  • dropping support for calling format() with a custom Formatter class - formatSql() can be used instead. This functionality is labeled experimental, so we're free to break it.
  • OPTIONAL: deprecating the format() function. Printing a warning message when it's called, which suggest using formatSql() instead. (Though only printing it once, so when somebody calls format() 100 times, we don't pollute their logs).

@nene
Copy link
Collaborator

nene commented Oct 2, 2022

There's one issue though with my proposed API. How to handle the default sql dialect?

I see 2 possible approaches:

formatSql() with optional language parameter - defaulting to sql

  • That would mean always pulling in the default implementation, which would mean we still pull in two dialects instead of just one. It's still better than pulling in all dialects, but not ideal.

formatSql() with mandatory language parameter

  • This means one is forced to explicitly make a choice of a dialect.
  • This might be a good thing, as over the years we've received many bug reports where the problem has been that the user didn't specify a concrete dialect and then had a problem when their chosen dialect wasn't formatted well.

I like the second option better.

@inferrinizzard
Copy link
Collaborator Author

I'm not sure if I love exporting the XFormatter classes since the user could easily just do new XFormatter(...options).format(query) and render the formatX functions useless + they would skip the config validation steps as well

I left the current classic format as is for the rare support for custom Formatter classes (but we could just as easily export languageFormat for that) and it should be used as a last resort , most users should use one of the language-specific exports.

@nene
Copy link
Collaborator

nene commented Oct 2, 2022

I'm not sure if I love exporting the XFormatter classes since the user could easily just do new XFormatter(...options).format(query) and render the formatX functions useless + they would skip the config validation steps as well

I wouldn't really worry about that. There's nearly always a possibility to shoot yourself in the foot by using some private API. Like, we currently expose the base Formatter class. If one doesn't use TypeScript, one can easily call any one of the many private methods on that. But why would one really do this?

However, on second thought, there might be further merits on this multiple-functions approach... let me think...

@nene
Copy link
Collaborator

nene commented Oct 2, 2022

Thinking out loud...

There's currently a design problem with these Formatter classes. Each one of them defines a different Tokenizer configuration, and the tokenizer is kinda expensive to instantiate. So we want to ensure we only ever instantiate it once for each dialect. Currently this is achieved by caching the tokenizer instance to the each XFormatter class constructor function. That's a pretty awkward solution. Because of this design, the XFormatter classes have to extend from base Formatter class. Though they really just contain configuration settings for each language, they shouldn't need to pull in all the logic in Formatter class. I wonder whether this separate-functions approach would help us untangle this mess...

For start it would be nice to store the configuration of each dialect as just data. For example:

export const postgresql: DialectConfig = {
  tokenizerConfig: {
    reservedClauses,
    reservedKeywords,
    ...
  },
  formatConfig: {
    alwaysDenseOperators: ['::']
  },
};

We could then use this to perform an initialization inside a higher order function (like your languageFormat):

import { postgresql } from "src/languages/postgresql/postgresql.formatter";
import { sqlite } from "src/languages/postgresql/sqlite.formatter";

export function createFormatter(dialect: DialectConfig): FormatFn {
  const tokenizer = new Tokenizer(dialect.tokenizerConfig);

  return (sql: string, options: FormatOptions) => {
    return new Formatter(tokenizer, dialect.formatConfig).format(sql, options);
  };
}

export const formatPostgresql = createFormatter(postgresql);
export const formatSqlite = createFormatter(sqlite);

This code is much nicer then our current tokenizer caching logic. However, it still has a problem - these expensive tokenizer instance are created even when the formatPostgresql is never called. So it's really a no-go.

It would work if we would force user to always call createFormatter on their own:

import { createFormatter, postgresql } from "sql-formatter";

const format = createFormatter(postgresql);

format("SELECT *"); 

But with that solution we're simply pushing the burden on the user of the library. He has the option to use it well, but has even more opportunities to shoot himself in the foot. Like he could call in a loop createFormatter(sqlite)("Some sql"); which would result in pretty horrible performance. So, really not a solution.

So we really have to perform this caching of tokenizer creation inside the library itself. We could do this by caching the tokenizers inside a static Map object:

class TokenizerFactory {
  private map = new Map<DialectConfig, Tokenizer>();
  create(dialect: DialectConfig) {
    if (!this.map.has(dialect)) {
      this.map.set(dialect, new Tokenizer(dialect));
    }
    return this.map.get(dialect);
  }
}

const tokenizerFactory = new TokenizerFactory();

export const createFormatter = (dialect: DialectConfig) => (sql: string, options: FormatOptions) => {
  return new Formatter(tokenizerFactory.create(dialect)).format(sql, options, dialect.formatConfig);
}

But here we're pretty much back at the start with regards to the public API. We can implement it either as separate functions or as one function that takes DialectConfig object as one parameter.

@nene
Copy link
Collaborator

nene commented Oct 2, 2022

Thinking further over which kind of API would be better to expose...

I'm still leaning towards my initial proposal. It seems like a more flexible option to me. Especially when we also decouple these dialect configs from Formatter class (as I described above). Then one can use the same function to accomplish several things:

  • One can call one of the builtin formatters (though one needs to import two items instead of one).
  • One can pass in a custom dialect config.
  • One can import one of the builtin dialect configs, modify it slightly (e.g. to patch a bug) and pass that to the formatter.
  • One could even publish a separate npm library with a custom dialect config, which wouldn't need to have sql-formatter library as it's dependency.

This added flexibility has of course downsides too:

  • it provides even more opportunities for one to shoot himself in the foot. That's acceptable for me, as it's a completely opt-in feature. Nobody's forced to use this extra flexibility.
  • it can bring us extra support burden. Though it can also facilitate easier contribution. So that's 1:1.
  • There's more opportunities to rely on this custom dialect API and accordingly request this API to be stable. This was the reason I was originally reluctant to expose such custom dialect API at all. I want to be able to freely evolve this part of the code without concerning myself too much with backwards compatibility. I see no reason why we can't continue the same line - treating this API as experimental and giving no guarantees about it not breaking between minor or even patch releases.

@nene
Copy link
Collaborator

nene commented Oct 5, 2022

Another concern with the separate function based approach is that it's more inconvenient when trying to parameterize the specific dialect.

With language-parameter you simply need to list the dialects inside a mapping object:

import { formatSql, sqlite, postgresql, transactsql } from "sql-formatter";

const languageMap = {
  sqlite,
  postgresql,
  transactsql,
};

export function format(code: string, lang: keyof typeof languageMap) {
  return format(code, { language: languageMap[lang] });
}

With separate functions, you'll have to list name-function pairs, which is a bit more tedious:

import { formatSqlite, formatPostgresql, formatTransactsql } from "sql-formatter";

const formatterMap = {
  sqlite: formatSqlite,
  postgresql: formatPostgresql,
  transactsql: formatTransactsql,
};

export function format(code: string, lang: keyof typeof formatterMap) {
  return formatterMap[lang](code);
}

It's a minor difference though, and me preferring the first version might just be my personal stylistic choice.

@nene
Copy link
Collaborator

nene commented Nov 4, 2022

I'll close this, as it's no more relevant after merging of #511

@nene nene closed this Nov 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants