Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use proptest in codegen tests #80

Closed

Conversation

nicoabie
Copy link

What kind of change does this PR introduce?

POC of using proptest to help finish the parser #51
What I want to show with this little PR is how easy is to generate families of tests.

What is the current behavior?

Static unit tests

What is the new behavior?

Generated unit tests

Additional context

There are two crates quickcheck and proptest, I found the documentation of proptest easier to understand for newcomers. I 've used PBT in different languages (prolog, scheme, javascript and python) but never in rust so this is my first time.

https://altsysrq.github.io/proptest-book/intro.html

Copy link
Collaborator

@psteinroe psteinroe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the contribution!

unfortunately, there is not much value in these tests, because the tokens we receive from the lexer are exactly the same. For our implementation, whether we parse select 1 from contact or select 48 from apple makes no difference. it only makes a difference when the semantic meaning changes. I think proptests can still be useful, but not on this level. do you have an idea how to apply them in a meaningful manner?

@nicoabie
Copy link
Author

The value is that you don't need to write all the possible scenarios that will produce the same output from the lexer.

what are all the posible statements that produce vec![TokenProperty::from(SyntaxKind::Select)]?

select *;
select *
select 1;
select 1
select 'a';
select 'a'
select 1 as alias;
select 1 as alias
select 'a' as alias;
select 'a' as alias

and now combinations of the previous selecting more than one field.

maybe more? I didn't get into the details of how it works

And now:

  • 1 can be any number composed of 1, 2, 3, N digits
  • 'a' can be any valid sequence of chars
  • alias can be any valid sequence of chars

'contact' or 'apple' represent all the possible table names? not really therefore you can have a custom arbitrary that generates valid names that respect postgres constraints.

  • Length: Up to 63 characters.
  • Characters: Start with a letter or an underscore, followed by letters, numbers, or underscores.
  • Case: Case-insensitive, but it's a good practice to use lowercase to avoid confusion.
  • Reserved Words: Avoid using reserved words like "select," "insert," "update," etc.

how many unit tests would you need to really make sure test_select_with_where works?
let's see all the possible combinations.

  • all the combinations of columns that go into test_simple_select
  • all the combinations of different table names that are valid in postgres described in the previous section
  • all the different operators to compare (you can have a custom arbitrary)
  • right operands be numbers or strings (you can have a custom arbitrary)
  • that where is a very simple one, I could have ANDs, ORs, etc (you can have a custom arbitrary)

That is the value of proptesting, there is no way one can write all the combinations by hand.
I guess it depends on the confidence you want/need.

Question is, how do you make the LLM to have that coverage of the domain of the problem?

@psteinroe psteinroe closed this Jul 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants