Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a SEP to extend the spec syntax #5

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

alalazo
Copy link
Member

@alalazo alalazo commented Nov 24, 2021

Proposal to extend the spec syntax:

  • To allow specifying arbitrary key=value attributes on the edges
  • To allow parsing non-flat DAGs

This is needed to deal with use cases introduced by separate concretization of build dependencies and compiler as dependencies. Details in the SEP.

tgamblin pushed a commit to spack/spack that referenced this pull request Dec 7, 2022
## Motivation

Our parser grew to be quite complex, with a 2-state lexer and logic in the parser
that has up to 5 levels of nested conditionals. In the future, to turn compilers into
proper dependencies, we'll have to increase the complexity further as we foresee
the need to add:
1. Edge attributes
2. Spec nesting

to the spec syntax (see spack/seps#5 for an initial discussion of
those changes).  The main attempt here is thus to _simplify the existing code_ before
we start extending it later. We try to do that by adopting a different token granularity,
and by using more complex regexes for tokenization. This allow us to a have a "flatter"
encoding for the parser. i.e., it has fewer nested conditionals and a near-trivial lexer.

There are places, namely in `VERSION`, where we have to use negative lookahead judiciously
to avoid ambiguity.  Specifically, this parse is ambiguous without `(?!\s*=)` in `VERSION_RANGE`
and an extra final `\b` in `VERSION`:

```
@ 1.2.3     :        develop  # This is a version range 1.2.3:develop
@ 1.2.3     :        develop=foo  # This is a version range 1.2.3: followed by a key-value pair
```

## Differences with the previous parser

~There are currently 2 known differences with the previous parser, which have been added on purpose:~

- ~No spaces allowed after a sigil (e.g. `foo @ 1.2.3` is invalid while `foo @1.2.3` is valid)~
- ~`/<hash> @1.2.3` can be parsed as a concrete spec followed by an anonymous spec (before was invalid)~

~We can recover the previous behavior on both ones but, especially for the second one, it seems the current behavior in the PR is more consistent.~

The parser is currently 100% backward compatible.

## Error handling

Being based on more complex regexes, we can possibly improve error
handling by adding regexes for common issues and hint users on that.
I'll leave that for a following PR, but there's a stub for this approach in the PR.

## Performance

To be sure we don't add any performance penalty with this new encoding, I measured:
```console
$ spack python -m timeit -s "import spack.spec" -c "spack.spec.Spec(<spec>)"
```
for different specs on my machine:

* **Spack:** 0.20.0.dev0 (c9db4e5)
* **Python:** 3.8.10
* **Platform:** linux-ubuntu20.04-icelake
* **Concretizer:** clingo

results are:

| Spec          | develop       | this PR |
| ------------- | ------------- | ------- |
| `trilinos`  |  28.9 usec | 13.1 usec |
| `trilinos @1.2.10:1.4.20,2.0.1`  | 131 usec  | 120 usec |
| `trilinos %gcc`  | 44.9 usec  | 20.9 usec |
| `trilinos +foo`  | 44.1 usec  | 21.3 usec |
| `trilinos foo=bar`  | 59.5 usec  | 25.6 usec |
| `trilinos foo=bar ^ mpich foo=baz`  | 120 usec  | 82.1 usec |

so this new parser seems to be consistently faster than the previous one.

## Modifications

In this PR we just substituted the Spec parser, which means:
- [x] Deleted in `spec.py` the `SpecParser` and `SpecLexer` classes. deleted `spack/parse.py`
- [x] Added a new parser in `spack/parser.py`
- [x] Hooked the new parser in all the places the previous one was used
- [x] Adapted unit tests in `test/spec_syntax.py`


## Possible future improvements

Random thoughts while working on the PR:
- Currently we transform hashes and files into specs during parsing. I think
we might want to introduce an additional step and parse special objects like
a `FileSpec` etc. in-between parsing and concretization.
amd-toolchain-support pushed a commit to amd-toolchain-support/spack that referenced this pull request Feb 16, 2023
## Motivation

Our parser grew to be quite complex, with a 2-state lexer and logic in the parser
that has up to 5 levels of nested conditionals. In the future, to turn compilers into
proper dependencies, we'll have to increase the complexity further as we foresee
the need to add:
1. Edge attributes
2. Spec nesting

to the spec syntax (see spack/seps#5 for an initial discussion of
those changes).  The main attempt here is thus to _simplify the existing code_ before
we start extending it later. We try to do that by adopting a different token granularity,
and by using more complex regexes for tokenization. This allow us to a have a "flatter"
encoding for the parser. i.e., it has fewer nested conditionals and a near-trivial lexer.

There are places, namely in `VERSION`, where we have to use negative lookahead judiciously
to avoid ambiguity.  Specifically, this parse is ambiguous without `(?!\s*=)` in `VERSION_RANGE`
and an extra final `\b` in `VERSION`:

```
@ 1.2.3     :        develop  # This is a version range 1.2.3:develop
@ 1.2.3     :        develop=foo  # This is a version range 1.2.3: followed by a key-value pair
```

## Differences with the previous parser

~There are currently 2 known differences with the previous parser, which have been added on purpose:~

- ~No spaces allowed after a sigil (e.g. `foo @ 1.2.3` is invalid while `foo @1.2.3` is valid)~
- ~`/<hash> @1.2.3` can be parsed as a concrete spec followed by an anonymous spec (before was invalid)~

~We can recover the previous behavior on both ones but, especially for the second one, it seems the current behavior in the PR is more consistent.~

The parser is currently 100% backward compatible.

## Error handling

Being based on more complex regexes, we can possibly improve error
handling by adding regexes for common issues and hint users on that.
I'll leave that for a following PR, but there's a stub for this approach in the PR.

## Performance

To be sure we don't add any performance penalty with this new encoding, I measured:
```console
$ spack python -m timeit -s "import spack.spec" -c "spack.spec.Spec(<spec>)"
```
for different specs on my machine:

* **Spack:** 0.20.0.dev0 (c9db4e5)
* **Python:** 3.8.10
* **Platform:** linux-ubuntu20.04-icelake
* **Concretizer:** clingo

results are:

| Spec          | develop       | this PR |
| ------------- | ------------- | ------- |
| `trilinos`  |  28.9 usec | 13.1 usec |
| `trilinos @1.2.10:1.4.20,2.0.1`  | 131 usec  | 120 usec |
| `trilinos %gcc`  | 44.9 usec  | 20.9 usec |
| `trilinos +foo`  | 44.1 usec  | 21.3 usec |
| `trilinos foo=bar`  | 59.5 usec  | 25.6 usec |
| `trilinos foo=bar ^ mpich foo=baz`  | 120 usec  | 82.1 usec |

so this new parser seems to be consistently faster than the previous one.

## Modifications

In this PR we just substituted the Spec parser, which means:
- [x] Deleted in `spec.py` the `SpecParser` and `SpecLexer` classes. deleted `spack/parse.py`
- [x] Added a new parser in `spack/parser.py`
- [x] Hooked the new parser in all the places the previous one was used
- [x] Adapted unit tests in `test/spec_syntax.py`


## Possible future improvements

Random thoughts while working on the PR:
- Currently we transform hashes and files into specs during parsing. I think
we might want to introduce an additional step and parse special objects like
a `FileSpec` etc. in-between parsing and concretization.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant