Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Stdlib] Add stol function #3178

Open
wants to merge 10 commits into
base: nightly
Choose a base branch
from

Conversation

jjvraw
Copy link
Contributor

@jjvraw jjvraw commented Jul 5, 2024

Introduces stol (string to long) function for more flexible string-to-integer parsing (issue #2639).

Implementation and logic closely follows that of atol. The function parses integer literals with base handling (2-36), supports base prefixes, and stops at the first invalid character. Unlike atol, it returns a tuple (parsed_int, remaining_string), allowing partial parsing without raising errors. This can make stol more flexible and suitable for mixed-content strings.

@jjvraw jjvraw requested a review from a team as a code owner July 5, 2024 07:43
@martinvuyk
Copy link
Contributor

Hey @jjvraw this looks good, just a couple of comments on some edge cases where python accepts leading underscores and zeroes for non base 10 numbers (remember to always check against python since we are aiming for maximum compatibility).

Other Changes

  • Added _handle_base_prefix function for improved reusability and readability.
    • Modularises previously inline code.
  • Added _trim_and_handle_sign function for improved reusability and readability.
    • Modularises previously inline code.
  • Renamed _atol_error to _str_to_base_error for reusability and clarity.
  • Revised the docstring of atol for improved clarity.

Could you split this into another PR ? It will be easier to review those changes separately and then incorporate this PR on top after merging that

@jjvraw
Copy link
Contributor Author

jjvraw commented Jul 5, 2024

Hi @martinvuyk, thank you!

python accepts leading underscores and zeroes for non base 10 numbers

Thanks for catching that, as well as the overflow! I misread the lexical syntax for Python's Integer literals, and intentionally tried to catch the leading underscore.

Could you split this into another PR ? It will be easier to review those changes separately and then incorporate this PR on top after merging that

Sure, I had a feeling these should be separate. Regarding StringRef to StringSlice, shall I apply the same change to _atol's signature in the mentioned PR?

@martinvuyk
Copy link
Contributor

lexical syntax for Python's Integer literals

I actually hadn't read the docs just tried it out. That table they have there seems pretty nice for documentation we could either copy it or put a more visible link to it (I hadn't actually seen the link at the bottom of your docstrings).

Regarding StringRef to StringSlice, shall I apply the same change to _atol's signature in the mentioned PR?

hmm I didn't realize atol uses StringRef as well, maybe we should leave changing both to StringSlice for another PR. Someone from the Mojo team would have to weigh in here.

jjvraw added a commit to jjvraw/mojo that referenced this pull request Jul 5, 2024
- Add handle_base_prefix and trim_and_handle_sign helper functions
- Rename atol_error to str_to_base_error for clarity
- Update atol docstring for improved clarity

Breaks up the functionality of atol for better readability
and reusability, as suggested in PR modularml#3178.
jjvraw added a commit to jjvraw/mojo that referenced this pull request Jul 5, 2024
- Add handle_base_prefix and trim_and_handle_sign helper functions
- Rename atol_error to str_to_base_error for clarity
- Update atol docstring for improved clarity

Breaks up the functionality of atol for better readability
and reusability, as suggested in PR modularml#3178.
jjvraw added a commit to jjvraw/mojo that referenced this pull request Jul 5, 2024
- Add handle_base_prefix and trim_and_handle_sign helper functions
- Rename atol_error to str_to_base_error for clarity
- Update atol docstring for improved clarity

Breaks up the functionality of atol for better readability
and reusability, as suggested in PR modularml#3178.
jjvraw added a commit to jjvraw/mojo that referenced this pull request Jul 5, 2024
- Add handle_base_prefix and trim_and_handle_sign helper functions
- Rename atol_error to str_to_base_error for clarity
- Update atol docstring for improved clarity

Breaks up the functionality of atol for better readability
and reusability, as suggested in PR modularml#3178.

Signed-off-by: Joshua James Venter <[email protected]>
jjvraw added a commit to jjvraw/mojo that referenced this pull request Jul 5, 2024
- Support leading underscores
- Add handle_base_prefix and trim_and_handle_sign helper functions
- Rename atol_error to str_to_base_error for clarity
- Update atol docstring for improved clarity

Breaks up the functionality of atol for better readability
and reusability, as suggested in PR modularml#3178.

Signed-off-by: Joshua James Venter <[email protected]>
@jjvraw
Copy link
Contributor Author

jjvraw commented Jul 5, 2024

With "Other Changes" moved to #3180, lets leave StringRef for now, on the basis that atol uses it? We can get this merged, then have the StringRef -> StringSlice as a seperate issue or discussion? Thoughts @martinvuyk ?

jjvraw added a commit to jjvraw/mojo that referenced this pull request Jul 5, 2024
- Support leading underscores
- Add handle_base_prefix and trim_and_handle_sign helper functions
- Rename atol_error to str_to_base_error for clarity
- Update atol docstring for improved clarity

Breaks up the functionality of atol for better readability
and reusability, as suggested in PR modularml#3178.

Signed-off-by: Joshua James Venter <[email protected]>
jjvraw added a commit to jjvraw/mojo that referenced this pull request Jul 5, 2024
- Support leading underscores
- Add handle_base_prefix and trim_and_handle_sign helper functions
- Rename atol_error to str_to_base_error for clarity
- Update atol docstring for improved clarity

Breaks up the functionality of atol for better readability
and reusability, as suggested in PR modularml#3178.

Signed-off-by: Joshua James Venter <[email protected]>
jjvraw added a commit to jjvraw/mojo that referenced this pull request Jul 5, 2024
- Support leading underscores
- Add handle_base_prefix and trim_and_handle_sign helper functions
- Rename atol_error to str_to_base_error for clarity
- Update atol docstring for improved clarity

Breaks up the functionality of atol for better readability
and reusability, as suggested in PR modularml#3178.

Signed-off-by: Joshua James Venter <[email protected]>
jjvraw added a commit to jjvraw/mojo that referenced this pull request Jul 5, 2024
- Support leading underscores
- Add handle_base_prefix and trim_and_handle_sign helper functions
- Rename atol_error to str_to_base_error for clarity
- Update atol docstring for improved clarity

Breaks up the functionality of atol for better readability
and reusability, as suggested in PR modularml#3178.

Signed-off-by: Joshua James Venter <[email protected]>
jjvraw added a commit to jjvraw/mojo that referenced this pull request Jul 5, 2024
- Support leading underscores
- Add handle_base_prefix and trim_and_handle_sign helper functions
- Rename atol_error to str_to_base_error for clarity
- Update atol docstring for improved clarity

Breaks up the functionality of atol for better readability
and reusability, as suggested in PR modularml#3178.

Signed-off-by: Joshua James Venter <[email protected]>
jjvraw added a commit to jjvraw/mojo that referenced this pull request Jul 6, 2024
- Support leading underscores (bug fix)
- Add handle_base_prefix and trim_and_handle_sign helper functions
- Rename atol_error to str_to_base_error for clarity
- Update atol docstring for improved clarity

Breaks up the functionality of atol for better readability
and reusability, as suggested in PR modularml#3178.

Co-authored-by: martinvuyk <[email protected]>
Co-authored-by: soraros <[email protected]>

Signed-off-by: Joshua James Venter <[email protected]>
jjvraw added a commit to jjvraw/mojo that referenced this pull request Jul 6, 2024
- Support leading underscores (bug fix)
- Add handle_base_prefix and trim_and_handle_sign helper functions
- Rename atol_error to str_to_base_error for clarity
- Update atol docstring for improved clarity

Breaks up the functionality of atol for better readability
and reusability, as suggested in PR modularml#3178.

Co-authored-by: martinvuyk <[email protected]>
Co-authored-by: soraros <[email protected]>

Signed-off-by: Joshua James Venter <[email protected]>
jjvraw added a commit to jjvraw/mojo that referenced this pull request Jul 6, 2024
- Support leading underscores (bug fix)
- Add handle_base_prefix and trim_and_handle_sign helper functions
- Rename atol_error to str_to_base_error for clarity
- Update atol docstring for improved clarity

Breaks up the functionality of atol for better readability
and reusability, as suggested in PR modularml#3178.

Co-authored-by: martinvuyk <[email protected]>
Co-authored-by: soraros <[email protected]>

Signed-off-by: Joshua James Venter <[email protected]>
jjvraw added a commit to jjvraw/mojo that referenced this pull request Jul 6, 2024
- Support leading underscores (bug fix)
- Add handle_base_prefix and trim_and_handle_sign helper functions
- Rename atol_error to str_to_base_error for clarity
- Update atol docstring for improved clarity

Breaks up the functionality of atol for better readability
and reusability, as suggested in PR modularml#3178.

Signed-off-by: Joshua James Venter <[email protected]>
jjvraw added a commit to jjvraw/mojo that referenced this pull request Jul 6, 2024
- Support leading underscores (bug fix)
- Add handle_base_prefix and trim_and_handle_sign helper functions
- Rename atol_error to str_to_base_error for clarity
- Update atol docstring for improved clarity

Breaks up the functionality of atol for better readability
and reusability, as suggested in PR modularml#3178.

Signed-off-by: Joshua James Venter <[email protected]>
@jjvraw jjvraw marked this pull request as draft July 7, 2024 20:27
jjvraw and others added 5 commits July 24, 2024 10:38
- StringRef to StringSlice
- Introduce _handle_base_prefix & _trim_and_handle_sign
- Changed var to alias

Signed-off-by: Joshua James Venter <[email protected]>
This commit introduces the `stol` (string to long) function, which closely
follows the implementation of `atol` while providing extended functionality:

- Parses integer literals following `atol`'s logic and base handling (2-36)
- Maintains consistency with `atol` in handling base prefixes (0b, 0o, 0x)
- Extends `atol` by returning both the parsed integer and remaining string
- Stops at the first invalid character instead of raising an error

Key differences from `atol`:
- Returns a tuple (parsed_int, remaining_string) instead of just an int
- Does not raise an error for partially valid inputs

This function provides a more flexible parsing option while maintaining
consistency with existing string-to-integer conversion in the standard library.

Signed-off-by: Joshua James Venter <[email protected]>
Co-authored-by: martinvuyk <[email protected]>
Signed-off-by: Joshua James Venter <[email protected]>
Signed-off-by: Joshua James Venter <[email protected]>
@JoeLoser
Copy link
Collaborator

#3207 just got merged, so feel free to rebase when the new nightly comes out and we can take a look at landing this. Thank you!

@JoeLoser JoeLoser added the waiting for response Needs action/response from contributor before a PR can proceed label Nov 22, 2024
modularbot pushed a commit that referenced this pull request Nov 23, 2024
… reusability (#51354)

[External] [stdlib] Refactor `atol`: `StringRef` to `StringSlice` and
reusability

This pull request modularises previously inline code, assisting in
addressing issue #2639 to allow
`stol` function to reuse some of the functionality. These changes,
initially part of #3178, have been
separated for easier review.

The refactoring extracts `_handle_base_prefix` &
`_trim_and_handle_sign`. Additionally, the docstring has been revised
for clarity.

Alongside these changes, the respective function signatures have been
updated to move away from `StringRef` to `StringSlice`.

There was also a small `FIXME` to convert from `var` to `alias`, which
is now seems possible.

Co-authored-by: Joshua James Venter <[email protected]>
Closes #3207
MODULAR_ORIG_COMMIT_REV_ID: e733ed473f086abe6fc5acccae711e0114c1093d
Signed-off-by: Joshua James Venter <[email protected]>
Signed-off-by: Joshua James Venter <[email protected]>
Signed-off-by: Joshua James Venter <[email protected]>
@jjvraw jjvraw marked this pull request as ready for review December 1, 2024 07:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
waiting for response Needs action/response from contributor before a PR can proceed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants