Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

convert.py: add python logging instead of print() #6511

Merged
merged 36 commits into from
May 3, 2024

Conversation

mofosyne
Copy link
Collaborator

@mofosyne mofosyne commented Apr 6, 2024

Motivation of this PR is to switch from print() to python logging (battery included package for python), this is so that when I add the --get-outfile command to get the calculated conventional gguf filename it doesn't get polluted by erroneous print() messages.

@mofosyne mofosyne force-pushed the convert-py-logging branch from 2a2b801 to e4e4df5 Compare April 6, 2024 13:53
@CISC
Copy link
Contributor

CISC commented Apr 7, 2024

Good suggestion, however you are using the logging module wrong. You must at the very least create and use a named instance with logging.getLogger(), see the documentation.

@deiteris
Copy link

deiteris commented Apr 7, 2024

Good suggestion, however you are using the logging module wrong. You must at the very least create and use a named instance with logging.getLogger(), see the documentation.

The provided usage is not exactly wrong. You're not restricted from using library methods directly.

IMO, for a single python module (which is a standalone script here), it's fine to use default logger configuration and library methods if you don't need complex logging. But I'm not aware of conventions here, so maybe other maintainers can say if it's ok or not.

@CISC
Copy link
Contributor

CISC commented Apr 7, 2024

The provided usage is not exactly wrong. You're not restricted from using library methods directly.

While true it's a good idea not to, it will make things a lot easier in the long run. Just because you can doesn't mean you should. :)

IMO, for a single python module (which is a standalone script here), it's fine to use default logger configuration and library methods if you don't need complex logging. But I'm not aware of conventions here, so maybe other maintainers can say if it's ok or not.

Ultimately it is up to whomever reviews this PR of course.

@mofosyne
Copy link
Collaborator Author

mofosyne commented Apr 8, 2024

No probs. Made the adjustment. Hopefully it matches what you are thinking of

@CISC
Copy link
Contributor

CISC commented Apr 8, 2024

Almost. :)

You missed a couple of logging uses though, and you shouldn't use __name__ as the logger name in a stand-alone script (it'll just be __main__), just calling it convert f.ex. will do fine.

@mofosyne
Copy link
Collaborator Author

mofosyne commented Apr 8, 2024

Almost. :)

You missed a couple of logging uses though, and you shouldn't use __name__ as the logger name in a stand-alone script (it'll just be __main__), just calling it convert f.ex. will do fine.

You mean do_dump_model()? well in that case, its explicit a dump so I suppress all logging (unless --verbose is active) and only print whats on do_dump_model() happy to adjust behavior if that's unexpected.

@CISC
Copy link
Contributor

CISC commented Apr 8, 2024

You missed a couple of logging uses though, [...]
You mean do_dump_model()? well in that case, its explicit a dump so I suppress all logging (unless --verbose is active) and only print whats on do_dump_model() happy to adjust behavior if that's unexpected.

No, I mean from here and onwards.

@mofosyne
Copy link
Collaborator Author

mofosyne commented Apr 8, 2024

You missed a couple of logging uses though, [...]
You mean do_dump_model()? well in that case, its explicit a dump so I suppress all logging (unless --verbose is active) and only print whats on do_dump_model() happy to adjust behavior if that's unexpected.

No, I mean from here and onwards.

Ah dang. Yeah I see what you mean. Fixed thanks

@CISC
Copy link
Contributor

CISC commented Apr 8, 2024

Ah dang. Yeah I see what you mean. Fixed thanks

Great, but this was probably not intended though. :)

@mofosyne
Copy link
Collaborator Author

mofosyne commented Apr 8, 2024

Fixed.

@rgryta
Copy link
Contributor

rgryta commented Apr 9, 2024

Since logging is being introduced, we could also change sys.stderr.write call in the default_outfile method to logger.error.

I'm also wondering what'd be the process of ensuring that later PRs don't introduce new print statements to the code. Up for reviewers to detect, or add pipeline checks/tests - e.g. add flake8-no-print to the python-lint pipeline and replace print with sys.stdout.write.

@rgryta
Copy link
Contributor

rgryta commented Apr 9, 2024

Nit: \n is now unnecessary in the logger.error 😅

@mofosyne mofosyne force-pushed the convert-py-logging branch from baddc3f to b9c984b Compare April 10, 2024 05:48
@mofosyne
Copy link
Collaborator Author

mofosyne commented Apr 10, 2024

Since logging is being introduced, we could also change sys.stderr.write call in the default_outfile method to logger.error.

I'm also wondering what'd be the process of ensuring that later PRs don't introduce new print statements to the code. Up for reviewers to detect, or add pipeline checks/tests - e.g. add flake8-no-print to the python-lint pipeline and replace print with sys.stdout.write.

Alright then, I've figured out how to run flake8 directly (after failing to get pre-commit to work) and start using the flake8 no print allowed plugin. It then threw a whole bunch of errors on all the other pythons scripts in this repo. So decided to give correcting it all a stab.

Also updated the github actions and pre-commit config to use this feature so hopefully it now works.

Since this is changing quite a few scripts, I had to spend a bit more time on other .py scripts that had an older approach to using print() especially if ending with end= or not using fstrings for variables substitution. So you may want to pay attention to this when reviewing.

Unfortunately this has now make this PR bit more daunting to review compared to the past, but with the inclusion of the no-print check this should hopefully be a one time pain for everyone here. :)


Regarding my issue getting pre-commit working

This is what I got. I've looked around but haven't found a solution yet. Hopefully its not an issue for others working on this.

$ pre-commit install
pre-commit installed at /home/mofosyne/huggingface/TinyLLama-v0-llamafile/.git/modules/llama.cpp/hooks/pre-commit

$ pre-commit
[INFO] Initializing environment for https://github.com/PyCQA/flake8:flake8-no-print.
[INFO] Installing environment for https://github.com/pre-commit/pre-commit-hooks.
[INFO] Once installed this environment will be reused.
[INFO] This may take a few minutes...
An unexpected error has occurred: AssertionError: BUG: expected environment for python to be healthy immediately after install, please open an issue describing your environment

more info:

virtualenv python version did not match created version:
- actual version: <<error retrieving version from /home/mofosyne/.cache/pre-commit/reposmwb8fu3/py_env-python3.10/bin/python>>
- expected version: 3.10.12.final.0

Check the log at /home/mofosyne/.cache/pre-commit/pre-commit.log

.github/workflows/python-lint.yml Outdated Show resolved Hide resolved
convert-hf-to-gguf.py Outdated Show resolved Hide resolved
convert-llama-ggml-to-gguf.py Outdated Show resolved Hide resolved
gguf-py/gguf/constants.py Outdated Show resolved Hide resolved
gguf-py/scripts/gguf-convert-endian.py Outdated Show resolved Hide resolved
scripts/verify-checksum-models.py Outdated Show resolved Hide resolved
scripts/compare-llama-bench.py Outdated Show resolved Hide resolved
gguf-py/examples/reader.py Outdated Show resolved Hide resolved
convert.py Outdated Show resolved Hide resolved
gguf-py/scripts/gguf-dump.py Outdated Show resolved Hide resolved
@mofosyne mofosyne force-pushed the convert-py-logging branch from 3e6ff06 to 9e4cf37 Compare April 18, 2024 04:26
@mofosyne
Copy link
Collaborator Author

rebased on top of latest master due to extra breaking changes on convert-hf-to-gguf.py (more print() was added in the main repo)

Copy link
Collaborator Author

@mofosyne mofosyne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay just finished making all the changes I noted from you (besides the flake8 enquiry which I'm taking as a comment for now).

I also took a bit of time to look over it again and it looks fine on my end so far.

@CISC
Copy link
Contributor

CISC commented Apr 18, 2024

Great work. :)

A final note however, you should use __name__ as logger name for everything in gguf-py/gguf as they will be imported and not run directly.

@mofosyne
Copy link
Collaborator Author

Great work. :)

A final note however, you should use __name__ as logger name for everything in gguf-py/gguf as they will be imported and not run directly.

One of your previous statement appears to state the opposite.

You missed a couple of logging uses though, and you shouldn't use name as the logger name in a stand-alone script (it'll just be main), just calling it convert f.ex. will do fine.

So you are saying just specifically for gguf-py/gguf because it's used within other scripts?

@mofosyne
Copy link
Collaborator Author

Anyway added your suggestion in @CISC hopefully it should all check out!

@CISC
Copy link
Contributor

CISC commented Apr 18, 2024

So you are saying just specifically for gguf-py/gguf because it's used within other scripts?

Yes, because __name__ will contain the module path in this case (and thus all be related to each other), it's mainly stand-alone scripts that benefit from explicitly naming the logger (unless you wish to create a certain logging hierarchy).

Copy link
Collaborator Author

@mofosyne mofosyne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Manually double checked and clicked viewed on all pages. Found one unused logger import and fixed it. Aside from that, looks ready.

@mofosyne mofosyne force-pushed the convert-py-logging branch from 89a75d3 to a6f54de Compare April 21, 2024 01:33
@mofosyne
Copy link
Collaborator Author

Rebased to add one additional [convert-hf-to-gguf.py: add additional logging](https://github.com/ggerganov/llama.cpp/pull/6511/commits/a6f54dee3ca65ae1dbeab3f8c26c1d75a9609715) commit to fix addition of print() convert-hf-to-gguf.py in the main branch.

@teleprint-me
Copy link
Contributor

teleprint-me commented Apr 21, 2024

I have a pattern we can use to centralize the logger instantiation with.

The implementation is really simple and flexible.

LOGGER_FORMAT = "%(asctime)s - %(filename)s:%(lineno)d - %(levelname)s - %(message)s"

def get_default_logger(name=None, level=logging.DEBUG):
    logger = logging.getLogger(name)
    if not logger.hasHandlers():
        handler = logging.StreamHandler(stream=sys.stdout)
        formatter = logging.Formatter(LOGGER_FORMAT)
        handler.setFormatter(formatter)
        logger.addHandler(handler)
        logger.setLevel(level)
    return logger

I left the docstring out of the snippet shown here, but this is basically it. You're free to use it to simplify the implementation and reduce code duplication. I've been able to use this in classes, functions, and scripts and regularly use it. It's under the MIT License, so it's compatible with llama.cpp.

This pattern can be extended over time and included in the gguf-py package as a logger.py module that can be improved over time.

I never liked the use of the global LOGGER_FORMAT, but it simplified the initial implementation which I extracted from a few other programs I wrote. The implementation details can be ironed out over time as needed though.

@mofosyne
Copy link
Collaborator Author

Happy to add if @cebtenzzre thinks it's a good idea.

What's the advantage of a centralized logger instance? Would this be better as a separate PR (Which would be good if it's only a small change to each file).

The main objective of this PR is simply to convert from print() to logger in a preferably simple and extendable way, so at least for this one I would prefer to minimize adding dependencies.

Copy link
Contributor

@CISC CISC left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hate to do this to you, but you got the wrong one, the correct one is at L1189, please revert 6d42f3d :)

@mofosyne
Copy link
Collaborator Author

mofosyne commented May 2, 2024

@mofosyne Great work here, unfortunately I think you were a bit unlucky and fell right between some urgent events, leaving everyone very busy, which has postponed the merge of this PR.

Also I do not have any reviewer powers, just being a busy-body. :) You'll have to wait for @cebtenzzre and/or @ggerganov to get some spare time.

Ah that's not a problem. Ultimately, this is mostly just a quality of life fix to make developer life smoother. But I do hope to eventually get this in, as better debugging will ultimately make everyone's life easier.

Copy link
Collaborator

@compilade compilade left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is overall quite good. Feel free to ignore/"resolve" any of the comments I made if they are too nit-picky.

convert-llama-ggml-to-gguf.py Outdated Show resolved Hide resolved
gguf-py/scripts/gguf-convert-endian.py Show resolved Hide resolved
gguf-py/scripts/gguf-convert-endian.py Outdated Show resolved Hide resolved
scripts/run-with-preset.py Outdated Show resolved Hide resolved
tests/test-tokenizer-0-spm.py Show resolved Hide resolved
requirements.txt Outdated Show resolved Hide resolved
convert-hf-to-gguf.py Outdated Show resolved Hide resolved
convert-hf-to-gguf.py Outdated Show resolved Hide resolved
@mofosyne
Copy link
Collaborator Author

mofosyne commented May 3, 2024

Okay implemented most of your suggestions @compilade but excluded some style observations as it doesn't appear to be an issue.

edit: Accodentally typed cebtenzzre. Intended to invoke compilade

@mofosyne mofosyne requested a review from compilade May 3, 2024 02:57
Copy link
Collaborator

@compilade compilade left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Merging this will cause conflicts in at least #7041, #7036, #7033, #7031, #7027, #7018, #6942, #6866 and soft conflicts with at least #7020, #6986, #6919, #6826. (soft conflicts in this case mean new linter errors, but no merge conflict)

Those involved in these PRs will probably want to take at least a cursory look at this before then (hopefully soon).

@teleprint-me
Copy link
Contributor

Maybe this one should be prioritized for now?

@ggerganov ggerganov merged commit a2ac89d into ggerganov:master May 3, 2024
22 checks passed
@mofosyne mofosyne deleted the convert-py-logging branch May 4, 2024 04:41
@mofosyne
Copy link
Collaborator Author

mofosyne commented May 4, 2024

Thanks for merging! Hopefully all is well for other PRs!

@ggerganov
Copy link
Owner

On my Macbook I don't see the print's anymore. For example:

python3 convert-hf-to-gguf-update.py

should print usage, but after this PR it does not print anything. Any ideas?

@mofosyne
Copy link
Collaborator Author

mofosyne commented May 4, 2024

I see what's going on. By default in https://docs.python.org/3/howto/logging.html logging level is only warning and up.

The default level is WARNING, which means that only events of this level and above will be tracked, unless the logging package is configured to do otherwise.

To get it printing again you need to add logging.basicConfig(level=logging.DEBUG) just under logger = logging.getLogger("convert-hf-to-gguf-update")

This was missed because it was not using the typical parser.add_argument that I was searching for.

We may want to convert it to ArgumentParser() so we can add the --verbose flag. But if unlikely to be complicated, then just throwing the basicConfig() under logger = logging.getLogger("convert-hf-to-gguf-update") will be good enough.


On a side note you may want to also chmod +x convert-hf-to-gguf-update.py after adding #!/usr/bin/env python3 to the top of the script.

nopperl pushed a commit to nopperl/llama.cpp that referenced this pull request May 5, 2024
* convert.py: add python logging instead of print()

* convert.py: verbose flag takes priority over dump flag log suppression

* convert.py: named instance logging

* convert.py: use explicit logger id string

* convert.py: convert extra print() to named logger

* convert.py: sys.stderr.write --> logger.error

* *.py: Convert all python scripts to use logging module

* requirements.txt: remove extra line

* flake8: update flake8 ignore and exclude to match ci settings

* gh-actions: add flake8-no-print to flake8 lint step

* pre-commit: add flake8-no-print to flake8 and also update pre-commit version

* convert-hf-to-gguf.py: print() to logger conversion

* *.py: logging basiconfig refactor to use conditional expression

* *.py: removed commented out logging

* fixup! *.py: logging basiconfig refactor to use conditional expression

* constant.py: logger.error then exit should be a raise exception instead

* *.py: Convert logger error and sys.exit() into a raise exception (for atypical error)

* gguf-convert-endian.py: refactor convert_byteorder() to use tqdm progressbar

* verify-checksum-model.py: This is the result of the program, it should be printed to stdout.

* compare-llama-bench.py: add blank line for readability during missing repo response

* reader.py: read_gguf_file() use print() over logging

* convert.py: warning goes to stderr and won't hurt the dump output

* gguf-dump.py: dump_metadata() should print to stdout

* convert-hf-to-gguf.py: print --> logger.debug or ValueError()

* verify-checksum-models.py: use print() for printing table

* *.py: refactor logging.basicConfig()

* gguf-py/gguf/*.py: use __name__ as logger name

Since they will be imported and not run directly.

* python-lint.yml: use .flake8 file instead

* constants.py: logger no longer required

* convert-hf-to-gguf.py: add additional logging

* convert-hf-to-gguf.py: print() --> logger

* *.py: fix flake8 warnings

* revert changes to convert-hf-to-gguf.py for get_name()

* convert-hf-to-gguf-update.py: use triple quoted f-string instead

* *.py: accidentally corrected the wrong line

* *.py: add compilade warning suggestions and style fixes
@reneleonhardt
Copy link
Contributor

Hmm I wonder why not all f-strings have been converted to %s parameters like all logging libraries suggest for a bit lazier evaluation 😅
https://docs.python.org/3/howto/logging-cookbook.html#formatting-styles

@CISC
Copy link
Contributor

CISC commented May 6, 2024

Hmm I wonder why not all f-strings have been converted to %s parameters like all logging libraries suggest for a bit lazier evaluation 😅

If it would impact performance in any meaningful way, sure, otherwise no. IMO f-strings are much better and less error prone.

teleprint-me pushed a commit to teleprint-me/llama.cpp that referenced this pull request May 7, 2024
* convert.py: add python logging instead of print()

* convert.py: verbose flag takes priority over dump flag log suppression

* convert.py: named instance logging

* convert.py: use explicit logger id string

* convert.py: convert extra print() to named logger

* convert.py: sys.stderr.write --> logger.error

* *.py: Convert all python scripts to use logging module

* requirements.txt: remove extra line

* flake8: update flake8 ignore and exclude to match ci settings

* gh-actions: add flake8-no-print to flake8 lint step

* pre-commit: add flake8-no-print to flake8 and also update pre-commit version

* convert-hf-to-gguf.py: print() to logger conversion

* *.py: logging basiconfig refactor to use conditional expression

* *.py: removed commented out logging

* fixup! *.py: logging basiconfig refactor to use conditional expression

* constant.py: logger.error then exit should be a raise exception instead

* *.py: Convert logger error and sys.exit() into a raise exception (for atypical error)

* gguf-convert-endian.py: refactor convert_byteorder() to use tqdm progressbar

* verify-checksum-model.py: This is the result of the program, it should be printed to stdout.

* compare-llama-bench.py: add blank line for readability during missing repo response

* reader.py: read_gguf_file() use print() over logging

* convert.py: warning goes to stderr and won't hurt the dump output

* gguf-dump.py: dump_metadata() should print to stdout

* convert-hf-to-gguf.py: print --> logger.debug or ValueError()

* verify-checksum-models.py: use print() for printing table

* *.py: refactor logging.basicConfig()

* gguf-py/gguf/*.py: use __name__ as logger name

Since they will be imported and not run directly.

* python-lint.yml: use .flake8 file instead

* constants.py: logger no longer required

* convert-hf-to-gguf.py: add additional logging

* convert-hf-to-gguf.py: print() --> logger

* *.py: fix flake8 warnings

* revert changes to convert-hf-to-gguf.py for get_name()

* convert-hf-to-gguf-update.py: use triple quoted f-string instead

* *.py: accidentally corrected the wrong line

* *.py: add compilade warning suggestions and style fixes
teleprint-me pushed a commit to teleprint-me/llama.cpp that referenced this pull request May 7, 2024
* convert.py: add python logging instead of print()

* convert.py: verbose flag takes priority over dump flag log suppression

* convert.py: named instance logging

* convert.py: use explicit logger id string

* convert.py: convert extra print() to named logger

* convert.py: sys.stderr.write --> logger.error

* *.py: Convert all python scripts to use logging module

* requirements.txt: remove extra line

* flake8: update flake8 ignore and exclude to match ci settings

* gh-actions: add flake8-no-print to flake8 lint step

* pre-commit: add flake8-no-print to flake8 and also update pre-commit version

* convert-hf-to-gguf.py: print() to logger conversion

* *.py: logging basiconfig refactor to use conditional expression

* *.py: removed commented out logging

* fixup! *.py: logging basiconfig refactor to use conditional expression

* constant.py: logger.error then exit should be a raise exception instead

* *.py: Convert logger error and sys.exit() into a raise exception (for atypical error)

* gguf-convert-endian.py: refactor convert_byteorder() to use tqdm progressbar

* verify-checksum-model.py: This is the result of the program, it should be printed to stdout.

* compare-llama-bench.py: add blank line for readability during missing repo response

* reader.py: read_gguf_file() use print() over logging

* convert.py: warning goes to stderr and won't hurt the dump output

* gguf-dump.py: dump_metadata() should print to stdout

* convert-hf-to-gguf.py: print --> logger.debug or ValueError()

* verify-checksum-models.py: use print() for printing table

* *.py: refactor logging.basicConfig()

* gguf-py/gguf/*.py: use __name__ as logger name

Since they will be imported and not run directly.

* python-lint.yml: use .flake8 file instead

* constants.py: logger no longer required

* convert-hf-to-gguf.py: add additional logging

* convert-hf-to-gguf.py: print() --> logger

* *.py: fix flake8 warnings

* revert changes to convert-hf-to-gguf.py for get_name()

* convert-hf-to-gguf-update.py: use triple quoted f-string instead

* *.py: accidentally corrected the wrong line

* *.py: add compilade warning suggestions and style fixes
teleprint-me pushed a commit to teleprint-me/llama.cpp that referenced this pull request May 7, 2024
* convert.py: add python logging instead of print()

* convert.py: verbose flag takes priority over dump flag log suppression

* convert.py: named instance logging

* convert.py: use explicit logger id string

* convert.py: convert extra print() to named logger

* convert.py: sys.stderr.write --> logger.error

* *.py: Convert all python scripts to use logging module

* requirements.txt: remove extra line

* flake8: update flake8 ignore and exclude to match ci settings

* gh-actions: add flake8-no-print to flake8 lint step

* pre-commit: add flake8-no-print to flake8 and also update pre-commit version

* convert-hf-to-gguf.py: print() to logger conversion

* *.py: logging basiconfig refactor to use conditional expression

* *.py: removed commented out logging

* fixup! *.py: logging basiconfig refactor to use conditional expression

* constant.py: logger.error then exit should be a raise exception instead

* *.py: Convert logger error and sys.exit() into a raise exception (for atypical error)

* gguf-convert-endian.py: refactor convert_byteorder() to use tqdm progressbar

* verify-checksum-model.py: This is the result of the program, it should be printed to stdout.

* compare-llama-bench.py: add blank line for readability during missing repo response

* reader.py: read_gguf_file() use print() over logging

* convert.py: warning goes to stderr and won't hurt the dump output

* gguf-dump.py: dump_metadata() should print to stdout

* convert-hf-to-gguf.py: print --> logger.debug or ValueError()

* verify-checksum-models.py: use print() for printing table

* *.py: refactor logging.basicConfig()

* gguf-py/gguf/*.py: use __name__ as logger name

Since they will be imported and not run directly.

* python-lint.yml: use .flake8 file instead

* constants.py: logger no longer required

* convert-hf-to-gguf.py: add additional logging

* convert-hf-to-gguf.py: print() --> logger

* *.py: fix flake8 warnings

* revert changes to convert-hf-to-gguf.py for get_name()

* convert-hf-to-gguf-update.py: use triple quoted f-string instead

* *.py: accidentally corrected the wrong line

* *.py: add compilade warning suggestions and style fixes
@slaren
Copy link
Collaborator

slaren commented May 7, 2024

This has broken compare-llama-bench.py.

@ggerganov
Copy link
Owner

This has broken compare-llama-bench.py.

@mofosyne PTAL - usage instructions are here:

DESCRIPTION = """Creates tables from llama-bench data written to an SQLite database. Example usage (Linux):
$ git checkout master
$ make clean && make llama-bench
$ ./llama-bench -o sql | sqlite3 llama-bench.sqlite
$ git checkout some_branch
$ make clean && make llama-bench
$ ./llama-bench -o sql | sqlite3 llama-bench.sqlite
$ ./scripts/compare-llama-bench.py
Performance numbers from multiple runs per commit are averaged WITHOUT being weighted by the --repetitions parameter of llama-bench.
"""

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants