convert.py: add python logging instead of print() #6511

mofosyne · 2024-04-06T13:50:45Z

Motivation of this PR is to switch from print() to python logging (battery included package for python), this is so that when I add the --get-outfile command to get the calculated conventional gguf filename it doesn't get polluted by erroneous print() messages.

CISC · 2024-04-07T10:47:56Z

Good suggestion, however you are using the logging module wrong. You must at the very least create and use a named instance with logging.getLogger(), see the documentation.

deiteris · 2024-04-07T15:41:22Z

Good suggestion, however you are using the logging module wrong. You must at the very least create and use a named instance with logging.getLogger(), see the documentation.

The provided usage is not exactly wrong. You're not restricted from using library methods directly.

IMO, for a single python module (which is a standalone script here), it's fine to use default logger configuration and library methods if you don't need complex logging. But I'm not aware of conventions here, so maybe other maintainers can say if it's ok or not.

CISC · 2024-04-07T17:03:11Z

The provided usage is not exactly wrong. You're not restricted from using library methods directly.

While true it's a good idea not to, it will make things a lot easier in the long run. Just because you can doesn't mean you should. :)

IMO, for a single python module (which is a standalone script here), it's fine to use default logger configuration and library methods if you don't need complex logging. But I'm not aware of conventions here, so maybe other maintainers can say if it's ok or not.

Ultimately it is up to whomever reviews this PR of course.

mofosyne · 2024-04-08T02:36:42Z

No probs. Made the adjustment. Hopefully it matches what you are thinking of

CISC · 2024-04-08T10:03:05Z

Almost. :)

You missed a couple of logging uses though, and you shouldn't use __name__ as the logger name in a stand-alone script (it'll just be __main__), just calling it convert f.ex. will do fine.

mofosyne · 2024-04-08T11:02:01Z

Almost. :)

You missed a couple of logging uses though, and you shouldn't use __name__ as the logger name in a stand-alone script (it'll just be __main__), just calling it convert f.ex. will do fine.

You mean do_dump_model()? well in that case, its explicit a dump so I suppress all logging (unless --verbose is active) and only print whats on do_dump_model() happy to adjust behavior if that's unexpected.

CISC · 2024-04-08T11:15:59Z

You missed a couple of logging uses though, [...]
You mean do_dump_model()? well in that case, its explicit a dump so I suppress all logging (unless --verbose is active) and only print whats on do_dump_model() happy to adjust behavior if that's unexpected.

No, I mean from here and onwards.

mofosyne · 2024-04-08T11:28:58Z

You missed a couple of logging uses though, [...]
You mean do_dump_model()? well in that case, its explicit a dump so I suppress all logging (unless --verbose is active) and only print whats on do_dump_model() happy to adjust behavior if that's unexpected.

No, I mean from here and onwards.

Ah dang. Yeah I see what you mean. Fixed thanks

CISC · 2024-04-08T11:35:15Z

Ah dang. Yeah I see what you mean. Fixed thanks

Great, but this was probably not intended though. :)

mofosyne · 2024-04-08T12:34:31Z

Fixed.

rgryta · 2024-04-09T09:49:12Z

Since logging is being introduced, we could also change sys.stderr.write call in the default_outfile method to logger.error.

I'm also wondering what'd be the process of ensuring that later PRs don't introduce new print statements to the code. Up for reviewers to detect, or add pipeline checks/tests - e.g. add flake8-no-print to the python-lint pipeline and replace print with sys.stdout.write.

rgryta · 2024-04-09T15:27:14Z

Nit: \n is now unnecessary in the logger.error 😅

mofosyne · 2024-04-10T05:53:38Z

Since logging is being introduced, we could also change sys.stderr.write call in the default_outfile method to logger.error.

I'm also wondering what'd be the process of ensuring that later PRs don't introduce new print statements to the code. Up for reviewers to detect, or add pipeline checks/tests - e.g. add flake8-no-print to the python-lint pipeline and replace print with sys.stdout.write.

Alright then, I've figured out how to run flake8 directly (after failing to get pre-commit to work) and start using the flake8 no print allowed plugin. It then threw a whole bunch of errors on all the other pythons scripts in this repo. So decided to give correcting it all a stab.

Also updated the github actions and pre-commit config to use this feature so hopefully it now works.

Since this is changing quite a few scripts, I had to spend a bit more time on other .py scripts that had an older approach to using print() especially if ending with end= or not using fstrings for variables substitution. So you may want to pay attention to this when reviewing.

Unfortunately this has now make this PR bit more daunting to review compared to the past, but with the inclusion of the no-print check this should hopefully be a one time pain for everyone here. :)

Regarding my issue getting pre-commit working

This is what I got. I've looked around but haven't found a solution yet. Hopefully its not an issue for others working on this.

$ pre-commit install
pre-commit installed at /home/mofosyne/huggingface/TinyLLama-v0-llamafile/.git/modules/llama.cpp/hooks/pre-commit

$ pre-commit
[INFO] Initializing environment for https://github.com/PyCQA/flake8:flake8-no-print.
[INFO] Installing environment for https://github.com/pre-commit/pre-commit-hooks.
[INFO] Once installed this environment will be reused.
[INFO] This may take a few minutes...
An unexpected error has occurred: AssertionError: BUG: expected environment for python to be healthy immediately after install, please open an issue describing your environment

more info:

virtualenv python version did not match created version:
- actual version: <<error retrieving version from /home/mofosyne/.cache/pre-commit/reposmwb8fu3/py_env-python3.10/bin/python>>
- expected version: 3.10.12.final.0

Check the log at /home/mofosyne/.cache/pre-commit/pre-commit.log

.github/workflows/python-lint.yml

convert-hf-to-gguf.py

convert-llama-ggml-to-gguf.py

gguf-py/gguf/constants.py

gguf-py/scripts/gguf-convert-endian.py

scripts/verify-checksum-models.py

scripts/compare-llama-bench.py

gguf-py/examples/reader.py

convert.py

gguf-py/scripts/gguf-dump.py

mofosyne · 2024-04-18T04:27:44Z

rebased on top of latest master due to extra breaking changes on convert-hf-to-gguf.py (more print() was added in the main repo)

mofosyne

Okay just finished making all the changes I noted from you (besides the flake8 enquiry which I'm taking as a comment for now).

I also took a bit of time to look over it again and it looks fine on my end so far.

CISC · 2024-04-18T05:47:57Z

Great work. :)

A final note however, you should use __name__ as logger name for everything in gguf-py/gguf as they will be imported and not run directly.

mofosyne · 2024-04-18T05:58:23Z

Great work. :)

A final note however, you should use __name__ as logger name for everything in gguf-py/gguf as they will be imported and not run directly.

One of your previous statement appears to state the opposite.

You missed a couple of logging uses though, and you shouldn't use name as the logger name in a stand-alone script (it'll just be main), just calling it convert f.ex. will do fine.

So you are saying just specifically for gguf-py/gguf because it's used within other scripts?

mofosyne · 2024-04-18T06:03:42Z

Anyway added your suggestion in @CISC hopefully it should all check out!

CISC · 2024-04-18T07:25:58Z

So you are saying just specifically for gguf-py/gguf because it's used within other scripts?

Yes, because __name__ will contain the module path in this case (and thus all be related to each other), it's mainly stand-alone scripts that benefit from explicitly naming the logger (unless you wish to create a certain logging hierarchy).

mofosyne

Manually double checked and clicked viewed on all pages. Found one unused logger import and fixed it. Aside from that, looks ready.

mofosyne · 2024-04-21T01:34:49Z

Rebased to add one additional [convert-hf-to-gguf.py: add additional logging](https://github.com/ggerganov/llama.cpp/pull/6511/commits/a6f54dee3ca65ae1dbeab3f8c26c1d75a9609715) commit to fix addition of print() convert-hf-to-gguf.py in the main branch.

teleprint-me · 2024-04-21T19:43:22Z

I have a pattern we can use to centralize the logger instantiation with.

The implementation is really simple and flexible.

LOGGER_FORMAT = "%(asctime)s - %(filename)s:%(lineno)d - %(levelname)s - %(message)s"

def get_default_logger(name=None, level=logging.DEBUG):
    logger = logging.getLogger(name)
    if not logger.hasHandlers():
        handler = logging.StreamHandler(stream=sys.stdout)
        formatter = logging.Formatter(LOGGER_FORMAT)
        handler.setFormatter(formatter)
        logger.addHandler(handler)
        logger.setLevel(level)
    return logger

I left the docstring out of the snippet shown here, but this is basically it. You're free to use it to simplify the implementation and reduce code duplication. I've been able to use this in classes, functions, and scripts and regularly use it. It's under the MIT License, so it's compatible with llama.cpp.

This pattern can be extended over time and included in the gguf-py package as a logger.py module that can be improved over time.

I never liked the use of the global LOGGER_FORMAT, but it simplified the initial implementation which I extracted from a few other programs I wrote. The implementation details can be ironed out over time as needed though.

mofosyne · 2024-04-21T23:36:05Z

Happy to add if @cebtenzzre thinks it's a good idea.

What's the advantage of a centralized logger instance? Would this be better as a separate PR (Which would be good if it's only a small change to each file).

The main objective of this PR is simply to convert from print() to logger in a preferably simple and extendable way, so at least for this one I would prefer to minimize adding dependencies.

CISC

Hate to do this to you, but you got the wrong one, the correct one is at L1189, please revert 6d42f3d :)

mofosyne · 2024-05-02T06:25:16Z

@mofosyne Great work here, unfortunately I think you were a bit unlucky and fell right between some urgent events, leaving everyone very busy, which has postponed the merge of this PR.

Also I do not have any reviewer powers, just being a busy-body. :) You'll have to wait for @cebtenzzre and/or @ggerganov to get some spare time.

Ah that's not a problem. Ultimately, this is mostly just a quality of life fix to make developer life smoother. But I do hope to eventually get this in, as better debugging will ultimately make everyone's life easier.

compilade

This is overall quite good. Feel free to ignore/"resolve" any of the comments I made if they are too nit-picky.

convert-llama-ggml-to-gguf.py

gguf-py/scripts/gguf-convert-endian.py

scripts/run-with-preset.py

tests/test-tokenizer-0-spm.py

requirements.txt

convert-hf-to-gguf.py

mofosyne · 2024-05-03T02:37:09Z

Okay implemented most of your suggestions @compilade but excluded some style observations as it doesn't appear to be an issue.

edit: Accodentally typed cebtenzzre. Intended to invoke compilade

compilade

Looks good to me. Merging this will cause conflicts in at least #7041, #7036, #7033, #7031, #7027, #7018, #6942, #6866 and soft conflicts with at least #7020, #6986, #6919, #6826. (soft conflicts in this case mean new linter errors, but no merge conflict)

Those involved in these PRs will probably want to take at least a cursory look at this before then (hopefully soon).

teleprint-me · 2024-05-03T05:06:07Z

Maybe this one should be prioritized for now?

mofosyne · 2024-05-04T04:49:31Z

Thanks for merging! Hopefully all is well for other PRs!

ggerganov · 2024-05-04T04:52:43Z

On my Macbook I don't see the print's anymore. For example:

python3 convert-hf-to-gguf-update.py

should print usage, but after this PR it does not print anything. Any ideas?

mofosyne · 2024-05-04T05:02:37Z

I see what's going on. By default in https://docs.python.org/3/howto/logging.html logging level is only warning and up.

The default level is WARNING, which means that only events of this level and above will be tracked, unless the logging package is configured to do otherwise.

To get it printing again you need to add logging.basicConfig(level=logging.DEBUG) just under logger = logging.getLogger("convert-hf-to-gguf-update")

This was missed because it was not using the typical parser.add_argument that I was searching for.

We may want to convert it to ArgumentParser() so we can add the --verbose flag. But if unlikely to be complicated, then just throwing the basicConfig() under logger = logging.getLogger("convert-hf-to-gguf-update") will be good enough.

On a side note you may want to also chmod +x convert-hf-to-gguf-update.py after adding #!/usr/bin/env python3 to the top of the script.

* convert.py: add python logging instead of print() * convert.py: verbose flag takes priority over dump flag log suppression * convert.py: named instance logging * convert.py: use explicit logger id string * convert.py: convert extra print() to named logger * convert.py: sys.stderr.write --> logger.error * *.py: Convert all python scripts to use logging module * requirements.txt: remove extra line * flake8: update flake8 ignore and exclude to match ci settings * gh-actions: add flake8-no-print to flake8 lint step * pre-commit: add flake8-no-print to flake8 and also update pre-commit version * convert-hf-to-gguf.py: print() to logger conversion * *.py: logging basiconfig refactor to use conditional expression * *.py: removed commented out logging * fixup! *.py: logging basiconfig refactor to use conditional expression * constant.py: logger.error then exit should be a raise exception instead * *.py: Convert logger error and sys.exit() into a raise exception (for atypical error) * gguf-convert-endian.py: refactor convert_byteorder() to use tqdm progressbar * verify-checksum-model.py: This is the result of the program, it should be printed to stdout. * compare-llama-bench.py: add blank line for readability during missing repo response * reader.py: read_gguf_file() use print() over logging * convert.py: warning goes to stderr and won't hurt the dump output * gguf-dump.py: dump_metadata() should print to stdout * convert-hf-to-gguf.py: print --> logger.debug or ValueError() * verify-checksum-models.py: use print() for printing table * *.py: refactor logging.basicConfig() * gguf-py/gguf/*.py: use __name__ as logger name Since they will be imported and not run directly. * python-lint.yml: use .flake8 file instead * constants.py: logger no longer required * convert-hf-to-gguf.py: add additional logging * convert-hf-to-gguf.py: print() --> logger * *.py: fix flake8 warnings * revert changes to convert-hf-to-gguf.py for get_name() * convert-hf-to-gguf-update.py: use triple quoted f-string instead * *.py: accidentally corrected the wrong line * *.py: add compilade warning suggestions and style fixes

reneleonhardt · 2024-05-06T07:17:42Z

Hmm I wonder why not all f-strings have been converted to %s parameters like all logging libraries suggest for a bit lazier evaluation 😅
https://docs.python.org/3/howto/logging-cookbook.html#formatting-styles

CISC · 2024-05-06T14:17:27Z

Hmm I wonder why not all f-strings have been converted to %s parameters like all logging libraries suggest for a bit lazier evaluation 😅

If it would impact performance in any meaningful way, sure, otherwise no. IMO f-strings are much better and less error prone.

* convert.py: add python logging instead of print() * convert.py: verbose flag takes priority over dump flag log suppression * convert.py: named instance logging * convert.py: use explicit logger id string * convert.py: convert extra print() to named logger * convert.py: sys.stderr.write --> logger.error * *.py: Convert all python scripts to use logging module * requirements.txt: remove extra line * flake8: update flake8 ignore and exclude to match ci settings * gh-actions: add flake8-no-print to flake8 lint step * pre-commit: add flake8-no-print to flake8 and also update pre-commit version * convert-hf-to-gguf.py: print() to logger conversion * *.py: logging basiconfig refactor to use conditional expression * *.py: removed commented out logging * fixup! *.py: logging basiconfig refactor to use conditional expression * constant.py: logger.error then exit should be a raise exception instead * *.py: Convert logger error and sys.exit() into a raise exception (for atypical error) * gguf-convert-endian.py: refactor convert_byteorder() to use tqdm progressbar * verify-checksum-model.py: This is the result of the program, it should be printed to stdout. * compare-llama-bench.py: add blank line for readability during missing repo response * reader.py: read_gguf_file() use print() over logging * convert.py: warning goes to stderr and won't hurt the dump output * gguf-dump.py: dump_metadata() should print to stdout * convert-hf-to-gguf.py: print --> logger.debug or ValueError() * verify-checksum-models.py: use print() for printing table * *.py: refactor logging.basicConfig() * gguf-py/gguf/*.py: use __name__ as logger name Since they will be imported and not run directly. * python-lint.yml: use .flake8 file instead * constants.py: logger no longer required * convert-hf-to-gguf.py: add additional logging * convert-hf-to-gguf.py: print() --> logger * *.py: fix flake8 warnings * revert changes to convert-hf-to-gguf.py for get_name() * convert-hf-to-gguf-update.py: use triple quoted f-string instead * *.py: accidentally corrected the wrong line * *.py: add compilade warning suggestions and style fixes

slaren · 2024-05-07T23:56:46Z

This has broken compare-llama-bench.py.

ggerganov · 2024-05-08T03:42:29Z

This has broken compare-llama-bench.py.

@mofosyne PTAL - usage instructions are here:

llama.cpp/scripts/compare-llama-bench.py

Lines 45 to 57 in c780e75

    
           DESCRIPTION = """Creates tables from llama-bench data written to an SQLite database. Example usage (Linux): 
        
           $ git checkout master 
        
           $ make clean && make llama-bench 
        
           $ ./llama-bench -o sql | sqlite3 llama-bench.sqlite 
        
           $ git checkout some_branch 
        
           $ make clean && make llama-bench 
        
           $ ./llama-bench -o sql | sqlite3 llama-bench.sqlite 
        
           $ ./scripts/compare-llama-bench.py 
        
           Performance numbers from multiple runs per commit are averaged WITHOUT being weighted by the --repetitions parameter of llama-bench. 
        
           """

mofosyne force-pushed the convert-py-logging branch from 2a2b801 to e4e4df5 Compare April 6, 2024 13:53

mofosyne force-pushed the convert-py-logging branch from baddc3f to b9c984b Compare April 10, 2024 05:48

mofosyne mentioned this pull request Apr 12, 2024

convert.py: Outfile default name change and additional metadata support #4858

Merged

ggerganov requested a review from cebtenzzre April 12, 2024 12:18

cebtenzzre reviewed Apr 17, 2024

View reviewed changes

mofosyne force-pushed the convert-py-logging branch from 3e6ff06 to 9e4cf37 Compare April 18, 2024 04:26

mofosyne commented Apr 18, 2024

View reviewed changes

mofosyne commented Apr 21, 2024

View reviewed changes

mofosyne force-pushed the convert-py-logging branch from 89a75d3 to a6f54de Compare April 21, 2024 01:33

convert-hf-to-gguf-update.py: use triple quoted f-string instead

154ad12

CISC reviewed May 1, 2024

View reviewed changes

compilade mentioned this pull request May 1, 2024

convert-hf : reduce repeated boilerplate from write_tensors #7031

Closed

18 tasks

*.py: accidentally corrected the wrong line

08e2b77

compilade reviewed May 3, 2024

View reviewed changes

*.py: add compilade warning suggestions and style fixes

52d0567

mofosyne requested a review from compilade May 3, 2024 02:57

compilade approved these changes May 3, 2024

View reviewed changes

ggerganov merged commit a2ac89d into ggerganov:master May 3, 2024
22 checks passed

dranger003 mentioned this pull request May 3, 2024

Add BPE pre-tokenization for Command-R/R+. #7063

Merged

mofosyne deleted the convert-py-logging branch May 4, 2024 04:41

mofosyne mentioned this pull request May 4, 2024

two converter py files needs basicConfig() added #7070

Closed

compilade mentioned this pull request May 4, 2024

convert-hf : save memory with lazy evaluation #7075

Merged

7 tasks

christianazinn mentioned this pull request May 5, 2024

Option to split during conversion #6942

Merged

mofosyne mentioned this pull request May 8, 2024

compare-llama-bench.py: add missing basicConfig #7138

Merged

fairydreaming mentioned this pull request May 9, 2024

Added support for the ArcticForCausalLM. #7020

Merged

convert.py: add python logging instead of print() #6511

convert.py: add python logging instead of print() #6511

Conversation

mofosyne commented Apr 6, 2024

CISC commented Apr 7, 2024

deiteris commented Apr 7, 2024 • edited Loading

CISC commented Apr 7, 2024

mofosyne commented Apr 8, 2024

CISC commented Apr 8, 2024

mofosyne commented Apr 8, 2024

CISC commented Apr 8, 2024

mofosyne commented Apr 8, 2024

CISC commented Apr 8, 2024

mofosyne commented Apr 8, 2024

rgryta commented Apr 9, 2024 • edited Loading

rgryta commented Apr 9, 2024

mofosyne commented Apr 10, 2024 • edited Loading

Regarding my issue getting pre-commit working

mofosyne commented Apr 18, 2024

mofosyne left a comment

Choose a reason for hiding this comment

CISC commented Apr 18, 2024

mofosyne commented Apr 18, 2024

mofosyne commented Apr 18, 2024

CISC commented Apr 18, 2024

mofosyne left a comment

Choose a reason for hiding this comment

mofosyne commented Apr 21, 2024

teleprint-me commented Apr 21, 2024 • edited Loading

mofosyne commented Apr 21, 2024

CISC left a comment • edited Loading

Choose a reason for hiding this comment

mofosyne commented May 2, 2024

compilade left a comment

Choose a reason for hiding this comment

mofosyne commented May 3, 2024 • edited Loading

compilade left a comment

Choose a reason for hiding this comment

teleprint-me commented May 3, 2024

mofosyne commented May 4, 2024 • edited Loading

ggerganov commented May 4, 2024

mofosyne commented May 4, 2024 • edited Loading

reneleonhardt commented May 6, 2024

CISC commented May 6, 2024

slaren commented May 7, 2024

ggerganov commented May 8, 2024

deiteris commented Apr 7, 2024 •

edited

Loading

rgryta commented Apr 9, 2024 •

edited

Loading

mofosyne commented Apr 10, 2024 •

edited

Loading

teleprint-me commented Apr 21, 2024 •

edited

Loading

CISC left a comment •

edited

Loading

mofosyne commented May 3, 2024 •

edited

Loading

mofosyne commented May 4, 2024 •

edited

Loading

mofosyne commented May 4, 2024 •

edited

Loading