Better readability for numbers in output #89

YYYasin19 · 2022-11-20T15:22:56Z

Closes #88

Adds a method for cutting decimal numbers at their first differing place (cf. issue for a better explanation of the problem).

I also added a method that rounds numbers to their order of magnitude so the difference in large numbers can be seen more easily. Otherwise, you sometimes have an output like 12389239823 rows instead of 12259385943.
I just realized that the current method would round both to 120,000,... but an even better solution would be to round them to their first differing order of magnitude as well, i.e. 12,300,000,000 rows instead of 12,200,000,000.

I'll have to integrate them more though and add tests 👍

codecov · 2022-11-20T15:25:48Z

Codecov Report

Merging #89 (b3e0ee7) into main (84eaeaa) will increase coverage by 1.91%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##             main      #89      +/-   ##
==========================================
+ Coverage   91.85%   93.76%   +1.91%     
==========================================
  Files          17       18       +1     
  Lines        1878     1894      +16     
==========================================
+ Hits         1725     1776      +51     
+ Misses        153      118      -35

Files Changed	Coverage Δ
src/datajudge/constraints/nrows.py	`94.93% <100.00%> (+0.19%)`	⬆️
src/datajudge/formatter.py	`100.00% <100.00%> (ø)`
src/datajudge/utils.py	`100.00% <100.00%> (ø)`

... and 2 files with indirect coverage changes

YYYasin19 · 2022-12-11T16:57:10Z

Not really really ready for review, but I'd like to hear some thoughts.
The problem seems a little more complicated than I thought at first. The issue regarding UX I have is the following:
How would you communicate to user that the numbers are not perfectly accurate but rounded for better readability?
A global setting? A method arg? Could be an arg to the Requirement or Constraint itself, e.g.

req.add_numeric_mean_constraint(..., accurate_numbers = True)

or maybe an argument for the pytest_integration that then sets a global variable?

YYYasin19 · 2022-12-11T17:05:14Z

As you can see in my usage of the two functions, I'm also not really sure what's the best way from a dev experience POV.
Smart would be to just use them at a 'higher level' point in the abstraction tree we have in datajudge, e.g. apply formatting in get_factual_value once.

kklein · 2022-12-12T08:14:16Z

The issue regarding UX I have is the following:
How would you communicate to user that the numbers are not perfectly accurate but rounded for better readability?
A global setting? A method arg? Could be an arg to the Requirement or Constraint itself

I'm not sure I find it totally necessary to to communicate to the end user what kind of rounding has taken place, as long as the rounding works in the interest of the prototypical user. Potentially one could add a short phrase to assertion texts indicating what kind of rounding has taken place, though I'm not sure about how bloated the texts might become then.

A global setting? A method arg? Could be an arg to the Requirement or Constraint itself, e.g.

If you truly want to make this parametrizable, the test method of a Constraint seems the most natural to me. It's not clear to me why this should be state of a Requirement or Constraint - I'd rather think of it as a parameter to the testing itself. If an end user is not using the pytest_integration module, they could pass it every time they call test. If they are using the pytest_integration module, they only need to pass it once.

As you can see in my usage of the two functions, I'm also not really sure what's the best way from a dev experience POV.
Smart would be to just use them at a 'higher level' point in the abstraction tree we have in datajudge, e.g. apply formatting in get_factual_value once.

I'm not really sure I know what you mean. Your exemplary use in NRowsMaxLoss doesn't look too bad to me. :)

YYYasin19 · 2022-12-12T22:32:54Z

Nice! I think you're right about all the things you have said.
Rethinking it through, I don't think any user should need more readability since we only round at the differing place, i.e., there is enough precision to see the difference.

What do you think about percentages instead of floats?

YYYasin19 · 2022-12-14T12:18:13Z

Ignoring the messy code above: We could also color-code the output for better readability without having a trade-off on the user's precision. very nice idea from @jonashaag

ivergara · 2022-12-14T12:37:39Z

Great idea with the colorization! Just be cautious of cases where the place it's being printed on doesn't support that and then you end up (eventually) with garbage in the string.

YYYasin19 · 2022-12-14T12:43:29Z

Great idea with the colorization! Just be cautious of cases where the place it's being printed on doesn't support that and then you end up (eventually) with garbage in the string.

Yep, routing that output into a file gives you not-so-well formatted string.

YYYasin19 · 2022-12-14T12:48:20Z

We can of course hack around like this

import sys
if sys.stdout.isatty():
    print(console.colorize("red", "This text is red."))
else:
    print("This text is red.")

but that may not be worth the hurdle, too..

jonashaag · 2022-12-14T13:14:26Z

It's not a hack actually, you might want to have a look at how other libraries identify color-readiness, I expect similar code

YYYasin19 · 2022-12-15T20:44:18Z

What do you think about the following implementation? This works using coloroma, which is already a dependency of pytest, so it's at no additional cost for us.

When piping output to a file, it strips all formatting, s.t. something like the first example does not happen. It also works in f-strings, which are used in our assert statements.

kklein · 2023-01-17T15:40:20Z

I still really like the idea of the coloring, yet it doesn't seem to work perfectly for my local machine just yet. When I use the given functionality in an assertion string, I obtain the following:

The default pytest coloring and the coloring introduced in this PR outdo each other. :/

@YYassin19 would you be open to making this PR about rounding and tackling the coloring in a follow-up PR? That might allow it to separate concerns and facilitate progress.

YYYasin19 · 2023-01-17T16:08:48Z

@YYassin19 would you be open to making this PR about rounding and tackling the coloring in a follow-up PR?

I thought the idea was that this was the alternative to rounding. By adding colors, we can still show full complexity while making it very easy to read.

YYYasin19 · 2023-01-17T16:11:03Z

The default pytest coloring and the coloring introduced in this PR outdo each other. :/

:(
I'll look for a workaround. Detecting pytest and disabling coloring is not an option, though.
Do you know if pytest prints to stderr the red part and the terminal shows it colored because of that? Or do they color it themselves?

kklein · 2023-01-17T16:22:28Z

I thought the idea was that this was the alternative to rounding. By adding colors, we can still show full complexity while making it very easy to read.

I see - sorry about the misunderstanding.

Do you know if pytest prints to stderr the red part and the terminal shows it colored because of that? Or do they color it themselves?

Sadly I don't know.

YYYasin19 · 2023-01-17T18:47:50Z

Hmm, I have no clue so far.
pytest prints different parts separately (e.g., the short summary to regular stdout, which gets printed in white in most terminals).
We can't disable the feature for pytest as well, as this is the main use case.

YYYasin19 · 2023-01-21T16:03:14Z

Okay, I did some digging, and it turns out that the coloring of the section is currently hard-coded in pytest.
There is a --color=no option in pytest that disables coloring entirely, and we could make changes upstream to make that customizable.
But, since our tests are called from pytest (and not the other way around), it's not really possible to modify the pytest call anyway.

The output seems fine on most UIs I have checked (e.g. PyCharm, VSCode, GitHub Output) but breaks for low contrast environments.

YYYasin19 · 2023-06-12T13:19:49Z

New try: After talking with @0xbe7a about #153 we had a look on this PR as well, since it's pretty close to the topic.

TL;DR:

Rounding (for all use cases) is hard, not always obvious and looses precision where it might be needed.
This approach uses coloring and thousand-seperators (cf. image)
The chosen color is cyan on purpose since it's not likely to collide with pytest colors.

ivergara · 2023-06-12T13:22:35Z

Awesome!

kklein

Looks great! Should we add colorma to environment.yaml and pyproject.toml?

src/datajudge/utils.py

YYYasin19 · 2023-06-16T08:44:46Z

Looks great! Should we add colorma to environment.yaml and pyproject.toml?

Yes, I'll do that!

Should we also start re-writing some (all?) of the assertion messages to use this or have these few test cases, refine the feature further and refactor the others from time to time?

kklein · 2023-06-16T08:58:11Z

Should we also start re-writing some (all?) of the assertion messages to use this or have these few test cases, refine the feature further and refactor the others from time to time?

Both are fine with me. In the latter case we should probably just make sure to create some shared overview of what has already been done and what remains to be done.

YYYasin19 · 2023-06-17T13:30:00Z

Should we also start re-writing some (all?) of the assertion messages to use this or have these few test cases, refine the feature further and refactor the others from time to time?

Both are fine with me. In the latter case we should probably just make sure to create some shared overview of what has already been done and what remains to be done.

I've created an issue to maybe track this task for the next time, hope that's a good middle way!

kklein · 2023-06-17T16:46:40Z

I've created an issue to maybe track this task for the next time, hope that's a good middle way!

Thanks! I'm not sure I understand 100%. What do you think about creating a concrete list of constraints with a binary indicator expressing completion?

kklein

Looking great! Just to be sure: has anyone tested this change with the html reports yet?

kklein · 2023-06-17T16:49:01Z

src/datajudge/constraints/nrows.py

@@ -58,9 +61,10 @@ def compare(self, n_rows_factual: int, n_rows_target: int) -> Tuple[bool, str]:
 class NRowsEquality(NRows):
    def compare(self, n_rows_factual: int, n_rows_target: int) -> Tuple[bool, str]:
        result = n_rows_factual == n_rows_target
+        n1, n2 = diff_color(n_rows_factual, n_rows_target)


I would either suggest

n_rows_factual_fmt, n_rows_target_fmt = diff_color(n_rows_factual, n_rows_target)

'as before' or

factual_fmt, target_fmt = diff_color(n_rows_factual, n_rows_target)

as in other places in this PR.

kklein · 2023-06-17T16:50:13Z

src/datajudge/constraints/nrows.py

@@ -145,10 +153,13 @@ def compare(self, n_rows_factual: int, n_rows_target: int) -> Tuple[bool, str]:
        if n_rows_factual < n_rows_target:
            return False, "Row loss."
        relative_gain = (n_rows_factual - n_rows_target) / n_rows_target
+        relative_gain_fmt, min_gain_fmt = diff_color(


Suggested change

relative_gain_fmt, min_gain_fmt = diff_color(

relative_gain_fmt, min_relative_gain_fmt = diff_color(

kklein · 2023-06-21T10:23:42Z

I tested it myself, unfortunately this doesn't work with reports. :/

ivergara · 2023-06-21T12:33:48Z

@YYYasin19 it should be possible to ask pytest if the html report option is active and if so, then the coloring should be deactivated.

YYYasin19 · 2023-06-21T15:13:28Z

Alternatively, what do you think about a global option COLORING: bool that is set during the intial call of our pytest_integration?

kklein · 2023-06-21T20:16:36Z

Alternatively, what do you think about a global option COLORING: bool that is set during the intial call of our pytest_integration?

Do I understand correctly that this wouldn't allow for coloring if people don't use pytest_integration but collect constraints themselves? I think that some people do the latter.

0xbe7a · 2023-06-21T20:46:38Z

What do you guys think about if we support Colored Tags in the Failure message like Normal Text <blue>Hello from Blue Text</blue>? Then we could extend TestResult.formatted_message(self) -> str to TestResult.formatted_message(self, formatter: Formatter = NoColor) -> str and provide for example the formatters AnsiColorFormatter, HTMLFormatter, NoColorFormatter etc. which replace the colors with platform specific Escape Sequences. Downstream users like the pytest_helper function could then choose which formatter to use based on their own settings and we would avoid global state / side effects. More importantly, all existing tests can still output normal ASCII text as failure messages and can be unaware of any formatters.

ivergara · 2023-06-22T06:55:38Z

What do you guys think about if we support Colored Tags in the Failure message like Normal Text <blue>Hello from Blue Text</blue>? Then we could extend TestResult.formatted_message(self) -> str to TestResult.formatted_message(self, formatter: Formatter = NoColor) -> str and provide for example the formatters AnsiColorFormatter, HTMLFormatter, NoColorFormatter etc. which replace the colors with platform specific Escape Sequences. Downstream users like the pytest_helper function could then choose which formatter to use based on their own settings and we would avoid global state / side effects. More importantly, all existing tests can still output normal ASCII text as failure messages and can be unaware of any formatters.

That's certainly a better solution. If you can pull it it'd be great!

YYYasin19 · 2023-06-22T07:30:38Z

What do you guys think about if we support Colored Tags in the Failure message

Nice idea!
How would we do the number highlighting (which started this whole journey) in this system? Have diff_color output our tags that get then formatted per platform?

YYYasin19 · 2023-06-22T07:31:54Z

I am in favor (of atleast trying it out) iff we have other use cases as well. Having more readable output is important, but does not only depend on better formatting for "long" numbers, only.

0xbe7a · 2023-06-22T07:36:21Z

Nice idea! How would we do the number highlighting (which started this whole journey) in this system? Have diff_color output our tags that get then formatted per platform?

Yes, exactly. We can even do proper HTML Colouring using this approach

0xbe7a · 2023-06-22T09:36:14Z

I implemented my concept in 567d6cd

ivergara · 2023-06-22T10:26:04Z

@0xbe7a looks good!

ivergara · 2023-07-12T10:50:09Z

@YYYasin19 How do we move with this PR after #170? Turn it into an example use in one constraint?

YYYasin19 · 2023-07-26T07:46:46Z

@YYYasin19 How do we move with this PR after #170? Turn it into an example use in one constraint?

I'll rebase, refactor some code and then add some examples to the more popular constraints we use!

YYYasin19 · 2023-08-23T11:32:07Z

I think the PR's mostly ready. It has some examples of the usage (that we can further extend), it has some tests..
The only thing "missing" would be the codecov, any ideas on how to improve that? The new lines that were not measured are the ones that apply the formatting in the test messages.

src/datajudge/utils.py

ivergara · 2023-08-28T08:03:08Z

Thank you @YYYasin19 and @0xbe7a for this undertaking!

YYYasin19 marked this pull request as ready for review December 11, 2022 16:25

0xbe7a mentioned this pull request Jun 12, 2023

Improve number reporting in add_categorical_bound_constraint #153

Open

YYYasin19 closed this Jun 12, 2023

YYYasin19 force-pushed the fmt-float-increase branch from 6821d41 to c1095c9 Compare June 12, 2023 12:45

YYYasin19 reopened this Jun 12, 2023

kklein reviewed Jun 16, 2023

View reviewed changes

src/datajudge/utils.py Outdated Show resolved Hide resolved

YYYasin19 force-pushed the fmt-float-increase branch from 9d69a30 to c4b2102 Compare June 16, 2023 08:50

YYYasin19 mentioned this pull request Jun 17, 2023

Refactor some assertion message to make use of colored output #164

Open

kklein reviewed Jun 17, 2023

View reviewed changes

0xbe7a mentioned this pull request Jun 22, 2023

Add formatting for assertion messages #170

Merged

update: use newly added formatter for diff_color

c9c2398

YYYasin19 force-pushed the fmt-float-increase branch from 06c05f6 to c9c2398 Compare July 27, 2023 14:35

YYYasin19 and others added 2 commits July 27, 2023 17:22

add: tests

c4640fa

Merge branch 'main' into fmt-float-increase

7ab3d39

YYYasin19 requested review from kklein and ivergara August 23, 2023 11:32

0xbe7a added the ready label Aug 23, 2023

0xbe7a reviewed Aug 25, 2023

View reviewed changes

src/datajudge/utils.py Outdated Show resolved Hide resolved

remove just_fix_windows_console duplicate

b3e0ee7

0xbe7a approved these changes Aug 28, 2023

View reviewed changes

ivergara merged commit 6df9980 into main Aug 28, 2023

ivergara deleted the fmt-float-increase branch August 28, 2023 08:03

	relative_gain_fmt, min_gain_fmt = diff_color(
	relative_gain_fmt, min_relative_gain_fmt = diff_color(

Better readability for numbers in output #89

Better readability for numbers in output #89

Conversation

YYYasin19 commented Nov 20, 2022

codecov bot commented Nov 20, 2022 • edited Loading

Codecov Report

YYYasin19 commented Dec 11, 2022

YYYasin19 commented Dec 11, 2022

kklein commented Dec 12, 2022

YYYasin19 commented Dec 12, 2022

YYYasin19 commented Dec 14, 2022

ivergara commented Dec 14, 2022

YYYasin19 commented Dec 14, 2022

YYYasin19 commented Dec 14, 2022 • edited Loading

jonashaag commented Dec 14, 2022

YYYasin19 commented Dec 15, 2022

kklein commented Jan 17, 2023

YYYasin19 commented Jan 17, 2023 • edited Loading

YYYasin19 commented Jan 17, 2023 • edited Loading

kklein commented Jan 17, 2023

YYYasin19 commented Jan 17, 2023

YYYasin19 commented Jan 21, 2023 • edited Loading

YYYasin19 commented Jun 12, 2023 • edited Loading

ivergara commented Jun 12, 2023

kklein left a comment

Choose a reason for hiding this comment

YYYasin19 commented Jun 16, 2023

kklein commented Jun 16, 2023

YYYasin19 commented Jun 17, 2023

kklein commented Jun 17, 2023

kklein left a comment

Choose a reason for hiding this comment

kklein Jun 17, 2023

Choose a reason for hiding this comment

kklein Jun 17, 2023

Choose a reason for hiding this comment

kklein commented Jun 21, 2023

ivergara commented Jun 21, 2023

YYYasin19 commented Jun 21, 2023

kklein commented Jun 21, 2023

0xbe7a commented Jun 21, 2023 • edited Loading

ivergara commented Jun 22, 2023

YYYasin19 commented Jun 22, 2023

YYYasin19 commented Jun 22, 2023

0xbe7a commented Jun 22, 2023 • edited Loading

0xbe7a commented Jun 22, 2023

ivergara commented Jun 22, 2023

ivergara commented Jul 12, 2023

YYYasin19 commented Jul 26, 2023

YYYasin19 commented Aug 23, 2023

ivergara commented Aug 28, 2023

codecov bot commented Nov 20, 2022 •

edited

Loading

YYYasin19 commented Dec 14, 2022 •

edited

Loading

YYYasin19 commented Jan 17, 2023 •

edited

Loading

YYYasin19 commented Jan 17, 2023 •

edited

Loading

YYYasin19 commented Jan 21, 2023 •

edited

Loading

YYYasin19 commented Jun 12, 2023 •

edited

Loading

0xbe7a commented Jun 21, 2023 •

edited

Loading

0xbe7a commented Jun 22, 2023 •

edited

Loading