Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More treelist utility functions #5106

Open
wants to merge 13 commits into
base: master
Choose a base branch
from

Conversation

AlexKnauth
Copy link
Member

@AlexKnauth AlexKnauth commented Nov 21, 2024

Checklist
  • Bugfix
  • Feature
  • tests included
  • documentation

Description of change

Adds 5 new treelist functions:

  • treelist-filter: like list filter
  • treelist-index-of: like list index-of
  • treelist-flatten: like list flatten
  • treelist-append*: like list append* on one argument
  • sequence->treelist: like list sequence->list

@jackfirth
Copy link
Contributor

Can you also add sequence->treelist?

@AlexKnauth
Copy link
Member Author

sequence->treelist added

@jackfirth
Copy link
Contributor

Two performance notes on sequence->treelist:

  • If the input is already a treelist, return it unchanged without iterating at all. That way functions can accept any sequence and convert it to a treelist immediately without any cost for users who already have a treelist.
  • If the input is a list, call the list->treelist function. That way there's no performance difference between sequence->treelist and list->treelist for users.

@AlexKnauth
Copy link
Member Author

Added shortcuts for conversions to treelist from treelist, vector, and list.

@mflatt
Copy link
Member

mflatt commented Nov 22, 2024

What's the rationale for adding these particular functions? Some of these seem arbitrary to me, liketreelist-keep-members, but maybe there's some precedent or rationale that I'm missing.

@AlexKnauth
Copy link
Member Author

AlexKnauth commented Nov 22, 2024

These are the list functions that I needed to re-implement for some projects I've been writing in Rhombus. Both keep-members and skip-members came out of wanting to use set intersect and subtract operations on lists in Rhombus, but not wanting to call the subtract operation subtract to avoid potential confusion on duplicate elements.

(edit: and of course, not being able to call the subtract operation remove* due to Rhombus identifier constraints)

@mflatt
Copy link
Member

mflatt commented Nov 22, 2024

These are the list functions that I needed to re-implement for some projects I've been writing in Rhombus.

I'm having trouble seeing this as a rationale for adding to a core library. It seems like we should generally avoid adding things to this library, since everyone who loads racket/treeelist will pay for them.

I can see an argument for adding a few more functions that are known from experience to be very broadly useful. The treelist-flatten and sequence->treelist functions are probably in that category, and maybe treelist-index-of.

As you may guess, I'm not a fan of the way racket/list evolved to have such a large set of functions. A lot of them should have been organized by task, and not in racket/list just because they work on lists.

@jackfirth
Copy link
Contributor

Perhaps sequence-index-of and sequence-index-where would be more valuable additions?

@AlexKnauth
Copy link
Member Author

AlexKnauth commented Nov 22, 2024

I don't think including the functionality of remove* in racket/list was a mistake. It's a very common thing to do with lists, and I think Rhombus should have an operation with the same behavior. I considered other names like remove_all and subtract, but... I didn't want to let people think it was the same as repeatedly calling remove for 1 element at a time.

(edit: remove* is included in racket/base, not racket/list, but my point about it being a very common operation still stands)

@AlexKnauth
Copy link
Member Author

Perhaps if a filter method is implemented with ~keep and ~skip keywords, use cases for a.keep_members(b) and a.skip_members(b) could be replaced with a.filter(~keep: b.has_element(_)) and a.filter(~skip: b.has_element(_)).

@mflatt
Copy link
Member

mflatt commented Nov 22, 2024

Switching to treelist-filter is a great improvement! I support adding treelist-filter, treelist-flatten, sequence->treelist and (less strongly) treelist-index-of. I don't think we should add treelist-index-where, treelist-splitf, or treelist-flatten-once.

@jackfirth
Copy link
Contributor

I'm not a fan of making treelist-filter work differently from filter, vector-filter, hash-filter, sequence-filter, etc. It should take a single predicate argument rather than separate #:keep and #:skip keyword arguments.

@mflatt
Copy link
Member

mflatt commented Nov 22, 2024

I'm not a fan of making treelist-filter work differently from filter, ...

Good point. I think the Rhombus filter convention should use ~keep and ~skip, but maybe that doesn't belong at the racket/treelist level.

@AlexKnauth
Copy link
Member Author

AlexKnauth commented Nov 22, 2024

I think treelist-index-where deserves to be here more than treelist-index-of. Use cases for l.index_of(v) can easily be expressed with l.index_where((_ == v)), but it's not quite as easy for use cases of index-where to be translated to index-of.

I have needed to use splitf operations far more often than other related functions like split, take, drop, takef, and dropf. Though if we keep index-where, at least implementing splitf in terms of index-where isn't hard.

As for treelist-flatten-once, list append* on a single argument is an operation I need far more often than flatten, and it's more "type-safe" in that it works to convert list-of-lists of any type A to a list of A consistently, even when values of type A might be lists. flatten-once is a bit of an awkward name. Would treelist-append* fit better? I didn't want to imply that (treelist-append* tlotl) is just equivalent to (apply treelist-append tlotl) though, since apply doesn't work on treelists. Or treelist-concat? treelist-join?

I like the #:keep and #:skip arguments to treelist-filter for the same reason expressed in racket/rhombus#131 (comment): If someone forgets to use (or has not yet learned to use) #:keep, the error message will make it clear. It also allows the arguments in either order, avoiding potential order inconsistency like hash-filter vs filter. It replaces 2 operations with just 1, and (treelist-filter #:skip pred tl) is a lot nicer than (treelist-filter (lambda (x) (not (pred x))) tl), especially when pred is just a function name.

@mflatt
Copy link
Member

mflatt commented Nov 22, 2024

To get a sense of what should be in a Racket library, I tried grepping all packages from a January snapshot, where each grep was of the form ''[ ()]map[ ()]' (but with map replaced). Checking things like map and append provides scale.

map            21930
append          8262
list-ref        4684
filter          3675
member          3075
memq            2464
for-each        2047
take            2197
drop            1120
flatten          816
apply append     735  ; literally, so on the same line
partition        497
sequence->list   476
append*          374
shuffle          316
add-between      280
drop-right       291
memv             272
index-of         227
filter-map       211
filter-not       202
group-by         187
argmax           154
argmin           140
splitf-at        101
check-duplicates  97
take-right        96
index-where       50
remf              15
indexes-of         6
takef-right        6
splitf-at-right    5
indexes-where      3

These are just rough counts, of course, given the grep approach. A number like 6 means basically unused, except that there are things that track or copy "list.rkt".

I read this as confirming that index-of is borderline, and splitf-at and index-where are significantly more rare. It's true that index-of can be written with index-where, but if the point is to provide useful things directly, index-of seems like the one that might be worthwhile.

The append* function and apply append combination do show up a lot. This is is an operation where treelists are different from lists, though, so I'm not sure it's right to conclude that it will be as useful to treelists. If we just go by these numbers, then I'd concede that treelist-append* belongs.

The partition function shows up more than I expected (just because I never use it). I'd say it's borderline, but more worthwhile than other things we're considering.

The shuffle function is useful, but I'd say it belongs in a different module.

The relative rareness of filter-not is a little bit of an argument against keywords for treelist-filter. I still think the stronger argument is to stay like filter to keep Racket libraries more consistent.

My current conclusions on what to include:

  • treelist-filter - yes (not with keywords)
  • treelist-append* - yes
  • treelist-flatten - yes
  • treelist-partition - probably
  • sequence->treelist - probably
  • treelist-index-of - maybe
  • treelist-index-where - no
  • treelist-splitf - no

Further Rhombus discussion probably belongs in the other repo, but I would be inclined to omit treelist-append* in favor of writing List.append(& lists) (and, of course, have keyword arguments for List.filter for consistency with other forms).

@AlexKnauth
Copy link
Member Author

Would you be happier if functions that split the list into 2 values, such as partition, split, and splitf, were placed in a separate module like racket/treelist/split? Just thinking of that split because not everyone likes dealing with multiple values.

The index operations like index-of and index-where have greater potential for treelists than they did for pairlists because random access by index is more efficient for treelists: should that bump them into the main module? Or should that be separated into a racket/treelist/index module?

@mflatt
Copy link
Member

mflatt commented Nov 23, 2024

As much as I like the treelist datatype, I don't see so many Racket programs moving to them that we need racket/treelist/X libraries, so far. (It looks like 3 non-Rhombus packages use them, based on a freshened package snapshot.) That may change in the long run, but Rhombus seems like a better place to experiment with list-library organizations. Adding to Racket means forever, but Rhombus is free to change, at least for for a little while.

With that in mind, maybe it's best to add only treelist-filter, treelist-append*, and treelist-flatten, because we agree that they're useful, and because list-function uses support that impression (with the caveat that append might be used differently with treelist). Or maybe it's best to not add anything to racket/treelist, since that isn't necessary to add functions to Rhombus — or even particularly useful in the short run, since we wouldn't want to bump the required Racket version for Rhombus.

(treelist-filter odd? (treelist 1 2 3 2 4 5 2))
(treelist-filter (λ (x) (not (even? x))) (treelist 1 2 3 2 4 5 2))
(treelist-filter (λ (x) (not (odd? x))) (treelist 1 2 3 2 4 5 2))
]}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs a history note (and the same for the other additions)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added history notes for 8.15.0.6, the current version. Is that okay, or should this include a version bump to 8.15.0.7?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think "8.15.0.6" is good.

@@ -56,6 +56,9 @@
(regexp-quote (symbol->string 'op))
":"))))

(define-syntax-rule (test-values expected actual)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this is not used, anymore.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants