Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revert select value type support #277

Merged
merged 9 commits into from
Jan 20, 2025

Conversation

eyelidlessness
Copy link
Member

@eyelidlessness eyelidlessness commented Jan 16, 2025

I have verified this PR works in these browsers (latest versions):

  • Chrome
  • Firefox
  • Safari (macOS)
  • Safari (iOS)

What else has been done to verify that this works as intended?

Since this rolls back functionality in #273, the tests added for that functionality have been removed. I think everything else is covered by existing tests and/or type checks.

Why is this the best possible solution? Were any other approaches considered?

This leaves most of the other substantive improvements from #273 intact. Any considerations there apply here.

How does this change affect users? Describe intentional changes to behavior and behavior that could have accidentally been affected by code changes. In other words, what are the regression risks?

This is technically a regression! As of #273, Web Forms behavior for <select1> with a non-string <bind type> was consistent with Collect/JavaRosa (for the current set of supported bind/value types). But we've decided that is an unsupported use case for now, as it isn't possible to produce an affected form with XLSForm/pyxform.

Do we need any specific form for testing your changes? If so, please attach one.

The fixture added in #273 now produces an error on load (which is now expected). The fixture added in this commit demonstrates correct support for spaces in <select1> values (which could have been added in that PR as well).

What's changed

The primary purpose of this change is to revert support for non-string <bind type>s associated with a <select> or <select1> control (introduced in #273). It also establishes the same precedent for <odk:rank>. The bulk of this aspect of the change involves:

  • Eliminating any ValueType type parameterization associated with SelectNode and its implementation
  • Eliminating the vast majority of implementation code supporting that type parameterization

Some of the breadcrumbs for that functionality are left intact, in ValueArrayCodec. There are other use cases where we'll benefit from being able to derive encoding/decoding of array items (e.g. "select from map", which was part of the motivation for typed select support in the first place).

I also took this PR as an opportunity to address a gap in the preparation for <odk:rank>, which I had neglected to address in #273. Specifically, the PR's first commit makes some structural changes to prepare for parsing <odk:rank><item> or <odk:rank><itemset>.

Lastly, since I was already looking in that area, I went ahead and took care of these maintenance chores:

  • The SelectDefinition class associated with the body element is now named SelectControlDefinition, consistent with other parsed representations of similar controls. This eliminates a naming conflict where we also have an interface named SelectDefinition representing the NodeDefinition associated with that control. (We really should take a more holistic look at naming, especially in these earlier areas of the engine package. For now, this is a piecemeal change aiming for consistency with current thinking.)
  • ItemsetNodesetContext, which was unused, is removed.

This is an engine-internal reorganization to improve some less-than-ideal naming and code structure around parsing selects, and helping to prepare for support of `<odk:rank>`.

- Addresses a naming conflict between `SelectDefinition` (as a subtype of `NodeDefinition`) and `SelectDefinition` (as a parsed representation of `<select>` and `<select1>` **body elements**). The parsed body element is now named `SelectControlDefinition`. This naming is still not ideal! But it is consistent with the conceptual equivalents for other controls.
- Anticipates similar parsing for `<odk:rank>` body elements. This may be easy to miss in review without calling it out, as it’s a change in file system structure. Specifically, each of the following are no longer structured as select-specific:
    - `ItemDefinition`
    - `ItemsetDefinition`

    This helps to prepare for `<odk:rank>` because it’s expected that both will be associated with a `RankControlDefinition`, providing the same parsed information about the control’s items to an implementation of `RankNode`/`RankControl`.
- Removes `ItemsetNodesetContext`, which is evidently not in use at all!
Support for non-string bind/value types was introduced in #273. That support exceeds the current ODK XForms spec, which specifies both controls as supporting only a string type.

We initially decided to merge that expansion on these bases:

- XLSForms/pyxform do not produce forms with non-string select types, so it would only occur for forms authored as XML XForms, and would presumably be an intentional form design decision

- The behavior (tested with `<select1>` + `<bind type=“int”>`) is consistent with Collect/JavaRosa

We have since decided to revert that for now, because it would violate assumptions about _multiple values_ (from a `<select>`, and soon from `<odk:rank>`) associated with a non-string `<bind type>`. For example, this would break downstream projects which derive database schema types (e.g. SQL column types) from a form definition’s `<bind type>`.
Copy link

changeset-bot bot commented Jan 16, 2025

🦋 Changeset detected

Latest commit: 0a6010e

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 4 packages
Name Type
@getodk/common Patch
@getodk/scenario Patch
@getodk/web-forms Minor
@getodk/xpath Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@eyelidlessness eyelidlessness marked this pull request as ready for review January 16, 2025 21:17
Copy link
Member

@lognaturel lognaturel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like what I expect from a pretty quick pass through!

@@ -48,7 +48,7 @@ const BodyElementDefinitionConstructors = [
PresentationGroupDefinition,
StructuralGroupDefinition,
InputControlDefinition,
SelectDefinition,
SelectControlDefinition,
Copy link
Member

@lognaturel lognaturel Jan 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this will expand to something like ItemBasedControlDefinition with the addition of rank?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see that becoming an abstract base class for select/select1/rank. Both concrete classes would still be enumerated here though, as it's used for instantiation during parsing. And the base class would make even more sense when we split select/select1.

Copy link
Collaborator

@latin-panda latin-panda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for sending this so fast!

I made a couple of suggestions, but it's mainly questions.

Let me know what you think.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Questions:
The documentation of the scenario package says:

This package implements a client of @getodk/xforms-engine, and a suite of tests and benchmarks against that engine.

  1. When it says ... implements a client..., is it a client to perform tests based on JavaRosa? Asking because when looking at the xforms-engine, there is already a client directory, and I was wondering about the difference.

  2. ... and benchmarks against that engine., is it referring to performance tests? And, is there a baseline performance for what "good" means?
    If I recall correctly, there was a delay when having groups and repeats simultaneously, which we'd like not to make it slower in Web Forms.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. When it says ... implements a client..., is it a client to perform tests based on JavaRosa?

Yep! We decided to make this boundary explicit when I started porting JavaRosa tests, as an additional exercise of the engine/client boundary (alongside the @getodk/web-forms UI client).

  1. [...] there is already a client directory, and I was wondering about the difference.

Yeah, that directory represents the engine's "client interface". Effectively the full interface of the package boundary. It's not quite as isolated as it could be, since in some cases it references types from other parts of the package. We could improve that. (On the other hand, we may reconsider that structure entirely!)

  1. ... and benchmarks against that engine., is it referring to performance tests?

I think we should probably remove that text. I wasn't sure whether we were going to port benchmarks from JavaRosa, and at least in that pass we ultimately decided not to. There's currently just a placeholder that can be run with the benchmark command, but it serves no purpose.

  1. [...] And, is there a baseline performance for what "good" means?

That's a great question! I don't think we have a consistent answer to this. In general, all of these feel like good candidates for performance goals (not all of which we currently meet, at least not across the board):

  • Form loads in 100ms or less on a representative device12
    • Computations from answering any form question complete in 100ms or less12
  • At least as fast as Collect/JavaRosa3
  • Faster than Enketo4
  1. [...] If I recall correctly, there was a delay when having groups and repeats simultaneously, which we'd like not to make it slower in Web Forms.

Can you clarify where you've seen that delay? Do you mean in Enketo (as in, "we'd like not to make it slower in Web Forms [than in Enketo]")? Or is this already a performance issue in Web Forms (as in, "we'd like to make it slower [than it already is]")?

In either case, if you know of any specific forms that perform poorly I would love to have access to them (and bring them into the repo, with appropriate permission from form authors)! I frequently check performance of forms with known performance issues (either in WF or Enketo), at least to be sure we're not introducing significant performance regressions.

And if you want to do the same, these are the ones I check most frequently these days:

  • bench9-alt.xml
  • nigeria_wards_external.xml

Footnotes

  1. There are a few slow forms that I test against which don't meet this goal, but it's generally what I have in mind when I'm focused on performance. 2

  2. At least with our current feature set, there is one very specific area of poor performance which stands out above all else: itemsets, particularly with an expression like instance('some-id')/root/item[some = /data/predicate]. 2 3 4

  3. I haven't done explicit comparisons here. I suspect (hunch based on areas of perf concern in JR, some of which are fast in WF) that we do pretty well with specific exceptions2.

  4. WF is faster than Enketo on almost all of the perf-focused fixtures I test against, with specific exceptions2.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you clarify where you've seen that delay? Do you mean in Enketo (as in, "we'd like not to make it slower in Web Forms [than in Enketo]")? Or is this already a performance issue in Web Forms (as in, "we'd like to make it slower [than it already is]")?

Absolutely! Much of this comes from my previous experience with Enketo. We received reports from technical partners building forms, such as family registration forms. These forms often included a combination of groups and repeats. For example, in remote communities in Africa, families typically have many members, often 5 to 8 children. When users answered a question like "How many children?" with a number such as 7, the form would dynamically repeat 7 times a section, which can have groups and many questions. We received feedback about performance issues, such as the form hanging while filling it out for several minutes.

At the time, we recommended making the forms "flatter" by reducing the use of groups and repeats. Where possible, we also suggested splitting larger forms into multiple smaller forms to improve performance.

It's very common for health workers to use low-end devices with just 2GB of RAM.
I don't think I can access those specific forms anymore due to the technical partner's policies, but we can come up with some complex lengthy ones, if we dont have it already.

I'm going to use these for testing too, thanks!

  • bench9-alt.xml
  • nigeria_wards_external.xml

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion:
I understand the intent behind prioritizing reusability, but I'm considering ways to reduce the spread of what defines "select" to make it more self-contained as a component. Additionally, I suggest inverting the responsibility for the ID format.

Assuming the ID format is tied to the select functionality in the engine, what about moving the selectOptionId(...) logic into the Select1Component? Then, you could pass the formatter as an Input to the RadioButton and CheckboxWidget components.

This approach shifts the responsibility for defining the ID to the component that uses RadioButton and CheckboxWidget, which would provide greater flexibility. For example, if these components are later used in contexts requiring a different ID format, the adjustment would be simpler and more modular.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like this suggestion! I wasn't really thrilled with breaking out a module for this function in the first place.

For context: I originally made this change because the logic for it changed in #273 (where SelectItem.value was no longer always a string, and the string value was assigned to SelectItem.asString). I broke the association between label and control by applying the change inconsistently, which was the motivation for generalizing it then.

That said, I'm a little hesitant about the specific suggestion, for a couple of reasons:

  1. I'm concerned about introducing "prop drilling". Currently there are three paths to usage:

    • Select1Control > RadioButton
    • SelectNControl > CheckboxWidget
    • Select1Control > LikertWidget > RadioButton

    If we did hoist it up to the common branching point now, it would have to go up to SelectControl, and then be passed down through 2-3 layers of props. This would already be a bit awkward, and may get more so as we put more intentional thought into the component hierarchy.

  2. The thing being identified (SelectItem) comes from the engine. We already have precedent for assigning unique ids to the nodes ([Any]Node.nodeId), which was intentionally designed to address a similar use case. It might be worth considering the same for SelectItem (or whatever common name we come up with when the same concept is, presumably, used for <odk:rank>). If we did that, we could compute it once upfront and access it like any other property, with no need to decide how such a dependency gets passed through.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea of computing it upfront because it will eliminate the direct dependency between RadioButton, CheckboxWidget on SelectXControl. It also addresses the concern of prop drilling by centralizing the logic, simplifying the structure of the webform package, and preventing the need for passing it through multiple layers as the component hierarchy evolves 🤩 Do you think we should adopt this approach in this PR?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Capturing here for the record: we agreed to defer this so we can consider approach in context of <odk:rank>, where it'll also apply.)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question:
Is the hierarchy defined based on the XML output?

For instance, are group and repeat considered higher-order components, while select functions as a child or leaf in the tree? Additionally, should all new components be declared in this file first to ensure they are correctly mapped in the typing?

It seems like a good place to start for a new contributor to understanding the engine.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The hierarchy is defined based on:

  • The kinds of nodes which exist
  • The kinds of nodes which semantically can be parents and/or children of one another

It's also worth noting that there are two hierarchy.ts modules (I'm sorry!): one for the client node interfaces, and one for the internal implementation of those. They're similar, but slightly different for ... reasons. Open to reconsidering naming on this, but for now we do need to maintain both in tandem.

These are the (client interface) hierarchy semantics:

  • RootNode, which is the topmost node, with no parent
    • As such: all nodes except RootNode are children of another node
  • RepeatRangeNode is a parent, but it only parents RepeatInstanceNodes
    • Correspondingly: RepeatInstanceNodes are only ever children of a RepeatRangeNode
  • RootNode, GroupNode, SubtreeNode and RepeatInstanceNode are "general parents": they can be parents of any node except RootNode and RepeatInstanceNode
    • Correspondingly: all nodes except RootNode and RepeatInstanceNode are "general children", i.e. they can have any "general parent"

These reflect the combined semantics of ODK XForms (per spec) and the engine's client interface representing those spec concepts.

Note: the concept of a "repeat range"—a parent node containing a contiguous series of "repeat instances"—is an invention of the WF engine. As with any abstraction, it's a little leaky, but it's worked out pretty well for reasoning about repeats (which gets kind of bonkers without some notion of set/items).

const { valueType } = definition;

super(
`Selects of type ${valueType} are not currently supported. If this functionality would be useful for your form, your feedback is welcome!`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion:

This seems like a common type of error for other components. What do you think about using generics in the declaration? Then, in the select definition (SelectNode? or SelectControl?), we could define the error type, extend it, and pass the types through the generic.

By centralizing the possible errors for select within select definition (SelectNode? or SelectControl?), it would make them easier to discover and help provide a clearer picture of the entire select functionality.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great suggestion. How would you feel about deferring it for now? We can pair on design for it as part of the work on <odk:rank>, since it'll have the same type limitations associated with a different control/node type?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that is a good idea!

@@ -0,0 +1 @@
export abstract class XFormsSpecViolationError extends Error {}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question:

I like reusable errors in this folder!
What is this class for? What will it do?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's currently just being used as a base class for SelectValueTypeError.

So... some background is warranted here. One of the big weak points in the current design (throughout the Web Forms project) is errors. How they're defined and organized, how they're invoked, how they're documented, and how they're produced to call sites. We want to improve on all of this. We originally planned on a big push in #202, but decided to defer that because the feature already had a huge scope.

In general, I've been trying to have a light touch on errors since I know we'll be focusing on them more holistically. My intent here was to capture these ideas:

  1. there are many XForms spec violations we might want to report in a consistent way
  2. we don't know what they all are, or what common things we want to report about them

So this base class is basically just a placeholder for that with a name.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question:

  1. When codec run? I'm looking to know how to verify the output is what we expect
  2. I wonder if the Rank codec will be the same as this one and if we can reuse or duplicate it.
    2.1 To reduce dependency between components, I would make 1 independent codec for Rank, but I want to know based on your experience with xforms

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll answer (2) first, because it may make the rest academic (for how it relates to <odk:rank>):

  1. I wonder if the Rank codec will be the same as this one and if we can reuse or duplicate it.

Great catch! Yes, that's exactly what I expect. In fact, here's what I wrote in the commit:

In fact, I wouldn’t be surprised if the codec logic for a hypothetical RankControl implementation would be identical [to MultipleValueSelectCodec], and we may want to discuss a more general naming scheme.

I think it's pretty likely naming is the only real work for a rank codec.


  1. When codec run? I'm looking to know how to verify the output is what we expect

This is from memory, so it may not be exhaustive:

  • The codec's decodeValue method is run whenever a serialized XML string is used as a (runtime/client facing) node value. Currently that includes:
    • Form init (reading default values from the form definition model); I also anticipate doing the same with values from submissions for edit
    • Any time a calculate expression is evaluated (i.e. when the node is relevant, on form init, or whenever anything referenced in the expression is updated)
  • The codec's encodeValue method is run whenever a (runtime/client facing) node value is:
    • accessed as a node value in any XPath expression
    • serialized as XML (e.g. for submission)

For node types that support non-string value types, I think the best frame of reference for how they're validated is the scenario test module bind-types.test.ts.

But those won't be pertinent for <odk:rank>, which like selects will only support a string type. I'm not sure offhand how much @getodk/scenario coverage already exists for the rank control. If it isn't much (which might not surprise me), existing select tests are probably the best frame of reference.

I think for the most part, testing the rank functionality will implicitly test the codec, as it's effectively just splitting (exercised e.g. by testing default and/or calculated values) and joining (exercised e.g. by testing submission values).

@@ -3,7 +3,7 @@ import type {
KnownAttributeLocalNamedElement,
LocalNamedElement,
} from '@getodk/common/types/dom.ts';
import type { SelectElement } from '../../parse/body/control/select/SelectDefinition';
import type { SelectElement } from '../../parse/body/control/SelectControlDefinition.ts';
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question:

Why do we have one for Select, while not for Range, Note, Inputs?
Looking to understand the reason so I can see if I need it for Rank or Geopoint

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand this question?

If you're asking why we have a SelectControlDefinition but not an equivalent definition for the others, we do. They're in the same directory this was moved up to.

If you're asking about the SelectElement type, I think it probably exists because the lookup for selects is a little bit more involved than for other control types (and that would be because of the historical design mistake that we handle <select> and <select1> with the same definitions/node type/implementation).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Questions:

I didn't find Range in the reactivity directory; maybe I missed it.

Why don't we have it but have it for Select?

What is the responsibility of reactivity?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This functionality is (currently) select-specific. The range control doesn't (currently) have a concept of "items" (or valueOptions, as named in the client interface). We may actually add support for that in the future, but I doubt it'll be here because the set of values a RangeNode might have will be static based on the bounds defined in the form.

I'll go into (much) more depth about the engine's use of reactivity in the docs I'm writing up. Briefly here for now so I don't leave it completely unanswered: the engine internally uses (SolidJS, at least for now) reactivity to implement the ODK XForms computation model. The computation model is a DAG, which is effectively what the reactivity model implements as well. So we get a lot of the recomputation logic "for free".

@@ -13,7 +14,7 @@ export class ItemDefinition extends BodyElementDefinition<'item'> {

constructor(
form: XFormDefinition,
override readonly parent: AnySelectDefinition,
override readonly parent: AnySelectControlDefinition,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question:

I'm supposing that rank will use this, too. Does it make sense?

override readonly parent: AnySelectControlDefinition | RankControlDefinition,

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another good catch! Yeah, this module is moving up out of the select directory because I expect it to be used for rank too. And yeah, at a glance that annotation looks right. I'm not sure if we want to introduce some kind of common base class between them.

For what it's worth, I'm not sure the parent property really gets much use. So this type is really only guarding which controls can construct it (which is still valuable IMO).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question:

Does body parsing happen after the engine's client? I mean, the entry point is client then body parsing runs

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hopefully you've had a chance to read the WIP architecture docs I pushed up EOW, which I think addresses this much better than I can here.

Briefly, the current parsing order is roughly:

  1. Parse XML (currently into a browser DOM tree, subject to change)
  2. Parse jr: URLs to e.g. external secondary instances
  3. Fetch and parse external secondary instances
  4. Parse body
  5. Parse model (binds, primary instance tree into node definitions, translations)

A client calls the engine's initializeForm to initiate this. Once all of that parsing is complete, it's used to build the primary instance state which is then returned to the client (well, resolved in a Promise, because step 3 does IO). We don't (currently) expose any parsing interfaces or logic to a client, it's all encapsulated in that initializeForm call.

Copy link
Collaborator

@latin-panda latin-panda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so much for answering all my question with great detail 🤩

I'm approving now to unblock, if you implement this approach here, I can review again 🙂

@eyelidlessness eyelidlessness merged commit 3ba7579 into main Jan 20, 2025
74 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants