Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Guideline rules on name and type conventions #9

Merged
merged 6 commits into from
Mar 11, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
87 changes: 82 additions & 5 deletions docs/user-guide/reduction-workflow-guidelines.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,9 +14,6 @@ The guidelines are intended to ensure that the workflows are consistent (both fo

We plan to include the following in future versions of the guidelines:

- Naming conventions
- Type conventions
- Example: Filenames
- Package and module structure:
- Where to place types?
What goes where?
Expand All @@ -43,6 +40,88 @@ We plan to include the following in future versions of the guidelines:

- *Provider*: A callable step in a workflow writing with Sciline.

## C: Convention
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please either remove the empty Naming below if you prefer Convention, or move this.


### C.1: Use common names and types

**Reason**
Helps with sticking to established practices and working across packages.

**Table**
Names use glob syntax, i.e., '*Filename' is any string that ends in 'Filename'.

| Name | Type | Description |
|-----------------------------|-------------|---------------------------------------------------------------------------|
| --- **Files** --- | | |
| Filename \| *Filename | str | Simple name of a file, must be processed into FilePath |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should Filename even exist? If the file is obtained from SciCat, we only have FilePath, I suppose. Otherwise, can we replace Filename by, e.g., and instrument identifier and a run number?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SciCat datasets can contain any number of files. So we need a pair or (dataset id, file id). And the file id can only really be a file name.

| FilePath \| *FilePath | Path | Concrete path to a file on the host filesystem, ideally absolute |
| --- **Flags** --- | | |
| UncertaintyBroadcastMode | enum | E.g., `Enum('UncertaintyBroadcastMode', ['drop', 'upper_bound', 'fail'])` |
| ReturnEvents | bool | Select whether to return events or histograms from the workflow |
| CorrectForGravity | bool | Toggle gravity correction |
| --- **Misc** --- | | |
| NeXus* | Any | Spelling of all NeXus-related keys |
| WavelengthBins \| *Bins | sc.Variable | Bin-edges |
| RunTitle | str | Extracted from NeXus or provided by user, can be used to find files |

### C.2: Use common names for generics

**Reason**
Helps with sticking to established practices and working across packages.

**Note**
If a workflow uses generics to parametrize its types, e.g., `Filename`,
it should define new types used as tags and type vars constrained to those tags.

**Table**

| Name | Type | Description |
|-------------------------------------------|---------|-----------------------------------------------------------------|
| --- **Run IDs** --- | | |
| SampleRun, BackgroundRun, ... | Any | Identifier for a run, only used as a type tag |
| RunType | TypeVar | Constrained to the run types used by the package, see above |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So is the rule to end typevars with Type?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't know in general. I've only seen those two in practice.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SimonHeybrock Do we need to do anything about this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we make this a rule? I think it is quite confusing to know what is a typevar and what is and ID otherwise?

| --- **Monitors** --- | | |
| IncidentMonitor, TransmissionMonitor, ... | Any | Identifier for a monitor, only used as a type tag |
| MonitorType | TypeVar | Constrained to the monitor types used by the package, see above |

**Example**
The choice of using `int` is arbitrary.
```python
SampleRun = NewType('SampleRun', int)
BackgroundRun = NewType('BackgroundRun', int)
RunType = TypeVar('RunType', SampleRun, BackgroundRun)
class Filename(sciline.Scope[RunType, str], str): ...
```

### C.3: Use the suffix 'Type' for type vars

**Reason**
This makes it easier to distinguish type vars from concrete domain types.

**Example**
See 'RunType' and 'MonitorType' in the table of C.2.

### C.4: Use flexible types

**Reason**
Users should not have to worry about the concrete type of parameters.

**Example**

- Numbers should use the appropriate abstract type from {mod}`numbers`.
E.g.,
```python
P = NewType('P', numbers.Real)
pipeline[P] = 3.0 # works
pipeline[P] = 3 # works, too, but not if `P` were `float`
```
- Use {class}`collections.abc.Sequence` instead of `list` or `tuple`.
- But do *not* use {class}`typing.Iterable`!
Parameters may be consumed multiple times and iterables are not guaranteed to support that.
- Gracefully promote dtypes for small parameters.
E.g., `sc.scalar(2, unit='m')` and `sc.scalar(2.0, unit='m')` should be usable interchangeably.
Comment on lines +121 to +122
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍
Should we require the same, e.g., for params like bin-edges such that, e.g., passing arange works just like passing linspace?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't it work the same way?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Operations may raise when ints are passed instead of floats as coords (bin edges, for example). I am suggesting extending your suggestion for graceful handling.

This can also apply to arrays, for instance, `sc.linspace` and `sc.arange` should be interchangeable but the latter may result in integers while the former typically produces floats.

## D: Documentation

### D.1: Document math and references in docstrings
Expand All @@ -55,8 +134,6 @@ This includes mathematical formulas and references to literature.
We have previously documented math and references in Jupyter notebooks.
This is not sufficient, as the documentation is not close to the code.

## N: Naming

## P: Performance

### P.1: Runtime and memory use of workflows shall be tested with large data
Expand Down
Loading