-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/#232 redesign sql stage input output #237
Conversation
""" | ||
|
||
data_partitions: Dict[DataPartitionName, DataPartition] | ||
table_like: TableLike |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basically,
data_partition
was removed andsample_columns
andtarget_columns
have been merged into a simplecolumns
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This made some tests obsolete.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The docsting was also removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I re-added the docstring and updated its content to match the redesign.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My take from it:
The table_like
is the data container. As its type name suggests, the data comes in the form of multiple columns. identifier_columns
and columns
are the descriptions of these columns. The meaning of the dependencies
is unclear, but hopefully the docsting of the Dependencies
class gives an explanation. If I am wrong please rewrite the docstring.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I must admit, that to me the meaning of dependencies
is also a bit unclear.
The plural is only defined as a type:
Dependencies = Dict[Enum, Dependency]
The Docstring of class Dependency
has some issues in wording etc., so I would prefer to update it in the scope of the current PR.
The docstring to me sounds more like a structured comment targeted for human readers rather than to be used by an algorithm.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See last push.
|
||
|
||
@dataclasses.dataclass(frozen=True, eq=True) | ||
@dataclass(frozen=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, class SQLStageInputOutput
has been simplified by leaving most implementation details up to the user.
Again this made some tests obsolete and also required tests to instantiate MultiDatasetSQLStageInputOutput
instead of the abstract parent class.
check_dataclass_types(self) | ||
@dataclass(frozen=True) | ||
class MultiDatasetSQLStageInputOutput(SQLStageInputOutput): | ||
datasets: Dict[object, Dataset] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of a single dataset (potentially containing multiple partitions) we now have a dict of potentially multiple datasets indexed by arbitrary objects.
This simplifies the tests but required some updates to the tests.
and updated planned release date
""" | ||
|
||
data_partitions: Dict[DataPartitionName, DataPartition] | ||
table_like: TableLike |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My take from it:
The table_like
is the data container. As its type name suggests, the data comes in the form of multiple columns. identifier_columns
and columns
are the descriptions of these columns. The meaning of the dependencies
is unclear, but hopefully the docsting of the Dependencies
class gives an explanation. If I am wrong please rewrite the docstring.
An instance of this class expresses that a object depends on something | ||
which in fact can depend on something else. The exact meaning of a | ||
dependency is user-defined. For example, a dependency could express that | ||
a view depends on a certain table. | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is definitely a better English, but the phrase "object depends on something which in fact can depend on something else" still sounds a bit mystical. An instance of this class represents a node in the dependency graph, right? That's how I would describe it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, that is probably a better wording. Had first to think, if it is really a graph or if it is only a tree, but yes it can be a graph.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds a bit mystical
I definitely agree 🙂
Please see updated PR.
Closes #232