-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Normalize IndexSet.data
DB storage
#122
Changes from 5 commits
4e9bc8e
6e51100
24a1181
4234042
4bbd26a
52f3423
2b62ae7
a7246cd
03f75c2
827a1b8
ef5616d
85dc134
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,6 +1,4 @@ | ||
from typing import ClassVar | ||
|
||
from sqlalchemy.orm import validates | ||
from typing import ClassVar, Literal | ||
|
||
from ixmp4 import db | ||
from ixmp4.core.exceptions import OptimizationDataValidationError | ||
|
@@ -10,26 +8,50 @@ | |
from .. import base | ||
|
||
|
||
# TODO Feels like there ought to be this kind of functionality already | ||
def cast_data_as_type( | ||
data: "IndexSetData", type: Literal["float", "int", "str"] | None | ||
) -> float | int | str: | ||
if type == "str": | ||
return data.value | ||
elif type == "int": | ||
return int(data.value) | ||
elif type == "float": | ||
return float(data.value) | ||
else: # type is None | ||
return 0 | ||
|
||
|
||
class IndexSet(base.BaseModel): | ||
NotFound: ClassVar = abstract.IndexSet.NotFound | ||
NotUnique: ClassVar = abstract.IndexSet.NotUnique | ||
DataInvalid: ClassVar = OptimizationDataValidationError | ||
DeletionPrevented: ClassVar = abstract.IndexSet.DeletionPrevented | ||
|
||
elements: types.JsonList = db.Column(db.JsonType, nullable=False, default=[]) | ||
data_type: types.OptimizationDataType | ||
|
||
_data: types.Mapped[list["IndexSetData"]] = db.relationship( | ||
back_populates="indexset" | ||
) | ||
|
||
@validates("elements") | ||
def validate_elements(self, key, value: list[float | int | str]): | ||
unique = set() | ||
for element in value: | ||
if element in unique: | ||
raise self.DataInvalid( | ||
f"{element} already defined for IndexSet {self.name}!" | ||
) | ||
else: | ||
unique.add(element) | ||
return value | ||
@db.hybrid_property | ||
def data(self) -> list[float | int | str]: | ||
return [cast_data_as_type(data, self.data_type) for data in self._data] | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This also looks quite slow, but is porbably unavoidable? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe @danielhuppmann can clarify, but I thought we want the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. According to this guide, the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Well of course they are both O(n) yes, doesnt really say much about the actual cost though. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I always thought it did, assuming the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Alright, a quick one liner would be: Big O notation only describes the relationship between runtime (or memory use if we are talking space complexity) and some characteristic "n" of the input data. To get a real runtime estimation one would have to insert scaling factors into the notation which would have to come from actual measurements... |
||
|
||
# NOTE For the core layer (setting and retrieving) to work, the property needs a | ||
# setter method | ||
@data.inplace.setter | ||
def _data_setter(self, value: list[float | int | str]) -> None: | ||
return None | ||
|
||
run__id: types.RunId | ||
|
||
__table_args__ = (db.UniqueConstraint("name", "run__id"),) | ||
|
||
|
||
class IndexSetData(base.RootBaseModel): | ||
indexset: types.Mapped["IndexSet"] = db.relationship(back_populates="_data") | ||
indexset__id: types.IndexSetId | ||
value: types.String = db.Column(db.String, nullable=False) | ||
|
||
__table_args__ = (db.UniqueConstraint("indexset__id", "value"),) |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -65,7 +65,7 @@ def _add_column( | |
self.columns.create( | ||
name=column_name, | ||
constrained_to_indexset=indexset.id, | ||
dtype=pd.Series(indexset.elements).dtype.name, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ALso this looks quite expensive for what it does, no idea how to avoid it right now though... There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. For all items like |
||
dtype=pd.Series(indexset.data).dtype.name, | ||
variable_id=variable_id, | ||
unique=True, | ||
**kwargs, | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I discussed how to do this best with an SQLAlchemy maintainer here. Since we are not going to use
.data
in SQL queries, we should be better served with a normal Python property.