-
Notifications
You must be signed in to change notification settings - Fork 100
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
disable execution of last query expression by default #407
Conversation
output: str | ||
if buffer: # Handle any remaining data in the buffer | ||
line = buffer.decode("utf-8") | ||
callback(line) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'll no longer print to the stdout when capture_output=True
.
@@ -1805,14 +1772,15 @@ def apply_udf( | |||
def query( | |||
self, | |||
query_script: str, | |||
envs: Optional[Mapping[str, str]] = None, | |||
python_executable: Optional[str] = None, | |||
env: Optional[Mapping[str, str]] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Renamed to env
. I don't think it should be plural.
envs: Optional[Mapping[str, str]] = None, | ||
python_executable: Optional[str] = None, | ||
env: Optional[Mapping[str, str]] = None, | ||
python_executable: str = sys.executable, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed python_executable
to default to sys.executable
and not take a None
value.
save: bool = False, | ||
capture_output: bool = True, | ||
output_hook: Callable[[str], None] = noop, | ||
params: Optional[dict[str, str]] = None, | ||
job_id: Optional[str] = None, | ||
) -> QueryResult: | ||
_execute_last_expression: bool = False, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I plan to get rid of this when we update Studio. (Well, I plan to release Studio with _execute_last_expression=True
set first, then break compatibility in the next release).
@@ -42,10 +42,6 @@ def __init__(self, message: str, return_code: int = 0, output: str = ""): | |||
super().__init__(self.message) | |||
|
|||
|
|||
class QueryScriptDatasetNotFound(QueryScriptRunError): # noqa: N818 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No longer used in query
. I will create a similar exception in Studio
side.
dr = self.update_dataset( | ||
dr, | ||
script_output=output, | ||
query_script=query_script, | ||
) | ||
self.update_dataset_version_with_warehouse_info( | ||
dr, | ||
dv.version, | ||
script_output=output, | ||
query_script=query_script, | ||
job_id=job_id, | ||
is_job_result=True, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was not being used anywhere. Job
replaces this.
def _get_dataset_versions_by_job_id(): | ||
for dr, dv, job in self.list_datasets_versions(): | ||
if job and str(job.id) == job_id: | ||
yield dr, dv | ||
|
||
try: | ||
dr, dv = max( | ||
_get_dataset_versions_by_job_id(), key=lambda x: x[1].created_at | ||
) | ||
except ValueError as e: | ||
if not save: | ||
return QueryResult(dataset=None, version=None, output=output) | ||
|
||
raise QueryScriptDatasetNotFound( | ||
"No dataset found after running Query script", | ||
output=output, | ||
) from e |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will have to be done on the caller side. And eventually removed when we drop _execute_last_expression
support.
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #407 +/- ##
==========================================
- Coverage 87.32% 87.07% -0.25%
==========================================
Files 92 92
Lines 9986 9952 -34
Branches 2041 2037 -4
==========================================
- Hits 8720 8666 -54
- Misses 911 931 +20
Partials 355 355
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
Deploying datachain-documentation with Cloudflare Pages
|
) from exc | ||
else: | ||
query_script_compiled = query_script | ||
assert not save |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reminder to remove this flag as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code looks good to me, thank you for removing all this outdated code 🙏
) from exc | ||
envs = dict(envs or os.environ) | ||
envs.update( | ||
env = dict(env or os.environ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, not sure why or
? May be something like this?
env = dict(env or os.environ) | |
env = {**os.environ, **(env or {})} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How would you provide a way to override envvars of the current process?
This is how subprocess.Popen
works, and given this is a thin wrapper around it, I think it's better to mimic it's API.
Also, Studio already provides copy of all envvars.
def _process_stream(stream: "IO[bytes]", callback: Callable[[str], None]) -> None: | ||
buffer = b"" | ||
while byt := stream.read(1): # Read one byte at a time | ||
buffer += byt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seeing some errors coming through the test suite of the form: TypeError: can't concat str to bytes
.
Examples are here: https://github.com/iterative/datachain/actions/runs/10821775005/job/30024466321?pr=427#step:7:177
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed tests in #431.
def loop() -> None: | ||
buffer = b"" | ||
while byt := stream.read(1): # Read one byte at a time | ||
buffer += byt.encode("utf-8") if isinstance(byt, str) else byt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may still need this to fix the issue @mattseddon mentioned above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in #431.
This PR hides the support for execution of last query expression behind a flag, which I plan to remove when Studio is updated.
Also, few other changes have been made to the
query
API:envs=
keyword argument has been renamed toenv
, similar tosubprocess.Popen(env=)
.python_executable
default has been changed fromNone
tosys.executable
. And it no longer acceptsNone
.query
API no longer returnsQueryResult
API. The responsibility is now with the caller, to find out latest dataset version. They are in much better place to do that, since they are the one responsible for creating ajob
.capture_output=True
,query
API no longer prints to thestdout
. Theoutput_hook
is responsible to do so now.query
no longer haveoutput
set.QueryScriptDatasetNotFound
has been removed.script_output
andquery_script
to theDatasetVersion
. This was not used anyway.Closes #360.