Released on June 6, 2023.
- Enables users to selectively or completely disable Aqueduct's artifact
snapshotting functionality. When defining a workflow, users can now decide
to disable snapshotting for an individual artifact or for all artifacts in
a workflow; if selected, these artifacts will only be generated during
computation time and then garbage collected. No data will be persisted
beyond any data that is explicilty
save
d. See our documentation for more details.
- Enables on-demand Kubernetes clusters on Google Cloud to use GPU nodes in an autoscaling capacity.
- Fixes bug where resources could not be deleted even when unused.
- Fixes bug where workflows that succeeded with a warning check were shown in a regular success state instead of a warning state.
- Fixes page load jitter when a large number of workflows are present.
Released on May 31, 2023.
- Adds support for on-demand Kubernetes clusters on Google Cloud; you can now select the Google Cloud option when asking Aqueduct to create a Kubernetes cluster, and Aqueduct will automatically use the Google Kubernetes Engine. See our documentation here.
- Adds support for Google Artifact Registry as a conatiner registry in Aqueduct. You can now use custom Docker images that are pushed to GAR. See our documentation here.
- Fixes bug where editing certain resource configurations from the UI would cause the resource edit dialog to error out.
- Fixes bug where updating a workflow that has been previously registered but not run would cause the workflow to be duplicated.
- Fixes bug where certain resources could be registered without specifying a name for the resource.
Released on May 24, 2023.
- Improves consistency of icon sizing on resources page.
- Merges
flow_id
andname
into a single argument when retrieving workflows from the SDK. - Adds ability to parametrize save operators when using AWS S3.
- Fixes bug where race condition could occur when syncing Airflow workflows to Aqueduct.
- Fixes bug where listing Snowflake data objects would return an error.
Released on May 17, 2023.
- Adds ability to use a parameter in save operators to dynamically name saves.
- Updates the artifact storage card on the resources page to be consistent with other resource cards.
- Shows how many workflows are using notification resources.
- Improves form validation with
react-hook-forms
. - Replaces the icon that signifies a workflow that has been registered but not executed.
- On the resouces page, displays the Conda integration as modifying the Aqueduct server execution layer rather than as a separate execution layer.
- Improves BigQuery error checking to ensure that a dataset exists before accessing it.
- Fixes bug where documentation links on the home page were broken.
- Fixes bug where the SDK's generated docstrings were broken.
Released on May 10, 2023.
- Improves error handling when the user attempts to take an action without connecting to the Aqueduct server.
- When connecting an S3 resource to Aqueduct, if the specified subdirectory does not exist, it is now automatically created.
- Replaces outdated integration terminology on the UI.
- Fixes bug where previously created resource names were not editable.
Released on May 4, 2023.
- Introduces the
aqueduct.llm_op
API and theaqueduct-llm
package. Aqueduct now has support for invoking LLMs with a single API call and comes with pre-built Docker images optimized for executing LLMs on Kubernetes. Thellm_op
API supports both ad hoc execution, as pictured below, as well as batch execution over a list of inputs or a PandasSeries
. See our documentation for more details.from aqueduct import Client, llm_op client = Client() # initialize Aqueduct client so we can check if the engine name below is valid vicuna = llm_op('vicuna_7b', engine='my_k8s_engine') vicuna('What is the best LLM?')
- Reorganizes integrations around the concept of resources. Resources are any
external tool, system, or API that Aqueduct can connect to; existing data
and compute integrations are automatically converted into resources. A
container registry resource is added in this release, and future releases
will introduce new resource types. The recommended SDK API for accessing
resources is now
client.resource
, withclient.integration
slated to deprecated in a future release. - Allows users to specify a custom Docker image when running an Aqueduct operator on Kubernetes. The Docker image is required to have the Aqueduct executor scaffolding installed; for more details, please see our documentation here.
- Improves logging and error handling when an operator fails because it's able to successfully generate a result, typically in the setup phase.
- Enables connecting a Databricks cluster to Aqueduct via the Python SDK.
- Fixes bug where installing pre-requisites for using Aqueduct-managed Kubernetes clusters would fail on an M1 Mac with certain configurations.
Note: Due to a technical issue, v0.3.0 was immediately replaced by v0.3.1.
Released on April 25, 2023.
- Improves the
describe
method on integration objects in the SDK; if there was an error connecting or validating the integration, the error will be shown. - Eagerly validates integration connections, so users will be notified immediately if they are using a mis-configured integration when, for example, attempting to access data.
- Replaces placeholder values in integration connection forms with more realistic values.
- Adds support for the most recent version of
numpy
.
Released on April 18, 2023.
- Extends the Aqueduct-Databricks integration to optionally support long-standing instance pools. If specified, Aqueduct will automatically use nodes in the instance pool, and otherwise, Aqueduct will create a cluster with new instances.
- Improves form validation during integration connection to ensure that valid and invalid form fields are correctly detected and visualized.
- Fixes bug where MySQL dependency installation could fail on an M-series Mac.
- Fixes bug where integration connections dialogs did could re-render repeatedly, causing slight performance degradation.
- Fixes bug where loading a Tensorflow object from a file could fail due to library version mismatches.
- Fixes bug where multiple processes could attempt to simultaneously create the same Kubernetes secret, causing mismatches in the name.
- Fixes bug where the release notification banner introduced in the last release was not picking up new Aqueduct versions from PyPI.
- Fixes bug where CUDA versions specified via the SDK may not have been faithfully installed at execution time.
Released on April 11, 2023.
- Enables subdirectory support when configuring the AWS S3 integration. When a subdirectory is specified, everything outside of that directory will be ignored by Aqueduct.
- Extends support for creating artifact from the local file system to all types supported by Aqueduct.
- If executing code on the Aqueduct server without Conda, the server will now proactively ensure that the Python version from your environment matches the server's Python environment. If there's a mismatch, an error will be thrown.
- Adds a dismissable banner to the Aqueduct UI notifying users of new releases.
- Fixes bug where errors occurring during the execution of a metric were not being properly surfaced.
- Fixes bug where checks with a severity level of warning were being shown as errors in check history.
- Fixes bug where creating a schedule with helper functions on the SDK would fail for monthly schedules.
Released on April 5, 2023.
- [Beta] Aqueduct now has support for loading DataFrames and images from
the local filesystem as parameters:
import aqueduct as aq client = aq.Client()
data = client.create_param( "df", "/path/to/my/df.csv", use_local=True, as_type=aq.TableArtifact, format='csv', )
### Enhancements
* Improves the artifact storage management process. The integrations page now
shows which system is being used for artifact storage, and if a migration
between artifact stores fails, the error will be surfaced on the
integration details page until the user triggers a new artifact migration.
* Adds metadata to read and write operators' sidesheets. Both sidesheets now
show which system is being used for the IO operation, and the save operator
now shows to what location and (if relevant) in what format the data is
being saved.
### Bugfixes
* Fixes typos in MongoDB connection dialog.
* Fixes bug where workflow status at the top of the workflow details page would
not update in sync with other parts of the page.
* Fixes a bug where stopping the Aqueduct server immediately after starting it
could potentially trigger and then kill a workflow run that would then
stay in a pending state permanently.
## 0.2.8
Released on March 29, 2023.
### Enhancements
* Extends Aqueduct's parameter support to allow parameters to be `None` *if*
there is a default value for the parameter in the function definition.
* Tweaks the design of the workflow details page to reduce the number of shades
of blue.
* Updates SQL query parametrization scheme to use `$1`, `$2`, etc. as parameter
placeholders rather than the previous `{ param }` syntax. This allows a
paramter to be reused multiple times within the same query and is similar
to how traditional RDBMS interfaces work.
* Adds context about the execution configuration of an operator to the workflow
details page. Now, each operator will show the logo of the engine on which
its running.
### Bugfixes
* Fixes bug where the metric history graph's X-axis would not reflect
timescales; all data points would be evenly spaced out rather than
reflecting the amount of time between each point.
* Fixes bug where cross-environment Python requirements gathering would lead to
erroneous installation. Currently, Aqueduct only supports explicit
requirements specification.
* Fixes bug where certain workflow failures would leave unfinished operators in
an infinitely-pending state; now, nodes are correctly marked as canceled.
* Fixes bug where retrieving a non-existent filepath from S3 would return
succesfully but with empty data; now, a detailed error message is shown
instead.
* Fixes bug where attempting to capture a system metric would thrown an error.
## 0.2.7
Released on March 22, 2023.
### Key Features
* [Beta] Aqueduct now has support for on-demand Kubernetes cluster creation and
management on AWS. From the Aqueduct UI, you can connect Aqueduct to your
AWS account via the cloud integration feature. Once connected, you can use
this cloud integration to ask Aqueduct to automatically create a Kubernetes
cluster for you. See the documentation
[here](https://docs.aqueducthq.com/integrations/on-demand-aws-eks-clusters)
for how to create an operator that uses on-demand Kubernetes.
### Enhancements
* Improves error handling to return more detailed error messages from errors
occurring during execution.
* Improves error handling by surfacing errors that occur outside of the
execution of an individual function as workflow-level errors; these errors
could occur for example if a compute system was misconfigured.
* Improves handling of artifact name conflicts in the Python SDK. Explicitly
named artifacts (using either the `outputs` argument to the `@op` decorator
or the `.set_name()` function) will immediately flag and prevent conflicts
in artifact names. Automatically named artifacts will error if multiple
artifacts with the same name are included in a single `publish_flow` call.
* Displays all compute engines associated with a workflow on the workflows list
page.
* Improves efficiency when previewing large objects on the UI by retrieving a
sample of the data instead of the full data object and noting that the
displayed data is a sample.
### Bugfixes
* Fixes bug where an S3 or GCS bucket being used as the Aqueduct artifact store
could possibly be deleted from the UI.
* Fixes bug that caused navigation buttons to be misaligned with other buttons
on the action bar on the workflow details page.
* Fixes bug where navigating to the next most recent run on the workflow
details page would not work correctly.
* Fixes bug pending or errored metric would show as "Unknown" on the UI instead
of with the correct status.
* Fixes bug where warning-level checks were being shown as failures on the
workflows list page.
* Fixes bug where certain DAG layouts would continue to show a layout with
overlapping and crossing edges.
### Note
* The parameterization of SQL queries may have unexpected behavior if you accidentally define a
parameter with the same name twice. The parameter value will be chosen at random in such a case.
This bug will be fixed in the next release.
## 0.2.6
Released on March 14, 2023.
### Key Features
* Enables registering a workflow without immediately triggering a run of that
workflow. When calling `publish_flow` the `run_now` parameter can be set to
`False`, which will tell the Aqueduct server to wait until the next
scheduled (or triggered) run. In the interim, only the DAG's structure will
be shown without any execution metadata.
### Enhancements
* Adds error checking to ensure that a remote compute engine is paired with a
remote artifact store.
* Adds error checking to ensure that a valid `kubeconfig` is provided when
connecting Aqueduct to Kubernetes.
* Enables using `~` to refer to the home directory when specifying a path to
AWS credentials or a `kubeconfig`.
* Adds icons to integration details views to indicate when an object store (AWS
S3, GCS) are being used for metadata storage.
* Allows specifying the specific library version of CUDA when requesting GPU
resources; v11.4.1 is the current default because that is what EKS clusters
use by default.
### Bugfixes
* Fixes bug where certain errors occurring during task launch weren't clearly
surfaced in the Aqueduct stack.
* Fixes bug where, within a Python process, executing an operator in lazy mode
precluded users from later executing it in eager mode (or vice versa).
* Fixes bug where notification count wasn't being shown.
* Fixes bug where the notice for an ongoing metadata migration would run off
the end of the screen.
* Removes duplicate "History" header on metric details page.
## 0.2.5
Released on March 7, 2023.
### Key Features
* Users can now run Aqueduct workflows on Spark clusters on AWS EMR. With Apache Livy as an interface, Aqueduct can submit your code to your Spark cluster reliably and seamlessly. See our documentation [here](https://docs.aqueducthq.com/integrations/adding-an-integration/connecting-to-spark-emr.md).
* Redesigns node layout on DAG view to improve information presentation and
better distinguish between different node types.
### Enhancements
* Enables specification of Snowflake role when connecting the Snowflake
integration.
* Updates workflow details page header to be more compact and reduce
information overload.
* Adds support for specifying compute engine and resource requirements when
creating metrics & checks.
### Bugfixes
* Fixes issue where size of large rows on Snowflake was artificially limited.
* Resolves requirement mismatches that would occur on the latest versions of
Ubuntu 22.
* Fixes layout issue where dates on metric history graph could have been cut
off.
* Fixes bug where the integration details page for compute integrations would
not list all workflows using that integration.
* Fixes bug where metrics plot failed to render when upstream operator was
cancelled.
## 0.2.4
Released on February 28, 2023.
### Enhancements
* Opens links to docs and feedback in new tabs rather than in the existing tab
on the Aqueduct UI.
* When authoring a pipeline, allows reusing the same Python function multiple
times in the same DAG.
* Improves the layout of the card displaying metadata storage information on
the settings page.
### Bugfixes
* Fixes bug where changing the Aqueduct metadata storage layer when there was a
previously-failed workflow would cause the data migration process to pause.
* Fixes bug where certain DAGs would render in a confusing fashion on the
workflow details page. The algorithm for DAG layouts is now signifcantly
more reliable.
## 0.2.3
Released on February 22, 2023.
### Enhancements
* Updates workflow and data table views to show overview of all executed checks
rather than just one.
* Garbages collect Lambda-specific Docker images from the Aqueduct server's
machine after the Lambda integration connection is finished.
* Improves performance of the Aqueduct serialization library by looking into
collection types (lists, tuples) and using data-type specific serialization
methods for each entry.
* On the Aqueduct settings page, adds details about what storage engine is
being used for metadata and version snapshot storage.
### Bugfixes
* Fixes detail header alignment on artifact and operator details pages.
* Fixes bug where latest MariaDB and MySQL drivers were not bieng installed
correctly on M1 Macs.
* Fixes bug where running the same function with multiple unnamed parameters
more than once would fail.
* Fixes bug where Aqueduct Docker images running for save operators were missing
dependencies for certain data types.
## 0.2.2
Released on February 14, 2023.
### Key Features
* Adds support for receiving Aqueduct notifications via email or in Slack
workspaces. You can configure notification settings for your Aqueduct
installation at large, and you can also customize notification settings
per-workflow. Notifications can be configured to be sent for all workflow
executions, only on warnings, or only on errors.
* *Email*: You can connect Aqueduct to your email account and specify a
list of email addresses as recipients. Each notification will trigger a
separate email.
* *Slack*: You can connect Aqueduct to your Slack workspace and specify a
channel that Aqueduct should send notifications on. Each notification
will send a separate message.
### Enhancements
* Adds support for specifying Snowflake schema when creating integration from
UI.
* Adds support for executing an operator that has one or more parameters and
multiple outputs interactively. You can call the same function, and
Aqueduct will automatically override previous implicitly created
parameters. See [our
documentation](https://docs.aqueducthq.com/parameters) for more details.
```python
@op
def fn(param):
return param
res = fn(1).get()
>>> 1 # Creates a parameter named `param` for you automatically, with a default value of 1.
res = fn(2).get()
>>> 2 # Updates `param` to have a default value of 2.
```
### Bugfixes
* Fixes two bugs where Aqueduct server was retrieving full data objects from
the Aqueduct metadata store to check for their existence. When working
with non-trivial data, this could cause serious performance issues.
* Fixes bug where object does not exist errors from S3 were mishandled, causing
Aqueduct to surface incorrect errors.
* Fixes bug where pods that are marked as pending on Kubernetes were being
treated as failed operators.
* Fixes bug where log and stack traces blocks didn't have proper formatting and
backgrounds on the UI.
* Fixes bug that was causing full data objects to be retrieved repeatedly when
loading metadata on the UI.
* Fixes bug where UI was previously treating not-yet-executed operators (for an
in-progress workflow) as failed operators.
* Fixes bug where the SDK's `global_config` could not be changed to set
Aqueduct as the compute engine.
## 0.2.1
Released on February 7, 2023.
### Key Features
* Allows customizing artifact names from the SDK in one of two ways.
```python
# Method 1: Use the decorator
@op(outputs=['sklearn model', 'churn predictions'])
def train_and_predict_churn(features):
# ...
return model, predictions
# Method 2: Use .set_name()
@op
def train_model(features):
# ...
return model
# ...
model = train_model(features)
model.set_name('churn model')
```
### Enhancements
* Allows providing filepath to ServiceAccount key file when connecting to
BigQuery from Aqueduct SDK.
* Improves form validation when connecting Databricks integration.
* Throughout the SDK, enables references to workflows using workflow name in
addition to workflow ID.
* Puts upper bounds on Python package dependencies to prevent unexpected
regressions (e.g., recent issues caused by SQLAlchemy 2.0).
### Bugfixes
* Fixes bug where errors were not being properly handled when an operator had
multiple outputs. This was occurring because the return value didn't have
the expected length.
## 0.2.0
Released on January 31, 2023
### Key Features
* [Beta] Aqueduct now supports running workflows on Databricks Spark clusters!
As of this release, you can now connect Aqueduct to a Databricks cluster
from the UI and use the Aqueduct decorator API to deploy workflows onto
those clusters.
* Databricks workflows can read data from Snowflake and AWS S3. Future
releases support other data systems, including Delta Lake.
* Currently, you cannot run a subset of a workflow on a Databricks cluster;
the whole workflow must be run on Databricks.
* We plan to add support for non-Databricks Spark clusters in the coming
releases.
### Enhancements
* Allows workflows running on Airflow to be triggered upon the completion of
other workflows. Note that the completion of an Airflow workflow cannot
trigger the execution of another workflow because completion state is not
synchronously tracked on Airflow.
* Unifies color and size of status indicators throughout the UI.
### Bugfixes
* Fixes bug where internal server error was uncaught when retrieving operator
results.
* Fixes bug where workflow status bar had unnecessary backticks around objects.
* Fixes bug where access checks for AWS S3 buckets would fail with certain
permissions that were in fact valid.
* Fixes bug where saving tables to relational databases with long column names
(\> 255 characters) would fail.
* Fixes bug where SQLAlchemy version 2 introduced access issues with Pandas
DataFrames. Our current solution is to require SQLAlchemy version 1.
* Fixes bug where listing tables in BigQuery required complex, brittle SQL
queries.
* Fixes bug where data listing page might crash on UI after the execution of a
failed workflow.
* Fixes bug where status indicator on check and metric details was not being
properly displayed.
* Fixes bug where checks and metrics of failed workflow executions show no values.
* Fixes bug where after switching to cloud storage as the metadata store, new integration
credentials weren't properly saved to cloud storage.
* Fixes bug where preview fails after switching to cloud storage as the metadata store.
* Fixes bug where failing metrics show as NaN in metric preview list on UI.
### Deprecations
* The `.save()` on Artifacts has been removed. As of
[v0.1.6](https://github.com/aqueducthq/aqueduct/releases/tag/v0.1.6), the
recommended method is to use the `.save()` API on integration objects.
## 0.1.11
Released on January 23, 2023
### Enhancements
* Upgrades workflow layout rendering tool to use the elkjs library.
* Shows the name of the options on the UI's menu sidesheet to improve clarity.
* Removes the Aqueduct logo on the UI's home page to reduce redundancy.
### Bugfixes
* Fixes bug where operator execution fails when running on Kubernetes. This was due to a time gap
between launching a Kubernetes job and spinning up a pod, and our system wasn't accounting
for this.
* Fixes bug where the workflow details page keeps re-rendering.
* Fixes bug where the Kubernetes logo doesn't show up on the UI.
* Fixes bug where the UI keeps hitting the notification route, which led to unnecessary overhead.
This was caused by omitting an empty dependency array in one of our useEffect hooks.
## 0.1.10
Released on January 17, 2023
### Enhancements
* For workflows that are triggered at the end of other workflows, we now allow changing the
triggering workflow from the UI's workflow settings dialog.
* Differentiate keys and values better on the UI; adds the use of different
colored text to make it clear which is the key and which is the value when
showing, for example, metric and check values.
* Improves presentation of non-success states of metrics and checks on workflow
DAG. Rather than leaving the nodes empty as before, they now include icons
that demonstrate the execution state (failed, pending, canceled).
* Adds redesigned search interface to workflow and data list pages. The search
bar itself has been reduced in size, and a sort functionality has been
added that allows users to select a column by which to sort the view.
* Orders integrations alphabetically on the integrations page to make them
easier to find.
### Bugfixes
* Fixes bug where operator & artifact statuses were missing from details pages.
* Fixes bug where the header breadcrumbs did not show the title of the workflow
on the metric details page.
## 0.1.9
Released on Januay 10, 2023.
### Key Features
* As of this release, Aqueduct has usage tracking. Usage tracking is fully
anonymized and captures API routes, performance data, and error rates
without revealing any specifics of your machine or deployment. For more
details, check out our [documentation](https://docs.aqueducthq.com/usage).
* We now support cascading workflow triggers, which means a workflow can trigger another one at the end of its execution. You can specify that in our [python SDK](https://github.com/aqueducthq/aqueduct/blob/main/sdk/aqueduct/client.py#L373).
### Enhancements
* Makes the artifact, check, metric, and operator details pages full width.
* Shows the Aqueduct version number on the UI navigation bar.
* Hides previews when artifacts are canceled.
* Hides parameters in status bar.
### Bugfixes
* Fixes a number of UI bugs:
* Resets workflow settings dialog content after close.
* Aligns margins on right side of workflow details page.
* Removes vestigial popover to access settings page.
* Addresses regression where a workflow's saved objects were not being shown
prior to workflow deletion.
* Aligns the width of metric and check history items.
* Updates the metadata views (workflows & data list pages) to differentiate
table headers from metadata rows.
* Persists the number of rows shown per-page on metadata views between page refreshes.
## 0.1.8
Released on December 20, 2022.
### Enhancements
* Allows user to set compute engine in the operator's decorator.
* Reduces the number of significant figures for metrics on data list page and workflows list page to
improve readability.
### Bugfixes
* Fixes a bug where the UI shows data section for compute integrations.
* Fixes a bug where previewing Mongo collection crashes.
## 0.1.7
Released on December 14, 2022.
### Bugfixes
* Fixes a bug where the Aqueduct installation script fails if the user doesn't have conda installed.
## 0.1.6
Released on December 13, 2022.
### Key Features
* Introduces new table views on the workflows and data pages that show rich
metadata at a glance, including workflow and artifact status, data types,
and associated metrics and checks!
* Adds support for integrating with conda. Once the user registers conda integration through the UI,
Aqueduct will create conda environments to run any newly created workflows to provide better
Python version and dependency management.
### Enhancements
* Introduces a new `save` API; now, to save an artifact, users can write the
following. The original `table.save()` syntax still works but will be
deprecated in a future release.
```python
db.save(table, 'my_new_table', update_mode='replace')
- Disallows creating multiple integrations with the same name.
- Fixes a bug where unused integration couldn't be deleted if historical workflow runs were associated with it.
- Fixes a bug where logs weren't being displayed on operator details page.
- Fixes a bug where saving multiple pieces of data to the same database would cause the workflow UI to crash.
- Fixes a bug where calling a metric or check with no inputs didn't raise a client-side error.
- Fixes a bug where metric history & graph was not sorted by time.
- Fixes a bug where where every click into a workflow DAG node reset the DAG visualization.
- Fixes a number of bugs that caused no notifications to be displayed on the UI.
Released on November 29, 2022.
- Enables operators running on Kubernetes to access GPUs and set RAM and CPU requirements. Note that using a GPU requires your Kubernetes cluster to already have GPU machines attached. See our documentation for more details.
@op(resources={'num_cpus': 2, 'memory': '5Gb', 'gpu_resource_name': 'nvidia.com/gpu'})
def my_operator_with_many_resources():
return 1
- Similarly, functions running on AWS Lambda can have memory requirement set using the syntax above; AWS Lambda does not support setting CPU requirement and it does not support GPUs.
- Enables operator previews to execute using different integrations, including using the resource constraints described above.
- Allows for the execution engine to be set globally for a client instance. See more details here:
aq.global_config({'engine': 'my_k8s_integration'})
- Fixes bug where a Kubernetes pod that ran out of memory would fail silently.
Released on November 14, 2022.
- Extends internal integration test framework to support automated testing against third-party compute engines.
- Significantly refactors internal data model implementations to improve readability and maintainability.
- Fixes bug where certain dividers on the navigation sidebar were too wide.
- Fixes bug where opening sidesheets would change page name.
- Fixes bug where function executor Dockerfiles had incorrect start script.
- Fixes bug that caused built-in metric and check functions to have different Python environments than regular operators.
Released on November 7, 2022.
- Surfaces errors with parameter validation in workflow status summary.
- Catches errors generated during
requirements.txt
installation and surfaces them eagerly; previously, these errors were ignored. - Improves operator execution time by only importing
great_expectations
when it's being used; the library import is quite slow, so doing it on every operator was wasteful. - Adjusts various font sizes in the UI to improve presentation.
- Adds MongoDB integration.
- Adds
engine
parameter toglobal_config
, allowing users to specify a default compute engine;engine
is also now an optional parameter topublish_flow
:- If the
engine
argument topublish_flow
is specified, it will override theglobal_config
. Otherwise, the engine set inglobal_config
will be used. - If neither the
engine
argument topublish_flow
orglobal_config
is set, the workflow will be executed on the default Aqueduct execution engine.
- If the
- Fixes bug where operator details button text overflowed.
Released on October 31, 2022.
- Hides search bar on data viewing page when there are no artifacts.
- Adds support for variable length arguments (
*args
) in Aqueduct functions.
- Fixes a bug where updating the metadata of a paused workflow would fail.
- Fixes a bug where parameters were shown as having an upstream function that wasn't accessible from the UI.
Released on October 25, 2022.
- Adds support for Tensorflow Keras models to type system.
- Allows users to chain multiple SQL queries in the extract operator.
- Automatically migrates all metadata and artifact snapshots when the user changes the storage layer.
- Re-enables downloading operator code.
- Fixes bug where artifact details view was not scrollable in drawer view.
- Fixes bugs where parameter nodes were rendered incorrectly.
- Fixes bug where search functionality was broken on the data page.
Released on October 18, 2022.
- Updates the UI to provide a simplified, more responsive layout and surface
more information about workflow execution.
- Adds details pages for operators, artifacts, checks, and metrics which show the history and metadata (e.g., code preview, historical values) for the relevant object.
- Replaces old sidesheets with preview of the details pages when clicking on a node in the workflow DAG.
- Adds narrower, simplified navigation sidebar as well as breadcrumbs to simplify navigation.
- Makes page layout more responsive to narrow screens.
- Adds Helm chart to deploy Aqueduct on Kubernetes servers; when running in
Kubernetes, there's a new integration mechanism to connect Aqueduct to the
current Kubernetes server that uses an in-cluster client rather than
kubeconfig
file. - When switching Aqueduct metadata stores from local to cloud-hosted, automigrates all data to cloud storage.
- Allows operators to have multiple output artifacts. You can specify the
number of by using the
num_outputs
argument to the@op
decorator.
import aqueduct as aq
@aq.op(num_outputs=3)
def multi_output:
return 1, 2, 3
a, b, c = multi_output()
- Enables modifying version history retention policy from the settings pane of the workflow page.
- Adds documentation link to menu sidebar.
- Detects when SDK and server version are mismatched and surfaces an error when creating SDK client.
- Allows
publish_flow
to accept both a single artifact or a list of multiple artifacts in theartifacts
parameter. - Moves retention policy parameter from
publish_flow
toFlowConfig
object.
- Fixes bug where tuple return types in operators were not returned correctly.
- Sets minimum version requirements on
pydantic
andtyping-extensions
; older versions caused inexplicable and confusing bugs. - Fixes bug where CSV upload dialog didn't show all rows in data upload preview.
- Fixes bug where parameters and checks were marked as canceled when there were invalid inputs.
- Fixes bug where Aqueduct logo was cut off on the welcome page on small screens.
- Fixes bug where long
stdout
orstderr
logs were truncated on the UI. - Fixes bug where SQLite inserts would fail because of an argument limit for older versions of SQLite.
- Fixes bug where running Aqueduct operators in temporary environments (e.g., IPython interpreter, VSCode notebooks) would fail because the operator source file would not be detectable.
Released on September 26, 2022.
- Improves the readability of the operator logs printed from the SDK by omitting empty logs and making formatting uniform.
- Throws a more informative error message when a table artifact's column name is not of type string. Aqueduct currently cannot support DataFrame's with non-string type columns.
- Fixes bug where authentication errors caused by incorrect integration credentials were treated as system errors, which led to a confusing error message.
- Fixes bug introduced in the previous releases where the settings gear was hidden on the UI.
- Fixes a number of minor formatting and spacing issues on the UI.
Released on September 20, 2022.
- Adds support for running new workflows on AWS Lambda and Apache Airflow. Users can define workflows using the Aqueduct API but delegate the execution of those workflows onto these compute systems.
- Allows Aqueduct parameters to hold any Python object; parameters are also now implicitly created when a Python object is passed into a decorated function.
- Updates UI to describe database write operators as
save
operators instead ofload
operators to avoid confusion. - Adds
describe
methods to all non-tabular artifact types.
- Fixes bug where stack traces and other messages in workflow status bar would overflow past edge of screen.
- Fixes bug where some workflows that should have been triggered on server start were being ignored due to inconsistent metadata.
- Fixes bug where newest workflow run wasn't shown after a run was manually triggered.
Released on September 12, 2022.
- Enables searching through workflows list.
- Workflows are now displayed on the workflows page even before any runs have been created.
- Adds canceled state to operator lifecycle; when upstream operators fail, downstream operators and artifact are now marked as canceled rather than being marked as permanently in progress.
- Adds ability to connect new SQLite DB from UI.
- Redesigns integration viewing page to explicitly show DB tables rather than the previous select menu.
- Fixes bug where browser console throws error when there is no write operator in workflow DAG.
- Fixes bug where operators previously could not return
None
.
Released on September 6, 2022.
- Adds AWS Athena integration. You can now execute SQL queries against AWS Athena using the Aqueduct integration API. (Since Athena is a query service, we do not support saving data to Athena.)
- Removes team and workflow notification categories and simplifies the presentation of the notifications pane to be a single box containing all notifications.
- Improves workflow metadata persistence: A newly created workflow will now show on the UI even before any runs are finished and persisted.
- Adds support for optionally lazily executing functions during workflow definition. You can also set
the global configuration for all functions to be lazy by using
aqueduct.global_config({"lazy": True})
.
@op
def my_op(input):
# ... modify your data...
return output
result = my_op.lazy(input) # This will not execute immediately.
result.get() # This will force execution of `my_op`.
- Enforces typing for saved data; only tabular data is now saveable to relational DBs.
- Makes exported function code human-readable. When you download the code for a function, it will include a file with the name of the operator, which will have the function's Python code.
None! 🎉
Released on August 25, 2022.
- Adds support for running workflows on Kubernetes. You can now register a Kubernetes integration
from the UI by providing the cluster's kubeconfig file and publish workflows
to run on Kubernetes by modifying the
config
argument in the SDK'spublish_flow
API. - Enables using Google Cloud Storage (GCS) as Aqueduct's metadata store. You can register GCS as a storage integration from the UI and store Aqueduct metadata in GCS.
- Adds support for editing the authentication credentials of existing integrations from the UI.
- Adds support for deleting integrations from the UI.
- Adds support for deleting data created by Aqueduct when deleting a workflow; when deleting a workflow, you will now see an option to select the objects created by this workflow.
None! 🎉
Released on August 23, 2022.
- If you did a fresh installation of Aqueduct v0.0.10, you may have run into a bug that says our
schema migrator did not run successfully. To fix this, run
aqueduct clear
andpip3 install --upgrade aqueduct-ml
. You can then start the server viaaqueduct start
and everything should work again.
- Fixes a bug where a fresh installation of Aqueduct fails due to a bug in the schema migration process.
Released on August 22, 2022.
- Adds support for non-tabular data types; operators can now return any Python-serializable object. Under the hood, Aqueduct has special optimization for JSON blobs, images, and tables, in addition to supporting regular Python objects.
- Enables eager execution when defining workflow artifacts; artifacts are now
immediately computed at definition time, before calling the
get
API, which surfaces potential errors earlier during workflow construction.
- Caches previously computed function results to avoid repetitive recomputation.
- Enables using AWS S3 as Aqueduct's metadata store; when connecting an S3 integration, you can now optionally choose to store all Aqueduct metadata in AWS S3.
- Fixes a bug where the DAG view would ignore the selected version when refreshing the page.
Released on August 15, 2022.
- Removes the system name prefix from integration connection form; users found this confusing because it was unclear you had to provide a name in addition to the prefix.
- Removes deprecated CLI commands,
aqueduct server
andaqueduct ui
. - Adds
__str__
method to SDKTableArtifact
class to support pretty-printing. - Adds support for authenticating with AWS S3 via pre-defined credentials files, including when authentication was done via AWS SSO.
- Fixes bug where Python requirements weren't properly installed when the client and the server ran on different machines.
- Fixes bug where Python stack traces were truncated when running imported Python functions.
- Fixes bug where errors generated when uploading a CSV to the Aqueduct demo database were formatted poorly and unreadable.
- Fixes bug where SDK client would indefinitely cache the list of connected integrations; if a user connected an integration after creating an SDK client, that integration would not have been accessible from the SDK client.
Released on August 8, 2022.
- Uses
pip freeze
to detect and capture local Python requirements when an explicit set of requirements is not specified during function creation. - Adds download bars to CLI to demonstrate progress when downloading files from S3.
- When running the Aqueduct server locally, the CLI now automatically opens a
browser tab with the Aqueduct UI on
aqueduct start
and passes the local API key as a query parameter to automatically log in. - When running on EC2 with
--expose
, detects and populates the public IP address of the current machine in CLI output onaqueduct start
. - Makes the file format parameter in the S3 integration a string, so users can
specify file format by passing in
"csv"
,"json"
, etc. - Improves the layout and readability of the integrations UI page by adding
explicit cards for each integration and also labeling each one with its
name.
- Allows users to create operators from existing functions without redefining
the operator with a decorator -- using
aqueduct.to_operator
, an existing function can be converted into an Aqueduct operator. - Reduces CLI log output by redirecting info and debug logs to a log file; adds
a corresponding
--verbose
flag to the CLI so users can see log output in terminal if desired. - Reorganizes integration management behind a dropdown menu, adding option to
test whether the integration connection still works or not.
- Adds "Workflows" section in the integration management page to show all workflows and operators associated with the integration.
- Fixes bug where interacting with the UI when the Aqueduct server was off resulted in an unhelpful error message ("Failed to fetch."). The fix explicitly detects whether the server is unreachable.
- Fixes bug where missing dependencies for integrations (e.g., requiring a Python package to access Postgres) were not explicitly surfaced to the user -- a cryptic import error message has been replaced with an explicit notification that a dependency needs to be installed.
- Fixes bug where metric nodes were misformatted.
- Fixes bug where loading large tables caused UI to significantly slow down because React was blindly rendering all cells -- using virtualized tables, the UI now only renders the data that is being shown on screen.
Released on August 1, 2022.
- Upgrades to go-chi v5.
- Removes need to provide API key and server address when running client and server on same machine.
- Adds support for operators with no input data.
- Fixes bug where imported functions were not executed correctly.
- Improves CSV upload UI to make data preview accurate and more legible.
- Fixes bug where requirements.txt was not consistently used.
- Fixes bug where bottom sidesheet and DAG viewer were misaligned and improperly sized.
Released on July 25, 2022.
- Prints error message as part of preview execution stack trace, not above it.
- Fixes bug where parameters argument to
head
function was unused. - Fixes bug where menu sidebar didn't link to home page.
- Fixes bug where operator zipfiles weren't cleaned up after workflow creation.
- Fixes bug where S3 connection listed all objects in bucket, causing connection to be extremely slow.
- Fixes bug where error and warning checks aren't properly distinguished.
Released on July 14, 2022.
- Makes password optional when creating a Postgres connection.
- Adds
describe
method to every relational integration. - Improves log capture when executing user functions.
- Enables configuration of S3 storage backend for version snapshots and operator metadata.
- Displays workflow ID on workflow settings modal.
- Adds ability to fetch an individual artifact from a workflow run using the SDK.
- Supports reading multiple S3 files into a single Pandas DataFrame.
- Deprecates showing
pyplot
image in notebook on workflow creation; instead, provides link to UI.
None! 🎉
Released on July 7, 2022.
- Workflows can now have custom parameters! A workflow can have any numbers of parameters which can be used in Python operators or SQL queries. See here for more details.
- Add SDK support for fetching and pretty-printing workflow and workflow run metadata.
- Hide success notifications by default to avoid repetitive notifications.
- Allow for custom port selection for the Postgres integration.
- Allow requirements.txt to be set on an operator-by-operator basis.
- Add ability to copy SDK initialization snippet from account page.
- Allows metrics to be integers in addition to floats.
- Adds syntax candy for
head
onTableArtifact
s.
- Fix bug that showed undefined in search bar when data search returned empty results.
- Fix bug where integration passwords were shown in plaintext on request headers.
- Fix bug where schema metadata was improperly persisted.
- Fix bug that disallowed non-CSV file uploads.
- Fix bug that caused unnecessary repetitive calls to the DAG render API.
- Fix a number of minor UI bugs -- margins, button placement, etc.
- Deprecates use of ipynbname in the SDK, which prevented the SDK from running in some notebook environments.
Released on June 21, 2022.
- View what tables are present in an integration by clicking on the integration in the UI.
- View all data artifacts created by Aqueduct on the
/data
page on the UI. - Add support for pre-defined metrics and checks, including lower & upper bounds and equality checks.
- Implement support for capturing low-level metrics, such as compute time, CPU usage, and memory usage, on a per-operator basis.
- API keys can now be retrieved from the SDK if running on the same machine as the Aqueduct server with
aqueduct.get_apikey()
. - Add feature to automatically search for next available port when port 8080 is occupied.
- Users can upload custom data to the Aqueduct demo DB — navigate to the integrations page, click on the Aqueduct Demo database, and hit “Add CSV”.
- Allow users to optionally specify HTTP/S prefix when creating Aqueduct API client.
- Implements support for creating checks via Great Expectations.
- Simplifies notifications interface by reducing redundant text.
- Fix bug where logs directory didn’t exist on upgraded installations.
- Fix bug where account page wasn’t previously being displayed.
- Kenneth Xu
- Vikram Sreekanti
- Chenggang Wu
- Fanjia Yan
- Haris Choudhary
- Andre Giron
- Hari Subbaraj
- Eunice Chan
- Saurav Chhatrapati
- Boyuan Deng
Released on June 9, 2022.
- Allows users to start both the backend server and UI with
aqueduct start
- Removes NextJS from UI project, reverting to a vanilla React app packaged with Parcel
- Removes need for users to have
npm
installed by serving the UI from the same server as the backend - Automatically ships common library as a transpiled module, removing need for explicit transpilation
- Allows users to retrieve package version by running
aqueduct version
- Only binds server to
localhost
by default, removing requirement for firewall permissions - Improves the thread safety of the job manager
- Allow users to execute annotated functions by calling
fn.local(args)
- Fixes incorrect use of
typing
library for Python3.7 - Fixes inconsistency in DAG rendering which would previously cause page load jitter
- Fixes bug where bounds on metrics were mislabeled
- Joey Gonzalez
- Kenneth Xu
- Vikram Sreekanti
- Chenggang Wu
- Boyuan Deng
- Fanjia Yan
- Haris Choudhary
- Andre Giron
- Will Crosier
- Wei Chen
Released on May 26, 2022.
Initial release of the Aqueduct open-source project.