Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge main02 #13

Open
wants to merge 78 commits into
base: snowflake-source-scala-update
Choose a base branch
from
Open

Conversation

aabbasi-hbo
Copy link
Owner

Description

Resolves #XXX

How was this PR tested?

Does this PR introduce any user-facing changes?

  • No. You can skip the rest of this section.
  • Yes. Make sure to clarify your proposed changes.

loomlike and others added 30 commits November 12, 2022 00:39
* Meaningful error

* Copy/paste typo
* Enhance error messages of synapse jobs
…ai#808)

* Support timePartitionPattern in paths of data sources.
Enhance sample notebook to solve some issues in synapse
Bumps [loader-utils](https://github.com/webpack/loader-utils) from 2.0.3 to 2.0.4.
- [Release notes](https://github.com/webpack/loader-utils/releases)
- [Changelog](https://github.com/webpack/loader-utils/blob/v2.0.4/CHANGELOG.md)
- [Commits](webpack/loader-utils@v2.0.3...v2.0.4)

---
updated-dependencies:
- dependency-name: loader-utils
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <[email protected]>

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* Fix unexpected 500

* Handle AtlasException
* Include noop-1.0.jar into the wheel

* No need for this
* Fix unexpected 500

* Handle AtlasException

* Missing typeName

* Specify purview client version

* Remove debug code
* Bump version to 0.9.0

* Include a python client setup fix from Jay

* Add a fallback
…athr-ai#862)

* Insert test coverage check for python client into github pipeline
…ass credentials (feathr-ai#818)

* Create feathr-registry-client-update.md

* add docs on additional user

* Update local-spark-provider.md

* Create feathr-advanced-topic.md

* Create feathr-credential-passthru.md

* Update feathr-credential-passthru.md

* update docs for feathr upgrade

* update docs

* fix conetent
* Fix local spark output file-format bug

Signed-off-by: Jun Ki Min <[email protected]>

* Add dev dependencies. Add unit-test for local spark job launcher

Signed-off-by: Jun Ki Min <[email protected]>

* Fix local spark submission unused param error

Signed-off-by: Jun Ki Min <[email protected]>

* Refactor nyc_taxi example. TODO: update refs to the notebook

Signed-off-by: Jun Ki Min <[email protected]>

* Add dataset utilities and notebook path refactor. TODO: update reference links

Signed-off-by: Jun Ki Min <[email protected]>

* Add init.py to datasets module. Modify maybe_download to accept dir as dst_path

Signed-off-by: Jun Ki Min <[email protected]>

* Add notebook test

Signed-off-by: Jun Ki Min <[email protected]>

* change notebook to use scrap flag and is_databricks

Signed-off-by: Jun Ki Min <[email protected]>

* Fix databricks path

Signed-off-by: Jun Ki Min <[email protected]>

* Fix unittest

Signed-off-by: Jun Ki Min <[email protected]>

* Modify databricks notebook. Fix dbfs path errors in utils.

Signed-off-by: Jun Ki Min <[email protected]>

* Address review comments

Signed-off-by: Jun Ki Min <[email protected]>

* put the user_workspace feature python files back

Signed-off-by: Jun Ki Min <[email protected]>

* Revive feathr_config.yaml

Signed-off-by: Jun Ki Min <[email protected]>

* Add custom marker to pyproject.toml

Signed-off-by: Jun Ki Min <[email protected]>

Signed-off-by: Jun Ki Min <[email protected]>
* Add more secret manager support

* Add abstract class

* Update feathr-configuration-and-env.md

* Update _envvariableutil.py

* add tests for aws secrets manager

* Update test_secrets_read.py

* fix tests

* Update test_secrets_read.py

* fix test

* Update pull_request_push_test.yml

* get_secrets_update

* move import statement

* update spelling

* update raise exception

* revert

* feature registry hack

* query for uppercase

* add snowflake source

* remove snowflake type

* enableDebugLogger

* add logging

* simple path snowflake fix

* snowflake-update

* fix bugs/log

* get_snowflake_path

* update get_snowflake_path

* remove log

* log

* add logs

* test with path

* update snowflake registry handling

* update source

* remove logs

* update error handling and test

* make lowercase

* remove logging

* Revert "Merge pull request #5 from aabbasi-hbo/secrets-key-test"

This reverts commit 41554b4, reversing
changes made to 6b401de.

* Revert "remove logging"

This reverts commit e01635d.

* Revert "update error handling and test"

This reverts commit e5c200f.

* Revert "query for uppercase"

This reverts commit 0531788.

* Revert "revert"

This reverts commit 87cd083.

* Revert "update raise exception"

This reverts commit 44a3ce0.

* Revert "update spelling"

This reverts commit 07a8cf0.

* Revert "move import statement"

This reverts commit 218123f.

* Revert "get_secrets_update"

This reverts commit 9cb332c.

* Revert "Update pull_request_push_test.yml"

This reverts commit e617b99.

* Revert "fix test"

This reverts commit 8be6a42.

* Revert "Update test_secrets_read.py"

This reverts commit 997a2b1.

* Revert "fix tests"

This reverts commit a6870d9.

* Revert "Update test_secrets_read.py"

This reverts commit aa5fdda.

* Revert "add tests for aws secrets manager"

This reverts commit cdcd612.

* Revert "Update _envvariableutil.py"

This reverts commit f616522.

* Revert "Update feathr-configuration-and-env.md"

This reverts commit 2d6c135.

* Revert "Add abstract class"

This reverts commit e96459a.

* Revert "Add more secret manager support"

This reverts commit c31906c.

* remove extra line

* fix formatting

* Update setup.py

* update python tests

* update scala test

* update tests

* update test

* add test

* update docs

* fix test

* add snowflake guide

* add to NonTimeBasedDataSourceAccessor

* remove registry fixes

* Update source.py

* Update source.py

* Update source.py

* remove print

* Update feathr-snowflake-guide.md

Co-authored-by: Xiaoyong Zhu <[email protected]>
* Restructure code to encapsulate the `save_to_feature_config_from_context`

* Update client.py

* fix merge issues

* Update config_helper.py

* fix comments
)

* Fix xdist test error. Also make a small cleanup some codes

Signed-off-by: Jun Ki Min <[email protected]>

* Revert "Revert 756 (feathr-ai#798)"

This reverts commit ff438f5.

* revert 798 (revert756 - example notebook refactor). Also add job_utils unit tests

Signed-off-by: Jun Ki Min <[email protected]>

* Update test_azure_spark_e2e.py

* Fix doc dead links (feathr-ai#805)

This PR fixes dead links detected in latest ci run. The doc scan ci action has been updated to run on main only, as running this in PR frequently reports false alarm due to changes in CI not deployed.

* Improve UI experience and clean up ui code warnings (feathr-ai#801)

* Add DataSourcesSelect and FlowGraph and ResizeTable components. Fix all warning and lint issues.

Signed-off-by: Boli Guan <[email protected]>

* Add CardDescriptions component and fix ESlint warning.

Signed-off-by: Boli Guan <[email protected]>

* Update FeatureDetails page title.

Signed-off-by: Boli Guan <[email protected]>

* Rename ProjectSelect

Signed-off-by: Boli Guan <[email protected]>

Signed-off-by: Boli Guan <[email protected]>

* Add release instructions for Release Candidate (feathr-ai#809)

* Add release instructions for Release Candidate

* Add a section for release versioning

* Add a section for overall process triggered by the release manager

* Bump version to 0.9.0-rc1 (feathr-ai#810)

* Fix tests to use mocks and fix get_result_df's databricks behavior

Signed-off-by: Jun Ki Min <[email protected]>

* fix tem file to dir

Signed-off-by: Jun Ki Min <[email protected]>

* checkout the feature_derivations.py from main (it was temporally changed to goaround previous issues)

Signed-off-by: Jun Ki Min <[email protected]>

* Remove old databricks sample notebook. Change pip install feathr from the github main branch to pickup the latest changes always

Signed-off-by: Jun Ki Min <[email protected]>

* Fix config and get_result_df for synapse

* Fix generate_config to accept all the feathr env var config name

Signed-off-by: Jun Ki Min <[email protected]>

* Add more pytests

Signed-off-by: Jun Ki Min <[email protected]>

* Use None as default dataformat in the job_utils. Instead, set 'avro' as a default output format to the job tags from the client

Signed-off-by: Jun Ki Min <[email protected]>

* Change feathr client to mocked object

Signed-off-by: Jun Ki Min <[email protected]>

* Change timeout to 1000s in the notebook

Signed-off-by: Jun Ki Min <[email protected]>

Signed-off-by: Jun Ki Min <[email protected]>
Signed-off-by: Boli Guan <[email protected]>
Co-authored-by: Blair Chen <[email protected]>
Co-authored-by: Blair Chen <[email protected]>
Co-authored-by: Boli Guan <[email protected]>
* Add 'format' arg to get_result_df

Signed-off-by: Jun Ki Min <[email protected]>

* Add unittest for arg alias of get_result_df

Signed-off-by: Jun Ki Min <[email protected]>

* Update explicit functions to use kwargs and update unit-tests accordingly

Signed-off-by: Jun Ki Min <[email protected]>

Signed-off-by: Jun Ki Min <[email protected]>
* Add working gradle build

* Set up pdl support

* Working PDL java code gen

* With pdl files from metadata models

* With pdl files from compute model

* Fix compile for all pdl files

* Add working gradle build

* Migrate frame-config module into feathr

* Migrate fcm graph module to feathr

* Add FCM offline execution code, includes FDS metadata code

* Add needed jars for feathr-config tests

* Switch client to FeathrClient2 for local tests and fix config errors

* Fix SWA test

* Add gradle wrapper jar

* Change name of git PR test from sbt to gradle

* Switch python client to use FCM client

* Exclude json from dependency

* Add hacky solution to handle json dependency conflict in cloud

* Add json to local dependency

* Add log to debug cloud jar

* Add json as dependency

* Another attempt at resolving json dependency

* Resolve json via shading

* Fix json shading

* Remove log

* Shade typesafe config for cloud jar

* Add maven publish code to build.gradle

* Add working local maven build and rename frame-config to feathr-config to avoid namespace conflict

* Modify sonatype creds

* Change so no need to sign if releasing snapshot version

* Update build.gradle to allow publishing of all modules

* Removed FDS handling from Feathr

* All tests working

* Deleted FR stuff

* Remove dimension and other tensor related stuff

* Remove mlfeatureversionurn from defaultvalueresolver

* Remove mlfeatureversionurn and featureref

* Remove featuredefinition files

* Remove featureRef and typedRef

* final cleanup

* Fix merge conflict bugs

* Fix guava error

* udf plugin for swa features

* row-transformations optimization

* fix bug

* fix another bug

* always execute agg nodes first

* Add SWA log

* reverse order of execution

* group by datasource

* Fix bug

* Merge main into fcm branch

* Remove insecure URLs

* Add back removed files

* Add back removed files

* Add back removed files

* Change PR build system to gradle

* Change sbt job to gradle jobb

* Change sbt workflow:wq

* Update maven github workflow to use gradle

* fix failing test

* remove sbt project module

* Remove sbt related files

* Change docs to reflect gradle

* Remove keywords

* Create a single jar

* 1. Fix jar not getting populated\n 2. Fix documentation bugs

* pubishToMavenLocal Working

* With FFE integrated

* maven upload working

* Update docs and code clean up

* add gradle-wrapper file

* Push all dependency jars

* Update docs

* Docs cleanup

* Update github workflow commands

* Update github workflow

* Update workflow syntax

* Update version

* Add gradle version to github workflow

* Update gradle version w/o quotes

* Remove github gradle version

* Github workflow fix

* Github workflow fix-2

* Github workflow fix-4

Co-authored-by: Bozhong Hu <[email protected]>
Co-authored-by: rkashyap <[email protected]>
… arg order to make it work with old codes (feathr-ai#890)

Signed-off-by: Jun Ki Min <[email protected]>

Signed-off-by: Jun Ki Min <[email protected]>
Bumps [minimatch](https://github.com/isaacs/minimatch) and [recursive-readdir](https://github.com/jergason/recursive-readdir). These dependencies needed to be updated together.

Updates `minimatch` from 3.0.4 to 3.1.2
- [Release notes](https://github.com/isaacs/minimatch/releases)
- [Changelog](https://github.com/isaacs/minimatch/blob/main/changelog.md)
- [Commits](isaacs/minimatch@v3.0.4...v3.1.2)

Updates `recursive-readdir` from 2.2.2 to 2.2.3
- [Release notes](https://github.com/jergason/recursive-readdir/releases)
- [Changelog](https://github.com/jergason/recursive-readdir/blob/master/CHANGELOG.md)
- [Commits](https://github.com/jergason/recursive-readdir/commits/v2.2.3)

---
updated-dependencies:
- dependency-name: minimatch
  dependency-type: indirect
- dependency-name: recursive-readdir
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <[email protected]>

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* Add docs for checking/improving test coverage
bozhonghu and others added 30 commits December 15, 2022 12:26
…om registry (feathr-ai#886)

Support printing features and returning keys dict when getting features from registry
Introduce an option to select between env vars and config yaml file.
Feathr client to use explicitly configured yaml file over environment variable if use_env_var flag is set to False.

The changes are added as the last argument of the existing functions and set default to True (use env variables) so that
existing codes don't break.

Resolves feathr-ai#922
* Update registry-access-control.md

* Update README.md

* add logo

* Update README.md
* Add is_synapse

Signed-off-by: Jun Ki Min <[email protected]>

* fix is_synapse and add tests

Signed-off-by: Jun Ki Min <[email protected]>

* Describe the reason we pin aiohttp package

Signed-off-by: Jun Ki Min <[email protected]>

* Change is_databricks to use os environ

Signed-off-by: Jun Ki Min <[email protected]>

Signed-off-by: Jun Ki Min <[email protected]>
Signed-off-by: Yuqing Wei <[email protected]>
* Support get online features by composite keys
* React best practice implementations in ui code

* Fix CI code defects

* Update package-lock.json

* Update package-lock.json
Latest numpy deprecated np.bool which conflicts with pyspark and pandas.
This PR pins the numpy version to use older one to resolve that. We can unpin later once pyspark resolves the issue.
* Create Feathr – An Enterprise-Grade High Performance Feature Store.pdf

* change name

* Update README.md
… to use latest feathr api (feathr-ai#921)

* Update notebooks to use latest codes with extra notebook dependencies

Signed-off-by: Jun Ki Min <[email protected]>

* wip. Remove azure cli package from extra dependencies

Signed-off-by: Jun Ki Min <[email protected]>

* Update fraud detection demo notebook and add test

Signed-off-by: Jun Ki Min <[email protected]>

* WIP debugging

Signed-off-by: Jun Ki Min <[email protected]>

* Update notebooks

Signed-off-by: Jun Ki Min <[email protected]>

* modify notebook test to go-around materialization issue

Signed-off-by: Jun Ki Min <[email protected]>

* Change notebook parameter name to align with client argument

Signed-off-by: Jun Ki Min <[email protected]>

* Update recommendation notebook

Signed-off-by: Jun Ki Min <[email protected]>

* Update synapse example. Add azure-cli dependency to notebook dependencies

Signed-off-by: Jun Ki Min <[email protected]>

* Update data url constants to point the source github repo's raw files

Signed-off-by: Jun Ki Min <[email protected]>

* add dataset url constants to init.py

Signed-off-by: Jun Ki Min <[email protected]>

* Update feature embedding notebook to use the original dataset from azure example github

Signed-off-by: Jun Ki Min <[email protected]>

* Add recommendation sample notebook test

Signed-off-by: Jun Ki Min <[email protected]>

* Fix numpy.bool deprecation error

Signed-off-by: Jun Ki Min <[email protected]>

* Change databricks cluster node size from Dv2 to DSv2

Signed-off-by: Jun Ki Min <[email protected]>

* Use Dv4 for databricks notebook test due to the limit of Dv2 quota at US East 2

Signed-off-by: Jun Ki Min <[email protected]>

* Fix to use the supported vm size

Signed-off-by: Jun Ki Min <[email protected]>

* pin numpy to resolve conflict with pyspark

Signed-off-by: Jun Ki Min <[email protected]>

* Add document intelligence sample notebook

Signed-off-by: Jun Ki Min <[email protected]>

* Update fraud detection sample

Signed-off-by: Jun Ki Min <[email protected]>

Signed-off-by: Jun Ki Min <[email protected]>
* Spark SQL source

* Add unit test

* Clean up after test

* Typo

* Fix conflict

* Add doc comments

* add spark sql source test cases

Signed-off-by: Yuqing Wei <[email protected]>

* add spark sql source test cases

Signed-off-by: Yuqing Wei <[email protected]>

* Ignore sql if it is None

Signed-off-by: Yuqing Wei <[email protected]>
Co-authored-by: Yuqing Wei <[email protected]>
This PR locks all python dependencies version for registry projects
feathr-ai#937)

* Add pytest cases and check test coverage for sql-registry and purview-registry
* modify types annotation
* Remove hadoop dependency

* Bump RC version

* Experiment test failures

* Exclude hadoop file

Co-authored-by: rkashyap <[email protected]>
* set purview name environment variable in workflow
I found there's a bug in the model result graph at the Fraud Detection sample notebook
Fixes are:

to use ML model output probability instead of the prediction class label for precision/recall graph
change the chart label to be the correct ML model name
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.