Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixing Dependencies in DbtRunner for Uninstalled Packages #1050

Conversation

Arun-kc
Copy link
Contributor

@Arun-kc Arun-kc commented Aug 2, 2023

Description:

This PR addresses issue #1039 regarding DbtRunner's behavior when required packages are not installed in the target dbt project. Currently, DbtRunner is able to run against a project without its required packages installed, leading to potential issues during execution.

@elongl elongl linked an issue Sep 3, 2023 that may be closed by this pull request
@@ -241,3 +242,28 @@ def ls(self, select: Optional[str] = None) -> list:

def source_freshness(self):
self._run_command(command_args=["source", "freshness"])

def get_installed_packages(self):
dbt_packages_folder = os.path.join(os.getcwd(), 'dbt_packages')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The dbt_packages directory will not be created relative to the working directory but rather to the dbt project directory. You can use the following in order to get its location.

return False

def _run_deps_if_needed(self):
packages_yaml_file = os.path.abspath('packages.yml')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as comment above, join packages.yml with the dbt project's path.

packages_yaml_file = os.path.abspath('packages.yml')
with open(packages_yaml_file, 'r') as file:
packages_data = yaml.safe_load(file)
package_names = [package_entry.get('package', '').split('/')[-1] for package_entry in packages_data['packages']]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you maybe add the if to the first list comprehension and remove the second?

Suggested change
package_names = [package_entry.get('package', '').split('/')[-1] for package_entry in packages_data['packages']]
package_names = [package_entry['package'].split('/')[-1] for package_entry in packages_data['packages'] if 'package' in package_entry]

@@ -241,3 +242,28 @@ def ls(self, select: Optional[str] = None) -> list:

def source_freshness(self):
self._run_command(command_args=["source", "freshness"])

def get_installed_packages(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Care to rename to get_installed_packages_names and make it a private method?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I have made the necessary change in the latest commit

package_names = [package_entry.get('package', '').split('/')[-1] for package_entry in packages_data['packages']]
package_names = [package for package in package_names if package != '']
installed_packages = self.get_installed_packages()
if not installed_packages:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about:

        if set(required_package_names) != set(installed_package_names):
            ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, this looks much clean than the previous method.

@Arun-kc
Copy link
Contributor Author

Arun-kc commented Sep 3, 2023

Hi @elongl , I have made the changes you have mentioned in the review. Please do let me know if you further require any changes. Thanks!

@Arun-kc Arun-kc marked this pull request as ready for review September 3, 2023 14:11

_PACKAGES_FILENAME = "packages.yml"

PATH = os.path.join(_MONITOR_DIR, "dbt_project")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually this does not make sense because it couples DbtRunner to the internal dbt project while it should remain generic. Instead, we should use self.project_dir and join with it everywhere.
Sorry for misleading you 🙏

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No worries, I have made the necessary changes in the latest commit. Just to be clear we will be doing a check on dbt_packages and packages.yml inside the self.project_dir to determine whether to use self.deps() or not, right?

Copy link
Member

@elongl elongl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for this contribution!

@elongl elongl merged commit 8426fa5 into elementary-data:master Sep 5, 2023
@Arun-kc
Copy link
Contributor Author

Arun-kc commented Sep 5, 2023

Thanks a lot for this contribution!

Thank you for the opportunity!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[ELE-1434] Make DbtRunner install the dbt packages if needed.
2 participants