Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for accessing internal repositories with tokens #519

Open
ghost opened this issue Oct 8, 2020 · 8 comments
Open

Add support for accessing internal repositories with tokens #519

ghost opened this issue Oct 8, 2020 · 8 comments
Labels
feature-request New feature request

Comments

@ghost
Copy link

ghost commented Oct 8, 2020

Motivation

In our current setup, we are using SaaS Gitlab where all projects are private. In that case, huskyCI is always accessing internal repositories. A current approach where SSH key is used is not the preferred option.
We will have a lot of different teams to add their SSH keys and we don't have service users. It's hard to manage personal keys without service users for more than 500 repositories so we prefer access through tokens with a set expiration date.

It would be great if

We add support for accessing internal repositories through tokens like:

Then we may add automation through API and create tokens as needed.

What we expect

We suggest adding a new environment variable that can be defined inside CI/CD

huskyci-report:
  image: docker-registry/huskyci-client:<tag>
  variables:
    HUSKYCI_CLIENT_REPO_URL: <repo_url>
    HUSKYCI_CLIENT_REPO_BRANCH: <repo_name>
    HUSKYCI_CLIENT_API_ADDR: <huskyci_url>
    HUSKYCI_CLIENT_API_USE_HTTPS: "true"
    HUSKYCI_API_GIT_ACCESS_TOKEN: <git_access_token>
  script:
    - scan

Tips

@rafaveira3 rafaveira3 added the hacktoberfest2022 https://opensource.globo.com/hacktoberfest label Oct 8, 2020
@Krlier Krlier added feature-request New feature request and removed hacktoberfest2022 https://opensource.globo.com/hacktoberfest labels Nov 9, 2020
@thepabloaguilar
Copy link

Is this issue already relevant to the project? Also, could you give more details about the automation part?

@ghost
Copy link
Author

ghost commented Oct 4, 2021

thought: Not sure if I'm the one that should answer these questions, but I will try. For me, this issue is still relevant.
Below I will assume that we are adding support for deploy token since it has the least amount of privileges. Overview of GitLab tokens describes more options.
I did not check how it looks for GitHub or BitBucket, from a brief look, there are minor differences, so we may need an additional variable to identify the repository to support it properly. Below I will assume that we talk only about GitLab.

util_test.go describes the flow, urrently when we want to check out the given repo to scan it. There are few steps.

  1. We extract %GIT_REPO%, %GIT_BRANCH% from cmd and replace it with the proper repository URL in test it's seen as
inputCMD := "git clone -b %GIT_BRANCH% --single-branch %GIT_REPO% code --quiet 2> /tmp/errorGitClone -- "

expected := "git clone -b myBranch --single-branch https://github.com/globocom/secDevLabs.git code --quiet 2> /tmp/errorGitClone -- "
  1. Then we extract GIT_SSH_URL and GIT_URL_TO_SUBSTITUTE from cmd and replace it with the SSH equivalent in test it looks like
expectedNotEmpty := "git config --global url.\"[email protected]:\".insteadOf \"https://gitlab.example.com/\""
  1. It assumes that there is a PrivateSSHKey.
rawString := "echo '%GIT_PRIVATE_SSH_KEY%' > ~/.ssh/huskyci_id_rsa &&"
expectedNotEmpty := "echo 'PRIVKEYTEST' > ~/.ssh/huskyci_id_rsa &&"

In case of support for GitLab tokens

Instead of HUSKYCI_API_GIT_PRIVATE_SSH_KEY we will have the add HUSKYCI_API_GIT_ACCESS_TOKEN and add username HUSKYCI_API_GIT_ACCESS_TOKEN_USERNAME. It has few implications.

  1. We check if we have both variables and decide flow when both variables are not empty. I'm leaning more towards token usage as a priority.

In case of SSH key flow we have

cmd := util.HandleCmd(scanInfo.URL, scanInfo.Branch, scanInfo.Container.SecurityTest.Cmd)
cmd = util.HandleGitURLSubstitution(cmd)
finalCMD := util.HandlePrivateSSHKey(cmd)

For TOKEN flow we can have

cmd := util.HandleCmd(scanInfo.URL, scanInfo.Branch, scanInfo.Container.SecurityTest.Cmd)
cmd = util.HandleGitURLSubstitution(cmd)
finalCMD := util.HandlePrivateToken(cmd)
  1. We can extend HandleGitURLSubstitution to support token based URL so to current transformation https://gitlab.example.com/ to [email protected]
    add token specific from https://gitlab.example.com to https://<username>:<deploy_token>@gitlab.example.com/ - this URL is simplified

  2. In HandlePrivateToken(cmd) we will have to inject both OS variables HUSKYCI_API_GIT_ACCESS_TOKEN, HUSKYCI_API_GIT_ACCESS_TOKEN_USERNAME so we can clone the repository.

Variable names are added just as an example. I'm not sure how the current project naming convention looks. Also not included test specification in this answer. Please correct me if I get anything wrong.

@thepabloaguilar
Copy link

@szymonwyrwiak after some deep dive in the source code I just realized that you can do what you want without any modification!!
You can just use the "HUSKYCI_CLIENT_REPO_URL":

huskyci-report:
  image: docker-registry/huskyci-client:<tag>
  variables:
    HUSKYCI_CLIENT_REPO_URL: https://<username>:<deploy_token>@gitlab.example.com/your_repository.git
    HUSKYCI_CLIENT_REPO_BRANCH: main
    HUSKYCI_CLIENT_API_ADDR: <huskyci_url>
    HUSKYCI_CLIENT_API_USE_HTTPS: "true"
  script:
    - scan

You can do something like:

huskyci-report:
  image: docker-registry/huskyci-client:<tag>
  variables:
    HUSKYCI_CLIENT_REPO_BRANCH: main
    HUSKYCI_CLIENT_API_ADDR: <huskyci_url>
    HUSKYCI_CLIENT_API_USE_HTTPS: "true"
  script:
    - HUSKYCI_CLIENT_REPO_URL="$CI_SERVER_PROTOCOL://$CI_DEPLOY_USER:$CI_DEPLOY_PASSWORD@$CI_SERVER_URL/$CI_PROJECT_PATH.git"
    - scan

Of course, if you know which protocol to use and what's your instance URL you can make something shorter!

https://$CI_DEPLOY_USER:[email protected]/your_repository.git

$CI_DEPLOY_USER and $CI_DEPLOY_PASSWORD exist if you set a project deploy token Predefined variables reference

The reason I'm saying that solution works is because husky injects your repo url directly in the command, example here, if you not set HUSKYCI_API_GIT_SSH_URL and HUSKYCI_API_GIT_URL_TO_SUBSTITUTE nothing will happen, as you can see here. Also, if you not set GIT_PRIVATE_SSH_KEY nothing will happen too, see here!

Is that work for you?

@thepabloaguilar
Copy link

Oh, now I see the problem! Sorry, the problem is to get the dependencies for a project in golang!

But the same approach from above should work:

HUSKYCI_API_GIT_SSH_URL=https://$USER:[email protected]/
HUSKYCI_API_GIT_URL_TO_SUBSTITUTE=https://gitlab.example.com/

Maybe we just need to change the env variable name!

@thepabloaguilar
Copy link

Ok, for formatting reasons that idea above not possible (it adds and : at the end) but that's the way!

Now I'm thinking, maybe it's not worth for you to use an environment variable like that!

I have an idea (maybe it's your original idea 😆), we modify both client and server to send/receive another param like HUSKY_AUTH_GIT (???) (should follow this format https://$USER:[email protected]/) and if there's present in the request we don't inject the GIT_SSH_URL and GIT_URL_TO_SUBSTITUTE since we'll have a conflict with the url to substitute in gitconfig and we inject that one passed through the API!

WDYT @szymonwyrwiak?? I'd like to invite @rafaveira3 to this discussion too since he's the person who have more commits therefore more context!

@ghost
Copy link
Author

ghost commented Oct 6, 2021

The idea with the HUSKY_AUTH_GIT sounds good to me, but we have to consider our support model. In the GitLab case, the sole URL is enough, but other cases require setting up a proper header like in GitHub.

I want to avoid a situation where we are adding support only for one case without considering the broad picture. It can introduce technical debt down the way.

With the current dev landscape, it would be great to have something that can support cloud-based deployment. In these models, usually, there will be token injection, and this token will be short-lived.

It may broaden our evaluation where we are adding the possibility to construct URL and header simultaneously. URL is a required parameter, and the header is an optional one. Then we will have to evaluate if the current case is:

  • SSH-based access (SSH)
  • Token in URL (URL)
  • URL + Token in header (URL+header)
  • Token in the header (header)

And based on it, act accordingly. With all this in mind, we cannot make any decision without inviting @rafaveira3

As for business cases, you are probably aware that people already wrap huskyCI and add pricing for their services. I remember @edersonbrilhante showing it to me once.

If we add support for all cases mentioned above, we will enable cloud services to add huskyCI as a base for scanning activities that they offer. It will be their decision, but changes in this area can lead to that outcome.

In this answer, I avoided business case creation where we try to specify how to access will work, for example, in AWS with STS, IAM usage, etc.

For now, it would be easier to focus on different authentication schemas that we can support and keep standards in mind.

@thepabloaguilar
Copy link

The idea with the HUSKY_AUTH_GIT sounds good to me, but we have to consider our support model. In the GitLab case, the sole URL is enough, but other cases require setting up a proper header like in GitHub.

AFIK GitHub supports this pattern too: https://$USER:[email protected]/

@thepabloaguilar
Copy link

@szymonwyrwiak I've just tested both, https://$USER:[email protected]/ and https://[email protected]/. I was able to clone a private repo from my GitHub account!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature-request New feature request
Projects
None yet
Development

No branches or pull requests

3 participants