Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make deployment on cloud.gov easy #1

Draft
wants to merge 36 commits into
base: main
Choose a base branch
from
Draft

Make deployment on cloud.gov easy #1

wants to merge 36 commits into from

Conversation

mogul
Copy link
Collaborator

@mogul mogul commented Mar 29, 2024

In order to justify more spike stories (and win over skeptics on diagram-as-config among government Python coders), a portfolio architect (me) demonstrate a fast, compliant path to production for spiff-arena by deploying and demoing it on the cloud.gov PaaS.

Acceptance Criteria

  • GIVEN I have logged into cloud.gov from the CLI
    AND I have cloned this repository
    WHEN I copy vars.yml-template to vars.yml
    AND I replace slug with a custom value
    AND I run cf push --vars-file vars.yml
    AND I open the URL https://spiffworkflow[[slug]].app.cloud.gov
    THEN I see the SpiffWorkflow login screen

For demo purposes:

  • Show it populated with the examples from the upstream repository
  • Show creating a ServiceAccount associated with a user, getting the api_key, and then reproducing the Web API example live
  • Show how to interact with dmn tables

For production-readiness:

  • Show it using Postgres database instead of sqlite
  • Show it using the cloud.gov IdP service

@mogul
Copy link
Collaborator Author

mogul commented Mar 29, 2024

Where we're stuck: As of now we think we have set all the requisite variables for the frontend and variables for the backend correctly, and the backend status is reporting 200 (live link). However, when we try to visit the frontend (live link), we are redirected to https://spiffworkflow-groundhog-heartbeat.app.cloud.gov/api/v1.0/login?redirect_url=https://spiffworkflow-groundhog-heartbeat.app.cloud.gov/&authentication_identifier=default and we see

{
  "error_code": "internal_server_error",
  "message": "InvalidRedirectUrlError: Invalid redirect url was given: 'https://spiffworkflow-groundhog-heartbeat.app.cloud.gov/'. It must match the domain the frontend is running on.",
  "status_code": 500
}

🤔🤔🤔

@mogul mogul marked this pull request as draft March 29, 2024 15:41
@mogul mogul force-pushed the deploy-to-cloud-gov branch from 5cff7d9 to 41cce00 Compare March 29, 2024 15:49
@mogul mogul force-pushed the deploy-to-cloud-gov branch from 41cce00 to 2a04d3d Compare March 29, 2024 17:06
@mogul
Copy link
Collaborator Author

mogul commented Mar 29, 2024

Just redeployed after rebasing to main, and we got a better error message:

{
  "error_code": "internal_server_error",
  "message": "InvalidRedirectUrlError: Invalid redirect url was given: 'https://spiffworkflow-groundhog-heartbeat.app.cloud.gov/'. It must start with the frontend url: 'https://spiffworkflow-groundhog-heartbeat.app.cloud.gov/api'",
  "status_code": 500
}

@mogul
Copy link
Collaborator Author

mogul commented Mar 29, 2024

Solved in the Spiff Discord!
image

@mogul
Copy link
Collaborator Author

mogul commented Mar 29, 2024

(I'm tearing that instance down now since the route was published and I don't want people monkeying with it while we work.)

@mogul
Copy link
Collaborator Author

mogul commented Mar 30, 2024

Current status: I am redirected to the login page, but when I try to log in with the default creds I'm told I have an invalid token... but! this is progress!

{
  "error_code": "invalid_token",
  "message": "Login failed. Please try again",
  "status_code": 401
}

@mogul
Copy link
Collaborator Author

mogul commented Apr 4, 2024

Current status: We were having trouble successfully logging in. I bought that back to the SpiffWorkflow Discord and they reproduced the problems we were having locally on a path-based setup. They made a fix and better documented how to host the API on a subdomain upstream. I think Cloud Foundry does the proxying/path removal as they expect, so we should be good there. We need to rebase our tree and generate a new image to see if their changes resolved the problem we're seeing!

@asteel-gsa asteel-gsa force-pushed the deploy-to-cloud-gov branch from 4120182 to 2e78298 Compare April 4, 2024 20:22
@asteel-gsa
Copy link

main sync'd w/ upstream, branch rebased onto main.
Image Building Now

@mogul
Copy link
Collaborator Author

mogul commented Apr 4, 2024

I think Cloud Foundry does the proxying/path removal as they expect, so we should be good there.

Looks like I was wrong; CF doesn't strip the path element from requests delivered to an app that's on a subpath of a route.

@mogul
Copy link
Collaborator Author

mogul commented Apr 5, 2024

We should be able to strip the path from the request via judicious application of nginx to implement a proxy. Three options for that spring to mind:

  1. If we ensure nginx is installed in the spiffworkflow image, I think we can do that by moving the invocation of the spiffworkflow-background app's start-command to a sidecar command in the manifest, and then making nginx the main command. However, I've never tried making a sidecar with a Docker image app...
  2. If we move from deploying spiffworkflow as a Docker image app to deploying it as a python_buildpack app, then we can definitely go the sidecar route... but at that point we're back in the territory of trying to work out things that are already solved in the upstream docker-compose.yml path, which was why we stopped work on the buildpack path earlier in this branch. (We may still need to revisit that path for production in any case but I don't want to tackle that until we successfully run spiffworkflow in our environment, so we can unblock the FAC team's adoption of it.)
  3. We can deploy spiffworkflow-backend on an internal route, and then deploy a separate proxy app on the /api subpath of the public route using the nginx_buildpack.

Having written these out, I think that the third option is the most straightforward in terms of solving the specific problem we have right now without introducing unknowns into the spiffworkflow side, where we're still getting our bearings.

@mogul
Copy link
Collaborator Author

mogul commented Apr 5, 2024

There's an upstream PR just now:

if backend is generating urls, we need to be careful. perhaps we try to generate a URL that mimics the current url (/api prefix if we are being accessed at /api and / if not)

We should see what happens with it!

@mogul
Copy link
Collaborator Author

mogul commented Apr 5, 2024

There's an upstream PR just now:
[...]
We should see what happens with it!

Huzzah!

@mogul mogul force-pushed the deploy-to-cloud-gov branch from 2e78298 to bef2118 Compare April 5, 2024 20:45
@mogul
Copy link
Collaborator Author

mogul commented Apr 7, 2024

The upstream change was still generating bad URLs. I've got a PR in flight that should fix this sub-path case. I merged that branch into this branch so I can test it with our images, once they're built.

@mogul
Copy link
Collaborator Author

mogul commented Apr 8, 2024

Now chasing sartography#1350. Image build in progress.

@mogul
Copy link
Collaborator Author

mogul commented Apr 8, 2024

Having merged in the pending branch from upstream, current status:
image
😎 😎 😎

@mogul
Copy link
Collaborator Author

mogul commented Apr 8, 2024

Thinking now about what I would want to add to this before we present on it...

For demo purposes:

  • Show it populated with the examples from the upstream repository
  • Show creating a ServiceAccount associated with a user, getting the api_key, and then reproducing the Web API example live

For production-readiness:

  • Show it using Postgres database instead of sqlite
  • Show it using the cloud.gov IdP service

@mogul mogul force-pushed the deploy-to-cloud-gov branch from e292811 to 75df1d1 Compare April 9, 2024 00:03
@mogul
Copy link
Collaborator Author

mogul commented Apr 9, 2024

@asteel-gsa just rebased and force-pushed... Can you confirm this is working for you too?

@mogul
Copy link
Collaborator Author

mogul commented Apr 9, 2024

I should be clearer... It works for me to login, but I'm still getting tons of 502s and non-functional pages. Are you also able to at least login?

@mogul
Copy link
Collaborator Author

mogul commented Apr 9, 2024

I'm still getting tons of 502s and non-functional pages.

This was an OOM situation... The frontend and backend need more than the 256M each we'd given them, and while cloud.gov restarted either of them we'd see blank pages and errors in the Chrome Inspector console. Now they're at 512M each (still small enough to deploy in a cloud.gov sandbox account) and everything seems to be working.

@asteel-gsa
Copy link

asteel-gsa commented Apr 9, 2024

@asteel-gsa just rebased and force-pushed... Can you confirm this is working for you too?

I am only able to access https://spiffworkflow((slug)).app.cloud.gov/api/v1.0/ui/ and am unable to get to the login screen for the frontend with latest. Probably best to touch base and figure out where the diff is, so we can document.

@asteel-gsa
Copy link

@asteel-gsa just rebased and force-pushed... Can you confirm this is working for you too?

I am only able to access https://spiffworkflow((slug)).app.cloud.gov/api/v1.0/ui/ and am unable to get to the login screen for the frontend with latest. Probably best to touch base and figure out where the diff is, so we can document.

Figured out what needed to be done
cf bind-security-group public_networks_egress ORGNAME --lifecycle running --space SPACENAME which gave the public_networks_egress to my sandbox space, allowing the redirects to occur when going to the frontend, allowing me to see the login splash and use the default credentials to gain access to the application.

@asteel-gsa
Copy link

asteel-gsa commented Apr 10, 2024

Figured out how to get the process-models "examples" to show up..

inside /bin/clone_process_models.sh there is a command

git clone -b "$SPIFFWORKFLOW_BACKEND_GIT_SOURCE_BRANCH" "$SPIFFWORKFLOW_BACKEND_GIT_PUBLISH_CLONE_URL" "$SPIFFWORKFLOW_BACKEND_BPMN_SPEC_ABSOLUTE_DIR"

Which, seems to have worked to drop the git fork we wanted into /app/process_models/, which gave us this.
image
image

I did however need to rm -rf /app/process_models as this seems to be part of the /app/bin/boot_server_in_docker to git init here

Not entirely sure why directly calling this script wont work? It seems executable in the repo, but has no output at all.

Attempted to add it to the command block with no execution. Going to mess around with it, but this is the current blocker for me.

command: |
/app/bin/boot_server_in_docker
/app/bin/clone_process_models

@asteel-gsa
Copy link

Attempted to add it to the command block with no execution. Going to mess around with it, but this is the current blocker for me.

Figured it out. (i think). Current stash has the working version, but hasn't been commited, since, for whatever reason @mogul with the latest commits you did, i get a crash loop on attempting to do the connection to the database, though the uri does look correct. Something with sqlalchemy is still a bit weird for the postgres instance. If you send me your aws_rds cf service plan, and it works for you, I can tear down my db_instance and retry, but for now, my stash is using sqllite db.

Stash References:
/bin/boot_server_in_docker

# ignore this stash, i was testing something, but of course, as I pushed and was about to wait for a build on the backend, this was unnessary
- # git init "${SPIFFWORKFLOW_BACKEND_BPMN_SPEC_ABSOLUTE_DIR}"
+ git init "${SPIFFWORKFLOW_BACKEND_BPMN_SPEC_ABSOLUTE_DIR}"
git config --global --add safe.directory "${SPIFFWORKFLOW_BACKEND_BPMN_SPEC_ABSOLUTE_DIR}"

spiffworkflow_backend/config/local_development.py

- SPIFFWORKFLOW_BACKEND_GIT_USERNAME = "sartography-automated-committer"
- SPIFFWORKFLOW_BACKEND_GIT_USER_EMAIL = f"{SPIFFWORKFLOW_BACKEND_GIT_USERNAME}@users.noreply.github.com"
+ SPIFFWORKFLOW_BACKEND_GIT_USERNAME = environ.get(
    "SPIFFWORKFLOW_BACKEND_GIT_USERNAME",
    default="sartography-automated-committer")
+ SPIFFWORKFLOW_BACKEND_GIT_USER_EMAIL = environ.get(
    "SPIFFWORKFLOW_BACKEND_GIT_USER_EMAIL",
    default=f"{SPIFFWORKFLOW_BACKEND_GIT_USERNAME}@users.noreply.github.com"
)

manifest.yml

  command: |
    # Get the postgres URI from the service binding. (SQL Alchemy insists on "postgresql://".🙄)
    export SPIFFWORKFLOW_BACKEND_DATABASE_URI=$( echo $VCAP_SERVICES | jq -r '.["aws-rds"][].credentials.uri' | sed -e s/postgres/postgresql/ ) 
    # export SPIFFWORKFLOW_BACKEND_DATABASE_URI=$( echo $VCAP_SERVICES | jq -r '.["aws-rds"][].credentials.uri' | sed -e s/postgres/postgresql/ )
    /app/bin/clone_process_models
    /app/bin/boot_server_in_docker

# VCAP_SERVICES
SPIFFWORKFLOW_BACKEND_DATABASE_URI: "sqlite:///db.sqlite3"

# Rearrange later
# https://github.com/sartography/spiff-arena/blob/293aa867a1cef056c5bee3ef037be31047fdc49e/spiffworkflow-backend/src/spiffworkflow_backend/config/default.py#L157-L179
SPIFFWORKFLOW_BACKEND_GIT_PUBLISH_CLONE_URL: ((git-process-models))
SPIFFWORKFLOW_BACKEND_GIT_USERNAME: "asteel-gsa"
SPIFFWORKFLOW_BACKEND_GIT_USER_EMAIL: "[email protected]"
SPIFFWORKFLOW_BACKEND_GIT_COMMIT_ON_SAVE: "true"
SPIFFWORKFLOW_BACKEND_GIT_SOURCE_BRANCH: "main"
SPIFFWORKFLOW_BACKEND_BPMN_SPEC_ABSOLUTE_DIR: "/app/process_models"
SPIFFWORKFLOW_BACKEND_GIT_SSH_PRIVATE_KEY: "/tmp/ssh_private_key.XXXXXX"

# https://github.com/sartography/spiff-arena/blob/main/spiffworkflow-backend/bin/clone_process_models
# https://github.com/sartography/spiff-arena/blob/main/spiffworkflow-backend/bin/find_sample_process_models


SPIFFWORKFLOW_BACKEND_DATABASE_TYPE: "sqlite"
# SPIFFWORKFLOW_BACKEND_DATABASE_TYPE: "postgres"

vars.yml / vars.yml-template

git-process-models: https://github.com/gsa-tts/gsa-process-models.git

Live Link: Here

Ill keep the stash until we figure out the postgres issue, then commit it up, but adding the code for reference for what needed to be done @mogul

mogul and others added 29 commits December 3, 2024 08:01
We're not tackling how to parse postgres service details out of
VCAP_SERVICES just yet, sticking with sqlite for now.
* Rename stuff to be clearer that the "slug" isn't random
* Include the slug in the app name to facilitate parallel deployments
    in the same space.
* Default to 1 image, so you can fit two deployments in a 1G sandbox
    space
This is asking for the URL to the frontend that the backend should use,
not vice-versa!
Enough to function properly, but still fits in a sandbox account
Use the bound Postgres service instead of SQLite
- Process model examples are shown
- Process model edits are commited to repo
- Process model publishing works
We want to make it obvious that this variable is only for the backend app
We needed to
* add a network policy to enable the backend to hit the internal route
* add the Cloud Foundry-provided CA cert bundle to the Docker
container at startup time
* ensure Python picks up added CA certs
We don't want any app to fail to start up the first time someone
pushes this manifest. So we start with the connector, then the backend,
then the frontend.

Upstream improvement: loop when there are errors like this
@mogul mogul force-pushed the deploy-to-cloud-gov branch from 4c0bbe2 to 1a93e3f Compare December 3, 2024 16:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants