Deploy OTP to AWS load balancer and manage OTP servers in separate collection (and misc other fixes) #225

landonreed · 2019-08-08T14:10:08Z

Checklist

Appropriate branch selected (all PRs must first be merged to dev before they can be merged to master)
Any modified or new methods or classes have helpful JavaDoc and code is thoroughly commented
The description lists all applicable issues this PR seeks to resolve
The description lists any configuration setting(s) that differ from the default settings
All tests and CI builds passing

Description

Note: this contains code originally contained in catalogueglobal#124

We have come to realize that the way we deploy new graphs to OTP (making a "build over the wire" request to existing EC2 servers) isn't reliable at all. This PR in conjunction with a UI PR adds to Data Tools the option to set up a "server" that targets a load balancer (rather than an existing EC2 server). It will:

spin up a new EC2 instance,
have the user script build the graph,
optionally upload the graph to S3, and finally
run OTP.

Another important change here that is related to deployments is to move the OtpServer objects stored in the Project POJO to their own MongoDB collection. Now, these objects are tied to projects by projectId and fetched via a JsonProperty-annotated getter method. This additionally allows for one of these server targets to be tied to a project or (if projectId is null) available across all projects in the application.

Finally, this PR does some other housekeeping:

fixes catalogueglobal#120
removes unused/legacy snapshot classes
fixes catalogueglobal#110
fixes catalogueglobal/datatools-ui#234

Notes on configuration

Instructions for setting up EC2 auto deploy:

1. Create a new application load balancer in the desired VPC.
2. Create a target group for the load balancer (arn is defined in OTP server)
3. Ensure you have an AMI with the appropriate software installed and nginx configuration (set at `deployment.ec2.default_ami`).
4. Create an OTP Server at http://localhost:9966/admin/servers with `Use Elastic Load Balancer (ELB)?` checked. Make sure to add required fields (Subnet ID, Security Group ID, IAM Role ARN, Key file name (without .pem), Target Group ARN) to the server.
6. Deploy a feed to your new server!

Deployment module in server.yml should look like:

  deployment:
    enabled: true
    ec2:
      enabled: true
      default_ami: ami-041ee0ca5cd75f7d7

fixes #120

Adds a separate controller/collection to manage OTP servers and allow for deploying to a new EC2 servers and adding them to a load balancer rather than building an OTP graph over the wire.

… a user

This change allows servers to be assigned to a specific project or left open for any project in the application.

Conflicts: pom.xml

Merge IBI dev branch into catalogueglobal dev

codecov-io · 2019-08-08T14:13:20Z

Codecov Report

Merging #225 into dev will decrease coverage by 1.1%.
The diff coverage is 3.32%.

@@             Coverage Diff              @@
##                dev     #225      +/-   ##
============================================
- Coverage     19.83%   18.72%   -1.11%     
  Complexity      361      361              
============================================
  Files           149      150       +1     
  Lines          7462     7963     +501     
  Branches        996     1076      +80     
============================================
+ Hits           1480     1491      +11     
- Misses         5850     6339     +489     
- Partials        132      133       +1

Flag	Coverage Δ	Complexity Δ
#unit_tests	`18.72% <3.32%> (-1.11%)`	`361 <6> (ø)`

Impacted Files	Coverage Δ	Complexity Δ
...atools/manager/controllers/api/UserController.java	`39.35% <ø> (ø)`	`16 <0> (ø)`	⬇️
...com/conveyal/datatools/editor/models/Snapshot.java	`0% <ø> (ø)`	`0 <0> (ø)`	⬇️
...veyal/datatools/manager/persistence/FeedStore.java	`26.35% <ø> (ø)`	`13 <0> (ø)`	⬇️
...ls/manager/jobs/NotifyUsersForSubscriptionJob.java	`0% <ø> (ø)`	`0 <0> (ø)`	⬇️
...ools/manager/controllers/api/StatusController.java	`13.04% <ø> (ø)`	`2 <0> (ø)`	⬇️
...onveyal/datatools/manager/models/Organization.java	`0% <ø> (ø)`	`0 <0> (ø)`	⬇️
...om/conveyal/datatools/manager/models/Snapshot.java	`0% <0%> (ø)`	`0 <0> (ø)`	⬇️
...m/conveyal/datatools/manager/models/OtpServer.java	`0% <0%> (ø)`	`0 <0> (ø)`	⬇️
...ols/manager/controllers/api/ProjectController.java	`12.5% <0%> (+0.19%)`	`2 <0> (ø)`	⬇️
...com/conveyal/datatools/manager/jobs/DeployJob.java	`0% <0%> (ø)`	`0 <0> (ø)`	⬇️
... and 24 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 568af07...240a6e0. Read the comment docs.

…eation

src/main/java/com/conveyal/datatools/manager/controllers/api/ServerController.java

…ummary

src/main/java/com/conveyal/datatools/manager/controllers/api/DeploymentController.java

src/main/java/com/conveyal/datatools/manager/jobs/DeployJob.java

evansiroky · 2019-09-24T05:22:08Z

src/main/java/com/conveyal/datatools/manager/jobs/DeployJob.java

+            if (!deployment.r5) {
+                lines.add(String.format("aws s3 --region us-east-1 cp %s/%s %s ", routerDir, OTP_GRAPH_FILENAME, getS3GraphURI()));
+                String s3BuildLogPath = joinToS3FolderURI(getBuildLogFilename());
+                lines.add(String.format("aws s3 --region us-east-1 cp $BUILDLOGFILE %s ", s3BuildLogPath));


It may make more sense to upload the logs first in case the graph wasn't built. If the graph wasn't built, the script might error out on a file not found error perhaps.

Will that actually happen? I suppose I can make the change regardless, but I don't think the execution of any line is dependent on the previous line, is it?

src/main/java/com/conveyal/datatools/manager/jobs/MonitorServerStatusJob.java

evansiroky · 2019-09-24T22:44:05Z

See other comments here.

evansiroky · 2019-09-24T22:54:25Z

src/main/java/com/conveyal/datatools/manager/jobs/DeployJob.java

+     * Construct the user data script (as string) that should be provided to the AMI and executed upon EC2 instance
+     * startup.
+     */
+    private String constructUserData(boolean graphAlreadyBuilt) {


I'm increasingly hesitant about the design of this method. I think this is going to limit us moving forward to be writing bash in java strings. Something I would really like to see is more detailed reporting and logging of the progress of building and loading a graph and automatically retrying if something fails that can be retried. We could attempt to do this with bash, but it would be ugly. If we're trying to ship this quick, then fine I can live with it, but I think we should be making a separate startup script specifically for this where we can do more detailed things. Furthermore, right now this is basically using Amazon s3 as a database of sorts for storing various information when we already have a mongo and postgres instance. Maybe it would make sense for the startup script to directly edit the mongo db or maybe make queries to datatools-server. Either way, we should be thinking about this feature long-term and about what the easiest way to create maintainable code will be.

Thanks for the comments. Perhaps there is a better way to handle what gets run in user data, but at some level it is going to require running some bash (unless we opt to include some kind of startup script on the AMI or write a script to a file that gets fetched by the instance). Because of this, I think our best option for now is to keep everything contained in a single place -- this bash script.

Regarding using S3 as a database, can you describe what you mean more fully? We're just storing static files that were generated as a result of the deploy process. We also store GTFS zip files in S3 as well. I'm really not too sure of what the concern there is. The alternative you're proposing as I understand it (the OTP instance connecting to the mongo instance and handing off files/messages) sounds far more complex.

That's what I'm saying. I'm recommending creating a script in some kind of programming language - be it java, python or JavaScript. Then this startup script would communicate directly with either mongo db or datatools-server via HTTP requests perhpas. As it is right now, all of this bash is in one place, but we have to construct it using strings in java. Therefore, we can't view it nicely formatted in an IDE, we can't lint or do static typing analysis, it's much harder to import 3rd party libraries and it is impossible to write isolated tests.

Regarding s3 I'm fine with most of what's going on. Most of the data makes sense to store in a raw format (logs, GTFS files, OSM pbf). The thing in particular that I'm concerned about is constantly checking for bundle download and graph build completion and who knows what else we may add in the future by checking for the existence of an s3 file. I'm not recommending that OTP connect to the mongo instance, but I am recommending that a replacement of the bash script do this and handle the checks of bundle download, graph build, and graph load. These intermediate files that are waited upon seem like the more complex option to me when the bash replacement script could just be writing to the mongo database directly or making an HTTP request to datatools-server, or maybe even using some other messaging framework.

Furthermore, using s3 as a database like this will cost more money since s3 has per-request pricing so the more we scale the more this will cost us. The cost seems lowish at $0.005 per 1k requests, but let's say new graphs are built on a nightly basis. And let's say graph building takes 1 hour for a largeish region with elevation data. So, it'll make an s3 request every 5 seconds for an hour or 3,600 requests per day and 1.3 million a year. If we have 100 regions doing this, the cost of just these s3 requests would be $657 a year compared to $0 for using a database or sending over an HTTP request to the datatools-server instance.

evansiroky · 2019-10-02T17:55:14Z

src/main/java/com/conveyal/datatools/manager/jobs/DeployJob.java

@@ -121,6 +130,16 @@ public String getDeploymentId () {
        return deployment.id;
    }

+    /** Increment the completed servers count (for use during ELB deployment) and update the job status. */
+    public void incrementCompletedServers() {


evansiroky · 2019-10-02T17:58:34Z

src/main/java/com/conveyal/datatools/manager/jobs/MonitorServerStatusJob.java

@@ -88,52 +92,70 @@ public void jobLogic() {
            }
            // First, check that OTP has started up.
            status.update("Prepping for graph build...", 20);
+            String bundleUrl = String.join("/", ipUrl, BUNDLE_DOWNLOAD_COMPLETE_FILE);
+            long bundleDownloadStartTime = System.currentTimeMillis();


I like this addition as well.

evansiroky

I'm giving my approval, but only on the understanding that this PR is a first proof-of-concept feature that will probably need a more thorough update in the future. The 2 items I still have issues with are:

Writing the startup script in bash in strings in Java code. I think that ideally, it would be good to replace this with a startup script written in a scripting language like python even JavaScript for the reasons mentioned here.
Having a more discrete way viewing and managing deployments as noted here. Depending on how clients and users manage deployments, this may or may not be a good idea. We'll see.

…raph size during deploy

landonreed · 2020-01-07T16:38:59Z

🎉 This PR is included in version 3.6.0 🎉

The release is available on GitHub release

Your semantic-release bot 📦🚀

landonreed added 18 commits August 7, 2018 16:43

WIP spin up EC2 (no user data)

ac0b00c

Merge branch 'remove-r5' into deploy-to-ec2

9c5e92c

Merge branch 'dev' into deploy-to-ec2

d34b96a

refactor(snapshot): remove legacy MapDB-based snapshot jobs

560c4c6

fix(delete): delete SQL namespace when feed version/snapshot deleted

dbbd130

fixes #120

feature(deploy-ec2): deployment enhancements for load balancers

c01eadc

Adds a separate controller/collection to manage OTP servers and allow for deploying to a new EC2 servers and adding them to a load balancer rather than building an OTP graph over the wire.

fix(user-mgmt): better error handling when Auth0 cannot update/create…

f777b02

… a user

fix: move toGtfsDate from deleted class to FeedTx

8898376

refactor: fix whitespace

fa5ce83

refactor: add missing aws pom entry

e0a1eb6

build(pom): update gtfs-lib dependency

61c04d2

feature(server-mgmt): manage deployment servers at the application level

7231a95

This change allows servers to be assigned to a specific project or left open for any project in the application.

refactor(server-job): attach just the project ID to the merge feeds job

b372303

Merge branch 'dev' into deploy-to-ec2

7c92c1d

Conflicts: pom.xml

refactor(deploy): shuffle deploy job code for clarity

aa0553b

Merge pull request #133 from ibi-group/dev

c52d6f0

Merge IBI dev branch into catalogueglobal dev

Merge branch 'dev' into deploy-to-ec2

56f7642

refactor: fix issues resulting from merge

6ae2055

landonreed mentioned this pull request Aug 8, 2019

Deploy to ec2/Manage OTP servers at the application level ibi-group/datatools-ui#463

Merged

5 tasks

landonreed added 4 commits August 8, 2019 17:13

refactor(deployment): tweak user script and update default config

c33c290

refactor(deployment): improve validation of server fields

05ec4df

refactor(deployments): modify OtpServer fields and refactor server cr…

b6d7363

…eation

refactor: remove unused import

7a020c8

landonreed requested a review from evansiroky August 9, 2019 17:54

landonreed assigned evansiroky Aug 9, 2019

evansiroky reviewed Aug 10, 2019

View reviewed changes

src/main/java/com/conveyal/datatools/manager/controllers/api/ServerController.java Outdated Show resolved Hide resolved

refactor(ServerController): add comment about checking S3 permissions

c753a31

evansiroky reviewed Aug 14, 2019

View reviewed changes

src/main/java/com/conveyal/datatools/manager/controllers/api/ServerController.java Outdated Show resolved Hide resolved

evansiroky reviewed Aug 14, 2019

View reviewed changes

src/main/java/com/conveyal/datatools/manager/controllers/api/ServerController.java Outdated Show resolved Hide resolved

landonreed added 3 commits September 20, 2019 15:26

refactor(deploy-to-ec2): add json property latest; add server ID to s…

18bd9d3

…ummary

Merge branch 'dev' into deploy-to-ec2

c28c215

refactor(deploy): fix check for S3 jar

41c21a8