The All of Us Workbench API is largely built on Spring Boot. It also uses Swagger for autogenerating REST APIs.
The primary application database is a Google Cloud SQL instance running MySQL. We modify its schema using Liquibase migrations, which live in db/changelog/.
When creating a new database migration, add an XML file to the changelog directory that describes the migration to perform. This new file should be named db.changelog-###-description-of-migration.xml
, where ### is the number of the newest migration + 1. Then, add the file to the end of the db.changelog-master.xml file.
We generally name database objects in lower_snake_case.
Liquibase migrations are applied (indirectly) via a bash script called db/run-migrations.sh. Do not run this script directly; instead, call one of the following project.rb
commands:
- dev-up (starts a local dev server)
- run-local-all-migrations (runs migrations on the
workbench
andcdr
schemas) - run-local-rw-migrations (runs migrations only on the
workbench
schema)
Data Access Objects (DAOs) are interfaces that describe what operations the application can perform on the database. Generally, we give DAOs the CamelCased name of the database table. All of the Workbench DAOs implement Spring CrudRepository
, which means they can do simple operations like find-by-primary-key, save, delete-by-primary-key, etc.
CrudRepository also allows queries to be automatically derived from the name of a method defined on the interface. As such, you will see methods like findByUserIdAndWorkspaceId
with no apparent implementation. The implementation is generated by Spring.
To define a query that uses a foreign key, there are a couple of options. First, you could use @JoinColumn
/@[One|Many]ToOne
annotation on the corresponding DB Model, as described below. Second, you could use an @Query annotation to just describe the query in stringy SQL.
DB Models represent a single row of a single table in the database, possibly with extender tables attached. We denote DB models by prefixing Db
to the name of the database table. DB models heavily use jakarta.persistence
annotations to describe where in the database table each part of the model lives.
In order to automatically gather data from extender tables, we use the @JoinColumn
, @OneToOne
, @OneToMany
, @ManyToOne
, and @MappedBy
annotations. Documentation and blogs on the usage of these annotations can be found in these links:
- https://www.baeldung.com/jpa-join-column
- https://www.baeldung.com/hibernate-one-to-many
- https://www.baeldung.com/jpa-joincolumn-vs-mappedby
- https://docs.jboss.org/hibernate/jpa/2.1/api/javax/persistence/JoinColumn.html (contains See Also links to the rest of these annotations)
Examples of the use of these annotations can be found in the following places:
Much of the structure of DB models can be autogenerated in IntelliJ by defining the class members and then doing Generate -> Getter and Setter and Generate -> equals()
and hashCode()
. This will not automatically add the required annotations - those must be added manually.
Services are interfaces that declare methods of business logic. ServiceImpls are classes implementing their correspondingly named Service and the business logic methods declared therein.
Practically, this allows one ServiceImpl to implement multiple interfaces. This is used in several UserServiceImpl
.
Services / ServiceImpls are generally named after a feature or functional area, like 'UserService', 'CohortService', or 'MonitoringService'.
ServiceImpls can call into other services, but we try to keep the number of dependencies down. We try to avoid having ServiceImpls call into a large number of DAOs, especially DAOs not directly related to the functional area for which the ServiceImpl is responsible. Ideally, each DAO would roll up to exactly one Service.
As a positive example, the UserServiceImpl
calls into the UserDao
, the UserTermsOfServiceDao
, and the VerifiedInstitutionalAffiliationDao
, all three of which are directly related to the UserService
. As an example of something to avoid, the WorkspaceServiceImpl
(and a number of Controllers!) talk to the UserDao
without going through the UserService
.
ServiceImpls should handle only application logic. They should not need explicit knowledge of the database, the REST API, or the UI.
Controllers take Request objects from the API, call into the Service layer, and return Response objects back to the API. Controllers implement autogenerated APIControllers. All Controllers must be located in the org.pmiops.workbench.api
package.
Controller methods should generally be small and should not contain any actual logic. The most common conditional in the Controller layer should be checking a feature flag. Most of the work done by Controllers should be unpacking data from a Request object or packaging data into a Response object.
There are four types of models in the system - API models, DB models, and API Request and Response models. Mappers are utility interfaces that own methods that convert between these types of models. We use MapStruct annotations to describe these mappings and autogenerate the implementation of the Mappers.
The actual REST API is defined in a series of YAML files according to the Swagger OpenAPI spec. swagger-codegen takes these YAML files, merges them, and generates all the classes defined by the YAMLs. The command to do so is ./project.rb compile-generated-java
.
One non-Swagger-standard thing about these YAML files is that the tag
param is used to indicate which Controller / APIController an endpoint should be associated with.
Each REST endpoint is defined under PATHS
in the YAML file.
API models are defined under DEFINITIONS
in the YAML file. Swagger auto generates classes from these. Both API models and DB models are currently used in the Controller and Service layer. While it is agreed that only one type of model should be used throughout the Service layer, there is no consensus on which type of model that should be, nor on how highly conforming to such a standard should be prioritized.
Swagger interprets the data sent to a given endpoint according to the Request model definition associated with that endpoint and throws an error if the data is improperly formatted. Likewise, Swagger validates that the data being returned from a given endpoint conforms to the Response model definition associated with that endpoint and throws an error if it does not.