-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merge from BI Server #2
base: brapi-server-v2
Are you sure you want to change the base?
Conversation
some initial performance experiments including indexing and batching database operations
added IF NOT EXISTS to CREATE INDEX statements
…nd how performance can be improved for fetching OUs, Observations, and ObsVars
… searches Also did some code cleanup, and added a logback.xml config file
… problem. It was not needed. The ScaleVlalidValueCategories were present (even though it is set to fetch LAZY) because of the SQL like statemtents called on searchQuery
…ocker Create local containerized keycloak docker example, update README
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a lot of stuff here. I'm hoping me, Pete, and BI devs who worked on this (or whoever is interested) can have a meeting in the new year and go through all these points together so we can start bucketing the stuff we want to keep, change, etc.
I also made some comments pertaining to the code. I'm not really expecting these to change or anything at this point, just pointing them out and want to see if we can refactor or change at some point.
if(includeObservations) { | ||
log.debug("Fetching observations for OUs"); | ||
for(ObservationUnitEntity entity : page) { | ||
log.trace("Fetching observations for OU: " + entity.getId()); | ||
entity.getObservations(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are calling an accessor just to do go around the lazy loading? If so, this to me is not really the intent of what lazy loading should be used for. Lazy loading is to only fetch data when it is accessed by the app for business purposes. What this method appears to be doing though is returning a data set directly to a user, and actually in fact has very little business logic. I feel like instead of working around the lazy loading we should create a service/repo method to actually grab the data the user needs in one go. Doing it as written seems to create more DB connections than are necessary to carry out this transaction.
If in fact we expect to load this on a given record fetch (which is what this method seems to be), I'd argue the Observation
entity attached to ObservationUnit
should be eagerly loaded, not lazily loaded, depending on if there are other business use cases.
if(!page.isEmpty()) { | ||
observationUnitRepository.fetchXrefs(page, ObservationUnitEntity.class); | ||
fetchTreatments(page); | ||
fetchObsUnitLevelRelationships(page); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this not in its own separate code path? Does it really need to depend on an already expensive query to determine its execution? Or can we create some diverging codepath or endpoint where we can execute this query independently. Need to know more about the use case here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A follow up question is why this extra fetching code even needs to exist. If we configured it so the attributes in these entities are eager we wouldn't need all this extra fetching code. This just seems like a workaround for defining them lazily, and creates more transactions than necessary to grab the data required.
If it seems the scale of the data is too large for these entities per observation unit to define them eager, we can define a native query that grabs the data we actually want in one go.
searchQuery.leftJoinFetch("germplasm", "germplasm") | ||
.leftJoinFetch("*germplasm.pedigree", "pedigree") | ||
.leftJoinFetch("cross", "cross") | ||
.leftJoinFetch("position", "position") | ||
.leftJoinFetch("*position.geoCoordinates", "geoCoordinates") | ||
.leftJoinFetch("seedLot", "seedLot") | ||
.leftJoinFetch("study", "study") | ||
.leftJoinFetch("*study.experimentalDesign", "experimentalDesign") | ||
.leftJoinFetch("*study.growthFacility", "growthFacility") | ||
.leftJoinFetch("*study.lastUpdate", "lastUpdate") | ||
.leftJoinFetch("*study.location", "studyLocation") | ||
.leftJoinFetch("*study.trial", "studyTrial") | ||
.leftJoinFetch("*studyTrial.program", "studyTrialProgram") | ||
.leftJoinFetch("trial", "trial") | ||
.leftJoinFetch("*trial.program", "trialProgram") | ||
.leftJoinFetch("program", "program"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This bypasses lazy loading and makes it eager. Why, and do we really need all of these to be left joins?
if(!page.isEmpty()) { | ||
observationRepository.fetchXrefs(page, ObservationEntity.class); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this not in its own separate code path? Does it really need to depend on an already expensive query to determine its execution? Or can we create some diverging codepath or endpoint where we can execute this query independently. Need to know more about the use case here.
searchQuery.leftJoinFetch("observationVariable", "observationVariable") | ||
.leftJoinFetch("*observationVariable.crop", "varCrop") | ||
.leftJoinFetch("*observationVariable.method", "varMethod") | ||
.leftJoinFetch("*observationVariable.ontology", "varOntology") | ||
.leftJoinFetch("*observationVariable.scale", "varScale") | ||
.leftJoinFetch("*observationVariable.trait", "varTrait") | ||
.leftJoinFetch("season", "season") | ||
.leftJoinFetch("program", "program") | ||
.leftJoinFetch("trial", "trial") | ||
.leftJoinFetch("geoCoordinates", "geoCoordinates") | ||
.leftJoinFetch("observationUnit", "observationUnit") | ||
.leftJoinFetch("*observationUnit.position", "position") | ||
.leftJoinFetch("*position.geoCoordinates", "ouGeoCoordinates") | ||
.leftJoinFetch("*observationUnit.germplasm", "ouGermplasm") | ||
.leftJoinFetch("*ouGermplasm.pedigree", "pedigree") | ||
.leftJoinFetch("*observationUnit.study", "ouStudy") | ||
.leftJoinFetch("study", "study") | ||
.leftJoinFetch("*study.experimentalDesign", "experimentalDesign") | ||
.leftJoinFetch("*study.growthFacility", "growthFacility") | ||
.leftJoinFetch("*study.lastUpdate", "lastUpdate"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This bypasses lazy loading and makes it eager. Why, and do we really need all of these to be left joins?
CREATE OR REPLACE FUNCTION sync_list_related_tables_soft_deleted() | ||
RETURNS TRIGGER AS $$ | ||
BEGIN | ||
-- Update list_external_references | ||
UPDATE public.list_external_references | ||
SET soft_deleted = NEW.soft_deleted | ||
WHERE list_entity_id = NEW.id; | ||
|
||
-- Update list_item | ||
UPDATE public.list_item | ||
SET soft_deleted = NEW.soft_deleted | ||
WHERE list_id = NEW.id; | ||
|
||
RETURN NEW; | ||
END; | ||
$$ | ||
LANGUAGE plpgsql; | ||
|
||
-- Create a trigger on the list table | ||
CREATE TRIGGER sync_soft_deleted_status | ||
AFTER UPDATE OF soft_deleted ON public.list | ||
FOR EACH ROW | ||
WHEN (OLD.soft_deleted IS DISTINCT FROM NEW.soft_deleted) | ||
EXECUTE FUNCTION sync_list_related_tables_soft_deleted(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm gonna need a walkthrough on the purpose and need of this to determine if we need this in the prod server.
ALTER TABLE public.list | ||
ADD COLUMN soft_deleted BOOLEAN NOT NULL DEFAULT FALSE; | ||
|
||
ALTER TABLE public.list_external_references | ||
ADD COLUMN soft_deleted BOOLEAN NOT NULL DEFAULT FALSE; | ||
|
||
ALTER TABLE public.list_item | ||
ADD COLUMN soft_deleted BOOLEAN NOT NULL DEFAULT FALSE; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
archived
much better name/term than soft_deleted
IMO.
-- First, drop the existing foreign key constraint | ||
ALTER TABLE ONLY public.vendor_file_sample | ||
DROP CONSTRAINT IF EXISTS fke3tnyn895kve2kgixku4j7htb; | ||
|
||
ALTER TABLE ONLY public.callset | ||
DROP CONSTRAINT IF EXISTS fkhreq22htrftm3dul7nfsg1agk; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should definitely consider keeping these cascades to the main branch of the prod server.
|
||
public interface ListRepository extends BrAPIRepository<ListEntity, String>{ | ||
public Page<ListEntity> findAllBySearchAndNotDeleted(SearchQueryBuilder<ListEntity> searchQuery, Pageable pageReq); | ||
|
||
@Query("SELECT l FROM ListEntity l WHERE l.id = :id AND l.softDeleted = false") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitely think we want to keep this functionality (and similar elsewhere) in the prod server base branch.
@Query("UPDATE TrialEntity t SET t.softDeleted = :softDeleted WHERE t.id = :trialId") | ||
int updateSoftDeletedStatus(@Param("trialId") String trialId, @Param("softDeleted") boolean softDeleted); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm starting to get the feeling these soft queries could be generified. Not sure why we need to define them in every entity that needs them. I suppose maybe it's bc not every entity necessarily has this column, but I think we could employ some kind of repo inheritance for entities that need deletion, this repo could extend from BrAPIRepository
, and it contains these soft deleted methods generified for any entity that needs them.
No description provided.