Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge from BI Server #2

Open
wants to merge 66 commits into
base: brapi-server-v2
Choose a base branch
from
Open

Merge from BI Server #2

wants to merge 66 commits into from

Conversation

jloux-brapi
Copy link

No description provided.

mlm483 and others added 30 commits December 17, 2024 16:27
some initial performance experiments including indexing and batching database operations
added IF NOT EXISTS to CREATE INDEX statements
…nd how performance can be improved for fetching OUs, Observations, and ObsVars
… searches

Also did some code cleanup, and added a logback.xml config file
… problem. It was not needed. The ScaleVlalidValueCategories were present (even though it is set to fetch LAZY) because of the SQL like statemtents called on searchQuery
@jloux-brapi jloux-brapi mentioned this pull request Dec 17, 2024
…ocker

Create local containerized keycloak docker example, update README
Copy link
Author

@jloux-brapi jloux-brapi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a lot of stuff here. I'm hoping me, Pete, and BI devs who worked on this (or whoever is interested) can have a meeting in the new year and go through all these points together so we can start bucketing the stuff we want to keep, change, etc.

I also made some comments pertaining to the code. I'm not really expecting these to change or anything at this point, just pointing them out and want to see if we can refactor or change at some point.

Comment on lines +177 to +181
if(includeObservations) {
log.debug("Fetching observations for OUs");
for(ObservationUnitEntity entity : page) {
log.trace("Fetching observations for OU: " + entity.getId());
entity.getObservations();
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are calling an accessor just to do go around the lazy loading? If so, this to me is not really the intent of what lazy loading should be used for. Lazy loading is to only fetch data when it is accessed by the app for business purposes. What this method appears to be doing though is returning a data set directly to a user, and actually in fact has very little business logic. I feel like instead of working around the lazy loading we should create a service/repo method to actually grab the data the user needs in one go. Doing it as written seems to create more DB connections than are necessary to carry out this transaction.

If in fact we expect to load this on a given record fetch (which is what this method seems to be), I'd argue the Observation entity attached to ObservationUnit should be eagerly loaded, not lazily loaded, depending on if there are other business use cases.

Comment on lines +267 to +271
if(!page.isEmpty()) {
observationUnitRepository.fetchXrefs(page, ObservationUnitEntity.class);
fetchTreatments(page);
fetchObsUnitLevelRelationships(page);
}
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this not in its own separate code path? Does it really need to depend on an already expensive query to determine its execution? Or can we create some diverging codepath or endpoint where we can execute this query independently. Need to know more about the use case here.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A follow up question is why this extra fetching code even needs to exist. If we configured it so the attributes in these entities are eager we wouldn't need all this extra fetching code. This just seems like a workaround for defining them lazily, and creates more transactions than necessary to grab the data required.

If it seems the scale of the data is too large for these entities per observation unit to define them eager, we can define a native query that grabs the data we actually want in one go.

Comment on lines +198 to +213
searchQuery.leftJoinFetch("germplasm", "germplasm")
.leftJoinFetch("*germplasm.pedigree", "pedigree")
.leftJoinFetch("cross", "cross")
.leftJoinFetch("position", "position")
.leftJoinFetch("*position.geoCoordinates", "geoCoordinates")
.leftJoinFetch("seedLot", "seedLot")
.leftJoinFetch("study", "study")
.leftJoinFetch("*study.experimentalDesign", "experimentalDesign")
.leftJoinFetch("*study.growthFacility", "growthFacility")
.leftJoinFetch("*study.lastUpdate", "lastUpdate")
.leftJoinFetch("*study.location", "studyLocation")
.leftJoinFetch("*study.trial", "studyTrial")
.leftJoinFetch("*studyTrial.program", "studyTrialProgram")
.leftJoinFetch("trial", "trial")
.leftJoinFetch("*trial.program", "trialProgram")
.leftJoinFetch("program", "program");
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This bypasses lazy loading and makes it eager. Why, and do we really need all of these to be left joins?

Comment on lines +248 to +250
if(!page.isEmpty()) {
observationRepository.fetchXrefs(page, ObservationEntity.class);
}
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this not in its own separate code path? Does it really need to depend on an already expensive query to determine its execution? Or can we create some diverging codepath or endpoint where we can execute this query independently. Need to know more about the use case here.

Comment on lines +177 to +196
searchQuery.leftJoinFetch("observationVariable", "observationVariable")
.leftJoinFetch("*observationVariable.crop", "varCrop")
.leftJoinFetch("*observationVariable.method", "varMethod")
.leftJoinFetch("*observationVariable.ontology", "varOntology")
.leftJoinFetch("*observationVariable.scale", "varScale")
.leftJoinFetch("*observationVariable.trait", "varTrait")
.leftJoinFetch("season", "season")
.leftJoinFetch("program", "program")
.leftJoinFetch("trial", "trial")
.leftJoinFetch("geoCoordinates", "geoCoordinates")
.leftJoinFetch("observationUnit", "observationUnit")
.leftJoinFetch("*observationUnit.position", "position")
.leftJoinFetch("*position.geoCoordinates", "ouGeoCoordinates")
.leftJoinFetch("*observationUnit.germplasm", "ouGermplasm")
.leftJoinFetch("*ouGermplasm.pedigree", "pedigree")
.leftJoinFetch("*observationUnit.study", "ouStudy")
.leftJoinFetch("study", "study")
.leftJoinFetch("*study.experimentalDesign", "experimentalDesign")
.leftJoinFetch("*study.growthFacility", "growthFacility")
.leftJoinFetch("*study.lastUpdate", "lastUpdate");
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This bypasses lazy loading and makes it eager. Why, and do we really need all of these to be left joins?

Comment on lines +30 to +53
CREATE OR REPLACE FUNCTION sync_list_related_tables_soft_deleted()
RETURNS TRIGGER AS $$
BEGIN
-- Update list_external_references
UPDATE public.list_external_references
SET soft_deleted = NEW.soft_deleted
WHERE list_entity_id = NEW.id;

-- Update list_item
UPDATE public.list_item
SET soft_deleted = NEW.soft_deleted
WHERE list_id = NEW.id;

RETURN NEW;
END;
$$
LANGUAGE plpgsql;

-- Create a trigger on the list table
CREATE TRIGGER sync_soft_deleted_status
AFTER UPDATE OF soft_deleted ON public.list
FOR EACH ROW
WHEN (OLD.soft_deleted IS DISTINCT FROM NEW.soft_deleted)
EXECUTE FUNCTION sync_list_related_tables_soft_deleted();
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm gonna need a walkthrough on the purpose and need of this to determine if we need this in the prod server.

Comment on lines +20 to +27
ALTER TABLE public.list
ADD COLUMN soft_deleted BOOLEAN NOT NULL DEFAULT FALSE;

ALTER TABLE public.list_external_references
ADD COLUMN soft_deleted BOOLEAN NOT NULL DEFAULT FALSE;

ALTER TABLE public.list_item
ADD COLUMN soft_deleted BOOLEAN NOT NULL DEFAULT FALSE;
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

archived much better name/term than soft_deleted IMO.

Comment on lines +3 to +8
-- First, drop the existing foreign key constraint
ALTER TABLE ONLY public.vendor_file_sample
DROP CONSTRAINT IF EXISTS fke3tnyn895kve2kgixku4j7htb;

ALTER TABLE ONLY public.callset
DROP CONSTRAINT IF EXISTS fkhreq22htrftm3dul7nfsg1agk;
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should definitely consider keeping these cascades to the main branch of the prod server.


public interface ListRepository extends BrAPIRepository<ListEntity, String>{
public Page<ListEntity> findAllBySearchAndNotDeleted(SearchQueryBuilder<ListEntity> searchQuery, Pageable pageReq);

@Query("SELECT l FROM ListEntity l WHERE l.id = :id AND l.softDeleted = false")
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely think we want to keep this functionality (and similar elsewhere) in the prod server base branch.

Comment on lines +30 to +31
@Query("UPDATE TrialEntity t SET t.softDeleted = :softDeleted WHERE t.id = :trialId")
int updateSoftDeletedStatus(@Param("trialId") String trialId, @Param("softDeleted") boolean softDeleted);
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm starting to get the feeling these soft queries could be generified. Not sure why we need to define them in every entity that needs them. I suppose maybe it's bc not every entity necessarily has this column, but I think we could employ some kind of repo inheritance for entities that need deletion, this repo could extend from BrAPIRepository, and it contains these soft deleted methods generified for any entity that needs them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants