Merge from BI Server #2

jloux-brapi · 2024-12-17T21:58:27Z

No description provided.

some initial performance experiments including indexing and batching database operations

added IF NOT EXISTS to CREATE INDEX statements

…nd how performance can be improved for fetching OUs, Observations, and ObsVars

… searches Also did some code cleanup, and added a logback.xml config file

fixed NPE

… problem. It was not needed. The ScaleVlalidValueCategories were present (even though it is set to fetch LAZY) because of the SQL like statemtents called on searchQuery

…leves

…rvice.java

…amples

…ocker Create local containerized keycloak docker example, update README

jloux-brapi

There's a lot of stuff here. I'm hoping me, Pete, and BI devs who worked on this (or whoever is interested) can have a meeting in the new year and go through all these points together so we can start bucketing the stuff we want to keep, change, etc.

I also made some comments pertaining to the code. I'm not really expecting these to change or anything at this point, just pointing them out and want to see if we can refactor or change at some point.

jloux-brapi · 2024-12-18T19:13:59Z

src/main/java/org/brapi/test/BrAPITestServer/service/pheno/ObservationUnitService.java

+		if(includeObservations) {
+			log.debug("Fetching observations for OUs");
+			for(ObservationUnitEntity entity : page) {
+				log.trace("Fetching observations for OU: " + entity.getId());
+				entity.getObservations();


We are calling an accessor just to do go around the lazy loading? If so, this to me is not really the intent of what lazy loading should be used for. Lazy loading is to only fetch data when it is accessed by the app for business purposes. What this method appears to be doing though is returning a data set directly to a user, and actually in fact has very little business logic. I feel like instead of working around the lazy loading we should create a service/repo method to actually grab the data the user needs in one go. Doing it as written seems to create more DB connections than are necessary to carry out this transaction.

If in fact we expect to load this on a given record fetch (which is what this method seems to be), I'd argue the Observation entity attached to ObservationUnit should be eagerly loaded, not lazily loaded, depending on if there are other business use cases.

jloux-brapi · 2024-12-18T19:29:36Z

src/main/java/org/brapi/test/BrAPITestServer/service/pheno/ObservationUnitService.java

+		if(!page.isEmpty()) {
+			observationUnitRepository.fetchXrefs(page, ObservationUnitEntity.class);
+			fetchTreatments(page);
+			fetchObsUnitLevelRelationships(page);
+		}


Why is this not in its own separate code path? Does it really need to depend on an already expensive query to determine its execution? Or can we create some diverging codepath or endpoint where we can execute this query independently. Need to know more about the use case here.

A follow up question is why this extra fetching code even needs to exist. If we configured it so the attributes in these entities are eager we wouldn't need all this extra fetching code. This just seems like a workaround for defining them lazily, and creates more transactions than necessary to grab the data required.

If it seems the scale of the data is too large for these entities per observation unit to define them eager, we can define a native query that grabs the data we actually want in one go.

jloux-brapi · 2024-12-18T19:30:35Z

src/main/java/org/brapi/test/BrAPITestServer/service/pheno/ObservationUnitService.java

+		searchQuery.leftJoinFetch("germplasm", "germplasm")
+				   .leftJoinFetch("*germplasm.pedigree", "pedigree")
+				   .leftJoinFetch("cross", "cross")
+				   .leftJoinFetch("position", "position")
+				   .leftJoinFetch("*position.geoCoordinates", "geoCoordinates")
+				   .leftJoinFetch("seedLot", "seedLot")
+				   .leftJoinFetch("study", "study")
+				   .leftJoinFetch("*study.experimentalDesign", "experimentalDesign")
+				   .leftJoinFetch("*study.growthFacility", "growthFacility")
+				   .leftJoinFetch("*study.lastUpdate", "lastUpdate")
+				   .leftJoinFetch("*study.location", "studyLocation")
+				   .leftJoinFetch("*study.trial", "studyTrial")
+				   .leftJoinFetch("*studyTrial.program", "studyTrialProgram")
+				   .leftJoinFetch("trial", "trial")
+				   .leftJoinFetch("*trial.program", "trialProgram")
+				   .leftJoinFetch("program", "program");


This bypasses lazy loading and makes it eager. Why, and do we really need all of these to be left joins?

jloux-brapi · 2024-12-18T19:31:32Z

src/main/java/org/brapi/test/BrAPITestServer/service/pheno/ObservationService.java

+		if(!page.isEmpty()) {
+			observationRepository.fetchXrefs(page, ObservationEntity.class);
+		}


Why is this not in its own separate code path? Does it really need to depend on an already expensive query to determine its execution? Or can we create some diverging codepath or endpoint where we can execute this query independently. Need to know more about the use case here.

jloux-brapi · 2024-12-18T19:32:16Z

src/main/java/org/brapi/test/BrAPITestServer/service/pheno/ObservationService.java

+		searchQuery.leftJoinFetch("observationVariable", "observationVariable")
+				.leftJoinFetch("*observationVariable.crop", "varCrop")
+				.leftJoinFetch("*observationVariable.method", "varMethod")
+				.leftJoinFetch("*observationVariable.ontology", "varOntology")
+				.leftJoinFetch("*observationVariable.scale", "varScale")
+				.leftJoinFetch("*observationVariable.trait", "varTrait")
+				.leftJoinFetch("season", "season")
+				.leftJoinFetch("program", "program")
+				.leftJoinFetch("trial", "trial")
+				.leftJoinFetch("geoCoordinates", "geoCoordinates")
+				.leftJoinFetch("observationUnit", "observationUnit")
+				.leftJoinFetch("*observationUnit.position", "position")
+				.leftJoinFetch("*position.geoCoordinates", "ouGeoCoordinates")
+				.leftJoinFetch("*observationUnit.germplasm", "ouGermplasm")
+				.leftJoinFetch("*ouGermplasm.pedigree", "pedigree")
+				.leftJoinFetch("*observationUnit.study", "ouStudy")
+				.leftJoinFetch("study", "study")
+				.leftJoinFetch("*study.experimentalDesign", "experimentalDesign")
+				.leftJoinFetch("*study.growthFacility", "growthFacility")
+				.leftJoinFetch("*study.lastUpdate", "lastUpdate");


This bypasses lazy loading and makes it eager. Why, and do we really need all of these to be left joins?

jloux-brapi · 2024-12-20T20:37:47Z

src/main/resources/db/migration/V002.004__add_soft_deleted_column_to_list.sql

+CREATE OR REPLACE FUNCTION sync_list_related_tables_soft_deleted()
+RETURNS TRIGGER AS $$
+BEGIN
+    -- Update list_external_references
+UPDATE public.list_external_references
+SET soft_deleted = NEW.soft_deleted
+WHERE list_entity_id = NEW.id;
+
+-- Update list_item
+UPDATE public.list_item
+SET soft_deleted = NEW.soft_deleted
+WHERE list_id = NEW.id;
+
+RETURN NEW;
+END;
+$$
+LANGUAGE plpgsql;
+
+-- Create a trigger on the list table
+CREATE TRIGGER sync_soft_deleted_status
+    AFTER UPDATE OF soft_deleted ON public.list
+    FOR EACH ROW
+    WHEN (OLD.soft_deleted IS DISTINCT FROM NEW.soft_deleted)
+EXECUTE FUNCTION sync_list_related_tables_soft_deleted();


I'm gonna need a walkthrough on the purpose and need of this to determine if we need this in the prod server.

jloux-brapi · 2024-12-20T20:39:15Z

src/main/resources/db/migration/V002.004__add_soft_deleted_column_to_list.sql

+ALTER TABLE public.list
+    ADD COLUMN soft_deleted BOOLEAN NOT NULL DEFAULT FALSE;
+
+ALTER TABLE public.list_external_references
+    ADD COLUMN soft_deleted BOOLEAN NOT NULL DEFAULT FALSE;
+
+ALTER TABLE public.list_item
+    ADD COLUMN soft_deleted BOOLEAN NOT NULL DEFAULT FALSE;


archived much better name/term than soft_deleted IMO.

jloux-brapi · 2024-12-20T20:44:31Z

src/main/resources/db/migration/V002.005__cascade_delete_sample.sql

+-- First, drop the existing foreign key constraint
+ALTER TABLE ONLY public.vendor_file_sample
+DROP CONSTRAINT IF EXISTS fke3tnyn895kve2kgixku4j7htb;
+
+ALTER TABLE ONLY public.callset
+DROP CONSTRAINT IF EXISTS fkhreq22htrftm3dul7nfsg1agk;


We should definitely consider keeping these cascades to the main branch of the prod server.

jloux-brapi · 2024-12-20T20:47:15Z

src/main/java/org/brapi/test/BrAPITestServer/repository/core/ListRepository.java


 public interface ListRepository extends BrAPIRepository<ListEntity, String>{
+    public Page<ListEntity> findAllBySearchAndNotDeleted(SearchQueryBuilder<ListEntity> searchQuery, Pageable pageReq);
+
+    @Query("SELECT l FROM ListEntity l WHERE l.id = :id AND l.softDeleted = false")


Definitely think we want to keep this functionality (and similar elsewhere) in the prod server base branch.

jloux-brapi · 2024-12-20T21:01:36Z

src/main/java/org/brapi/test/BrAPITestServer/repository/core/TrialRepository.java

+	@Query("UPDATE TrialEntity t SET t.softDeleted = :softDeleted WHERE t.id = :trialId")
+	int updateSoftDeletedStatus(@Param("trialId") String trialId, @Param("softDeleted") boolean softDeleted);


I'm starting to get the feeling these soft queries could be generified. Not sure why we need to define them in every entity that needs them. I suppose maybe it's bc not every entity necessarily has this column, but I think we could employ some kind of repo inheritance for entities that need deletion, this repo could extend from BrAPIRepository, and it contains these soft deleted methods generified for any entity that needs them.

mlm483 and others added 30 commits December 17, 2024 16:27

[BI-1771] - performance experiments

aa958f9

some initial performance experiments including indexing and batching database operations

[BI-1771] - added germplasm-related indexes

d370995

[BI-1771] - reverted index annotation

853c4b6

[BI-1771] - reverted whitespace changes

1bafb97

[BI-1771] - made create_indexes.sql idempotent

511744a

added IF NOT EXISTS to CREATE INDEX statements

[BI-1909] Exploring converting *ToOne relationships to lazy loading a…

17f208b

…nd how performance can be improved for fetching OUs, Observations, and ObsVars

[BI-1909] NPE fix, converting sysouts to use Logger

65f41bb

[BI-1909] Updating GermplasmService to eagerly fetch related date for…

f213636

… searches Also did some code cleanup, and added a logback.xml config file

[BI-1909] suggested changes

444ac44

Create pull.yml

7ae834a

[BI-1945] - saving work in progress

f3963b0

[BI-1945] - store additionalInfo on primary entities as JSONB

14347ea

[BI-1945] - optimized imports

0a6c1c6

[BI-1945] - updated template, readme

dfe5e82

[BI-1945] - cleaned up SQL dummy data migrations

32a1c91

[BI-1945] - updated template

c31973b

[BI-1945] - removed unused class

3d73553

[BI-1945] - renamed method

213d51b

[BI-1945] - handled null case

ea38550

[BI-1945] - added stringtype=unspecified to template

14d780c

[BI-1945] - removed debug log

027eff8

[BI-1945] - optimized imports

2e57478

[BI-1945] - removed additionalInfo joins

246e6a1

fixed NPE

[BI-1945] - updated README.md

44c6a6a

[BI-1945] - added ON DELETE CASCADE to xref linking tables

4d25468

[BI-2051] - batched inserts for efficiency

530ae23

[BI-2051] - removed comment

1777923

[BI-2040] The fetchScaleValidValueCategories() method was causing the…

9bb3096

… problem. It was not needed. The ScaleVlalidValueCategories were present (even though it is set to fetch LAZY) because of the SQL like statemtents called on searchQuery

[BI-2040] removed all calls to fetchXXXXX(page) and the methods thems…

dc3aa96

…leves

[BI-2040] removed unused import-statements from ObservationVariableSe…

f3c8871

…rvice.java

mlm483 and others added 27 commits December 17, 2024 16:47

[BI-2304] - used zero-indexing in migration

1409352

[BI-2304] - changes based on review

e2548a7

create batch entity and controller endpoints

8281201

json deserialize searchrequest sub-types

92aa47f

return batchDbID in POST batches response results

7b6b61a

refactor

e74184b

add constraint to cascade on delete for list_item

5c00c3d

add soft_deleted column to list tables

beb95c4

modify list repository to include clause softDelete=falsee

9ebbfb1

use where clause and transactional annotations

ed8ee46

soft-delete batch lists

0c38c55

add hard delete of single trial

0191666

create hard-delete endpoint for single sample

0547cbc

cascade delete for trial and sample

d659040

add soft_delete columns to trial and sample related tables

0d4ac86

create sample service soft-delete method

fc10679

add soft-delete method to trial service

aa8b86f

create Trial and Sample components

a5d8656

optimize imports

b96e1c9

respond with 204 for successful delete

f2036bf

return entity dbids in response for POST deleteBatch

cbdd0f3

add batchDeleteDbId query param to GET endpoints for lists, trials, s…

493f88f

…amples

create new exception class for wrong batch delete type

6fa2c21

delete batch entity when deleting batch contents

71cdbb1

add cascade delete constraints for study foreign keys

bd2b039

rename batch to batch delete

0efca81

Solve dummy data migration issues

b57a17d

jloux-brapi mentioned this pull request Dec 17, 2024

Merge from BI server #1

Closed

Merge pull request plantbreeding#75 from plantbreeding/keycloak-dev-d…

b7ecfda

…ocker Create local containerized keycloak docker example, update README

jloux-brapi commented Dec 20, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Merge from BI Server #2

Merge from BI Server #2

jloux-brapi commented Dec 17, 2024

jloux-brapi left a comment

jloux-brapi Dec 18, 2024

jloux-brapi Dec 18, 2024

jloux-brapi Dec 18, 2024

jloux-brapi Dec 18, 2024

jloux-brapi Dec 18, 2024

jloux-brapi Dec 18, 2024

jloux-brapi Dec 20, 2024

jloux-brapi Dec 20, 2024

jloux-brapi Dec 20, 2024

jloux-brapi Dec 20, 2024

jloux-brapi Dec 20, 2024

		@Query("UPDATE TrialEntity t SET t.softDeleted = :softDeleted WHERE t.id = :trialId")
		int updateSoftDeletedStatus(@Param("trialId") String trialId, @Param("softDeleted") boolean softDeleted);

Merge from BI Server #2

Are you sure you want to change the base?

Merge from BI Server #2

Conversation

jloux-brapi commented Dec 17, 2024

jloux-brapi left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment