Skip to content
This repository has been archived by the owner on Jan 20, 2025. It is now read-only.

Optimise Transformer PSC #262

Open
tiredpixel opened this issue Apr 30, 2024 · 0 comments
Open

Optimise Transformer PSC #262

tiredpixel opened this issue Apr 30, 2024 · 0 comments

Comments

@tiredpixel
Copy link
Contributor

tiredpixel commented Apr 30, 2024

Transformation is very slow. It takes around 2.03 s/statement (*):

I, [2024-04-30T12:57:31.688140 #1]  INFO -- : [shardId-000000000002] [49651600875111508319672393859288357183009020135870562338] /company/15694555/persons-with-significant-control/individual/7AYbJehiTTD91IcUhbXdBvumeaY
I, [2024-04-30T12:57:33.697593 #1]  INFO -- : [shardId-000000000002] [49651600875111508319672393859531351272751560599986503714] /company/12462557/persons-with-significant-control/individual/jjdgflZphXeM0pPxRkQkZs5kbHM
I, [2024-04-30T12:57:35.710237 #1]  INFO -- : [shardId-000000000002] [49651600875111508319672393860093501778872363166224875554] /company/02496500/persons-with-significant-control/individual/3QzT_0qRrYgG6E4rTdpVpMEWENQ
I, [2024-04-30T12:57:38.723755 #1]  INFO -- : [shardId-000000000002] [49651600875111508319672393860683457578844302203481489442] /company/NI696959/persons-with-significant-control/individual/qV-mSBMeZPPSXn2mGU_T_hcZjlA
I, [2024-04-30T12:57:40.731559 #1]  INFO -- : [shardId-000000000002] [49651600875111508319672393861147685093576319806568661026] /company/NI047824/persons-with-significant-control/individual/h0i1_kzZJGYCUHwlUbezCFHf9NI
I, [2024-04-30T12:57:43.743871 #1]  INFO -- : [shardId-000000000002] [49651600875111508319672393861385843480040401753985777698] /company/SC592338/persons-with-significant-control/individual/CPAtOXYPPh2bprJsArvAXScn288
I, [2024-04-30T12:57:45.757287 #1]  INFO -- : [shardId-000000000002] [49651600875111508319672393861905681582474692299109433378] /company/08491828/persons-with-significant-control/individual/2p9zF0twy2ehnKfDkJOWeWtCbBc
I, [2024-04-30T12:57:48.775943 #1]  INFO -- : [shardId-000000000002] [49651600875111508319672393862146257820578003573595439138] /company/15694559/persons-with-significant-control/individual/tKng5vVsd8XjFteN4dFDy0ej68M
I, [2024-04-30T12:57:50.793828 #1]  INFO -- : [shardId-000000000002] [49651600875111508319672393862406176871795148846157266978] /company/11325059/persons-with-significant-control/individual/VQyyTGV40yj-uBRvDEC4MGQhb_M
I, [2024-04-30T12:57:53.815453 #1]  INFO -- : [shardId-000000000002] [49651600875111508319672394004352201981846837448732049442] /company/12208333/persons-with-significant-control/individual/s_E1ELb-HXARDTizpSdoT88hz0g
I, [2024-04-30T12:57:55.836929 #1]  INFO -- : [shardId-000000000002] [49651600875111508319672394004890173971575347431476297762] /company/15694561/persons-with-significant-control/individual/AokX43_Re-u54us-uib6l800pSs
I, [2024-04-30T12:57:57.860308 #1]  INFO -- : [shardId-000000000002] [49651600875111508319672394005883910995298572613084774434] /company/15694565/persons-with-significant-control/individual/M51kbMDneeZlVlBLkKP6XjSf160
I, [2024-04-30T12:57:59.879818 #1]  INFO -- : [shardId-000000000002] [49651600875111508319672394007039644078850158172823355426] /company/15547479/persons-with-significant-control/individual/8Znvy7tV9M7VuyBRluVSs8Cbdis
I, [2024-04-30T12:58:00.744422 #1]  INFO -- : [shardId-000000000002] [49651600875111508319672394007555855403825604830422892578] /company/15512467/persons-with-significant-control/individual/q-O8qcELY7atL9GtbCzi4L2IhHQ
I, [2024-04-30T12:58:02.884764 #1]  INFO -- : [shardId-000000000002] [49651600875111508319672394007852042229631188978225905698] /company/15545523/persons-with-significant-control/individual/iXEklbEKlTUZswR1QqhBCYtcKYE
I, [2024-04-30T12:58:03.729477 #1]  INFO -- : [shardId-000000000002] [49651600875111508319672394008402103477555845252717215778] /company/15547510/persons-with-significant-control/individual/JoGuhDIatSMKhdRakaOH5P4Lmos
I, [2024-04-30T12:58:04.429314 #1]  INFO -- : [shardId-000000000002] [49651600875111508319672394009458704643899031151410413602] /company/15563627/persons-with-significant-control/individual/Yhb4Ev3VME8uaphRpnLByDgeknM
I, [2024-04-30T12:58:05.156642 #1]  INFO -- : [shardId-000000000002] [49651600875111508319672394010336384788939252000966574114] /company/15603805/persons-with-significant-control/individual/fTFm-H4_iJs4CVsEEoDEiLwVhFU
I, [2024-04-30T12:58:05.877892 #1]  INFO -- : [shardId-000000000002] [49651600875111508319672394010674884018431348169884303394] /company/15607311/persons-with-significant-control/individual/wvtX8efpM-Ehsj7vGcR5RwF-0IU
I, [2024-04-30T12:58:08.909962 #1]  INFO -- : [shardId-000000000002] [49651600875111508319672394147026000435306532951918903330] /company/14796746/persons-with-significant-control/individual/95cXqZ-14aTkGpC7jjU8MJ0xRJE
I, [2024-04-30T12:58:10.933029 #1]  INFO -- : [shardId-000000000002] [49651600875111508319672394147864994954119085599164989474] /company/15694572/persons-with-significant-control/individual/jT3EiIgt0BbS5p2Dq6G0ABOdeKI
I, [2024-04-30T12:58:12.954265 #1]  INFO -- : [shardId-000000000002] [49651600875111508319672394148082601601649718850612101154] /company/15694574/persons-with-significant-control/individual/zktlC6GFce-khAoOygiG-5d_K0I
I, [2024-04-30T12:58:14.976031 #1]  INFO -- : [shardId-000000000002] [49651600875111508319672394286094782094675404118692986914] /company/09568573/persons-with-significant-control/individual/Y5zBOJaAmDfJ35r9LUJCXB_L86k
I, [2024-04-30T12:58:16.993920 #1]  INFO -- : [shardId-000000000002] [49651600875111508319672394286435699175806729545960128546] /company/15135615/persons-with-significant-control/individual/6KP-hx2juhDTdhlodF7RM-jYLXk
I, [2024-04-30T12:58:19.011294 #1]  INFO -- : [shardId-000000000002] [49651600875111508319672394286950701574962561574384959522] /company/15694579/persons-with-significant-control/individual/JcnmYPWoI3AYo4MdpJM1PaQQXXs
D, [2024-04-30T12:58:22.441569 #1] DEBUG -- : [shardId-000000000000] LAG: 86400s | N: 0

(*): Note that these measurements are from a development environment on a small Elasticsearch instance, not a production Elasticsearch cluster.

I think this has most likely always been the case, although it's possible that things are currently running slower than they used to (I am not sure why, if so). There could be a mixture of reasons for this, including: design, extensive work within Ruby code, Elasticsearch roundtrip, Elasticsearch write-confirmations.

Given our current design and performance characteristics, this means we couldn't cope with even 1 event/s from upstream PSC datasource, and that if the transformation stream falls behind, catching up could take an exceedingly long time.

It's worth spending some time trying to see if there are any simple performance improvements which could be made, here. Anything more complex would likely be too risky or time-consuming to change, but if some small improvements could be found here or there (or perhaps some regression discovered and fixed), even a 0.5 s/event increase in speed would compound to our benefit.

Although this prioritises optimisation of the PSC data transformation pipeline only, any benefits would likely also be applicable to SK and DK pipelines.

This was already known about from bulk data imports, but work on #251 made this more obvious and easier to measure.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant