Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#191: add main logic for EidGeneratorStrategy #192

Open
wants to merge 12 commits into
base: master
Choose a base branch
from

Conversation

Chuburashka
Copy link

@Chuburashka Chuburashka commented Jun 5, 2024

This is an approach for solving #191.

  • add a database column eid.
  • this gets by default filled with the same value as the id column (converted to UUID just as the current Java code is doing it)
  • An EidGeneratorStrategy can be injected to generate this eid value in a different way (e.g. randomly).
  • When sending events, the eid value is used for the events.

@Chuburashka Chuburashka requested review from fbrns and ePaul as code owners June 5, 2024 00:26
@ePaul ePaul changed the title add main logic for EidGeneratorStrategy #191: add main logic for EidGeneratorStrategy Jun 5, 2024
@ePaul ePaul self-assigned this Jun 5, 2024
@ePaul ePaul added enhancement auto-configuration everything about the auto-configuration features nakadi-submission persistence everything around DB access labels Jun 5, 2024

public interface EidGeneratorStrategy {

UUID generateEid();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder whether we want to be more flexible here to allow for future extensions?
Like generating the UUID as a hash of the content?
Or a different one depending on the event type?

So my suggestion would be to have this method take some parameter(s) – either the EventLog object (though with the current approach this will be tricky, as it's only created afterwards) or the values that go into it (i.e. eventType, payload, compactionKey).

The two implementations we provide can just ignore these parameters, but if someone needs something specialized, we don't have to do another incompatible change.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about that, I would prefer to use payload object for generation eid, but because we have just Object value I decided do nothing with that.

Added in last commit:

  1. Moved generateEid call to the end createEventLog method
  2. Added EventLog as parameter.

Personally I not so much like this solution, because it a little bit tricky and depends on what time we initialise eid field.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On a deeper thought, this breaks the package layering a bit. The *.impl packages (EventLog is part of it) should not be exposed in the user interface of the library (i.e. the EidGeneratorStrategy).

So that only leaves the second option of "take all the values which go into the EventLog", which makes the signature a bit unwieldly.

I think now that it should be possible to retrofit this also later – we can for now have just the no-argument abstract generateEid method in the interface, and later add a default method with more parameters, which will then be called by the library. The few applications which need a data-specific eid can overwrite this method, applications which have some other non-data-specific need can implement the abstract method, and the big majority of applications can just use one of the pre-made implementations via the factory methods (or even auto-configuration).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I removed EventLog.
I agree that we can add additional method later.

For some extra cases I extracted EventLogBuilder interface and clients can implement this and build EventLog as they needed

Comment on lines 13 to 19
static EidGeneratorStrategy noop() {
return (EventLog eventLog) -> null;
}

static EidGeneratorStrategy random() {
return (EventLog eventLog) -> UUID.randomUUID();
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add a bit of Javadocs to these methods:

Suggested change
static EidGeneratorStrategy noop() {
return (EventLog eventLog) -> null;
}
static EidGeneratorStrategy random() {
return (EventLog eventLog) -> UUID.randomUUID();
}
/**
* A built-in strategy which does not generate an eid (which means the library will fall back
* to converting the sequential DB ID into an UUID).
* (This is the default strategy, and equivalent to the behavior before this interface was introduced.)
*/
static EidGeneratorStrategy noop() {
return (EventLog eventLog) -> null;
}
/**
* A built-in strategy which will assign a random (type 4) UUID, ignoring the data.
* You should only use this if your consumers don't depend on the eid for ordering
* of events.
*/
static EidGeneratorStrategy random() {
return (EventLog eventLog) -> UUID.randomUUID();
}

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added

Comment on lines 109 to 110
" (:eventType, :eventBodyData, :flowId, :created, :lastModified, :lockedBy, :lockedUntil, :compactionKey, " +
" COALESCE(:eid, CAST(LPAD(TO_HEX(CAST(CURRVAL('nakadi_events.event_log_id_seq') as BIGINT)), 32, '0') AS UUID)))",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we keep the "convert on reading" fallback, then we can just put NULL here when using the noop strategy, right?
No need to do the complicated conversion on insert.

Suggested change
" (:eventType, :eventBodyData, :flowId, :created, :lastModified, :lockedBy, :lockedUntil, :compactionKey, " +
" COALESCE(:eid, CAST(LPAD(TO_HEX(CAST(CURRVAL('nakadi_events.event_log_id_seq') as BIGINT)), 32, '0') AS UUID)))",
" (:eventType, :eventBodyData, :flowId, :created, :lastModified, :lockedBy, :lockedUntil, :compactionKey, " +
" :eid)",

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about that, but decided that if we provide a new column it would be better to provide a value too.
For me it's better to check data in the DB than to try to understand what the value will be send

But as you wish, in general I agree to remove this complex function.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You didn't change this, I think?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I left the function for now.
If you still think that better to remove it, I'll do it


import org.zalando.nakadiproducer.eventlog.impl.EventLog;

public interface EventLogBuilder {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should go into the impl package, as this is depending on the impl class EventLog, and should be something which normal library users don't need to be exposed to.
(Leaving it public is okay, maybe for corner cases.)

EidGeneratorStrategy eidGeneratorStrategy) {
return new EventLogMapper(objectMapper, flowIdComponent, eidGeneratorStrategy);
@ConditionalOnMissingBean
public EventLogBuilder eventLogMapper(ObjectMapper objectMapper, FlowIdComponent flowIdComponent,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
public EventLogBuilder eventLogMapper(ObjectMapper objectMapper, FlowIdComponent flowIdComponent,
public EventLogBuilder eventLogBuilder(ObjectMapper objectMapper, FlowIdComponent flowIdComponent,

README.md Outdated
@@ -313,6 +313,15 @@ public SnapshotEventGenerator snapshotEventGenerator(MyService service) {
}
```

### EID generation strategy (optional)
EID - is unique identifier for nakadi events. We must provide it for each event. By default, the library generates EID based on id of the event in the database.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
EID - is unique identifier for nakadi events. We must provide it for each event. By default, the library generates EID based on id of the event in the database.
The `eid` is a unique identifier for Nakadi events, which is required by Nakadi for each event on submission.
By default, the library generates eid based on a database sequence (which is also used internally to identify
the event log entries before sending them out).

README.md Show resolved Hide resolved
Comment on lines 53 to 57
if (eid == null) {
return new UUID(0, id);
}

return eid;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For performance reasons, we likely want to only do the conversion once.

Suggested change
if (eid == null) {
return new UUID(0, id);
}
return eid;
if (eid == null) {
eid = new UUID(0, id);
}
return eid;

Currently we do it once before submitting, and once again after checking the failure results from Nakadi.
(It's been like that in the old code, but if we can improve this now, even better.)

This makes also the lookup a tiny bit faster, as it'll now be an == comparison before comparing the content.

Comment on lines 47 to 51
/**
* This is only needed for backward compatibility.
*
* <p>For instance 213 will be converted to "00000000-0000-0000-0000-0000000000d5"</p>
*/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/**
* This is only needed for backward compatibility.
*
* <p>For instance 213 will be converted to "00000000-0000-0000-0000-0000000000d5"</p>
*/
/**
* Returns the eid to be used for submitting the event. If none was stored, we'll convert it from the DB-ID.
*
* <p>For instance 213 will be converted to "00000000-0000-0000-0000-0000000000d5"</p>
*/

Comment on lines 53 to 59
public UUID getEid() {
if (eid == null) {
eid = new UUID(0, id);
}

return eid;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I gave you a bad suggestion here, this now causes NPEs when trying to store the EventLog in the database.

Suggested change
public UUID getEid() {
if (eid == null) {
eid = new UUID(0, id);
}
return eid;
}
public UUID getEid() {
if (eid == null && id != null) {
eid = new UUID(0, id);
}
return eid;
}

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, sorry.
I ran integration-test task instead of install.

And integration-test executes succesfully and shown BUILD SUCCESS (although in the logs we can find NPE messages)

@@ -77,8 +82,7 @@ public void testDeleteMultipleEvents() {

List<EventLog> remaining = findAllEventsInDB();
assertThat(remaining, hasSize(1));
assertThat(remaining.get(0).getId(), is(notDeleted.getId()));
assertThat(remaining.get(0).getFlowId(), is(notDeleted.getFlowId()));
assertThat(remaining.get(0), equalTo(notDeleted));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this line worked when you wrote it, but now it fails, as EventLog doesn't have the generated equals method anymore, so it just does the == comparison.

I'd just revert this line of the change.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed equalTo to Matchers.samePropertyValuesAs

Comment on lines 107 to 113
public void testInsertEventWithDefaultEid() {
persistTestEvent("FLOW_ID");
EventLog actual = findAllEventsInDB().get(0);

EventLog expected = buildEventLog("FLOW_ID", 1, buildEid(1));
assertEvent(actual, expected);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've expanded the test to make sure it's also working when inserting multiple events:

Suggested change
public void testInsertEventWithDefaultEid() {
persistTestEvent("FLOW_ID");
EventLog actual = findAllEventsInDB().get(0);
EventLog expected = buildEventLog("FLOW_ID", 1, buildEid(1));
assertEvent(actual, expected);
}
public void testInsertSingleEventsWithDefaultEid() {
persistTestEvent("FLOW_ID_1");
persistTestEvent("FLOW_ID_2");
List<EventLog> eventLogs = findAllEventsInDB();
EventLog actual1 = eventLogs.get(0);
EventLog actual2 = eventLogs.get(1);
EventLog expected1 = buildEventLog("FLOW_ID_1", 1, buildEid(1));
EventLog expected2 = buildEventLog("FLOW_ID_2", 2, buildEid(2));
assertEvent(actual1, expected1);
assertEvent(actual2, expected2);
}
@Test
@Transactional
public void testBulkInsertEventWithDefaultEid() {
List<EventLog> eventLogsToPersist = Arrays.asList(
buildEventLog("FLOW_ID_1"),
buildEventLog("FLOW_ID_2")
);
eventLogRepository.persist(eventLogsToPersist);
List<EventLog> eventLogsFound = findAllEventsInDB();
EventLog actual1 = eventLogsFound.get(0);
EventLog actual2 = eventLogsFound.get(1);
EventLog expected1 = buildEventLog("FLOW_ID_1", 1, buildEid(1));
EventLog expected2 = buildEventLog("FLOW_ID_2", 2, buildEid(2));
assertEvent(actual1, expected1);
assertEvent(actual2, expected2);
}

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@Chuburashka
Copy link
Author

@ePaul
I removed function from INSERT query in the last commit.

Also I reverted this changes for EventLog.getEid() method.

Because I think if we decided to leave eid field null by default better to leave it null and client side should decide what to do with this field. And we do it in the same place as before in EventTransmissionService

@ePaul
Copy link
Member

ePaul commented Jun 26, 2024

I deployed a release-candidate version (21.1.0-RC-eidgenerator) to our internal Nexus.

Feel free to try this out in your projects.

There are some merge conflicts with some of the other open PRs, so I couldn't easily create a release candidate containing the changes for all of them.

@ePaul
Copy link
Member

ePaul commented Jul 17, 2024

👍

return () -> UUID.randomUUID();
}

UUID generateEid();
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ePaul
Are you agree with UUID as a returned type?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would speak against it? (I might be forgetting some discussion we had 3 weeks ago, please remind me if there was something.)

Maybe we should add a @Nullable annotation and add Javadoc explaining what returning null means, though.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we haven't spoken about it.
I asked because in the fahrschein and nakadi libraries eid is just a string.
And we can expect some client who uses not UUID for eid.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nakadi's API requires a string in UUID-format.
Passing a non-UUID string will break things, so making it an UUID here is fine. (We of course need to convert it to a string before submitting to Nakadi, or rather before passing it to Fahrschein, but that's just a .toString().)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, thanks
I checked the nakadi and they validate eid field format as UUID

@ePaul
Copy link
Member

ePaul commented Jul 18, 2024

👍

@ePaul ePaul added the major label Oct 7, 2024
@Chuburashka
Copy link
Author

👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-configuration everything about the auto-configuration features enhancement major nakadi-submission persistence everything around DB access
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants