Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A more complex structure of generated version 7 UUIDs is needed #1

Open
sergeyprokhorenko opened this issue Mar 19, 2023 · 23 comments
Open

Comments

@sergeyprokhorenko
Copy link

Hi Andrew,

Could you please describe in the README page the structure of your generated version 7 UUIDs, something like https://habr.com/ru/post/658855/comments/#comment_24491432

@x4m
Copy link
Owner

x4m commented Mar 19, 2023

Hi! Well, pull requests are welcome :)
Unfortunately I'm better at writing code than docs :(

@sergeyprokhorenko
Copy link
Author

sergeyprokhorenko commented Mar 19, 2023

Andrey, your implementation is too simplified and therefore is not reliable enough. It needs to be advanced for database applications. It does not take into account many concerns that are described in the standard draft and were discussed in detail during its development:

  • millisecond level of precision,
  • excluded leap seconds,
  • handling of the system clock move backward,
  • counter for monotonicity,
  • randomly initialized counter portion,
  • two counter rollover guard methods,
  • optional node identifiers,
  • reseeded CSPRNG,
  • impermissibility of fatal errors,
  • UUID as left part of the key field.

I strongly advise you to read the sections:
6.1. Timestamp Considerations
6.2. Monotonicity and Counters
6.4. Distributed UUID Generation
6.8. Unguessability
6.12. DBMS and Database Considerations

@x4m
Copy link
Owner

x4m commented Mar 19, 2023

Actually the code is taken from standard draft :)
If you wish a better standard adherence, consider sending a pull request. I'll be happy to review it.
But I'd propose to send amendments to IETF group first, because the code is taken from there...

@sergeyprokhorenko
Copy link
Author

sergeyprokhorenko commented Mar 20, 2023

Luckily this code was thrown out from the new version of standard draft: https://datatracker.ietf.org/doc/draft-ietf-uuidrev-rfc4122bis/02/

I am not a programmer. Actualy I am a system analist and I use SQL, not C language. So I cannot write a pull request for you.

It would be a pity if a wonderful updated standard got a poor implementation in a hurry. It's well known that every complex problem has a simple solution that doesn't work.

@sergeyprokhorenko sergeyprokhorenko changed the title Description of the structure of generated version 7 UUIDs is necessary A more complex structure of generated version 7 UUIDs is needed Mar 20, 2023
@x4m
Copy link
Owner

x4m commented Mar 20, 2023

OK, I can help with C part. Can you plz sort needed improvements by some priority and help me develop incremental changed towards "real" UUIDv7 and UUIDv8?
I cannot allocate big chunks of time, but can gradually work on this. What is easiest change that will make current implementation better?

@sergeyprokhorenko
Copy link
Author

sergeyprokhorenko commented Mar 20, 2023

OK, I can help with C part. Can you plz sort needed improvements by some priority and help me develop incremental changed towards "real" UUIDv7 and UUIDv8? I cannot allocate big chunks of time, but can gradually work on this. What is easiest change that will make current implementation better?

I'll try to do it this week

@sergeyprokhorenko
Copy link
Author

sergeyprokhorenko commented Mar 22, 2023

Required UUIDv7 generator features for PostgreSQL

No. UUIDv7 feature for PostgreSQL Section of draft-ietf-uuidrev-rfc4122bis-02 Field length and position Complexity Importance Goals
1 Millisecond level of precision 5.7. UUID Version 7 The leftmost 48 bits of UUID low high Monotonicity, Space saving
2 Excluded leap seconds 5.7. UUID Version 7   high low Monotonicity
3 Temporal time shift forward to compensate for accidental system clock reset backwards 6.1. Timestamp Considerations   high high Monotonicity, Fault tolerance
4 Periodical random time shift 6.1. Timestamp Considerations   high low Secrecy, Unguessability
5 Generating UUID on system clock failure with last known timestamp 6.1. Timestamp Considerations   low high Fault tolerance
6 Counter 6.2. Monotonicity and Counters 15 bits next to the right of the timestamp low high Monotonicity, Space saving
7 Randomly initialized counter portion 6.2. Monotonicity and Counters The rightmost 14 bits of the counter low high Uniqueness
8 Zero initialized counter portion 6.2. Monotonicity and Counters The leftmost 1 bit of the counter low high Guarding against counter rollovers
9 Timestamp increment 6.2. Monotonicity and Counters   high low Guarding against counter rollovers
10 Random node identifier 6.4. Distributed UUID Generation The rightmost few bits of the UUID high low Uniqueness
11 Reseeded CSPRNG 6.8. Unguessability   low high Uniqueness, Unguessability
12 Optional segment next to the right of UUID in the key DB columns 6.12. DBMS and Database Considerations 32 bits next to the right of the UUID low high Identification of system, service, table, source, operation, message, shard, partition etc., Check sum
13 New data type UUID160 with optional 32-bit segment next to the right of the UUID n/a   high high Better developer experience
14 Keeping the UUID generator settings in named JSON format separate from program code (non-standard feature) n/a   high low Better developer experience
15 Setting the UUID output format (binary, integer, "hex-and-dash" string format, non-standard Crockford's Base32) 4. UUID Format   high low Better developer experience
16 Generation of UUIDs or UUID fields in advance 6.3. UUID Generator States   high low Better performance
17 Automated migration of key fields to UUID and UUID160 data types providing backup, order preservation, referential integrity and replacing autoincrement with generation, preserving the natural key n/a   high high Better developer experience

@x4m
Copy link
Owner

x4m commented Mar 22, 2023

Is it all applied to v7 only? We are not looking at v8 right now, are we?

I think in v7 we already have property (1). So maybe let's start with (11) "Reseeded CSPRNG". How often should we reseed PRGN? I think it is already reseeded on fork()s.

@sergeyprokhorenko
Copy link
Author

Is it all applied to v7 only? We are not looking at v8 right now, are we?

It is correct. UUIDv8 is not intended for mass market

I think in v7 we already have property (1).

I don't understand C language well, so I didn't see this property in your program

So maybe let's start with (11) "Reseeded CSPRNG". How often should we reseed PRGN?

It is up to you

I think it is already reseeded on fork()s.

Maybe. It's better to check

@sergeyprokhorenko
Copy link
Author

OK, I can help with C part. ... I cannot allocate big chunks of time, but can gradually work on this.

Andrey,
I strongly advise you to co-operate with the PostgreSQL community, such as
Oleg Bartunov [email protected]
Ivan Panchenko [email protected]
Ivan Frolkov [email protected]

pgsql-hackers: https://www.postgresql.org/message-id/PH0PR11MB5029DF5E0A0EAF8E3CC2C652BBA29@PH0PR11MB5029.namprd11.prod.outlook.com

@x4m
Copy link
Owner

x4m commented Mar 23, 2023

Are you aware that you are posting a link to my thread in pgsql-hackers?
(and yes, surely I'll cooperate with PgPro if they express interest in the topic)

@sergeyprokhorenko
Copy link
Author

Ivan Frolkov

Yes, I am

@x4m
Copy link
Owner

x4m commented Mar 23, 2023

Sure, I'll chat with Ivan next monday on pgconf.ru .

So, let's work on this implementation. Where do we start to do it better?

@sergeyprokhorenko
Copy link
Author

sergeyprokhorenko commented Mar 23, 2023

I know about the conference: https://pgconf.ru/2023/345765
We should start from co-operation with Postgres Professional: discussion of the features required and work planning

@sergeyprokhorenko
Copy link
Author

sergeyprokhorenko commented Mar 24, 2023

I talked to Ivan Panchenko about ULIDs with counters 4 years ego. But it was too early

Oleg Bartunov was aware of this discussion

@sergeyprokhorenko
Copy link
Author

I added the 17th feature into the table

@sergeyprokhorenko
Copy link
Author

Andrey, did you manage to discuss the joint development of the UUIDv7 generator with Ivan Frolkov?

@x4m
Copy link
Owner

x4m commented Mar 29, 2023

Not yet, the conference start on April 2nd (I was wrong about next Monday, sorry)

@sergeyprokhorenko
Copy link
Author

My drawing of the proposed UUIDv7 structure:
UUID structure

@x4m
Copy link
Owner

x4m commented Jul 29, 2023 via email

@x4m
Copy link
Owner

x4m commented Jul 29, 2023

BTW this 15 bits counter and 65 bits of entropy (compared to 16 bits counter and 64 bits of entropy) is going to be significant performance drainer. We do not generate entropy bit by bit. We are going to generate 5 unaligned bytes and overwrite 7 bits by counter.
Personally I don't see any benefits from this counter at all.

@x4m
Copy link
Owner

x4m commented Jul 29, 2023

OK, I've read the standard. This particular version https://datatracker.ietf.org/doc/html/draft-ietf-uuidrev-rfc4122bis-08#name-uuid-version-7

  1. the counter is totally optional. So current implementation adheres to standard.
  2. randomness does not sprawl to counter's byte. So having counter will not affect performance.

@sergeyprokhorenko
Copy link
Author

sergeyprokhorenko commented Jul 30, 2023

Andrey,

My Telegram account is https://t.me/SergeyProkhorenko (@SergeyProkhorenko)

The rate of UUIDv7 generation by modern computers usually does not allow filling the counter completely longer than 15 bits. But I don't mind a little increase in the length of the counter.

The counter must be big-endian, as is the timestamp.

The counter has the following benefits:

  • Higher search speed for records created as a result of bulk generation of records at submillisecond intervals when the system clock is not sufficiently accurate
  • Space saving (for longer UUIDv7's pseudo-random segment and therefore better collision resistance) compared to submillisecond timestamp segment
  • Better collision resistance due to initialization with a pseudo-random number compared to submillisecond timestamp segment

The initialization every millisecond and increment of the 15-bit counter are independent of the generation of the 59-bit UUIDv7's pseudo-random segment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants