Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question: Filter by indecies #255

Open
blouflashdb opened this issue Jan 3, 2025 · 11 comments
Open

Question: Filter by indecies #255

blouflashdb opened this issue Jan 3, 2025 · 11 comments

Comments

@blouflashdb
Copy link

Is it faster if I do filtering by properties that are a primary or secondary index?

There is already a built in way to filter fast by the documents KvId via startId and endId but I dont seem to find anything that allows something like this using denokvs build in start and end of list

I would love to do something like this with this awesome library.

const result = await db.users.getManyBySecondaryOrder("age", {
  startId: 18,
  endId: 20
})
@oliver-oloughlin
Copy link
Owner

Hey, that's an interesting use case! When finding documents it will always be faster with a primary id, as it is a single lookup. But in this case im not sure how indices will actually come into play, as the provided methods work on a single index value, while if I understand correctly, here you want to filter documents by a range of index values. I see how that could be very useful for faster filtering. Definitely something I want to explore when I get the time, thanks for bringing it up!

@oliver-oloughlin
Copy link
Owner

I was a little too quick with my reasoning. As the secondary order methods do in fact work on the range of index values as opposed to a single value, this might actually work as expected. I will do some testing now! And it might make a lot of sense to add equivalent methods for primary indices. As for your question then: It will probably be equally fast for filtering with primary as with secondary indices, as both are a list operation under the hood. Tye difference only being that filtering by primary index will only collect unique entries, while secondary index filtering allows for duplicate values.

@oliver-oloughlin
Copy link
Owner

Ok so the startId and endId in a secondary order call does unfortunately still refer to the actual ID of the document. I'm not certain it is possible to introduce a start and end value for index values, as the id of the document is required to form the full document key, which are used under the hood for startKey and endKey.

@blouflashdb
Copy link
Author

Thanks for your replies.

So my goal is to get all users with the age from 18 to 20 in a fast way. Since age is an secondary index meaning it is part of a KvKey the start and end from kv.list could be used to retrieve them very fast. I believe the built in filter of kvdex would loop over all documents which is a bit to slow when there are alot of ducuments in a collection.

@oliver-oloughlin
Copy link
Owner

oliver-oloughlin commented Jan 4, 2025

The best way to achieve what you want right now is probably to get the first document with age = 18, and the last document with age = 20, and then use getManyBySecondaryOrder("age", { startId: first18.id, endId: last20.id }). You can get the first with age = 18 by using getOneBySecondaryIndex("age", 18), and then the last with age = 20 using getOneBySecondaryIndex("age", 20, { reverse: true }). I haven't tested this yet so it's just a proposal, but I think it should work.

@blouflashdb
Copy link
Author

I will try this out tomorrow but I am not sure if this will work because startAt and endAt is for the document Id and between first document id with age 18 and last document id with age 20 could be a document id with age 21.

Lets imagine this collection:

id: 1
username: test
age: 18

id: 2
username: test2
age: 21

id: 3
username: test3
age: 20

If i put in startAt: 1 and endAt: 3 I think I will also get document with id 2

@oliver-oloughlin
Copy link
Owner

oliver-oloughlin commented Jan 5, 2025

When using getManyBySecondaryOrder("age") the docuemnts are ordered by age, so it should be possible to select a start and end id, as the age part preceeds the id part of the document key. But, we need additional start and end properties so that we know to construct the indexed key instead of the normal document key. This shouldn't be too difficult to implement. What I really want to do is to abstract this so we dont need to manually get the first and last document, but instead can simply specify the range like you first proposed. One problem is that it requires a document with age = 18, and one with age = 20 to exist. I will investigate if this can be solved somehow.

@blouflashdb
Copy link
Author

Actually let me show you my real usecase.

Currently I use denokv like this:

I add transactions like this

export interface PayoutTransaction {
  hash: string; // unique
  amount: number;
  type: string;
  recipient: string; // not equal to address
}

async function addPayout(
  address: string,
  datetime: number,
  payoutTransaction: PayoutTransaction,
) {
  await kv.set(
    ["reward_payout", address, datetime],
    payoutTransaction,
  );
}

And then the user (address) can fetch his reward_payout`s between two timestamps from my backend like this:

    const address = getRouterParam(event, 'address') // from GET request
    const start = getRouterParam(event, 'start') // from GET request
    const end = getRouterParam(event, 'end') // from GET request

    const startDate = new Date(start!)
    const startDateTime = startDate.setUTCHours(0, 0, 0, 0)

    const endDate = new Date(end!)
    const endDateTime = endDate.setUTCHours(23, 59, 59, 999)

    const list = kv.list<PayoutTransaction>({ start: ['reward_payout', address, startDateTime], end: ['reward_payout', address, endDateTime] })

This is very fast and I want to be able to do the same with kvdex. Also note that in this case the startDateTime and endDateTime are not required to have a value. For example the user can fetch his data from the Year 2026 to Year 2027 without the backend crashing even tho there are no keys with this start end end date.

Also I would like to know how I need to do my schema for this. Could you please give me an example for this case?

@oliver-oloughlin
Copy link
Owner

oliver-oloughlin commented Jan 5, 2025

Ah, I see, that's really useful information. I made this little example, which works when I run it:

import { collection, kvdex, type KvObject, model } from "jsr:@olli/[email protected]";

async function sleep(ms: number) {
  await new Promise((r) => setTimeout(r, ms));
}

interface PayoutTransaction extends KvObject {
  hash: string; // unique
  amount: number;
  type: string;
  recipient: string; // not equal to address
}

const db = kvdex({
  kv: await Deno.openKv(":memory:"),
  schema: {
    payoutTransactions: collection(model<PayoutTransaction>(), {
      idGenerator: () => Date.now(),
      indices: {
        hash: "primary",
      },
    }),
  },
});

await db.payoutTransactions.add({
  hash: "unique_hash_1",
  amount: 100,
  type: "type",
  recipient: "recipient_id",
});

await sleep(100);

const start = Date.now();

await db.payoutTransactions.add({
  hash: "unique_hash_2",
  amount: 200,
  type: "type",
  recipient: "recipient_id",
});

await sleep(100);

await db.payoutTransactions.add({
  hash: "unique_hash_3",
  amount: 300,
  type: "type",
  recipient: "recipient_id",
});

await sleep(100);

const end = Date.now();

await db.payoutTransactions.add({
  hash: "unique_hash_4",
  amount: 400,
  type: "type",
  recipient: "recipient_id",
});

const { result } = await db.payoutTransactions.getMany({
  startId: start,
  endId: end,
});

console.log(result);

I hope this helps. You can see here we override the default id generator, setting it to a number timestamp (same should actually work with the default id, as it is a ULID, but that would be a string in that case). Then we can select a range based on the timestamp id. If you really want ["reward_payout", "address"] to be part of the underlying document key, then you can use a nested schema like so:

const db = kvdex({
  kv: await Deno.openKv(":memory:"),
  schema: {
    reward_payout: {
      address: collection(model<PayoutTransaction>(), {
        idGenerator: () => Date.now(),
        indices: {
          hash: "primary",
        },
      }),
    },
  },
});

Edit: Ignore that last part, just realised address is a dynamic value. That would mean its not actually possible to mimic.

@blouflashdb
Copy link
Author

blouflashdb commented Jan 5, 2025

I dont understand how include the address in your example. It is not a const string but the identifier of the user. For example this:

const listUser1 = kv.list<PayoutTransaction>({ start: ['reward_payout', "NQ77 0000 0000 0000 0001", startDateTime], end: ['reward_payout', "NQ77 0000 0000 0000 0001", endDateTime] })
const listUser2 = kv.list<PayoutTransaction>({ start: ['reward_payout', "NQ88 0000 0000 0000 0002", startDateTime], end: ['reward_payout', "NQ88 0000 0000 0000 0002", endDateTime] })

@blouflashdb
Copy link
Author

Do you think you could add support for this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants