Skip to content
This repository has been archived by the owner on Oct 18, 2021. It is now read-only.

Feature Requests Roadmap

Dustin Sallings edited this page Sep 26, 2012 · 2 revisions

(this is derived from an email exchange between two contributors)

In no particular order:

Ability to define aggregation pattern that can be saved as schema in the database

Use case

I know my typical aggregation pattern, so I define it ONCE in/as the database schema.

Later, when I will live query data or use deferred data processing, I can use the already saved aggregation pattern.

This feature shall faciliate ease of use and save time.

Comment

I've thought about this one for sure. It's actually possible to do externally already, just not very magically. I'll learn more when I get more internal people pushing it.

Deferred data processing

Use case

When dataset [input to data processing tasks] is too big, i.e. live queries become "unresponsive" or when most of reporting/analytics tasks are done with aggregated data

Idea

Use smth like stored procedures for aggregation/map/reduce/pipes [Mongo 2.2].

Example

I collect events from application servers. For my reporting purposes I do not need raw data, but I need the data to be aggregated with 5 min, 15 min, 1 hr, 12 hrs and 24 hrs intervals. I do not need aggregated data in real-time. And there are 20 people like me, who hammer the aggregated data. I/application define rules for data aggregation/pipes/etc and schedule when it shall be run [e.g. at night], i.e. I schedule a batch processing task, which aggregates data according to the already defined db schema and saves the results into the database. When I need to query data later on, I can use already pre-computed data instead of live query each time.

Comment

I could probably prioritize the query/doc processing and get most of this out of the way, or something like what I've been thinking about for #4.

Availability

Idea

Add master-slave(s) replication and automatic failover. This shall facilitate ease of use, reduction of maintenance and user adoption.

Comment

I've been tempted to add replication -- not because I need it, but because it's just really easy. master-slave is completely trivial. master-master isn't hard, but requires a tiny bit of state to be tracked I don't have an easy way to do yet. It'd be worth it just for fun.

Sharding data automatically and/or easily across available nodes

Comment

I have a lot of infrastructure for this. To be efficient, I need something like _all_docs that doesn't include the values and/or something like get that evaluates jsonpointer. Then you could pretty well round-robin your writes and have a front-end that does this last-step work. Harvest the range from all nodes concurrently while collating the keys. Once you find a boundary from every node, you have a fully defined chunk and can start doing reductions on it. A slightly harder, but more efficient integration is to have two-phase reduction and let the leaves do a bunch of the work while the central thing just does collation. You wouldn't be able to stream results in that scenario, though.

Some way to ensure that data is never lost

Ability to guarantee that data was safely delivered to database and stored there.

Comment

Is this as simple as disabling DELETE and PUT (where a document doesn't exist)?