Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for DuckDb #153

Closed
azimov opened this issue Jul 23, 2021 · 6 comments
Closed

Support for DuckDb #153

azimov opened this issue Jul 23, 2021 · 6 comments
Labels
enhancement New functionality that could be added help wanted Will probably not be addressed by the package maintainer, but could be addressed by someone else

Comments

@azimov
Copy link
Contributor

azimov commented Jul 23, 2021

I've recently found some information on DuckDb which is an embedded database similar to Sqlite but is designed for analytics use cases.

Some of the appealing aspects of this project are:

  • Apparently supports full postgresql query support, so very few (if any) changes would need to be made to SqlRender to support it
  • Vectorised approach to querying, is significantly faster than sequential operations in Sqlite
  • Support for parallel operations with multiple threading
  • Has R DBI support and can be installed from CRAN as C++ source or binary
  • Very few dependencies - it appears to just depend on DBI
  • Single file storage, like sqlite, which is handy for dataset transfer
  • In memory support, which could be extremely useful for fast unit tests
  • Support for schemas (which I don't believe SQLite does)

It also mentions native support for R and python pandas data.frames, I'm not sure what this means but bulk upload could be trivially supported with a call to duckdb_read_csv

A big negative is that it's not a very mature project, so it should be properly evaluated and well tested.

@schuemie
Copy link
Member

schuemie commented Oct 6, 2021

@ablack3 also seems to be considering DuckDB here. Maybe you can work on evaluating DuckDB together?

@ablack3
Copy link
Collaborator

ablack3 commented Apr 12, 2022

Yes I would like to support duckdb. One more big benefit is support for date and datetimes in duckdb. (Sqlite does not have a date or datetime type and the workarounds are ok but not great.)

This could be done similarly to SQLite. However I don't think the way DatabaseConnector currently implements SQLite connection is ideal. It creates a new DBI connection object and puts the SQLite connection inside. Ideally DatabaseConnector would be able to use a RSqlite connection or duckdb connection directly without wrapping in another connection object. If this were the case then DatabaseConnector could simply use all the DBI methods defined in the duckdb package. If the driver backend is already implemented we should just use it rather than create new methods. So refactoring DatabaseConnector to divide the parts that implement the DBI methods (the backend driver for jdbc connections) and all the other functions that are not part of the driver should let us use duckdb (or any other driver) pretty easily. We would still need SQLRender support but since duckdb essentially uses postgres sql it shouldn't be a big addition.

I started working on this and feel like it is the right direction. It seems like maybe it could be done without breaking the interface (all existing tests would pass on the refactored version) but I'm not completely sure.

image

@schuemie schuemie added exploratory Changes that require some research first help wanted Will probably not be addressed by the package maintainer, but could be addressed by someone else labels Jun 7, 2022
@ablack3
Copy link
Collaborator

ablack3 commented Jan 30, 2023

Eunomia support for duckdb is moving forward. Rather than the major change I outlined above I'd like to propose a smaller change so that duckb can be supported by DatabaseConnector in exactly the same way that it supports Sqlite. It's really just a copy-paste of the sqlite code where we change RSqlite to duckdb. I'll open a PR. Tagging @fdefalco

@ablack3 ablack3 mentioned this issue Jan 30, 2023
@schuemie
Copy link
Member

The new createDbiConnectionDetails() function should be helpful in developing this, although for DuckDB to really be supported this function should eventually be used under the hood like SQLite and like Spark via ODBC. (DatabaseConnector should support a specific set of DBI drivers, not DBI drivers in general, because behavior would be unpredictable)

@schuemie schuemie added enhancement New functionality that could be added and removed exploratory Changes that require some research first labels Jan 31, 2023
@schuemie
Copy link
Member

Implemented

@ablack3
Copy link
Collaborator

ablack3 commented Apr 20, 2023

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New functionality that could be added help wanted Will probably not be addressed by the package maintainer, but could be addressed by someone else
Projects
None yet
Development

No branches or pull requests

3 participants