layout | title |
---|---|
docu |
Frequently Asked Questions |
DuckDB is maintained by Dr. Mark Raasveldt & Prof. Dr. Hannes Mühleisen along with many other contributors from all over the world. Mark and Hannes have set up the DuckDB Foundation that collects donations and funds development and maintenance of DuckDB. Mark and Hannes are also co-founders of DuckDB Labs, which provides commercial services around DuckDB. Several other DuckDB contributors are also affiliated with DuckDB Labs.
DuckDB's initial development took place at the Database Architectures Group at the Centrum Wiskunde & Informatica (CWI) in Amsterdam, The Netherlands.
Ducks are amazing animals. They can fly, walk and swim. They can also live off pretty much everything. They are quite resilient to environmental challenges. A duck's song will bring people back from the dead and inspires database research. They are thus the perfect mascot for a versatile and resilient data management system. Also the logo designs itself.
DuckDB is the name of the MIT licensed open-source project.
The [DuckDB Foundation]({% link foundation/index.html %}) is a non-profit organization that holds the intellectual property of the DuckDB project.
Its statutes also ensure DuckDB remains open-source under the MIT license in perpetuity.
Donations to the DuckDB Foundation directly fund DuckDB development.
DuckDB Labs is a company based in Amsterdam that provides commercial support services for DuckDB.
DuckDB Labs employs the core contributors of the DuckDB project.
MotherDuck is a venture-backed company creating a hybrid cloud/local platform using DuckDB.
MotherDuck contracts with DuckDB Labs for development services, and DuckDB Labs owns a portion of MotherDuck.
See the partnership announcement for details.
To learn more about MotherDuck, see the CIDR 2024 paper on MotherDuck and the MotherDuck documentation.
You can download the DuckDB Logo here:
- Horizontal logo: [svg](/images/logo-dl/DuckDB_Logo-horizontal.svg) / [png](/images/logo-dl/DuckDB_Logo-horizontal.png) / [pdf](/images/logo-dl/DuckDB_Logo-horizontal.pdf)
Inverted variants for dark backgrounds:
- Horizontal logo: [svg](/images/logo-dl/DuckDB_Logo-horizontal-dark-mode.svg) / [png](/images/logo-dl/DuckDB_Logo-horizontal-dark-mode.png) / [pdf](/images/logo-dl/DuckDB_Logo-horizontal-dark-mode.pdf)
The DuckDB logo & website were designed by Jonathan Auch & Max Wohlleber.
Please consult the [trademark guidelines for DuckDB™]({% link trademark_guidelines.md %}).
DuckDB supports [persistent storage]({% link docs/connect/overview.md %}#persistent-database) and stores the database as a single file, which includes all tables, views, indexes, macros, etc. present in the database. DuckDB's [storage format]({% link docs/internals/storage.md %}) uses a compressed columnar representation, which is compact but allows for efficient bulk updates. DuckDB can also run in [in-memory mode]({% link docs/connect/overview.md %}#in-memory-database), where no data is persisted to disk.
The type of storage used to run DuckDB has a [significant performance impact]({% link docs/guides/performance/environment.md %}#disk). In general, using SSDs (SATA or NVMe SSDs) leads to superior performance compared to HDDs. The location of the storage varies greatly depending the workload. For read-only workloads, the DuckDB database can be stored on local disks and remote endpoints such as [HTTPS]({% link docs/extensions/httpfs/https.md %}) and cloud object storage such as [AWS S3]({% link docs/extensions/httpfs/s3api.md %}) and similar providers. For read-write workloads, storing the database on instance-attached storage yields the best performance. Network-attached cloud storage such as AWS EBS also works and its performance can be fine-tuned with the guaranteed IOPS settings. Based on our experience, we advise against running read-write DuckDB workloads on on-premises network-attached storage (NAS). These setups are often slow and result in spurious failures that are difficult to troubleshoot.
It is a common misconception that DuckDB is an in-memory database. While DuckDB can work in-memory, it is not an in-memory database. DuckDB can make use of available memory for caching, it also fully supports disk-based persistence and [offloading larger-than-memory operations]({% link docs/guides/performance/how_to_tune_workloads.md %}#larger-than-memory-workloads-out-of-core-processing) to disk.
Since version 0.10.0 (released in February 2024), DuckDB is backwards-compatible when reading database files, i.e., newer versions of DuckDB are always able to read database files created with an older version of DuckDB. DuckDB also provides partial forwards-compatibility on a best-effort basis. See the [storage page]({% link docs/internals/storage.md %}) for more details. Compatibility is also guaranteed between different DuckDB clients (e.g., Python and R): a database file created with one client can be read with other clients.
DuckDB does not use explicit SIMD (single instruction, multiple data) instructions because they greatly complicate portability and compilation. Instead, DuckDB uses implicit SIMD, where we go to great lengths to write our C++ code in such a way that the compiler can auto-generate SIMD instructions for the specific hardware. As an example why this is a good idea, it took 10 minutes to port DuckDB to the Apple Silicon architecture.
We welcome experiments comparing DuckDB's performance to other systems. To ensure fair comparison, we have a few recommendations. First, try to use the [latest DuckDB version available as a nightly build]({% link docs/installation/index.html %}), which often has significant performance improvements compared to the last stable release. Second, consider consulting our DBTest 2018 paper Fair Benchmarking Considered Difficult: Common Pitfalls In Database Performance Testing for guidelines on how to avoid common issues in benchmarks. Third, study the DuckDB [Performance Guide]({% link docs/guides/performance/overview.md %}), which has best practices for ensuring optimal performance. Finally, please report the DuckDB version (for stable verison, the version number, for nightly builds, the commit hash).
DuckDB was designed with both data science and data engineering workloads in mind. Therefore, you can use DuckDB's SQL syntax to be highly flexible, or very precise, depending on your needs.
For data science users, who often run queries in an interactive fashion, DuckDB offers several mechanisms for quickly exploring data sets.
For example, CSV files can be loaded by [auto-inferring their schema]({% link docs/data/csv/auto_detection.md %}) using CREATE TABLE tbl AS FROM 'input.csv'
.
Moreover, there numerous SQL shorthands known as [“friendly SQL”]({% link docs/sql/dialect/friendly_sql.md %}) for more concise expressions, e.g., the [GROUP BY ALL
clause]({% link docs/sql/query_syntax/groupby.md %}#group-by-all).
For data engineering use cases, DuckDB allows full control over the loading process, so it is possible to define the precise schema using a CREATE TABLE tbl ⟨schema⟩
statement and populate it using a [COPY
statement]({% link docs/sql/statements/copy.md %}) that specifies the CSV's dialect (delimiter, quotes, etc.).
Most friendly SQL extensions are simple to rewrite to SQL queries that are fully compatible with PostgreSQL.
For example, the GROUP BY ALL
clause can be replaced with a GROUP BY
clause and an explicit list of columns.
DuckDB's use cases can be split into roughly three major categories. Namely, DuckDB can be used for interactive data analysis by a user (“data science”) and as pipeline component for automated data processing (“data enginereering”). DuckDB can also be deployed in novel architectures, where one traditionally couldn't run an analytical database management system but DuckDB is available thanks to its portability. These architectures include running DuckDB in browsers (using the WebAssembly client) and on smartphones. Additionally, DuckDB's extensions unlock use cases such as geospatial analysis and deep integration with other database systems. And finally, in some cases, DuckDB doesn't even need data to be a database.
Please check the [release calendar]({% link docs/dev/release_calendar.md %}) for the planned release date of the next stable version of DuckDB.
Currently, we do not maintain a public development roadmap. We discuss planned developments at DuckCon events (typically held twice a year). See the most recent overview talk at DuckCon #5.
The DuckDB Website is hosted by GitHub Pages, its repository is at duckdb/duckdb-web
.
When the documentation is browsed from a desktop computer, every page has a “Page Source” button on the top that navigates you to its Markdown source file.
Pull requests to fix issues or to expand the documentation section on DuckDB's features are very welcome.
Before opening a pull request, please consult our Contributor Guide.