MariaDB + DuckDB: A New Playground for Analytics - A First Look at the New Storage Engine - lefred blog: tribulations of a MariaDB Community Advocate

MariaDB just announced it has learned to quack: the new DuckDB storage engine has joined the large family of storage engines in MariaDB Server.

The idea is very interesting: use MariaDB Server as usual, but create some tables using ENGINE=DuckDB and benefit from DuckDB’s columnar storage and vectorized execution for analytical queries.

In other words, we can keep our transactional workload in InnoDB, and use DuckDB tables for analytics… in the same MariaDB instance. Not a new concept, but a new and powerful implementation!

This is, of course, not something I would recommend for production today; it’s still alpha, but it’s exactly the kind of feature I like to test, and I recommend you test it to help it move to the next quality level as quickly as possible.

Let’s have a look.

The Goal

In this post, I want to test:

how to enable the DuckDB storage engine in MariaDB
how to create a DuckDB table
how to load some data
how to query it
how to join an InnoDB table with a DuckDB table
how far we can go with DuckDB-specific features like Parquet
some limitations

Installation

For this first test, I used the packages provided by MariaDB’s CI and dbdeployer.

Please remember that these packages are unsigned and for testing only.

$ wget https://ci.mariadb.org/68929/amd64-fedora-42/mariadb-11.4.13-linux-x86_64.tar.gz

$ dbdeployer unpack mariadb-11.4.13-linux-x86_64.tar.gz

After installing the server, we need to load the plugin at startup by modifying the configuration file (my.sandbox.cnf within dbdeployer’s sandbox):

plugin-maturity=alpha
plugin-load-add=ha_duckdb.so
duckdb-memory-limit=8G

The plugin-maturity=alpha line is mandatory because the plugin is currently marked as alpha.

I also increased duckdb-memory-limit, because the default value is only 1G and that can quickly become the first limitation when loading or querying larger datasets.

We can restart the instance and verify in the error log that the plugin was correctly loaded:

$ ./restart

$ ./show_log | tail -n 11
2026-06-11 12:44:52 0 [Note] InnoDB: Buffer pool(s) load completed at 260611 12:44:52
2026-06-11 12:44:52 0 [Note] DuckDB: registered MySQL-compatible function overloads 
(octet_length, length, ascii, ord, hex, oct, bin, locate, mid, rtrim,
 ltrim, regexp_instr, regexp_replace, regexp_substr, json_unquote,
 json_contains)
2026-06-11 12:44:52 0 [Note] DuckDB: cross-engine scan registered (_mdb_scan + replacement scan)
2026-06-11 12:44:52 0 [Note] DuckDB: DuckdbManager::Initialize succeed, path=/var/run/media/fred/DATA/DBDEPLOYER/sandboxes/msb_11_4_13/data/duckdb.db
2026-06-11 12:44:52 0 [Note] DuckDB storage engine initialized
2026-06-11 12:44:52 0 [Note] Plugin 'FEEDBACK' is disabled.
2026-06-11 12:44:52 0 [Note] Plugin 'wsrep-provider' is disabled.
2026-06-11 12:44:52 0 [Note] Server socket created on IP: '127.0.0.1', port: '11413'.
2026-06-11 12:44:52 0 [Note] mariadbd: Event Scheduler: Loaded 0 events
2026-06-11 12:44:52 0 [Note] /var/run/media/fred/DATA/DBDEPLOYER/opt/11.4.13/bin/mariadbd: ready for connections.
Version: '11.4.13-MariaDB'  socket: '/tmp/mysql_sandbox11413.sock'  port: 11413  Source distribution

We can connect to our instance with the mariadb client and check the engines using SHOW ENGINES\G and we should see the following:

...
*************************** 7. row ***************************
      Engine: DUCKDB
     Support: YES
     Comment: DuckDB storage engine
Transactions: YES
          XA: YES
  Savepoints: NO

We can also see new global variables and status variables dedicated to DuckDB:

mariadb > show global variables like 'duc%';
+-------------------------------------------+---------------------+
| Variable_name                             | Value               |
+-------------------------------------------+---------------------+
| duckdb_appender_allocator_flush_threshold | 67108864            |
| duckdb_checkpoint_threshold               | 268435456           |
| duckdb_copy_ddl_in_batch                  | ON                  |
| duckdb_disabled_optimizers                |                     |
| duckdb_dml_in_batch                       | ON                  |
| duckdb_explain_output                     | PHYSICAL_ONLY       |
| duckdb_force_no_collation                 | OFF                 |
| duckdb_log_options                        |                     |
| duckdb_max_temp_directory_size            | 0                   |
| duckdb_max_threads                        | 0                   |
| duckdb_memory_limit                       | 8589934592          |
| duckdb_merge_join_threshold               | 4611686018427387904 |
| duckdb_require_primary_key                | ON                  |
| duckdb_scheduler_process_partial          | ON                  |
| duckdb_temp_directory                     |                     |
| duckdb_update_modified_column_only        | ON                  |
| duckdb_use_direct_io                      | OFF                 |
| duckdb_use_double_for_decimal             | ON                  |
+-------------------------------------------+---------------------+
18 rows in set (0.002 sec)

mariadb > show global status like '%duck%';
+-----------------------------+-------+
| Variable_name               | Value |
+-----------------------------+-------+
| Duckdb_commit               | 0     |
| Duckdb_rollback             | 0     |
| Duckdb_rows_delete          | 0     |
| Duckdb_rows_delete_in_batch | 0     |
| Duckdb_rows_insert          | 0     |
| Duckdb_rows_insert_in_batch | 0     |
| Duckdb_rows_update          | 0     |
| Duckdb_rows_update_in_batch | 0     |
+-----------------------------+-------+
8 rows in set (0.002 sec)

All good, we can proceed with the next step!

Creating a DuckDB table

My very first test is to create a DuckDB table:

mariadb > CREATE DATABASE ducktest;

mariadb > USE ducktest;

mariadb > CREATE TABLE sales ( 
            id BIGINT PRIMARY KEY, 
            product VARCHAR(64), 
            amount DECIMAL(12,2), 
            sold_at TIMESTAMP ) ENGINE=DuckDB;
ERROR 1478 (HY000): Table storage engine 'DuckDB' does not support the create option 'non-utf8 charset'

This error is expected, as Roman states in his announcement, DuckDB rejects non-UTF8 charsets. The default on 11.4 is latin1.

mariadb > CREATE TABLE sales ( 
            id BIGINT PRIMARY KEY, 
            product VARCHAR(64), 
            amount DECIMAL(12,2), 
            sold_at TIMESTAMP ) ENGINE=DuckDB DEFAULT CHARSET utf8mb4;
Query OK, 0 rows affected (0.014 sec)

Let’s add some rows and try some queries:

mariadb > INSERT INTO sales VALUES (1, 'MariaDB mug', 12.00, NOW()),
             (2, 'DuckDB sticker', 2.50, NOW()),
             (3, 'Sea lion plush', 25.00, NOW()),
             (4, 'DuckDB sticker', 2.50, NOW());
Query OK, 4 rows affected (0.007 sec)
Records: 4  Duplicates: 0  Warnings: 0

mariadb > SELECT * FROM sales;
+----+----------------+--------+---------------------+
| id | product        | amount | sold_at             |
+----+----------------+--------+---------------------+
|  1 | MariaDB mug    |  12.00 | 2026-06-11 12:49:44 |
|  2 | DuckDB sticker |   2.50 | 2026-06-11 12:49:44 |
|  3 | Sea lion plush |  25.00 | 2026-06-11 12:49:44 |
|  4 | DuckDB sticker |   2.50 | 2026-06-11 12:49:44 |
+----+----------------+--------+---------------------+
4 rows in set (0.007 sec)

mariadb > SELECT product, SUM(amount) AS revenue 
          FROM sales GROUP BY product ORDER BY revenue DESC;
+----------------+---------+
| product        | revenue |
+----------------+---------+
| Sea lion plush |   25.00 |
| MariaDB mug    |   12.00 |
| DuckDB sticker |    5.00 |
+----------------+---------+
3 rows in set (0.007 sec)

mariadb > SELECT p.category, SUM(s.amount) AS revenue 
            FROM sales s 
            JOIN products p ON p.product = s.product 
           GROUP BY p.category ORDER BY revenue DESC;
+----------+---------+
| category | revenue |
+----------+---------+
| toys     |   25.00 |
| goodies  |   17.00 |
+----------+---------+
2 rows in set (0.003 sec)

So far, nothing special from a SQL user’s point of view.

And that’s exactly the point!

Mixing InnoDB and DuckDB

The really interesting part is that we can use both engines in the same schema.

Let’s create a small InnoDB table:

mariadb > CREATE TABLE products ( 
            product VARCHAR(64) PRIMARY KEY, 
            category VARCHAR(64) 
          ) ENGINE=InnoDB; 

mariadb > INSERT INTO products VALUES ('MariaDB mug', 'goodies'),
          ('DuckDB sticker', 'goodies'), 
          ('Sea lion plush', 'toys');

This is where this engine becomes very interesting.

We can imagine keeping operational data in InnoDB and moving analytical or historical data into DuckDB tables.

AirportDB

Let’s try DuckDB with a bigger dataset.

I’ve loaded a well-known dataset for those who have tested MySQL HeatWave: airportdb.

I could not load the mysqlsh dump in MariaDB, so I had to load a logical sql dump.

Only the largest table (booking) was loaded into InnoDB, and another copy in DuckDB for comparison.

The load in InnoDB was almost endless… I wasn’t expecting it to be so long, so I didn’t measure it, but it was looooooong. The import in DuckDB was much, much faster!

To make a fair comparison, I reduced the duckdb_memory_limit to 4GB and increased the innodb_buffer_pool_size to 4GB too.

I restarted the MariaDB Server after the load.

mariadb > > SELECT count(*) FROM booking;
+----------+
| count(*) |
+----------+
| 54304619 |
+----------+
1 row in set (26.656 sec)

mariadb > SELECT count(*) FROM booking_duck;
+----------+
| count(*) |
+----------+
| 54304619 |
+----------+
1 row in set (0.005 sec)

We can already see a big difference.

Let’s compare a query, first in InnoDB:

mariadb > SELECT a.airlinename, COUNT(*) AS bookings 
            FROM booking b JOIN flight f ON b.flight_id = f.flight_id 
            JOIN airline a ON f.airline_id = a.airline_id 
           GROUP BY a.airlinename ORDER BY bookings DESC LIMIT 10;
+----------------------+----------+
| airlinename          | bookings |
+----------------------+----------+
| Vanuatu Airlines     |   743353 |
| Peru Airlines        |   703840 |
| Micronesia Airlines  |   700704 |
| Yugoslavia Airlines  |   697159 |
| Hungary Airlines     |   693488 |
| Falkland Is Airlines |   666531 |
| Ethiopia Airlines    |   660362 |
| Tunisia Airlines     |   638740 |
| Swaziland Airlines   |   602344 |
| Puerto Rico Airlines |   602130 |
+----------------------+----------+
10 rows in set (27.973 sec)

And now in DuckDB (joining InnoDB tables too):

mariadb > SELECT a.airlinename, COUNT(*) AS bookings 
            FROM booking_duck b JOIN flight f ON b.flight_id = f.flight_id
            JOIN airline a ON f.airline_id = a.airline_id 
           GROUP BY a.airlinename ORDER BY bookings DESC LIMIT 10;
+----------------------+----------+
| airlinename          | bookings |
+----------------------+----------+
| Vanuatu Airlines     |   743353 |
| Peru Airlines        |   703840 |
| Micronesia Airlines  |   700704 |
| Yugoslavia Airlines  |   697159 |
| Hungary Airlines     |   693488 |
| Falkland Is Airlines |   666531 |
| Ethiopia Airlines    |   660362 |
| Tunisia Airlines     |   638740 |
| Swaziland Airlines   |   602344 |
| Puerto Rico Airlines |   602130 |
+----------------------+----------+
10 rows in set (0.456 sec)

No contest!

Disk Size

If we check the disk consumption, we can see that for the booking table, the InnoDB tablespace is 7.2GB:

$ ls -lh booking*
-rw-rw---- 1 fred fred 2.5K Jun 11 13:53 booking_duck.frm
-rw-rw---- 1 fred fred 2.5K Jun 11 14:05 booking.frm
-rw-rw---- 1 fred fred 7.2G Jun 11 17:47 booking.ibd

DuckDB’s data consumes much less space:

$ ls -lh duck*
-rw-rw---- 1 fred fred 333M Jun 12 08:42 duckdb.db
-rw-rw---- 1 fred fred 8.1K Jun 12 09:47 duckdb.db.wal

This is also something very important, and that’s a significant benefit!

Almost 22 times smaller, a 95.4% size reduction!

Some limitations

During my test, I encountered some limitations, but don’t forget that this is still an Alpha release:

auto_increment

Currently, there is no support for auto_increment. Of course, since the primary purpose is to migrate existing data from InnoDB to DuckDB for analytics, the primary key won’t change, so there is no need to generate it.

But in other situations, supporting auto_increment could be nice too.

mariab > CREATE TABLE invoice ( 
           id BIGINT AUTO_INCREMENT PRIMARY KEY, 
           product VARCHAR(64), 
           amount DECIMAL(12,2), 
           sold_at TIMESTAMP ) ENGINE=DuckDB DEFAULT CHARSET utf8mb4;
ERROR 1164 (42000): Storage engine DUCKDB doesn't support AUTO_INCREMENT columns

Ok, so let’s see if we have other alternatives.

Let’s start with using a SEQUENCE:

mariadb > CREATE SEQUENCE primary_key_seq START WITH 1 INCREMENT BY 1 CACHE 100;
Query OK, 0 rows affected (0.014 sec)

mariadb > CREATE TABLE invoice ( 
            id BIGINT PRIMARY KEY DEFAULT(NEXT VALUE FOR primary_key_seq),
            product VARCHAR(64), 
            amount DECIMAL(12,2), 
            sold_at TIMESTAMP ) ENGINE=DuckDB DEFAULT CHARSET utf8mb4;
ERROR 1296 (HY000): Got error 168 'Binder Error: DEFAULT value cannot contain column names

LINE 1: ...(nextval("ducktest"."primary_key_seq")),"product" VARCHAR COLLATE NOCASE.NOACCENT,"amount" DECIMAL(12,2),"sold_at" TIMESTA...
      ' from DuckDB

This is not supported, so let’s try with something else: UUID

mariadb > CREATE TABLE invoice ( 
            id UUID DEFAULT(uuid()) PRIMARY KEY, 
            product VARCHAR(64), 
            amount DECIMAL(12,2), 
            sold_at TIMESTAMP ) ENGINE=DuckDB DEFAULT CHARSET utf8mb4;
Query OK, 0 rows affected (0.016 sec)

It seems to work… but…

mariadb > INSERT INTO invoice (product, amount, sold_at) 
          VALUES ('MariaDB mug', 12.00, NOW()), 
                 ('DuckDB sticker', 2.50, NOW()), 
                 ('Sea lion plush', 25.00, NOW()), 
                 ('DuckDB sticker', 2.50, NOW());
Query OK, 4 rows affected (0.009 sec)
Records: 4  Duplicates: 0  Warnings: 0

mariadb > SELECT id, product, amount FROM invoice;
+--------------------------------------+----------------+--------+
| id                                   | product        | amount |
+--------------------------------------+----------------+--------+
| 00000000-0000-0000-0000-000000000000 | MariaDB mug    |  12.00 |
| 00000000-0000-0000-0000-000000000000 | DuckDB sticker |   2.50 |
| 00000000-0000-0000-0000-000000000000 | Sea lion plush |  25.00 |
| 00000000-0000-0000-0000-000000000000 | DuckDB sticker |   2.50 |
+--------------------------------------+----------------+--------+
4 rows in set (0.003 sec)

We can see that the uuid() is not generated as expected. But this breaks the Primary Key constraint.

This seems to be an issue with UUID data type because even when provided, it stays ‘00000000-0000-0000-0000-000000000000’:

mariadb > insert into invoice values ('65297af6-6630-11f1-8ad2-5e1b9081e705','fred',10, now());
Query OK, 1 row affected (0.011 sec)

mariadb > > SELECT id, product, amount FROM invoice;
+--------------------------------------+----------------+--------+
| id                                   | product        | amount |
+--------------------------------------+----------------+--------+
| 00000000-0000-0000-0000-000000000000 | MariaDB mug    |  12.00 |
| 00000000-0000-0000-0000-000000000000 | DuckDB sticker |   2.50 |
| 00000000-0000-0000-0000-000000000000 | Sea lion plush |  25.00 |
| 00000000-0000-0000-0000-000000000000 | DuckDB sticker |   2.50 |
| 00000000-0000-0000-0000-000000000000 | fred           |  10.00 |
+--------------------------------------+----------------+--------+
5 rows in set (0.002 sec)

So it’s better to provide the primary key value when inserting a record, but avoid UUIDs.

In DuckDB, auto-incrementing is not supported, but sequences are. Those sequences are different than the MariaDB and are internal to DuckDB. I’ve tried to create one in DuckDB, and it worked, but I can’t use it from MariaDB yet. This is something to investigate further.

The scope of DuckDB is to ingest large volumes of data in bulk very quickly and provide very fast querying, not to insert rows one by one.

Insert from select

Inserting data into a DuckDB table from an InnoDB table works as expected. However, the other way around is not supported:

mariadb > INSERT INTO invoice2 (SELECT * FROM invoice);
ERROR 1296 (HY000): Got error 122 'Catalog Error: Table with name invoice2 does not exist!
Did you mean "invoice"?' from DuckDB

You could then think to first create the second table as a DuckDB table and then change it to InnoDB. But this won’t work as expected:

mariadb > CREATE TABLE invoice2 ( id INT UNSIGNED PRIMARY KEY, 
              product VARCHAR(64), 
              amount DECIMAL(12,2), 
              sold_at TIMESTAMP ) engine=DuckDB default CHARSET utf8mb4;
Query OK, 0 rows affected (0.017 sec)

mariadb > INSERT INTO invoice2 (SELECT * FROM invoice);
Query OK, 1 row affected (0.009 sec)
Records: 1  Duplicates: 0  Warnings: 0

mariadb > SELECT * FROM invoice2;
+----+----------------+--------+---------------------+
| id | product        | amount | sold_at             |
+----+----------------+--------+---------------------+
|  1 | MariaDB mug    |  12.00 | 2026-06-12 09:42:25 |
|  2 | DuckDB sticker |   2.50 | 2026-06-12 09:42:25 |
|  3 | Sea lion plush |  25.00 | 2026-06-12 09:42:25 |
|  4 | DuckDB sticker |   2.50 | 2026-06-12 09:42:25 |
|  4 | NULL           |   NULL | NULL                |
+----+----------------+--------+---------------------+
5 rows in set (0.004 sec)

We can see the last extra line; we will have to remove it before changing the table to InnoDB, as the Primary Key constraint will fail (the value is not unique).

Subquery

When using the airportdb, there is a query I like to use as an example; however, this query is not yet working:

mariadb > SELECT airlinename, sum(sold_seat)/sum(capacity) 'load factor'            
           FROM ( 
             SELECT  flight_id, COUNT(*) sold_seat
              FROM booking_duck GROUP BY flight_id) seat_sold
                  JOIN flight using (flight_id)  
                  JOIN airplane USING (airplane_id)
                  JOIN airline ON airline.airline_id = flight.airline_id
              GROUP BY airlinename ORDER BY airlinename LIMIT 10;
ERROR 1296 (HY000): Got error 122 'Parser Error: Expected a constant as type modifier' from DuckD

Loading data faster

Now let’s go back to where DuckDB shines: loading data.

It’s possible to use DuckDB’s native COPY through the run_in_duckdb() function. This avoids pushing all rows through the normal MariaDB SQL layer one by one.

$ ls -lh booking.csv 
-rw-r--r-- 1 fred fred 2.2G Jun 12 09:57 booking.csv

First, we create the table:

mariadb > CREATE TABLE `booking2` (
       `booking_id` int(11) NOT NULL,
       `flight_id` int(11) NOT NULL,
       `seat` char(4) DEFAULT NULL,
       `passenger_id` int(11) NOT NULL,
       `price` decimal(10,2) NOT NULL,
       PRIMARY KEY (`booking_id`),
       UNIQUE KEY `seatplan_unq` (`flight_id`,`seat`),
       KEY `flight_idx` (`flight_id`),
       KEY `passenger_idx` (`passenger_id`)
     ) ENGINE=DUCKDB DEFAULT CHARSET=utf8mb4;
Query OK, 0 rows affected (0.012 sec)

Then we load the CSV file using the run_in_duckd() super function:

mariadb > SELECT run_in_duckdb("COPY airport.booking2 FROM 'booking.csv' (HEADER, DELIMITER ',') ")\G
*************************** 1. row ***************************
run_in_duckdb("COPY airport.booking2 FROM 'booking.csv' (HEADER, DELIMITER ',') "): Count
BIGINT	
[ Rows: 1]
54304619

1 row in set (4.104 sec)

mariadb > SELECT count(*) FROM booking2;
+----------+
| count(*) |
+----------+
| 54304619 |
+----------+
1 row in set (0.013 sec)

That was extremely fast: 54,304,619 rows in 4 seconds!!

It’s also possible to import Parquet files!

Exporting DuckDB table to Parquet

One of the first comments I got when I shared DuckDB’s announcement on social media was whether it would be possible to back up the data to a Parquet file.

The good news is: Yes, it’s possible!

Once again, the run_in_duckdb()function is what we need:

mariadb > SELECT run_in_duckdb(" COPY airport.booking_duck TO 'booking_backup.parquet' (FORMAT parquet) ")\G
*************************** 1. row ***************************
run_in_duckdb(" COPY airport.booking_duck TO 'booking_backup.parquet' (FORMAT parquet) "): Count	
BIGINT	
[ Rows: 1]
54304619

1 row in set (1.597 sec)

Let’s check the generated file:

$ ls -lh booking_backup.parquet 
-rw-rw---- 1 fred fred 703M Jun 12 10:28 booking_backup.parquet

$ file booking_backup.parquet 
booking_backup.parquet: Apache Parquet

And we can verify it:

$ parquet-tools inspect booking_backup.parquet 

###### file meta data ######
created_by: DuckDB version v1.5.2 (build 0000000000)
num_columns: 5
num_rows: 54304619
num_row_groups: 442
format_version: 1.0
serialized_size: 187066


###### Columns ######
booking_id
flight_id
seat
passenger_id
price

###### Column(booking_id) ######
name: booking_id
path: booking_id
max_definition_level: 1
max_repetition_level: 0
physical_type: INT32
logical_type: Int(bitWidth=32, isSigned=true)
converted_type (legacy): INT_32
compression: SNAPPY (space_saved: -0%)

###### Column(flight_id) ######
name: flight_id
path: flight_id
max_definition_level: 1
max_repetition_level: 0
physical_type: INT32
logical_type: Int(bitWidth=32, isSigned=true)
converted_type (legacy): INT_32
compression: SNAPPY (space_saved: -0%)

###### Column(seat) ######
name: seat
path: seat
max_definition_level: 1
max_repetition_level: 0
physical_type: BYTE_ARRAY
logical_type: String
converted_type (legacy): UTF8
compression: SNAPPY (space_saved: 1%)

###### Column(passenger_id) ######
name: passenger_id
path: passenger_id
max_definition_level: 1
max_repetition_level: 0
physical_type: INT32
logical_type: Int(bitWidth=32, isSigned=true)
converted_type (legacy): INT_32
compression: SNAPPY (space_saved: -0%)

###### Column(price) ######
name: price
path: price
max_definition_level: 1
max_repetition_level: 0
physical_type: INT64
logical_type: Decimal(precision=10, scale=2)
converted_type (legacy): DECIMAL
compression: SNAPPY (space_saved: 47%)

It’s also possible to export the result of a query into a Parquet file, but the query must only include tables in DuckDB, not with other storage engines like InnoDB:

mariadb > SELECT run_in_duckdb(" COPY ( SELECT flight_id, AVG(price) AS price_avg
            FROM airport.booking_duck
            GROUP BY flight_id order by price_avg DESC ) TO
        'booking_prices.parquet' (FORMAT parquet, COMPRESSION zstd)")\G
*************************** 1. row ***************************
run_in_duckdb(" COPY ( SELECT flight_id, AVG(price) AS price_avg
            FROM airport.booking_duck
           GROUP BY flight_id order by price_avg DESC ) TO 'booking_prices.parquet' (FORMAT parquet, COMPRESSION zstd)"): Count	
BIGINT	
[ Rows: 1]
462553

1 row in set (0.123 sec)

Just for Fun

DuckDB is a lightweight SQL database engine designed for analytics.

Think of it as SQLite for data analysis.

It is useful because it can query files directly, including:

CSV
Parquet
JSON
Arrow

If MariaDB Server is not running, there will be a conflicting lock otherwise. You can use duckdb’s Python module with your duckdb.db inside MariaDB’s datadir.

Be careful, you could destroy everything!

$ python3 - <<'PY'
import duckdb

con = duckdb.connect("duckdb.db")
print(con.sql("SHOW SCHEMAS"))
PY
┌───────────────┬─────────────┬─────────┐
│ database_name │ schema_name │ current │
│    varchar    │   varchar   │ boolean │
├───────────────┼─────────────┼─────────┤
│ duckdb        │ airport     │ false   │
│ duckdb        │ ducktest    │ false   │
│ duckdb        │ main        │ true    │
└───────────────┴─────────────┴─────────┘

You can also install DuckDB locally, and you will get the CLI client:

$ curl https://install.duckdb.org | sh
*** DuckDB Linux/MacOS installation script, version 1.5.3 ***


         .;odxdl,            
       .xXXXXXXXXKc          
       0XXXXXXXXXXXd  cooo:  
      ,XXXXXXXXXXXXK  OXXXXd 
       0XXXXXXXXXXXo  cooo:  
       .xXXXXXXXXKc          
         .;odxdl,  


#################################### 100.0%

Successfully installed DuckDB 1.5.3 to /home/fred/.duckdb/cli/1.5.3/duckdb
Updated symlink from /home/fred/.duckdb/cli/latest/duckdb to
                     /home/fred/.duckdb/cli/1.5.3/duckdb

Hint: Append the following line to your shell profile:
export PATH="/home/fred/.duckdb/cli/latest":$PATH
Also created a symlink from /home/fred/.local/bin/duckdb 
                         to /home/fred/.duckdb/cli/latest/duckdb

To launch DuckDB 1.5.3 now, type
/home/fred/.duckdb/cli/latest/duckdb

This allows you to connect to your database:

$ duckdb duckdb.db 
DuckDB v1.5.3 (Variegata)
Enter ".help" for usage hints.
duckdb D .databases
┌─────────────────────────────────────────────────────────────────────────────┐
│                                  databases                                  │
│                                                                             │
│ duckdb /run/media/fred/data/dbdeployer/sandboxes/msb_11_4_13/data/duckdb.db │
└─────────────────────────────────────────────────────────────────────────────┘

Can we do more?

But we can go even further: let’s imagine we want to use Arrow files instead of Parquet. This is also possible.

You will need to install the plugin, load it, and then use it. If you have the same version of DuckDB on your system as the one in MariaDB, you can manage the plugin directly from the duckdb client CLI.

The latest version is 1.5.3, and the one built with MariaDB is 1.5.2. This means the easiest way to do it is to use the super function run_in_duckdb():

mariadb > SELECT run_in_duckdb("INSTALL nanoarrow FROM community");
*************************** 1. row ***************************
run_in_duckdb("INSTALL nanoarrow FROM community") 
Success	
BOOLEAN	
[ Rows: 0]

1 row in set (0.881 sec)

mariadb > SELECT run_in_duckdb("LOAD nanoarrow");
*************************** 1. row ***************************
run_in_duckdb("LOAD nanoarrow")
Success	
BOOLEAN	
[ Rows: 0]

1 row in set (0.034 sec)

mariadb > SELECT run_in_duckdb(" COPY airport.booking_duck TO 'booking_backup.arrow' (FORMAT arrow) ")\G
*************************** 1. row ***************************
run_in_duckdb(" COPY airport.booking_duck TO 'booking_backup.arrow' (FORMAT arrow) "): Count	
BIGINT	
[ Rows: 1]
54304619

1 row in set (2.185 sec)

We have the file exported now:

ls -lh booking_backup.arrow 
-rw-rw---- 1 fred fred 1.8G Jun 12 13:25 booking_backup.arrow

And we can verify it’s correct:

 $ duckdb 
DuckDB v1.5.3 (Variegata)
Enter ".help" for usage hints.
memory D LOAD nanoarrow;
memory D SELECT * FROM read_arrow('booking_backup.arrow') LIMIT 10;
┌────────────┬───────────┬─────────┬──────────────┬───────────────┐
│ booking_id │ flight_id │  seat   │ passenger_id │     price     │
│   int32    │   int32   │ varchar │    int32     │ decimal(10,2) │
├────────────┼───────────┼─────────┼──────────────┼───────────────┤
│          1 │      3863 │ NULL    │         2947 │        110.10 │
│          2 │      3863 │ 25F     │        20814 │        173.10 │
│          3 │      3863 │ 25E     │        33340 │        323.15 │
│          4 │      3863 │ 24D     │        21401 │        445.85 │
│          5 │      3863 │ 24G     │        24581 │        198.32 │
│          6 │      3863 │ 24F     │        16902 │        275.88 │
│          7 │      3863 │ 23E     │        31818 │        170.03 │
│          8 │      3863 │ 22F     │         1878 │        221.48 │
│          9 │      3863 │ 22C     │        16535 │        264.82 │
│         10 │      3863 │ 21A     │        32301 │        131.94 │
└────────────┴───────────┴─────────┴──────────────┴───────────────┘
  10 rows                                               5 columns

Conclusion

My first impression is that I really like it!

MariaDB already has multiple storage engines, and this one brings something different: lightweight analytical processing inside the same server with a very popular analytical engine.

The DuckDB storage engine for MariaDB is indeed a very interesting addition to the MariaDB ecosystem.

It won’t replace InnoDB, and it may not replace ColumnStore either.

But it could become a very nice solution for local analytics, HTAP experiments, reporting tables, and data lake interactions.

The most interesting hack for me is the ability to use run_in_duckdb() to access DuckDB features that are not yet available through the MariaDB SQL parser.

And yes, this includes Parquet.

So now we can have a sea lion that quacks… and exports Parquet too.

Enjoy MariaDB and DuckDB!

Cookie	Duration	Description
cookielawinfo-checbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

MariaDB + DuckDB: A New Playground for Analytics – A First Look at the New Storage Engine