Summary
A vec0 virtual table created and populated on macOS (Darwin arm64) corrupts on the first write attempt from a Linux (x86_64) host after the underlying SQLite file is byte-for-byte rsync'd. Read-only queries against the rsync'd DB work fine. Both hosts run sqlite-vec 0.1.9.
Environment
|
macOS (writer) |
Linux (reader/writer) |
| OS |
macOS 15.x (Darwin arm64) |
Debian (x86_64) |
| Python |
3.12 |
3.12 |
sqlite-vec (pip) |
0.1.9 |
0.1.9 |
| SQLite |
3.49.x |
3.40.x |
Reproduction
- On macOS: create a DB with a
vec0 virtual table and populate ~50k rows.
conn = sqlite3.connect("rss.db")
conn.enable_load_extension(True); sqlite_vec.load(conn)
conn.execute("CREATE VIRTUAL TABLE vec_articles USING vec0(article_id INTEGER PRIMARY KEY, embedding float[768])")
# ... insert 50,593 rows
- Rsync the file to Linux:
rsync -avz rss.db host:rss.db. Verify sha256sum matches and sqlite3 rss.db "PRAGMA integrity_check" returns ok on the Linux side.
- On Linux, perform any write that touches
vec_articles via the extension — even one that should be a no-op when there's nothing new to insert. Example via a higher-level wrapper:
conn.enable_load_extension(True); sqlite_vec.load(conn)
# any CREATE TABLE IF NOT EXISTS / INSERT path that re-creates or touches vec_articles
- Result: DB file shrinks (~620 MB → ~470 MB), subsequent
PRAGMA integrity_check reports database disk image is malformed. The article rows and vec_articles_rowids shadow table both lose data.
Expected
The DB should remain valid across platforms when bytes are unchanged. Either:
- vec0 should detect platform-incompatible shadow table state and refuse to write (loud failure), or
- vec0 shadow tables should be platform-neutral so cross-host transfer + write works.
Workaround
Open the DB read-only on the non-creator host:
conn = sqlite3.connect("file:rss.db?mode=ro", uri=True)
conn.execute("PRAGMA query_only = 1")
# load sqlite-vec, run SELECTs only — works correctly
Hybrid FTS5 + vec0 SELECT queries return correct ranked results. Only writes corrupt.
Why this matters
Common deployment pattern: do expensive embedding on a beefy laptop, rsync the DB to a small server for read-only querying. Today this silently corrupts on the server's first write, even if that write would have been a logical no-op.
Happy to provide a minimized self-contained repro script if helpful.
Summary
A
vec0virtual table created and populated on macOS (Darwin arm64) corrupts on the first write attempt from a Linux (x86_64) host after the underlying SQLite file is byte-for-byte rsync'd. Read-only queries against the rsync'd DB work fine. Both hosts runsqlite-vec0.1.9.Environment
sqlite-vec(pip)Reproduction
vec0virtual table and populate ~50k rows.rsync -avz rss.db host:rss.db. Verifysha256summatches andsqlite3 rss.db "PRAGMA integrity_check"returnsokon the Linux side.vec_articlesvia the extension — even one that should be a no-op when there's nothing new to insert. Example via a higher-level wrapper:PRAGMA integrity_checkreportsdatabase disk image is malformed. The article rows and vec_articles_rowids shadow table both lose data.Expected
The DB should remain valid across platforms when bytes are unchanged. Either:
Workaround
Open the DB read-only on the non-creator host:
Hybrid FTS5 + vec0 SELECT queries return correct ranked results. Only writes corrupt.
Why this matters
Common deployment pattern: do expensive embedding on a beefy laptop, rsync the DB to a small server for read-only querying. Today this silently corrupts on the server's first write, even if that write would have been a logical no-op.
Happy to provide a minimized self-contained repro script if helpful.