Local Testing of Distributed Systems

The Problem: Building a data system that runs across multiple machines (laptop, NAS, cloud) creates a testing nightmare:

Integration tests: Require real databases, filesystems, network connections
Test fixtures: Need to set up multiple devices or mock their state
Debugging: Stack traces span multiple services, hard to isolate failures
CI/CD: Tests take hours, flake randomly, depend on external services
Cost: Cloud infrastructure for testing gets expensive
Isolation: One slow test slows down the entire suite

The Solution: Test distributed logic locally using in-memory databases, mock adapters, and dependency injection. Run thousands of tests in milliseconds, no external services needed.

This article explains how to write tests that verify complex pipelines (enumerate → reconcile → classify) in pure memory, shows patterns from Replicator, and provides copy-paste ready examples.

Open Table of Contents

The Integration Testing Problem
Local Testing Strategy
In-Memory Databases
Mock Filesystems (memfs)
- Using memfs Library
- Manual Mock Filesystem
Mock Adapters
- Repository Adapter (Database Queries)
- AI Provider Adapter
Full Pipeline Testing
- Test: Enumerate → Reconcile → Classify
Error Injection Testing
Parametric Testing
- Test Multiple Scenarios
Performance Testing
- Measure In-Memory Pipeline Speed
Real Examples from Replicator
- Test: Enumerate with mtime-skip
- Test: Reconcile (Deduplication)
Key Takeaways
Benefits Summary
References

The Integration Testing Problem

Imagine testing file deduplication across multiple volumes:

// Traditional integration test (DON'T do this)
describe('File Deduplication', () => {
  it('finds duplicates across volumes', async () => {
    // Step 1: Create real directories
    const vol1 = '/mnt/vol1/test-12345';
    const vol2 = '/mnt/vol2/test-12345';
    await createDirectory(vol1);
    await createDirectory(vol2);

    // Step 2: Write test files (slow!)
    const file1 = `${vol1}/data.bin`;
    const file2 = `${vol2}/data.bin`;
    await fs.promises.writeFile(file1, Buffer.alloc(1024 * 1024)); // 1MB
    await fs.promises.writeFile(file2, Buffer.alloc(1024 * 1024)); // Same content

    // Step 3: Open real databases (slow!)
    const db = new Database(':memory:'); // Even in-memory is slower than direct objects
    await runMigrations(db);

    // Step 4: Enumerate both volumes (slow!)
    await runEnumerate(db, vol1);
    await runEnumerate(db, vol2);

    // Step 5: Hash files (slow!)
    await runHash(db, vol1);
    await runHash(db, vol2);

    // Step 6: Reconcile (finally!)
    const dupes = await runReconcile(db);

    // Step 7: Assert (slow!)
    expect(dupes).toHaveLength(1);
    expect(dupes[0].copies).toHaveLength(2);

    // Step 8: Cleanup (slow!)
    await fs.promises.rm(vol1, { recursive: true });
    await fs.promises.rm(vol2, { recursive: true });
    db.close();
  });
});

// Total time: 10+ seconds per test
// CI suite: 1000 tests × 10s = 3 hours
// Flakiness: Network timeouts, disk full, permission errors, etc.

Problems:

Slow: Reading/writing gigabytes from disk takes seconds per test
Flaky: Network, permissions, disk space, concurrent access issues
Hard to debug: Failures happen in subprocess/daemon, stack traces are unclear
Requires infrastructure: Need multiple volumes, cloud services, etc.
Expensive: Cloud resources for testing add up
Sequential: Can’t parallelize easily (disk contention)

Local Testing Strategy

The solution: Test distributed logic in process using in-memory implementations:

┌─────────────────────────────────────────────────────────────┐
│ TEST: findDuplicates()                                      │
├─────────────────────────────────────────────────────────────┤
│ 1. Mock Filesystem                                          │
│    ├─ /vol1/data.bin (content "abc123")                    │
│    └─ /vol2/data.bin (content "abc123")                    │
│                                                             │
│ 2. In-Memory Database                                       │
│    ├─ files table (empty, ready to populate)              │
│    └─ schema loaded (milliseconds)                         │
│                                                             │
│ 3. Mock Adapters                                            │
│    ├─ FileSystem: mock returns instant results            │
│    ├─ Database: in-memory SQLite                          │
│    └─ Network: mocked responses                           │
│                                                             │
│ 4. Run Pipeline (in same process!)                         │
│    ├─ enumerate(dep: mockFs, mockDb)                      │
│    ├─ reconcile(dep: mockDb)                              │
│    └─ classify(dep: mockAI)                               │
│                                                             │
│ 5. Assert (all data in memory, instant)                    │
│    └─ expect(dupes[0].copies).toHaveLength(2)             │
└─────────────────────────────────────────────────────────────┘

Total time: 10-100 milliseconds (no I/O)
Parallelizable: Run 1000s tests in parallel
Flaky: Zero flakiness (no external services)
Debug: Full stack traces, normal breakpoints

In-Memory Databases

SQLite In-Memory

SQLite supports :memory: databases that exist only in RAM:

import Database from 'better-sqlite3';

// Create in-memory database
const db = new Database(':memory:');

// Load schema
db.exec(`
  CREATE TABLE files (
    id INTEGER PRIMARY KEY,
    path TEXT NOT NULL,
    size INTEGER NOT NULL,
    content_hash TEXT
  );

  CREATE TABLE directories (
    id INTEGER PRIMARY KEY,
    path TEXT NOT NULL,
    parent_id INTEGER
  );
`);

// Use it like normal
db.prepare('INSERT INTO files VALUES (NULL, ?, ?, NULL)')
  .run('/test.txt', 1024);

const result = db.prepare('SELECT * FROM files').get();
console.log(result); // { id: 1, path: '/test.txt', size: 1024, content_hash: null }

db.close();

Fixture Builders

Create helpers to populate test databases:

/**
 * Build a test database with files and directories.
 */
function createTestDatabase(fixture: {
  files: Array<{ path: string; size: number; hash?: string }>;
  directories: Array<{ path: string }>;
}) {
  const db = new Database(':memory:');

  // Schema
  db.exec(`
    CREATE TABLE files (id INTEGER PRIMARY KEY, path TEXT, size INTEGER, hash TEXT);
    CREATE TABLE directories (id INTEGER PRIMARY KEY, path TEXT);
  `);

  // Populate files
  const insertFile = db.prepare('INSERT INTO files VALUES (NULL, ?, ?, ?)');
  for (const file of fixture.files) {
    insertFile.run(file.path, file.size, file.hash || null);
  }

  // Populate directories
  const insertDir = db.prepare('INSERT INTO directories VALUES (NULL, ?)');
  for (const dir of fixture.directories) {
    insertDir.run(dir.path);
  }

  return db;
}

// Usage
const db = createTestDatabase({
  files: [
    { path: '/data/file1.txt', size: 1024, hash: 'abc123' },
    { path: '/data/file2.txt', size: 1024, hash: 'abc123' }, // duplicate
  ],
  directories: [{ path: '/data' }],
});

const dupes = db.prepare(`
  SELECT hash, COUNT(*) as count FROM files GROUP BY hash HAVING count > 1
`).all();

expect(dupes).toHaveLength(1);
expect(dupes[0].count).toBe(2);

db.close();

Snapshots (Assert on Database State)

/**
 * Extract database state as JSON for snapshot testing.
 */
function databaseSnapshot(db: Database) {
  const files = db.prepare('SELECT * FROM files ORDER BY id').all();
  const directories = db.prepare('SELECT * FROM directories ORDER BY id').all();
  const duplicates = db.prepare(`
    SELECT hash, COUNT(*) as count FROM files GROUP BY hash HAVING count > 1
  `).all();

  return { files, directories, duplicates };
}

it('consolidates duplicates', () => {
  const db = createTestDatabase({
    files: [
      { path: '/a/file.txt', size: 100, hash: 'same' },
      { path: '/b/file.txt', size: 100, hash: 'same' },
    ],
    directories: [{ path: '/a' }, { path: '/b' }],
  });

  runConsolidate(db); // Modifies database

  expect(databaseSnapshot(db)).toMatchInlineSnapshot(`
    {
      "duplicates": [],  // All duplicates consolidated
      "files": [
        {
          "id": 1,
          "path": "/a/file.txt",
          "size": 100,
          "hash": "same"
        },
        {
          "id": 2,
          "path": "/b/file.txt",
          "size": 100,
          "hash": "same",
          "isHardlink": true  // Now a hardlink to /a/file.txt
        }
      ],
      "directories": [
        { "id": 1, "path": "/a" },
        { "id": 2, "path": "/b" }
      ]
    }
  `);

  db.close();
});

Mock Filesystems (memfs)

Using memfs Library

import { vol } from 'memfs';

// Create an in-memory filesystem
vol.fromJSON({
  '/data': null, // Directory
  '/data/file1.txt': 'content1',
  '/data/file2.txt': 'content2',
  '/data/subdir': null,
  '/data/subdir/file3.txt': 'content3',
});

// Use it with fs
const fs = require('fs'); // Will be mocked

// Works like real filesystem, but all in memory
const content = fs.readFileSync('/data/file1.txt', 'utf-8');
expect(content).toBe('content1');

const files = fs.readdirSync('/data');
expect(files).toEqual(['file1.txt', 'file2.txt', 'subdir']);

// Cleanup
vol.reset();

Manual Mock Filesystem

If you’re using a custom filesystem interface, create a mock:

/**
 * In-memory mock filesystem for testing.
 */
export function createMockFileSystem(
  structure: Record<string, string | null>
) {
  return {
    async stat(path: string) {
      if (structure[path] === null) {
        // Directory
        return {
          isDirectory: true,
          isFile: false,
          size: 4096,
          mtimeMs: Date.now(),
          ino: Math.random() * 1000000,
        };
      } else if (typeof structure[path] === 'string') {
        // File
        const content = structure[path] as string;
        return {
          isDirectory: false,
          isFile: true,
          size: Buffer.byteLength(content),
          mtimeMs: Date.now(),
          ino: Math.random() * 1000000,
        };
      }
      return null; // Not found
    },

    async listDir(path: string) {
      const entries: string[] = [];
      const prefix = path.endsWith('/') ? path : path + '/';

      for (const filepath of Object.keys(structure)) {
        if (filepath.startsWith(prefix) && filepath !== path) {
          const relative = filepath.slice(prefix.length);
          // Only direct children (not nested)
          if (!relative.includes('/')) {
            entries.push(relative);
          }
        }
      }

      return entries.map((name) => ({
        name,
        isDirectory: structure[`${prefix}${name}`] === null,
        isFile: typeof structure[`${prefix}${name}`] === 'string',
      }));
    },

    async readFile(path: string) {
      const content = structure[path];
      if (typeof content === 'string') {
        return Buffer.from(content);
      }
      return null;
    },
  };
}

// Usage
const mockFs = createMockFileSystem({
  '/data': null,
  '/data/file1.txt': 'Hello World',
  '/data/file2.txt': 'Hello World', // Duplicate
});

const stat = await mockFs.stat('/data/file1.txt');
expect(stat.size).toBe(11); // "Hello World".length

Mock Adapters

Repository Adapter (Database Queries)

/**
 * Mock repository that returns test data instead of querying database.
 */
export function createMockRepository(data: {
  scanStatuses: Record<string, { id: number; mtime_ms: number }>;
  subdirs: Record<string, string[]>;
  dirPatterns: Record<string, string>; // name → label
}) {
  return {
    async getDirScanStatus(path: string) {
      return data.scanStatuses[path] || null;
    },

    async getKnownSubdirs(path: string) {
      return data.subdirs[path] || [];
    },

    matchDirPattern(name: string) {
      for (const [pattern, label] of Object.entries(data.dirPatterns)) {
        if (name.includes(pattern)) {
          return { label };
        }
      }
      return null;
    },

    async findDuplicateHashes(hashes: Map<string, string[]>) {
      // Return the input (mock just echoes back)
      return hashes;
    },
  };
}

// Usage
const mockRepo = createMockRepository({
  scanStatuses: {
    '/data': { id: 1, mtime_ms: 1000 },
  },
  subdirs: {
    '/data': ['/data/subdir1', '/data/subdir2'],
  },
  dirPatterns: {
    'backup': 'Backup',
    'cache': 'Cache',
  },
});

const status = await mockRepo.getDirScanStatus('/data');
expect(status.id).toBe(1);

AI Provider Adapter

/**
 * Mock AI provider for testing classification.
 */
export function createMockAIProvider(
  classifications: Record<string, string>
) {
  return {
    async classify(files: string[]): Promise<Record<string, string>> {
      const result: Record<string, string> = {};
      for (const file of files) {
        // Return predetermined classifications
        result[file] = classifications[file] || 'unknown';
      }
      return result;
    },

    async inferConfidence(category: string): Promise<number> {
      // Always high confidence in tests
      return 0.95;
    },
  };
}

// Usage
const mockAI = createMockAIProvider({
  '/data/photo.jpg': 'image',
  '/data/video.mp4': 'video',
  '/data/document.pdf': 'document',
});

const classifications = await mockAI.classify([
  '/data/photo.jpg',
  '/data/video.mp4',
]);
expect(classifications['/data/photo.jpg']).toBe('image');

Full Pipeline Testing

Test: Enumerate → Reconcile → Classify

it('processes files end-to-end without I/O', async () => {
  // ---- SETUP ----

  // 1. Mock filesystem (in memory)
  const mockFs = createMockFileSystem({
    '/data': null,
    '/data/photo1.jpg': 'JPEG_DATA_1',
    '/data/photo2.jpg': 'JPEG_DATA_1', // Same content = duplicate
    '/data/video.mp4': 'VIDEO_DATA',
    '/data/archive': null,
    '/data/archive/backup.zip': 'ZIP_DATA',
  });

  // 2. In-memory database
  const db = new Database(':memory:');
  db.exec(`
    CREATE TABLE files (
      id INTEGER PRIMARY KEY,
      path TEXT NOT NULL,
      size INTEGER,
      hash TEXT
    );
    CREATE TABLE classifications (
      id INTEGER PRIMARY KEY,
      file_id INTEGER,
      category TEXT
    );
  `);

  // 3. Mock adapters
  const mockRepo = createMockRepository({
    scanStatuses: {},
    subdirs: {
      '/data': ['/data/archive'],
    },
    dirPatterns: {
      'archive': 'Archive',
    },
  });

  const mockAI = createMockAIProvider({
    '/data/photo1.jpg': 'image',
    '/data/photo2.jpg': 'image',
    '/data/video.mp4': 'video',
    '/data/archive/backup.zip': 'archive',
  });

  // ---- RUN PIPELINE ----

  // Step 1: Enumerate (populate files table)
  const enumDeps = {
    stat: mockFs.stat.bind(mockFs),
    listDir: mockFs.listDir.bind(mockFs),
    getDirScanStatus: mockRepo.getDirScanStatus.bind(mockRepo),
    getKnownSubdirs: mockRepo.getKnownSubdirs.bind(mockRepo),
  };

  const files: Array<{ path: string; size: number }> = [];
  for await (const event of enumerateTree('/data', 1, enumDeps, () => {})) {
    if (event.type === 'file') {
      files.push({ path: event.path, size: event.size });
    }
  }

  // Insert into database
  const insertFile = db.prepare('INSERT INTO files VALUES (NULL, ?, ?, NULL)');
  for (const file of files) {
    insertFile.run(file.path, file.size);
  }

  // Step 2: Hash files (mock hash function)
  const mockHash = (content: string) => require('crypto')
    .createHash('sha256')
    .update(content)
    .digest('hex');

  const photoHash = mockHash('JPEG_DATA_1');
  db.prepare('UPDATE files SET hash = ? WHERE path IN (?, ?)')
    .run(photoHash, '/data/photo1.jpg', '/data/photo2.jpg');

  const videoHash = mockHash('VIDEO_DATA');
  db.prepare('UPDATE files SET hash = ? WHERE path = ?')
    .run(videoHash, '/data/video.mp4');

  const archiveHash = mockHash('ZIP_DATA');
  db.prepare('UPDATE files SET hash = ? WHERE path = ?')
    .run(archiveHash, '/data/archive/backup.zip');

  // Step 3: Find duplicates
  const dupes = db.prepare(`
    SELECT hash, COUNT(*) as count FROM files GROUP BY hash HAVING count > 1
  `).all();

  // Step 4: Classify
  const fileRows = db.prepare('SELECT * FROM files').all();
  for (const fileRow of fileRows) {
    const category = await mockAI.classify([fileRow.path]);
    db.prepare('INSERT INTO classifications VALUES (NULL, ?, ?)')
      .run(fileRow.id, category[fileRow.path]);
  }

  // ---- ASSERTIONS ----

  // Found 4 files total
  expect(files).toHaveLength(4);

  // Found 1 duplicate hash (photo1 and photo2)
  expect(dupes).toHaveLength(1);
  expect(dupes[0].count).toBe(2);

  // Classified correctly
  const classified = db.prepare(`
    SELECT f.path, c.category FROM files f
    JOIN classifications c ON f.id = c.file_id
    ORDER BY f.path
  `).all();

  expect(classified).toEqual([
    { path: '/data/archive/backup.zip', category: 'archive' },
    { path: '/data/photo1.jpg', category: 'image' },
    { path: '/data/photo2.jpg', category: 'image' },
    { path: '/data/video.mp4', category: 'video' },
  ]);

  db.close();
});

Speed: ~50ms (no I/O, all in memory) Isolation: Runs in parallel with other tests Debuggability: Normal breakpoints, full stack traces

Error Injection Testing

Test: Handle Missing Files

it('handles missing files gracefully', async () => {
  const mockFs = createMockFileSystem({
    '/data': null,
    '/data/exists.txt': 'content',
    // /data/missing.txt does NOT exist
  });

  const enumDeps = {
    stat: mockFs.stat.bind(mockFs),
    listDir: mockFs.listDir.bind(mockFs),
    getDirScanStatus: async () => null,
    getKnownSubdirs: async () => [],
  };

  const errors: any[] = [];
  const emitter = (event: any) => {
    if (event.type === 'error') {
      errors.push(event);
    }
  };

  for await (const _ of enumerateTree('/data', 1, enumDeps, emitter)) {
    // Process events
  }

  // Should have recorded the missing file as an error
  expect(errors).toContainEqual(
    expect.objectContaining({
      type: 'error',
      path: '/data/missing.txt',
    })
  );
});

Test: Database Lock Handling

it('handles database conflicts', async () => {
  const db = new Database(':memory:');
  db.exec('CREATE TABLE files (id INTEGER PRIMARY KEY, path TEXT)');

  // Mock repository that simulates lock conflicts
  let callCount = 0;
  const mockRepo = {
    async getDirScanStatus(path: string) {
      callCount++;
      if (callCount === 1) {
        throw new Error('database is locked');
      }
      return null;
    },

    async getKnownSubdirs() {
      return [];
    },
  };

  // With retry logic, should succeed
  let result = null;
  try {
    result = await mockRepo.getDirScanStatus('/data');
  } catch (err) {
    expect(err.message).toBe('database is locked');
  }

  // Second call should succeed (lock released)
  result = await mockRepo.getDirScanStatus('/data');
  expect(result).toBe(null);
});

Test: Timeout Handling

it('cancels long operations', async () => {
  let startTime: number;
  let endTime: number;

  async function* slowEnumerate() {
    startTime = Date.now();
    for (let i = 0; i < 1000; i++) {
      await new Promise(r => setTimeout(r, 10)); // Simulate slow I/O
      yield { type: 'progress', count: i };
    }
    endTime = Date.now();
  }

  const controller = new AbortController();
  setTimeout(() => controller.abort(), 100); // Cancel after 100ms

  let count = 0;
  try {
    for await (const event of slowEnumerate()) {
      count++;
      if (controller.signal.aborted) break;
    }
  } catch (err) {
    // Handle cancellation
  }

  endTime = Date.now();
  const elapsed = endTime - startTime;

  // Should have completed in ~100ms, not 10 seconds
  expect(elapsed).toBeLessThan(150);
  expect(count).toBeLessThan(100); // Processed some items but not all
});

Parametric Testing

Test Multiple Scenarios

const testCases = [
  {
    name: 'single file',
    files: [{ path: '/data/a.txt', size: 100 }],
    expectedDuplicates: 0,
  },
  {
    name: 'two identical files',
    files: [
      { path: '/data/a.txt', size: 100, hash: 'same' },
      { path: '/data/b.txt', size: 100, hash: 'same' },
    ],
    expectedDuplicates: 1,
  },
  {
    name: 'three files, two duplicates',
    files: [
      { path: '/data/a.txt', size: 100, hash: 'hash1' },
      { path: '/data/b.txt', size: 100, hash: 'hash1' },
      { path: '/data/c.txt', size: 200, hash: 'hash2' },
    ],
    expectedDuplicates: 1,
  },
  {
    name: 'three groups of duplicates',
    files: [
      { path: '/data/a.txt', size: 100, hash: 'hash1' },
      { path: '/data/b.txt', size: 100, hash: 'hash1' },
      { path: '/data/c.txt', size: 200, hash: 'hash2' },
      { path: '/data/d.txt', size: 200, hash: 'hash2' },
      { path: '/data/e.txt', size: 300, hash: 'hash3' },
      { path: '/data/f.txt', size: 300, hash: 'hash3' },
    ],
    expectedDuplicates: 3,
  },
];

describe.each(testCases)('$name', ({ files, expectedDuplicates }) => {
  it('detects correct duplicate groups', () => {
    const db = createTestDatabase({ files, directories: [] });

    const dupes = db.prepare(`
      SELECT hash, COUNT(*) as count FROM files GROUP BY hash HAVING count > 1
    `).all();

    expect(dupes).toHaveLength(expectedDuplicates);

    db.close();
  });
});

Performance Testing

Measure In-Memory Pipeline Speed

it('processes 100k files in <1 second', () => {
  // Create large in-memory filesystem
  const structure: Record<string, string | null> = {
    '/data': null,
  };

  for (let i = 0; i < 100_000; i++) {
    structure[`/data/file${i}.txt`] = `Content ${i}`;
  }

  const mockFs = createMockFileSystem(structure);
  const db = new Database(':memory:');

  // Time the enumeration
  const start = performance.now();

  let count = 0;
  for (const path of Object.keys(structure)) {
    if (path !== '/data') count++;
  }

  const elapsed = performance.now() - start;

  // In-memory enumeration should be very fast
  expect(elapsed).toBeLessThan(1000); // Less than 1 second
  expect(count).toBe(100_000);
});

Real Examples from Replicator

Test: Enumerate with mtime-skip

// From src/__tests__/enumerate.test.ts
it('skips unchanged directories', async () => {
  const mockFs = createMockFileSystem({
    '/data': null,
    '/data/file1.txt': 'content',
    '/data/subdir': null,
    '/data/subdir/file2.txt': 'content',
  });

  const mockRepo = createMockRepository({
    scanStatuses: {
      '/data/subdir': {
        id: 123,
        mtime_ms: Date.now(),
        scan_status: 'enumerated',
      },
    },
    subdirs: {
      '/data/subdir': [],
    },
  });

  const enumDeps: EnumerationDeps = {
    stat: mockFs.stat.bind(mockFs),
    listDir: mockFs.listDir.bind(mockFs),
    getDirScanStatus: mockRepo.getDirScanStatus.bind(mockRepo),
    getKnownSubdirs: mockRepo.getKnownSubdirs.bind(mockRepo),
  };

  let dirsVisited = 0;
  let dirsSkipped = 0;

  const emitter = (event: any) => {
    if (event.type === 'dir') dirsVisited++;
  };

  for await (const _ of enumerateTree('/data', 1, enumDeps, emitter)) {
    // Process events
  }

  // /data/subdir should be skipped (mtime unchanged)
  expect(dirsSkipped).toBeGreaterThan(0);
});

Test: Reconcile (Deduplication)

// From src/__tests__/reconcile.test.ts
it('generates consolidation plans', async () => {
  const db = new Database(':memory:');
  db.exec(`
    CREATE TABLE physical_files (
      id INTEGER PRIMARY KEY,
      content_hash TEXT,
      size INTEGER
    );
    CREATE TABLE file_paths (
      id INTEGER PRIMARY KEY,
      path TEXT,
      phys_file_id INTEGER
    );
  `);

  // Insert test data: 2 files with same hash
  db.prepare('INSERT INTO physical_files VALUES (NULL, ?, ?)')
    .run('same_hash', 1000);

  db.prepare('INSERT INTO file_paths VALUES (NULL, ?, ?)')
    .run('/vol1/file.txt', 1);

  db.prepare('INSERT INTO physical_files VALUES (NULL, ?, ?)')
    .run('same_hash', 1000);

  db.prepare('INSERT INTO file_paths VALUES (NULL, ?, ?)')
    .run('/vol2/file.txt', 2);

  // Run reconciliation
  const mockDeps = {
    async findDuplicateHashes() {
      const rows = db.prepare(`
        SELECT content_hash, GROUP_CONCAT(fp.path) as paths
        FROM physical_files pf
        JOIN file_paths fp ON pf.id = fp.phys_file_id
        GROUP BY content_hash HAVING COUNT(*) > 1
      `).all();

      const result = new Map();
      for (const row of rows) {
        result.set(row.content_hash, row.paths.split(','));
      }
      return result;
    },

    selectSourceOfTruth(copies) {
      return copies[0]; // Use first as source
    },
  };

  const reconciler = createLocalReconciler(db);
  const plans = [];

  for await (const plan of reconciler.reconcile([], mockDeps)) {
    plans.push(plan);
  }

  // Should generate one consolidation plan (for the duplicate files)
  expect(plans).toHaveLength(1);
  expect(plans[0].copies).toHaveLength(2);

  db.close();
});

Key Takeaways

Use in-memory SQLite: :memory: databases are fast and realistic
Create mock adapters: Filesystem, database, network—all become functions
Test full pipelines: Enumerate → reconcile → classify in milliseconds
Use parametric tests: Multiple scenarios, same test code
Inject errors: Test error handling without flaky infrastructure
Measure performance: Know your baseline speed without I/O overhead
Snapshot database state: Assert on derived results, not just return values

Benefits Summary

Aspect	Integration Tests	Local Tests
Speed	10+ seconds	10-100ms
Parallelization	Limited (disk contention)	Unlimited
Flakiness	High (network, permissions)	Zero
Debug	Stack traces span services	Normal breakpoints
Infrastructure	Real databases, filesystems	In-memory objects
Cost	Expensive (cloud resources)	Zero (RAM only)
Isolation	Tests interfere	Complete isolation
Coverage	100s of tests/hour	1000s of tests/second

References

better-sqlite3: https://github.com/WiseLibs/better-sqlite3
memfs: https://github.com/streamich/memfs
Vitest: https://vitest.dev/ (fast test runner)
Replicator test suite: src/__tests__/ and *.test.ts files
In-Memory Databases: https://en.wikipedia.org/wiki/In-memory_database
Test Doubles: “Test Driven Development: By Example” - Kent Beck