Cloud backup has a physics problem that no startup can solve. At any realistic residential upload speed, terabytes of data cannot cross the internet in a human time frame. The correct response is not a faster cloud — it is an architecture that treats the physical drive as a first-class transport layer.
The Numbers Nobody Talks About
Before the product argument, the math.
| Data | 20 Mbps upload (real) | 100 Mbps upload (advertised) | 1 Gbps fiber |
|---|---|---|---|
| 100 GB | 11 hours | 2.2 hours | 13 min |
| 1 TB | 4.6 days | 22 hours | 2.2 hours |
| 10 TB | 46 days | 9 days | 22 hours |
| 50 TB | 230 days | 46 days | 4.6 days |
| 500 TB | never | 462 days | 46 days |
The 20 Mbps column is not pessimistic. It is the median. The FCC’s Measuring Broadband America program consistently finds that real-world residential upload speeds sit at 20–30% of the advertised rate during peak hours. A household sold “100 Mbps” internet will push 15–25 Mbps to a cloud endpoint over a sustained multi-hour backup window. Contention, packet loss, throttling, and provider-side rate shaping all erode the theoretical number.
The 1 Gbps column is the aspirational case — symmetrical gigabit fiber, available to roughly 30% of US households as of 2024, typically configured with asymmetric upload speeds that cap at 500 Mbps or less. Even at true gigabit upload, 50 TB is a 4.6-day job.
This is the bandwidth wall. It is not a UX problem. It is not a pricing problem. It is thermodynamics.
Why This Is Counterintuitive
Cloud storage products have been so successful for small data that users extend the mental model upward, where it breaks.
Dropbox works. Google Drive works. iCloud photo sync works. These products handle documents and photos in the low-gigabyte range, where a 2-hour upload is a one-time cost and the ongoing sync is a trickle of new files. The user experience at 50 GB is excellent, and it extrapolates badly.
Backblaze Personal Backup — the most honest of the major cloud backup products — acknowledges on its help pages that “initial backup can take weeks” for large data sets. The phrase “unlimited backup” is technically accurate: you can upload unlimited data, as long as you have unlimited time. Most users never finish. Backblaze’s own data shows the median backup set is around 700 GB. The product was designed around that user.
CrashPlan, Arq, iCloud, Google Photos, and Amazon Photos are all architecturally identical on this dimension: they assume that file content moves over the internet. They are right for the 700 GB user and wrong for anyone else.
The product implications:
- Initial backup for >1 TB users often never completes
- Restore at scale is the same problem in reverse, timed precisely when the user is most stressed
- Egress fees on B2, S3, and Glacier punish the restore path further
- A 10 TB restore at 20 Mbps upload takes 46 days — longer than most disaster recovery windows
None of these companies are doing something wrong. They built for the median. The median was the right call for 2010. Storage has grown faster than bandwidth since then.
Kryder’s Law vs. Nielsen’s Law
The root cause of the bandwidth wall is a gap between two exponential growth curves.
Kryder’s Law (storage density): Hard drive areal density doubles roughly every 18 months. In practice, the cost per gigabyte of spinning disk has fallen at approximately 30–40% per year for decades. A 4 TB external drive that cost $180 in 2015 costs $65 in 2025.
Nielsen’s Law (bandwidth): High-end residential bandwidth grows at approximately 50% per year — slower than Moore’s Law for compute (60%) and slower than Kryder’s Law for storage. Nielsen documented this in 1998 and updates confirm it has held with R² = 0.99 through 2023.
The divergence: storage capacity outpaces bandwidth roughly 2:1 over the long run. Every year, the average household has proportionally more data than its internet connection can move in the same time budget. The wall gets higher as storage gets cheaper.
This is not a trend that gigabit rollouts will fix. Even if every home had symmetric 1 Gbps fiber tomorrow — which would require approximately $250 billion in infrastructure investment — the 50 TB user is still looking at a 4.6-day initial upload. And in five years, that user will have 100 TB.
The correct architectural response is not to wait for faster internet. It is to stop treating the internet as the primary transport layer.
Sneakernet: Still Faster Than the Internet for Bulk Data
The term “sneakernet” — physically carrying data on a disk, as opposed to sending it over the network — dates to the 1970s. It was the subject of an early computing joke: “Never underestimate the bandwidth of a station wagon full of tapes hurtling down the highway.” The observation is attributed to various people over the years, most prominently Andrew Tanenbaum in his textbook Computer Networks, but the underlying math has been correct at every scale since the joke was first made.
A single 8 TB external hard drive, hand-carried across town, transfers 8 TB at whatever speed you can walk. For perspective:
| Transport | Effective throughput (8 TB over 30 min carry) | Latency |
|---|---|---|
| Walking with 8 TB drive | ~35 Gbps equivalent | 30 min |
| 100 Mbps residential upload | 0.1 Gbps | 18 hours |
| 1 Gbps fiber | 1 Gbps | 18 hours |
| FedEx overnight, 20x 8 TB drives (160 TB) | ~460 Gbps equivalent | 12–24 hours |
The FedEx bandwidth calculation is not a joke. It is how enterprise data centers moved data in the early cloud era and how some still do. AWS Snowball, Azure Data Box, and Seagate Lyve Mobile are all commercial productizations of this insight — physical appliances shipped to a data center because it is genuinely faster and cheaper than the network for initial seed loads above a few terabytes.
AWS Snowball Edge can hold up to 210 TB. A round-trip from your location to an AWS data center takes approximately 5–7 business days. For any dataset above ~20 TB, even in regions with strong fiber availability, Snowball is faster than uploading. AWS effectively discontinued new Snowball Edge customers in late 2025 as network speeds improved at the enterprise tier — but the threshold where physical transport beats the network has merely moved upward, not disappeared. For consumers, it has not moved at all.
Azure Data Box offers 120 TB and 525 TB capacity options with AES-256 encryption and NIST 800-88 erasure after ingestion. These products exist because the largest customers of cloud providers found the bandwidth wall before the rest of us did, and needed a different answer.
The Correct Architecture: Physical First
If cloud backup is not viable as a primary strategy for datasets above 1 TB, the architecture has to change.
The tiers, from fastest to recover to slowest:
Tier 1: Local NVMe (primary machine)
— The metadata index: file catalog, block manifest, drive registry
— Always-on, RAM-speed access to the map of everything
— No file content required; this is the navigation layer
Tier 2: Local drive copy (same building)
— A second drive in the same location: robocopy, rsync, or block-level copy
— Covers single-drive failure, the most common failure mode
— Recovery time: hours, not days
Tier 3: Offsite drive (different physical location)
— A drive at a relative's house, a safe deposit box, or on rotation
— Covers building-level events: fire, flood, theft
— Recovery time: hours to retrieve, hours to copy
Tier 4: Cloud metadata only
— The catalog: file names, hashes, drive assignments, priority scores
— Kilobytes per file, gigabytes for millions of files
— Cloudflare R2: ~$0.30/month for 20 GB of metadata
— Enables disaster recovery: new machine downloads catalog,
knows exactly which drive has what
Tier 5: Selective cloud content
— The top 1,000 photos, key documents, irreplaceable records
— Chosen by priority scoring, not random sampling
— 5–10 GB of content, cloud-viable on any connection
— True offsite copy for the files that matter most
The key inversion: file content does not go to the cloud. Metadata does. Identity goes to the cloud. The actual bytes live on drives you own, at locations you control.
Tier 5 is the insight that makes this viable without abandoning cloud entirely. A family has 50 TB of data. Not all 50 TB is equally important. Raw home video of a birthday party from 2009 matters less than the 200 best photos from that party. Key documents — passport scans, insurance cards, tax returns — are tens of megabytes, not terabytes. The emotionally irreplaceable subset of a large personal archive is routinely less than 10 GB.
10 GB is cloud-viable. Even at 20 Mbps upload, 10 GB takes 67 minutes. Do that once, and the most important files have true offsite cloud redundancy. Do it on a schedule, and new irreplaceable files reach the cloud within days of being created.
The cloud does not disappear from this architecture. It just does what it is good at: storing small, high-value data reliably with global availability.
The Incremental Delta Problem Is Solvable
The initial backup is the hard problem. Ongoing changes are not.
A family generating photos and video at a typical modern rate produces approximately 10–50 GB per month of new content. iPhone Pro cameras generate roughly 40 MB per photo in ProRAW format; 4K video at 30fps is around 6 GB per hour. A family of four who shoots casually — birthdays, vacations, school events — might capture 25–100 GB in a good month.
At 20 Mbps upload, 50 GB takes approximately 5.5 hours. A nightly background sync during sleeping hours handles the ongoing delta with no user-visible impact. The bandwidth wall exists for the initial terabytes. Once the seed is done, the wall disappears for normal usage patterns.
This argues for a clear two-phase product:
Phase 1: Seed — local drives handle the initial capture. Physical transport handles the initial offsite copy. Cloud handles only the metadata and the selective Tier 5 subset. No upload window required.
Phase 2: Ongoing — new content is small enough for cloud upload. Tier 5 is updated on a schedule. Local and offsite drives are updated during drive rotation. The system is in steady state.
Most cloud backup products treat Phase 2 as the default and Phase 1 as a setup problem the user must solve themselves. Users with large datasets are stuck in Phase 1 forever. The product design should treat Phase 1 as a ceremony: explicit, named, with a defined completion event.
Priority Replication: Not All Data Is Equal
The bandwidth constraint forces a prioritization problem that most backup systems ignore.
A typical backup system copies files in one of two orders: the order the filesystem presents them (arbitrary), or by modification time (recent first). Neither order reflects what the user actually cares about.
Consider a drive that needs to leave — going offsite for the first time, or rotating out to a relative’s house — in the next 4 hours. The system can copy perhaps 400 GB in that window (USB 3.0 to an 8 TB drive at ~100 MB/s with filesystem overhead). The pool has 2 TB of data. What goes on the drive?
Without priority awareness, the answer is the first 400 GB the filesystem happens to encounter. This might be a collection of downloaded Linux ISOs, old installer files, and cached application data — content that matters least.
With priority scoring, the system asks: which files are irreplaceable? Which have no other backup copy? Which are emotionally or practically important?
A scoring heuristic:
- Uniqueness: files that exist only on one drive score highest
- Category: family photos and videos score higher than downloads and installers
- Access recency: recently accessed files may indicate current importance
- Explicit user signal: starred, tagged, or marked-as-important files
The drive that goes in the car carries the highest-scoring 400 GB. If the house burns down, the recovery set contains the most important 20% of the family’s data — not a random 20%.
This is only possible when the system knows what matters. Generic backup tools do not. They treat a wedding photo and a .dmg file identically.
The Snowball Moment for Consumers
AWS Snowball was created in 2015 to solve a problem enterprises had been working around for decades: at some scale, physically shipping a device is faster and cheaper than sending bits over a wire.
The original Snowball was a 50 TB appliance in a ruggedized case. You ordered it, it arrived, you loaded your data, and shipped it back. AWS ingested the data and it appeared in S3. The service was explicitly positioned as the solution when uploading over the internet would take weeks.
At 50 TB, even gigabit fiber requires 4.6 days of continuous upload. At 500 Mbps symmetric — genuinely exceptional residential service — it takes over 9 days. AWS Snowball exists because these numbers do not change.
Azure Data Box operates identically. Seagate Lyve Mobile is the same concept for hybrid enterprise environments. These are not niche products. They represent the correct architectural response at scale.
For consumers, there is no commercial equivalent. The consumer equivalent of Snowball is carrying a drive in a bag.
Heirloom should make this ceremony explicit. When a drive is designated offsite, the product should:
- Calculate what fits given the drive capacity and available time
- Queue content in priority order
- Show a pre-departure manifest: “This drive will contain N files covering your [date range] photos, N key documents, and N videos. Copy time: X hours.”
- Verify the copy with hashes before departure
- Record the departure: drive label, contents manifest, destination, date
- On return: verify no blocks changed during the drive’s absence
This is not a workaround. This is the product. “Drive departure” is a first-class event, not an edge case.
Competitive Landscape: What Everyone Else Gets Wrong
iCloud / Google Photos
Both are bandwidth-first by design. iCloud will cheerfully begin “optimizing” your iPhone storage by removing local copies once the cloud upload completes — which assumes the upload does complete. Above 200 GB, initial uploads can take weeks on typical home connections. Users in this state have neither local originals nor cloud backup for the duration.
Neither product has a physical transport option. Neither product can explain to a user what to do with 10 TB of photos.
Backblaze Personal Backup
The most bandwidth-honest of the cloud backup products. Unlimited backup, no file size limits. Backblaze’s own support documentation states that initial backup for large data sets takes “weeks to months” and recommends leaving the computer connected to power and internet continuously.
Backblaze does offer a “Fireball” seeding service: they ship a drive, you copy your data to it, ship it back, and it seeds your backup. This is exactly the physical-first insight, offered as a paid add-on, positioned as an advanced workaround rather than a default path.
The restore path is similarly constrained. Full restore from Backblaze over the internet means downloading terabytes at whatever speed your connection permits. They also offer physical restore on a drive for $189 (refunded if returned within 30 days). Again: physical transport as an afterthought.
Syncthing
Open-source continuous file synchronization. Excellent engineering. Syncthing replicates whole files between devices and handles conflict resolution well. It does not solve the bandwidth problem — it uses bandwidth. A 1 TB sync between two devices using Syncthing requires moving 1 TB of data over whatever network is available.
Syncthing also has no concept of priority: it will sync a .DS_Store file with the same urgency as a wedding video. There is no scoring, no drive departure ceremony, no verified manifest.
Heirloom
Local-first. Metadata to cloud, content to drives. Physical transport is the primary offsite path, not an emergency workaround. Priority scoring determines what goes on a drive when time is limited. Verified manifests make the handoff auditable. The product is designed around the user who has 5 TB today and will have 20 TB in three years, and for whom cloud-first backup has never worked.
The Pitch
“Your files never leave your house — except on a drive you carry yourself.”
This is not a technical limitation dressed up as a feature. It is the correct architecture for personal data at scale, derived from the physics of bandwidth growth.
Cloud backup is the right answer for documents, credentials, and the top 1% of irreplaceable photos. It is the wrong answer for the raw footage, the unedited archives, the full-resolution originals that make up the bulk of a modern family’s digital life.
The bandwidth wall is not going away. Nielsen’s Law at 50% annual bandwidth growth cannot close the gap against Kryder’s Law at 30–40% annual storage cost decline. Every year the median household has more data relative to its upload capacity, not less. The products that will win are the ones built for that reality, not the ones still assuming that all data can eventually cross the wire.
Physical first is not a legacy approach. It is the only architecture that works at the scale personal data has already reached.
Derived from a product design conversation, 2026-03-20. Research sources: Nielsen’s Law (NN/g, 1998–2023 longitudinal validation, R²=0.99); AWS Snowball product documentation; Azure Data Box specifications (120 TB and 525 TB, AES-256 encryption, NIST 800-88 erasure); Backblaze Hard Drive Stats 2023 (274,622 drives under management, 35 active models); Backblaze initial backup time support documentation; FCC Measuring Broadband America program upload speed findings; AWS Snowball pricing page (effective November 2025: new customers redirected to DataSync/Data Transfer Terminal). Sneakernet bandwidth comparison methodology from Andrew Tanenbaum, Computer Networks.