So far, on my KendiEgemen channel and across my blog posts, I have repeatedly covered topics like offline LLMs, solar panels, and generators. But offline LLMs hallucinate, which means we also need solid offline data sources. That’s exactly what this post is about.
There are three “what if” scenarios:
In all three cases, you need the same thing: knowledge that works without internet. This folder (/mnt/d/wikipedia) is exactly that. Roughly 170 GB of data — fits on an external drive, easily fits on an SSD, can be backed up, can be carried.
What’s inside, and why it’s there — in order.
| Category | Size | Purpose |
|---|---|---|
| English Wikipedia (3 formats) | ~140 GB | General knowledge — the starting point for almost everything |
| Turkish Wikipedia (3 formats) | ~10 GB | Same idea, local language |
| OpenStax books (122 textbooks) | 11 GB | College-level education — math, physics, biology, anatomy, law… |
Medical/field archive (offline-medical-survival/) |
622 MB | When there is no doctor / system collapse |
| Kiwix Desktop + DuckDB tools | 470 MB | To actually open and use everything |
| Total | ~170 GB |
A 256 GB or 512 GB portable SSD handles this comfortably. Cost: a few hundred dollars. Payoff: when your connection to the world breaks, you still hold a meaningful chunk of humanity’s knowledge.
Same Wikipedia, different formats for different jobs:
wikipedia_en_all_maxi_2026-02.zim (124 GB) and wikipedia_tr_all_maxi_2026-02.zim (9 GB)Kiwix ZIM format. Full Wikipedia including images. You get the familiar browsing experience — just offline. Snapshot from February 2026.
How to use: run kiwix-desktop.exe from kiwix-desktop_windows_x64_2.5.1/ → File → Open ZIM. Or host it on your local network with kiwix-serve.exe so phones can access it too.
Doomsday tip: this is the single most critical file set. Back it up to a second disk first.
enwiki-latest-pages-articles.xml.bz2 (24 GB) and trwiki-latest-pages-articles.xml.bz2 (983 MB)Raw Wikipedia XML dump. Fresher than ZIM (May 2026), but no images and raw wikitext structure. Best for people building pipelines, writing custom parsers, or wanting newer text.
hf_wikipedia/20231101.{en,tr}/*.parquet (12 GB)HuggingFace’s cleaned parquet Wikipedia snapshot (November 2023). Older, but the text is clean — templates expanded, markup stripped. Perfect for SQL search, pandas analysis, and ML workflows.
With DuckDB CLI from duckdb_cli-windows-amd64.zip:
SELECT title FROM 'D:/wikipedia/hf_wikipedia/20231101.tr/*.parquet'
WHERE title ILIKE '%Atatürk%';
The three formats complement each other: ZIM = reading, Parquet = search/analysis, XML = freshest + your own tools.
Inside openstax/: 122 PDFs — the full OpenStax (Rice University) open textbook catalog. Peer-reviewed, free, openly licensed.
Coverage includes math (algebra, calculus 1-3, contemporary math), physics, chemistry, biology (general + AP + concepts), anatomy and physiology, astronomy, micro/macro economics, business ethics, business law, U.S. government, sociology, psychology, anthropology, clinical nursing skills, plus Spanish calculus translations.
A motivated student can self-study a huge chunk of high school + early undergraduate curriculum with this alone. As long as you have minimal electricity — and even without a grid, a charged tablet plus a small solar setup solves that.
Doomsday tip: if you have kids (or might), this folder is one of the strongest education assets you can store. Adding Khan Academy/Kolibri offline packs is a good next project.
offline-medical-survival/ — medicine for collapse conditionsThis folder is the smallest (622 MB), but arguably the most critical. No doctor, no pharmacy, no ambulance scenarios.
It has 13 subfolders, each targeting a different problem area:
hesperian/ — Where There Is No Doctor (2025) and Where There Is No Dentist (2020)military/ — U.S. Army field manuals: medical, first aid, survival, water/wells, tactical comms, emergency war surgerywho-iris/ — WHO core clinical handbooks (hospital care, surgery, essential medicines, maternal care, IMCI/IMAI, etc.)who-icrc/ — WHO/ICRC Basic Emergency Carefirst-aid/ — IFRC and ICRC first aid referencesmerck-msd/ — MSD manual pages as backup referencesmsf/ — Médecins Sans Frontières field clinical guidesicrc-war-surgery/ — ICRC War Surgery Vol. 1cdc/ — emergency supply guidance + travel medicine pagesdrug-labels/ — DailyMed + FDA Orange Book pagesopenstax/ — OpenStax Anatomy & Physiology duplicate for quick referencemaritime/ — WHO International Medical Guide for Shipsniosh/ — NIOSH Pocket Guide to Chemical HazardsThis archive was built by health.sh (wget-based) and health_fixup.py (Python for WHO IRIS DSpace 7 API). Both are idempotent — rerun them and they only fetch missing files.
Gap: wilderness/ is currently empty. Wilderness Medical Society guidelines and several PMC/Sage resources are paywalled. military/ survival docs and Hesperian material partly cover the need, but not fully.
Two .zip files in /mnt/d/wikipedia:
kiwix-desktop.exe)duckdb.exe)The older README.md explains usage details. This post explains the “why.”
| Scenario | Does this archive help? | Notes |
|---|---|---|
| Short internet outage (hours) | Massively | Wikipedia + OpenStax are enough alone |
| Long local outage (weeks) | Yes, strongly | Medical archive + Wikipedia can be life-saving |
| Regional war | Yes, materially | Field medicine, water sourcing, tactical comms |
| Long social collapse (years) | Limited but critical | Foundation for education and rebuilding |
| Personal offline need (flight/boat/mountain) | Of course | Everything you need is already there |
What’s still missing:
All addable. Each one is a weekend project.
Knowledge decays:
download.kiwix.org/zim/wikipedia/health.sh and health_fixup.py yearly (they skip existing files with wget -c)_openstax_download.py pulls in newly published booksdrug-labels/ yearlyKeep disks in two locations: one primary working copy, one in a separate physical location for fire, theft, flood.
Building this folder took a weekend. If you ever need it, it can be worth a decade of accumulated knowledge. Preparedness is not paranoia; it’s insurance. You don’t buy insurance because you hope disaster happens — you buy it because you hope it doesn’t, and you still want a fallback.
What you have now: the world’s largest encyclopedia (in two languages), a university-level curriculum, field-grade life-saving medical manuals, military field references, water/comms/navigation basics. Pull the cable — it still works.
Good news: communities like Kiwix make this possible with thousands of ZIMs. Bad news: if you don’t build your own kit, no one builds it for you.
And one more thing: if you finished reading this, back it up. Now.
In a potential disaster scenario, you can run RAG on top of these documents with your LLM (LM Studio supports this natively) and reach critical factual information with dramatically lower hallucination risk.
tags: