Teaching Claude to Read My Book Collection

The best knowledge system is the one you actually use and iterate. A few weeks ago I started building with Claude on top of my Obsidian vault, and have steadily been doing just that. Between skill and agent creation, plus many MCP servers to make it easier to interact with the APIs for my various services, I started looking inwards at my hard drive to see what else I could tap into.

Why Build a Personal Knowledge System?

For likely more than a decade I’ve been building an ebook collection on my computer, hosting them in an open source app called Calibre. Now wouldn’t it be awesome (I asked myself) if I could find a way to get Claude to be able to access these within my regular sessions?

So, I asked it, and the answer was a ‘yes’. That would indeed be awesome, and 1,700 carefully organized ebooks in Calibre would no longer be entirely worthless. instead, i had the capability to turn that library (years of Humble Bundle purchases, tech manuals, business books, security references) into a mineable resource for the to the tool I actually use as a thinking partner — Claude.

When I’m writing about email security for the newsletter, I don’t need Claude’s general knowledge. I need the specific passage from Kevin Mitnick where he explains how attackers craft urgency, because that’s what I remember learning. When I’m prepping for a consulting call about change resistance, I want Kahneman’s actual words about loss aversion, not a paraphrase from someone’s blog.

So I spent a week building what is objectively too much infrastructure for this problem, but that’s part of the process.

A Python CLI extracts text from PDFs, ePubs, and MOBIs, chunks it into digestible pieces, and pushes everything into ChromaDB running on my Mac Mini. An MCP server exposes it to Claude. Five “librarian” agents know which books matter for which domains—security, psychology, writing craft, tech history, fiction.

The flow is:

Calibre → book-indexer → ChromaDB → MCP server → Claude Code

In practice, everything broke at least once.

My first version loaded entire books into memory, generated all embeddings at once, then pushed to ChromaDB. Worked fine until I fed it a 400-page manual and watched my Mac grind to a halt. Switched to batching 50 chunks at a time.

After some teething issues I’ve now got roughly 175 documents indexed, using about 2GB of ChromaDB storage. The security librarian agent I built has 2600 magazines, Mitnick, and the social engineering shelf. The influence librarian holds Cialdini, Kahneman, and—oddly useful—some magic and mentalism books.

Perplexity and Google are fine for web searches, but I’ve got a wealth of information on my own drives that I know to be authentic and valid. Now when I ask about social engineering techniques, Claude searches my books first, cites them properly, then supplements with web research. The sourcing is mine. The context is specific. And I’m not copy-pasting excerpts like it’s 2019.

Was it worth it? The searching works. But really I just enjoyed building it. The vibe coding with Opus 4.5 remains a ‘magical’ experience for someone like me who just wants non-production level tools to get the job done, even if it’s strung together with the digital equivalent of gaffer tape.

What I’d do differently: start with one category instead of indexing everything at once. Pull more metadata from Calibre — tags, series info, custom columns. Handle scanned PDFs that fail silently. Maybe write some actual tests.

But here’s the real lesson: generic AI knowledge is fine until it isn’t. When you need your references, your sources, your accumulated context—that’s when personal knowledge infrastructure starts to matter.

Most people won’t build this, and I think that’s a good thing. As I see it, this ‘golden age’ of AI lies in what we choose to create with it and make our own. If you’ve got a library gathering digital dust, the pieces are: ChromaDB, PyMuPDF for PDFs, sentence-transformers for embeddings, chunk around 500 characters with overlap. The MCP server is just a thin wrapper.

The hard part isn’t the technology. It’s deciding which books actually deserve to be in Claude’s context window.

This is far from perfect, but it’s perfectly functional. I’ll keep tinkering.

Related Resources and Learning

If you’re interested in building your own MCP servers and integrating AI with your personal knowledge base, here are some excellent resources:

Key Documentation and Tools

Model Context Protocol Documentation – https://modelcontextprotocol.io

ChromaDB Documentation – https://docs.trychroma.com

PyMuPDF Documentation – https://pymupdf.readthedocs.io

Calibre E-book Management – https://calibre-ebook.com

M	T	W	T	F	S	S
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30	31

Why Build a Personal Knowledge System?

Related Resources and Learning

Key Documentation and Tools

Leave a Reply Cancel reply