Reading across books with Claude Code

Dec 21, 2025

I want to explore how LLMs can enrich reading by augmenting the original text, not merely reducing it to a summary.

I gave Claude Code the tools to discover connections in a library of 100 non-fiction books.

It found dozens of trails, thematic sequences of excerpts.
Here’s a part of one such trail, linking deception in the startup world to the social psychology of mass movements:

How it works

Pipeline diagram

The books were selected from Hacker News’ favourites, which I previously scraped and visualized.

Claude browses the books a chunk at a time. A chunk is a segment of roughly 500 words that aligns with paragraphs when possible. This length is a good balance between saving tokens and providing enough context for ideas to breathe.

Chunks are indexed by topic, and topics are themselves indexed for search. This makes it easy to look up all passages in the corpus that relate to, say, deception.

This works well when you know what to look for, but search alone can’t tell you which topics are present to begin with. There are over 100,000 extracted topics, far too many to be browsed directly. To support exploration, they are grouped into a hierarchical tree structure.

This yields around 1,000 top-level topics. They emerge from combining lower-level topics, and not all of them are equally useful:

However, this Borgesian taxonomy is good enough for Claude to piece together what the books are about.

Claude uses the topic tree and the search via a few CLI tools.
They allow it to:

To generate the trails, the agent works in stages.

Generate ideas, research a trail, connect the highlights
  1. First, it scans the library and the existing trails, and proposes novel trail ideas. It mainly browses the topic tree to find unexplored areas and rarely reads full chunks in depth.
  2. Then, it takes a specific idea and turns it into a trail. It receives seed topics from the previous stage and browses many chunks. It extracts excerpts, specific sequences of sentences, and decides on how best to order them to support an insight.
  3. Finally, it adds highlights and edges between consecutive excerpts.

What I learned

Agents really are quite good

My first attempt at this system consisted of LLM modules with carefully hand-assembled contexts.

On a whim, I ran Claude with access to the debugging tools I’d been using and a minimal prompt: “find something interesting.” It immediately did a better job at pulling in what it needed than the pipeline I was trying to tune by hand, while requiring much less orchestration. It was a clear improvement to push as much of the work into the agent’s loop as possible.

I ended up using Claude as my main interface to the project.
Initially I did so because it inferred the sequence of CLI calls I wanted to run faster than I could recall them. Then, I used it to automate tasks which weren’t rigid enough to be scripted traditionally.

The latter opened up options that I wouldn’t have considered before. For example, I changed my mind on how short I wanted excerpts to be. I communicated my new preference to Claude, which then looked through all the existing trails and edited them as necessary, balancing the way the overall meaning of the trail changed. Previously, I would’ve likely considered all previous trails to be outdated and generated new ones, because the required edits would’ve been too nuanced to specify.

In general, agents have widened my ambitions.
By taking care of the boilerplate, I no longer shy away from the tedious parts. Revision is cheap, so I don’t need to plow ahead with suboptimal choices just because it’d be too costly to undo them. This, in turn, keeps up the momentum and lets me to focus on the joyful, creative aspects of the work.

Ask the agent what it needs

My focus went from optimising prompts to implementing better tools for Claude to use, moving up a rung on the abstraction ladder.

My mental model of the AI component changed: from a function mapping input to output, to a coworker I was assisting. I spent my time thinking about the affordances that would make the workflow better, as if I were designing them for myself. That they were to be used by an agent was a mere detail.

This worked because the agent is now intelligent enough that the way it uses these tools overlaps with my own mental model. It is generally easy to empathise with it and predict what it will do.

Initially I watched Claude’s logs closely and tried to guess where it was lacking a certain ability. Then I realised I could simply ask it to provide feedback at the end and list the functionality it wished it had. Claude was excellent at proposing new commands and capabilities that would make the work more efficient.

Claude suggested improvements, which Claude implemented, so Claude could do the work better. At least I’m still needed to pay for the tokens — for now.

Novelty is a useful guide

It’s hard to quantify interestingness as an objective to optimise for.
Why Greatness Cannot Be Planned makes the case that chasing novelty is often a more fruitful approach. While its conclusions are debated, I’ve found this idea to be a good fit for this project.

As a sign of the times, this novelty search was implemented in two ways:

  1. by biasing the search algorithm towards under-explored topics and books
  2. by asking Claude nicely

A topic’s novelty score was calculated as the mean distance from its embedding’s k nearest neighbors. A book’s novelty score is the average novelty of the unique topics that it contains. This value was used to rank search results, so that those which were both relevant and novel were more likely to be seen.

On a prompting level, Claude starts the ideation phase by looking at all the existing trails and is asked to avoid any conceptual overlap. This works fairly well, though it is often distracted by any topics related to secrecy, systems theory, or tacit knowledge.

It’s as if the very act of finding connections in a corpus summons an aspect of Umberto Eco and puts it in a conspiratorial mindset.

How it’s implemented

Implementation diagram
<topics query="deception" count="1">
  <topic id="47193" books="7" score="0.0173" label="Deception">
    <chunk id="186" book="1">
      <topic id="47192" label="Business deal"/>
      <topic id="47108" label="Internal conflict"/>
      <topic id="46623" label="Startup founders"/>
    </chunk>
    <chunk id="1484" book="4">
      <topic id="51835" label="Gawker Media"/>
      <topic id="53006" label="Legal Action"/>
      <topic id="52934" label="Maskirovka"/>
      <topic id="52181" label="Strategy"/>
    </chunk>
    <chunk id="2913" book="9">
      <topic id="59348" label="Blood testing system"/>
      <topic id="59329" label="Elizabeth Holmes"/>
      <topic id="59352" label="Investor demo"/>
      <topic id="59349" label="Theranos"/>
    </chunk>
  </topic>
</topics>