Some API vendors give you an API doc in a giant custom-edited PDF file. In my case it’s >1200 pages, with a “helpful” table of contents that itself spans about 20 pages.
Well, I dislike reading giant PDF docs, love writing Ruby, and there’s an awesome RubyLLM gem, and Gemini supports PDF parsing, so maybe I can just throw together a quick CLI tool that can answer questions for me? Alas, Gemini is limited to 1000 pages. Either way it would probably be too wasteful to send the entire doc every time. RubyLLM supports tools, maybe I could do something clever with that.
Let’s Read PDF Text Locally
My doc is mostly text, there isn’t any pics in there I care about, so this part is easy. A quick search later, there’s a gem called pdf-reader. Perfect. I’ll start with a little tool.
bin/ask_api_doc
#!/usr/bin/env ruby
require 'ruby_llm'
require 'pdf-reader'
class PdfPageReader < RubyLLM::Tool
DOC = PDF::Reader.new('docs/big-doc.pdf')
description 'Read the text of any set of pages from the doc.'
param :page_numbers,
desc: 'Comma-separated page numbers (first page: 1). (e.g. "12, 14, 15")'
def execute(page_numbers:)
puts "\n-- Reading pages: #{page_numbers}\n\n"
page_numbers = page_numbers.split(',').map { _1.strip.to_i }
pages = page_numbers.map { [_1, DOC.pages[_1.to_i - 1]] }
{
pages: pages.map { |num, p|
# There are lines drawn with dots in my doc.
# So I squeeze them to save tokens.
{ page: num, text: p&.text&.squeeze('.') }
}
}
rescue => e
{ error: e.message }
end
end
Now my LLM can use the tool to extract text from any page.
And We’re Basically Done
Unlike “draw the rest of the owl”, the rest of the code is actually pretty straightforward (goes after the above):
# Grab key from my 1Password.
GEMINI_API_KEY=`op read "op://Private/Google Gemini API Personal/credential"`
RubyLLM.configure do |config|
config.gemini_api_key = GEMINI_API_KEY
end
chat =
RubyLLM
.chat(model: 'gemini-2.5-pro-preview-03-25') # Pick a model.
.with_tool(PdfPageReader.new) # Add the tool.
.with_instructions(<<~TEXT) # Add general instructions.
Use provided tool to find requested info in the multi-page doc. Ask for
multiple pages at a time to avoid roundtrips.
Respond only with results of your findings. Don't do ascii tables, I prefer
text and bullet points.
To find info, use table of contents. Make sure you scan the full table of
contents before you give up. Don't go to irrelevant parts of the doc unless
absolutely needed.
Total number of pages: 1249
Table of contents is on pages: 31-49
TEXT
response = chat.ask(ARGV.join(' ')) { |chunk|
print chunk.content
}
# Some stats at the end
puts "\n\n-----------\n"
puts "Input tokens: #{response.input_tokens}"
puts "Output tokens: #{response.output_tokens}"
puts "Total tokens: #{response.input_tokens.to_i + response.output_tokens.to_i}"
That’s it.
Now I can ask a question and sit back, watching the llm scanning table of contents, reading relevant pages, and spitting out a catered response. Pretty nice!
(Below is just sample output, not what’s really in my doc.)
❯ bin/ask_api_doc "what are all available statuses?"
-- Reading pages: 31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49
-- Reading pages: 1123
The available statuses are:
- `ACTIVE`: The default status for a new object.
- `INACTIVE`: The object is inactive and cannot be used.
- `PENDING`: The object is pending approval or activation.
- `ARCHIVED`: The object has been archived and is no longer active.
- `DELETED`: The object has been deleted and cannot be recovered.
- `SUSPENDED`: The object has been suspended and cannot be used.
- `EXPIRED`: The object has expired and is no longer valid.
-----------
Input tokens: 95288
Output tokens: 643
Total tokens: 95931
I bet there are more involved “talk to your docs” solutions out there, but this was quick and easy, and I can tweak it as needed.