Skip to main content

Vision, PDF & File Input

Intermediate

Claude is multimodal: you can include images and documents (like PDFs) in a message and ask questions about them — extract data, describe a chart, summarize a contract, read a screenshot.

Three ways to send a file

  1. Base64 — encode the bytes inline in the request. Simplest for one-offs.
  2. URL — point to a hosted file.
  3. Files API — upload once, then reference it by file_id across many requests (avoids re-uploading large files).
# Conceptual: an image content block alongside text (see docs for exact fields)
messages=[{
"role": "user",
"content": [
{"type": "image", "source": {"type": "base64", "media_type": "image/png", "data": b64}},
{"type": "text", "text": "What does this chart show? Give the top 3 takeaways."},
],
}]

PDFs work similarly (a document content block); Claude can read text and visual layout, and you can request citations back to where in the document an answer came from.

What it's great for

  • Extraction — pull structured data from invoices, forms, tables (pair with Structured Output).
  • Understanding visuals — charts, diagrams, screenshots, UI.
  • Document Q&A — ask questions across a long PDF with citations.

Tips

  • Reuse with file_id when sending the same big document repeatedly — cheaper and faster.
  • Resolution/size matter — very large images may be downscaled; check limits.
  • Verify extracted numbers — vision is strong but not infallible; spot-check critical figures (Hallucinations).

Next