Vision, PDF & File Input
Claude is multimodal: you can include images and documents (like PDFs) in a message and ask questions about them — extract data, describe a chart, summarize a contract, read a screenshot.
Three ways to send a file
- Base64 — encode the bytes inline in the request. Simplest for one-offs.
- URL — point to a hosted file.
- Files API — upload once, then reference it by
file_idacross many requests (avoids re-uploading large files).
# Conceptual: an image content block alongside text (see docs for exact fields)
messages=[{
"role": "user",
"content": [
{"type": "image", "source": {"type": "base64", "media_type": "image/png", "data": b64}},
{"type": "text", "text": "What does this chart show? Give the top 3 takeaways."},
],
}]
PDFs work similarly (a document content block); Claude can read text and visual layout, and you can request citations back to where in the document an answer came from.
What it's great for
- Extraction — pull structured data from invoices, forms, tables (pair with Structured Output).
- Understanding visuals — charts, diagrams, screenshots, UI.
- Document Q&A — ask questions across a long PDF with citations.
Tips
- Reuse with
file_idwhen sending the same big document repeatedly — cheaper and faster. - Resolution/size matter — very large images may be downscaled; check limits.
- Verify extracted numbers — vision is strong but not infallible; spot-check critical figures (Hallucinations).
Next
- Structured Output
- Generating Real Files (docx/pptx/xlsx/pdf)
- Tokens & Pricing — images cost tokens too