[AI Summary]: Chandra is a highly accurate OCR model that converts images and PDFs into structured HTML/Markdown/JSON while preserving detailed layout information. It features excellent handwriting support, accurate form reconstruction including checkboxes, strong support for tables, mathematical formulas and complex layouts, extraction of images and diagrams with captions and structured data, support for over 40 languages, and offers two inference modes with both local (HuggingFace) and remote (vLLM server) options.
- Developer: Datalab
- License: Apache 2.0 License
- Platform: GitHub
- Languages: Supports 40+ languages