gothitech/GT-REX
Image-Text-to-Text
•
3B
•
Updated
•
29
None defined yet.
Building the Future of Document AI
Gothi Tech LLP is an AI-first company specializing in Document Intelligence and Vision-Language Models. We build production-grade AI systems that help enterprises automate document processing, extract structured data, and unlock insights from unstructured documents at scale.
Our mission is to make document understanding fast, accurate, and accessible for businesses of all sizes.
| Model | Type | Size | Description |
|---|---|---|---|
| GT-REX | Vision-Language (VLM) | ~7B | Production OCR model for enterprise document understanding, text extraction, and structured data output |
| Variant | Resolution | Speed | Best For |
|---|---|---|---|
| Nano | 640px | ~1-2s | High-volume batch processing (100+ docs/min) |
| Pro | 1024px | ~2-5s | Standard workflows — invoices, contracts, forms |
| Ultra | 1536px | ~5-10s | Fine print, dense tables, legal and medical docs |
| Area | Capabilities |
|---|---|
| Document AI | Intelligent Document Processing (IDP), automated data extraction |
| OCR | High-accuracy text extraction from printed and scanned documents |
| Handwriting | Recognition and transcription of handwritten text |
| Structured Output | Extract data as JSON, Markdown tables, key-value pairs, custom schemas |
| Multi-Language | Document understanding across multiple languages |
| Table Extraction | Accurate extraction of complex and nested table structures |
| Industry | Use Cases |
|---|---|
| 🏦 Finance | Invoice processing, receipt scanning, bank statements |
| ⚖️ Legal | Contract analysis, clause extraction, legal filings |
| 🏥 Healthcare | Medical records, prescriptions, lab reports |
| 🏛️ Government | Form processing, ID verification, tax documents |
| 🛡️ Insurance | Claims processing, policy documents |
| 📦 Logistics | Shipping labels, waybills, packing lists |
from vllm import LLM, SamplingParams
from PIL import Image
llm = LLM(
model="gothitech/GT-REX",
trust_remote_code=True,
max_model_len=4096,
gpu_memory_utilization=0.75,
)
image = Image.open("document.png")
outputs = llm.generate(
[{"prompt": "Extract all text from this document.", "multi_modal_data": {"image": image}}],
sampling_params=SamplingParams(temperature=0.0, max_tokens=4096),
)
print(outputs[0].outputs[0].text)
| 🌐 Website | gothi.in |
| 🤗 HuggingFace | gothitech |
| 🦖 GT-REX Model | gothitech/GT-REX |
🇮🇳 Proudly built in India by Gothi Tech LLP