JTPA2.5-VL-3B-Phone-Agent-VI
A lightweight Vietnamese-optimized vision-language model for Android phone automation. This model understands Vietnamese UI interactions and generates structured action sequences for mobile device control.
Model Information
- Fine-tuning Method: LoRA (Low-Rank Adaptation)
- Framework: [LLaMA-Factory]
- Language: Vietnamese (vi) & English (en)
- License: Apache 2.0
Capabilities
This model excels at:
- 📱 Phone GUI Understanding: Analyzes Android screenshots and identifies UI elements
- 🗣️ Vietnamese Instructions: Processes natural language commands in Vietnamese
- ⚡ Action Generation: Produces structured actions (tap, type, swipe, back, home)
- 🔄 Multi-step Workflows: Handles complex sequential tasks across multiple screens
Example Usage
Input: Screenshot + Vietnamese instruction Output: Structured action in JSON format { "action": "tap", "x": 540, "y": 1200, "reason": "Click on the search button" }
Quick Start
Deploy with vLLM
python -m vllm.entrypoints.openai.api_server --model johnnietien/JTPA2.5-VL-3B-phone-agent-vi --trust-remote-code --port 8000
Query the Model
import requests import base64
def encode_image(path): with open(path, "rb") as f: return base64.b64encode(f.read()).decode("utf-8")
payload = { "model": "JTPA2.5-VL-3B-phone-agent-vi", "messages": [{ "role": "user", "content": [ {"type": "input_text", "text": "Mở ứng dụng MoMo"}, {"type": "input_image", "image_url": {"url": f"data:image/jpeg;base64,{encode_image('screenshot.png')}"}} ] }], "max_tokens": 256 }
response = requests.post("http://localhost:8000/v1/chat/completions", json=payload) print(response.json()["choices"]["message"]["content"])
Integration
Perfect for:
- Open-phone agent framework
- Mobile application testing & QA automation
- Vietnamese smartphone accessibility tools
- RPA platforms for mobile app workflows
Performance Specifications
| Aspect | Details |
|---|---|
| Model Size | 3B parameters |
| Context Length | 4K tokens |
| Inference Speed | ~0.5-1.0 sec/action (T4 GPU) |
| VRAM Requirement | 6-8 GB (BF16), 4-6 GB (quantized) |
| Supported Actions | tap, type, swipe, back, home, wait |
Use Responsibly
✅ Intended for:
- Personal device automation
- Mobile app testing
- Accessibility solutions
- Research purposes
❌ Not intended for:
- Unauthorized account access
- Bypassing security mechanisms
- Unauthorized data collection
Citation
@misc{jtpa2024phone, author = {Johnnie Tien}, title = {JTPA2.5-VL-3B-Phone-Agent-VI: Vietnamese Mobile GUI Agent}, year = {2024}, howpublished = {\url{https://huggingface.co/johnnietien/JTPA2.5-VL-3B-phone-agent-vi}} }
Version: 1.0 | Status: Production-Ready | Last Updated: December 2024
[More Information Needed]
- Downloads last month
- 5