One Vision-Language-Action Model for GUI Agent
Qinghong (Kevin) Lin
KevinQHLin
AI & ML interests
Vision-Language Model, Video Understanding, Human-AI Interaction
Recent Activity
liked a dataset 1 day ago
ServiceNow/VideoCUA upvoted an article about 1 month ago
When Vision Meets Code authored a paper about 1 month ago
Learning Video Context as Interleaved Multimodal Sequences