OpenGVLab

community

https://github.com/opengvlab

opengvlab

OpenGVLab

Activity Feed Request to join this org

AI & ML interests

Computer Vision

Recent Activity

ganlinyang updated a collection about 4 hours ago

ganlinyang updated a collection about 4 hours ago

heroding77 authored a paper 7 days ago

TIDE: Trajectory-based Diagnostic Evaluation of Test-Time Improvement in LLM Agents

View all activity

Papers

InternVideo-Next: Towards General Video Foundation Models without Video-Text Supervision

VKnowU: Evaluating Visual Knowledge Understanding in Multimodal LLMs

View all Papers

ganlinyang

updated a collection about 4 hours ago

Vlaser

Vlaser: Vision-Language-Action Model with Synergistic Embodied Reasoning • 6 items • Updated about 4 hours ago • 4

heroding77

authored 2 papers 7 days ago

TIDE: Trajectory-based Diagnostic Evaluation of Test-Time Improvement in LLM Agents

Paper • 2602.02196 • Published 14 days ago • 33

OdysseyArena: Benchmarking Large Language Models For Long-Horizon, Active and Inductive Interactions

Paper • 2602.05843 • Published 11 days ago • 57

prithivMLmods

posted an update 7 days ago

Post

2836

Introducing FLUX.2-Klein-LoRA-Studio, a demo for image editing using specialized LoRA adapters built for the FLUX.2-Klein-Distilled model. It features an edit-style gallery for multi-style image editing, including de-light, face swap, mannequin, and more. Try the demo below.

🤗Demo: prithivMLmods/FLUX.2-Klein-LoRA-Studio
🤗Collection: https://huggingface.co/collections/prithivMLmods/image-generation-apps-collection
🤗GitHub: https://github.com/PRITHIVSAKTHIUR/FLUX.2-Klein-LoRA-Studio

To learn more, visit the app page or the respective model pages.

Xrenya

in OpenGVLab/InternVideo2-Stage2_1B-224p-f4 10 days ago

Error when using model

#2 opened 10 days ago by

yangxue

submitted a paper to Daily Papers 10 days ago

RISE-Video: Can Video Generators Decode Implicit World Rules?

Paper • 2602.05986 • Published 11 days ago • 26

prithivMLmods

posted an update 11 days ago

Post

827

GLM OCR, a multimodal OCR model for complex document understanding, built on the GLM-V encoder–decoder architecture. It delivers high accuracy and strong generalization with a blazing-fast inference pipeline. The demo is live . Try it now. 🤗🚀

✨ Demo: prithivMLmods/GLM-OCR-Demo
✨ Multimodal Implementations: https://huggingface.co/collections/prithivMLmods/multimodal-implementations
✨ GitHub: https://github.com/PRITHIVSAKTHIUR/GLM-OCR-Demo

Xrenya

in OpenGVLab/InternVL2_5-1B 12 days ago

unable to load model on google collab notebook

#8 opened 12 days ago by

prithivMLmods

posted an update 12 days ago

Post

2137

Introducing the Qwen-Image-Edit-3D-Lighting-Control app, featuring 8× horizontal and 3× elevational lighting positions for precise 3D lighting control. It enables studio-level lighting using fast Qwen Image Edit fast inference, paired with Multi-Angle-Lighting adapters. 🔦

🔥 Space: prithivMLmods/Qwen-Image-Edit-3D-Lighting-Control
✅ Collection: https://huggingface.co/collections/prithivMLmods/image-generation-apps-collection
📂 GitHub: https://github.com/PRITHIVSAKTHIUR/Qwen-Image-Edit-3D-Lighting-Control

ynhe

in OpenGVLab/VideoChat-Flash-Qwen2_5-7B_InternVideo2-1B 17 days ago

Enable on CPU

#1 opened 18 days ago by

prithivMLmods

posted an update 18 days ago

Post

3624

Daggr UI version of the Qwen3-TTS demo.🔥
(custom voice, voice design, qwen3-asr and voice cloning) nodes.
No remote spaces used for API inference; all functions run in-app fn.
Powered by t4-m and built with daggr@0.5.2 and gradio@6.

👉Demo: prithivMLmods/Qwen3-TTS-Daggr-UI
⭐Github: https://github.com/PRITHIVSAKTHIUR/Qwen3-TTS-Daggr-UI

1 reply

·

kpzhang996

submitted a paper to Daily Papers 19 days ago

World Craft: Agentic Framework to Create Visualizable Worlds via Text

Paper • 2601.09150 • Published Jan 14 • 20

prithivMLmods

posted an update 20 days ago

Post

2682

Qwen-Image-Edit-Object-Manipulator Space is now featured in Hugging Face Space of the Week. It enables object manipulation such as extracting objects, adding designs, and removing objects or designs from the red highlighted area using specialized adapters.

🔥Do enjoy the demo! ~ prithivMLmods/Qwen-Image-Edit-Object-Manipulator

Collections:
🧨Adapters-1: https://huggingface.co/collections/prithivMLmods/qwen-image-edit-exps
🧨Adapters-2: https://huggingface.co/collections/prithivMLmods/qie-jan-23-26
🧨Adapters-3: https://huggingface.co/collections/prithivMLmods/qwen-image-edit-object-manipulator

⭐Github: https://github.com/PRITHIVSAKTHIUR/Qwen-Image-Edit-Object-Manipulator

To learn more, visit the app page or the respective model pages.

1 reply

·

kpzhang996

submitted a paper to Daily Papers 21 days ago

MeepleLM: A Virtual Playtester Simulating Diverse Subjective Experiences

Paper • 2601.07251 • Published Jan 12 • 11

prithivMLmods

posted an update 24 days ago

Post

3039

Introducing QIE-2511-Zoom-Master for highlight-guided area zoom-in, enabling lossless zooming within a drawn square area, and QIE-2511-Object-Remover-v2 for precise object or highlight-guided area cleanup. These experimental adapters are trained based on QIE-2511. Find the adapters below.

🕹️QIE-2511-Zoom-Master : prithivMLmods/QIE-2511-Zoom-Master
🕹️QIE-2511-Object-Remover-v2: prithivMLmods/QIE-2511-Object-Remover-v2

🤗Demo: prithivMLmods/Qwen-Image-Edit-Object-Manipulator

📂Collection: https://huggingface.co/collections/prithivMLmods/qwen-image-edit-exps

To learn more, visit the app page or the respective model pages.

2 replies

·

Eurayka

authored a paper 27 days ago

LaViT: Aligning Latent Visual Thoughts for Multi-modal Reasoning

Paper • 2601.10129 • Published Jan 15 • 11

ownerEli

authored a paper about 1 month ago

STEP3-VL-10B Technical Report

Paper • 2601.09668 • Published Jan 14 • 193

Eurayka

submitted a paper to Daily Papers about 1 month ago

LaViT: Aligning Latent Visual Thoughts for Multi-modal Reasoning

Paper • 2601.10129 • Published Jan 15 • 11

YYangzzzz

authored a paper about 1 month ago

OS-Symphony: A Holistic Framework for Robust and Generalist Computer-Using Agent

Paper • 2601.07779 • Published Jan 12 • 28