| /share/liangzy/miniconda3/envs/vllm/lib/python3.10/site-packages/_distutils_hack/__init__.py:54: UserWarning: Reliance on distutils from stdlib is deprecated. Users must rely on setuptools to provide the distutils module. Avoid importing distutils or import setuptools first, and avoid setting SETUPTOOLS_USE_DISTUTILS=stdlib. Register concerns at https://github.com/pypa/setuptools/issues/new?template=distutils-deprecation.yml |
| warnings.warn( |
| Total Video Size: 31436 |
|
0%| | 0/31436 [00:00<?, ?it/s]
100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 31436/31436 [00:00<00:00, 847983.72it/s] |
| Total Clips Size: 37658 |
| Start: 0, End: 10000 |
| to process size: 10000 |
| Total size: 10000 |
| Sample show: <|im_start|>system |
| You are an AI assistant tasked with generating **high-quality object recognition questions and answers** based on a given video. |
|
|
| ## TASK: |
| Your task is to generate **one** high-quality **object recognition question** that requires identifying visible objects, people, or immediate scene attributes from the video. |
|
|
| You must also provide **4 answer options (AβD)**, with only one correct answer, which must be directly observable in the visual content. |
|
|
| ## INSTRUCTIONS: |
| - **Focus on Visual Entities**: The question must test the modelβs ability to recognize **objects, tools, clothing, colors, or explicit scene elements**. |
| - **Ground in Visuals**: All answers must be verifiable by pausing a single frame. Avoid actions, motivations, or temporal reasoning. |
| - **Avoid Abstract Questions**: Only ask about clearly visible physical entities (e.g., 'What is the man holding?' not 'Why is the man holding this?'). |
| - **Natural Language**: Keep questions concise and neutral (e.g., 'What color is the car?' not 'Can you describe the car's color?'). |
| - **Exclusion Questions**: Include at least one 'NOT visible' question per video (e.g., 'Which item is NOT in the scene?'). |
| - **Format**: Format the output as a list of dictionaries with the following keys: |
| - `'Q'`: The question. |
| - `'Options'`: A list of four answer options labeled 'A', 'B', 'C', and 'D'. |
| - `'Answer'`: The correct answer (e.g., `'A'`, `'B'`, etc.). |
|
|
| ## EXAMPLES: |
| 1. {'Q': 'What is the woman wearing on her head?', |
| 'Options': [ |
| 'A. A red hat', |
| 'B. A black scarf', |
| 'C. A white helmet', |
| 'D. No head covering' |
| ], |
| 'Answer': 'A'} |
|
|
| 2. {'Q': 'Which object is NOT visible on the table?', |
| 'Options': [ |
| 'A. A knife', |
| 'B. A plate', |
| 'C. A laptop', |
| 'D. A spoon' |
| ], |
| 'Answer': 'C'} |
|
|
| 3. {'Q': 'What type of vehicle is parked in the driveway?', |
| 'Options': [ |
| 'A. A motorcycle', |
| 'B. A pickup truck', |
| 'C. A bicycle', |
| 'D. A school bus' |
| ], |
| 'Answer': 'B'} |
|
|
| ## GUIDELINES FOR CREATING QUESTIONS: |
| - **Specificity**: Ask about singular, clearly defined objects or attributes (e.g., 'What tool is in the manβs hand?' not 'What tools are visible?'). |
| - **Visual Certainty**: Ensure the correct answer is unambiguous and visible in a single frame. |
| - **Plausible Distractors**: Wrong options should be visually similar (e.g., other kitchen tools if asking about a pan). |
| - **No Implicit Knowledge**: Avoid questions requiring domain knowledge (e.g., 'What brand is the car?' is invalid unless the logo is visible). |
|
|
| ## OUTPUT FORMAT: |
| [{'Q': 'Your question here...', 'Options': ['A. ...', 'B. ...', 'C. ...', 'D. ...'], 'Answer': 'Correct answer here...'}]<|im_end|> |
| <|im_start|>user |
| I have provided you with a video description generated by Qwen2.5-VL. Based on this description and the system instructions, generate **one** high-quality object recognition question-and-answer pair. |
|
|
| ## REQUIREMENTS: |
| - The question must focus on **identifying visible objects, people, or scene attributes** (e.g., colors, tools, clothing). |
| - The answer must be directly observable in the description without any reasoning or inference. |
| - Include at least one 'NOT visible' question type if possible. |
|
|
| ## OUTPUT FORMAT: |
| [{'Q': 'Your question here...', 'Options': ['A. ...', 'B. ...', 'C. ...', 'D. ...'], 'Answer': 'Correct answer here...'}] |
|
|
| **Only return the QA pair in the specified JSON list format.**<|im_end|> |
| <|im_start|>assistant |
|
|
|
|