synthesis / ObjectRecognition /logs /object_0_10000.log

Upload folder using huggingface_hub

55500d6 verified 12 months ago

4.44 kB

	/share/liangzy/miniconda3/envs/vllm/lib/python3.10/site-packages/_distutils_hack/__init__.py:54: UserWarning: Reliance on distutils from stdlib is deprecated. Users must rely on setuptools to provide the distutils module. Avoid importing distutils or import setuptools first, and avoid setting SETUPTOOLS_USE_DISTUTILS=stdlib. Register concerns at https://github.com/pypa/setuptools/issues/new?template=distutils-deprecation.yml
	warnings.warn(
	Total Video Size: 31436
	0%\| \| 0/31436 [00:00<?, ?it/s] 100%\|██████████████████████████████████████████████████████████████████████\| 31436/31436 [00:00<00:00, 847983.72it/s]
	Total Clips Size: 37658
	Start: 0, End: 10000
	to process size: 10000
	Total size: 10000
	Sample show: <\|im_start\|>system
	You are an AI assistant tasked with generating high-quality object recognition questions and answers based on a given video.

	## TASK:
	Your task is to generate one high-quality object recognition question that requires identifying visible objects, people, or immediate scene attributes from the video.

	You must also provide 4 answer options (A–D), with only one correct answer, which must be directly observable in the visual content.

	## INSTRUCTIONS:
	- Focus on Visual Entities: The question must test the model’s ability to recognize objects, tools, clothing, colors, or explicit scene elements.
	- Ground in Visuals: All answers must be verifiable by pausing a single frame. Avoid actions, motivations, or temporal reasoning.
	- Avoid Abstract Questions: Only ask about clearly visible physical entities (e.g., 'What is the man holding?' not 'Why is the man holding this?').
	- Natural Language: Keep questions concise and neutral (e.g., 'What color is the car?' not 'Can you describe the car's color?').
	- Exclusion Questions: Include at least one 'NOT visible' question per video (e.g., 'Which item is NOT in the scene?').
	- Format: Format the output as a list of dictionaries with the following keys:
	- `'Q'`: The question.
	- `'Options'`: A list of four answer options labeled 'A', 'B', 'C', and 'D'.
	- `'Answer'`: The correct answer (e.g., `'A'`, `'B'`, etc.).

	## EXAMPLES:
	1. {'Q': 'What is the woman wearing on her head?',
	'Options': [
	'A. A red hat',
	'B. A black scarf',
	'C. A white helmet',
	'D. No head covering'
	],
	'Answer': 'A'}

	2. {'Q': 'Which object is NOT visible on the table?',
	'Options': [
	'A. A knife',
	'B. A plate',
	'C. A laptop',
	'D. A spoon'
	],
	'Answer': 'C'}

	3. {'Q': 'What type of vehicle is parked in the driveway?',
	'Options': [
	'A. A motorcycle',
	'B. A pickup truck',
	'C. A bicycle',
	'D. A school bus'
	],
	'Answer': 'B'}

	## GUIDELINES FOR CREATING QUESTIONS:
	- Specificity: Ask about singular, clearly defined objects or attributes (e.g., 'What tool is in the man’s hand?' not 'What tools are visible?').
	- Visual Certainty: Ensure the correct answer is unambiguous and visible in a single frame.
	- Plausible Distractors: Wrong options should be visually similar (e.g., other kitchen tools if asking about a pan).
	- No Implicit Knowledge: Avoid questions requiring domain knowledge (e.g., 'What brand is the car?' is invalid unless the logo is visible).

	## OUTPUT FORMAT:
	[{'Q': 'Your question here...', 'Options': ['A. ...', 'B. ...', 'C. ...', 'D. ...'], 'Answer': 'Correct answer here...'}]<\|im_end\|>
	<\|im_start\|>user
	I have provided you with a video description generated by Qwen2.5-VL. Based on this description and the system instructions, generate one high-quality object recognition question-and-answer pair.

	## REQUIREMENTS:
	- The question must focus on identifying visible objects, people, or scene attributes (e.g., colors, tools, clothing).
	- The answer must be directly observable in the description without any reasoning or inference.
	- Include at least one 'NOT visible' question type if possible.

	## OUTPUT FORMAT:
	[{'Q': 'Your question here...', 'Options': ['A. ...', 'B. ...', 'C. ...', 'D. ...'], 'Answer': 'Correct answer here...'}]

	Only return the QA pair in the specified JSON list format.<\|im_end\|>
	<\|im_start\|>assistant