Spaces:
Sleeping
Sleeping
| name: code-review-env | |
| version: "1.0.0" | |
| description: > | |
| A real-world code review environment where an AI agent identifies bugs in Python pull requests. | |
| The agent must find real bugs, avoid false positives, and not approve broken code. | |
| Includes a red herring in the hard task to test false positive resistance. | |
| author: Team Phoenix | |
| tags: | |
| - openenv | |
| - code-review | |
| - real-world | |
| - security | |
| - python | |
| tasks: | |
| - id: easy | |
| description: Find 3 bugs in a simple Python data processing function | |
| difficulty: easy | |
| max_steps: 8 | |
| - id: medium | |
| description: Find 4 security vulnerabilities in a Python web API endpoint | |
| difficulty: medium | |
| max_steps: 15 | |
| - id: hard | |
| description: Find 6 security and architectural bugs across 3 files in an async cryptographic service while avoiding a red herring | |
| difficulty: hard | |
| max_steps: 25 | |
| observation_space: | |
| type: object | |
| fields: | |
| task_id: str | |
| language: str | |
| pr_title: str | |
| pr_description: str | |
| code_diff: str | |
| full_file: str | |
| existing_comments: list | |
| step_number: int | |
| max_steps: int | |
| review_status: str | |
| action_space: | |
| operations: | |
| - add_comment | |
| - approve | |
| - request_changes | |
| - done | |
| - inspect_file | |
| - inspect_lines | |
| fields: | |
| line_number: int (required for add_comment) | |
| severity: str (critical|major|minor|nit) | |
| category: str (bug|security|performance|style) | |
| message: str | |
| summary: str (required for approve and request_changes) | |