# OpenProblems Spatial Transcriptomics Agent Rules

## Build & Development Commands

### Viash Component Development
- **Use `viash run` for executing components**: `viash run src/methods/component_name/config.vsh.yaml -- --input_train data.h5ad --output result.h5ad`
- **Build components with Docker engine**: Always specify `--engine docker` for consistent environments
- **Test individual components**: Use `viash test src/methods/component_name/config.vsh.yaml` before integration
- **Run parallel testing**: Execute `viash ns test --parallel --engine docker` for comprehensive validation
- **Validate configurations**: Every component must have a valid `config.vsh.yaml` file
- **Use test data**: Always test with resources from `resources_test/` directory first

### Nextflow Workflow Commands
- **Run workflows locally**: Use `nextflow run workflow.nf` with proper parameters
- **Validate pipeline syntax**: Execute `nextflow config workflow.nf` to check configuration
- **Use profiles**: Specify appropriate profiles with `-profile docker,test` for development
- **Monitor execution**: Use `nextflow log` to track workflow progress and debug issues
- **Resume failed runs**: Apply `-resume` flag to continue from last successful checkpoint

### Docker Integration Commands
- **Build component images**: Use Docker engine through Viash for consistency
- **Test containerized components**: Verify all dependencies are included in containers
- **Push to registries**: Use standardized tagging conventions for component images
- **Validate environments**: Ensure Python/R environments match OpenProblems specifications

## Testing Guidelines

### Component Testing Strategy
- **Run unit tests first**: Execute `viash test` on individual components before integration
- **Test with multiple datasets**: Validate components work across different spatial datasets
- **Validate input/output formats**: Ensure h5ad files maintain proper structure and metadata
- **Test edge cases**: Include empty datasets, single-cell data, and boundary conditions
- **Verify Docker builds**: Confirm all components build successfully in containerized environments

### Integration Testing Approach
- **Test complete workflows**: Run end-to-end pipelines with realistic data sizes
- **Validate metric calculations**: Ensure accuracy metrics produce expected ranges and distributions
- **Test control methods**: Verify positive and negative controls behave as expected
- **Cross-validate results**: Compare outputs across different methods for consistency
- **Performance benchmarking**: Measure execution time and memory usage for scalability

### Quality Assurance Checklist
- **Check GitHub Actions**: Ensure all CI/CD checks pass before merging
- **Validate test coverage**: Confirm critical code paths are tested
- **Review error handling**: Test failure modes and error message clarity
- **Verify reproducibility**: Ensure identical inputs produce identical outputs
- **Test resource requirements**: Validate memory and compute constraints are met

## Code Style & Guidelines

### Viash Component Structure
- **Follow standard layout**: Organize components with `config.vsh.yaml`, `script.py/R`, and `test.py/R`
- **Use descriptive names**: Component names should clearly indicate their function and scope
- **Define clear inputs/outputs**: Specify all required and optional parameters with types
- **Include comprehensive metadata**: Add author, description, keywords, and version information
- **Implement proper logging**: Use structured logging for debugging and monitoring

### Python Code Standards
- **Follow PEP 8**: Use consistent indentation, naming, and formatting
- **Use type hints**: Annotate function parameters and return types
- **Handle AnnData objects**: Follow scanpy/squidpy conventions for spatial data manipulation
- **Implement error handling**: Use try-catch blocks with informative error messages
- **Document functions**: Include docstrings with parameter descriptions and examples

### R Code Standards
- **Use tidyverse conventions**: Apply consistent data manipulation and visualization patterns
- **Handle Seurat objects**: Follow best practices for spatial transcriptomics analysis
- **Implement proper error handling**: Use tryCatch with meaningful error messages
- **Document functions**: Include roxygen2 documentation for all functions
- **Use consistent naming**: Apply snake_case for functions and variables

### Configuration Management
- **Use YAML for configs**: Structure configuration files with clear hierarchies
- **Define resource requirements**: Specify CPU, memory, and disk requirements accurately
- **Include version constraints**: Pin software versions for reproducibility
- **Document parameters**: Provide clear descriptions and default values
- **Validate inputs**: Implement parameter validation and type checking

## Documentation Guidelines

### Component Documentation
- **Write clear descriptions**: Explain the biological/computational problem being addressed
- **Document algorithm details**: Describe the core methodology and implementation approach
- **Provide usage examples**: Include concrete examples with sample data and parameters
- **List dependencies**: Document all required software, packages, and versions
- **Include references**: Cite relevant papers and methodological sources

### Task Documentation Structure
- **Define task motivation**: Explain the biological significance and research gaps addressed
- **Describe datasets**: Detail input data types, formats, and expected characteristics
- **Outline methods**: List implemented methods with brief algorithmic descriptions
- **Specify metrics**: Define evaluation criteria and interpretation guidelines
- **Document controls**: Explain positive and negative control implementations

### Workflow Documentation
- **Create process diagrams**: Visualize workflow steps and data flow
- **Document parameters**: Explain all configurable options and their effects
- **Provide troubleshooting**: Include common issues and resolution strategies
- **List output formats**: Describe all generated files and their contents
- **Include performance notes**: Document expected runtime and resource usage

### API Documentation Standards
- **Use OpenAPI specifications**: Document REST endpoints with complete schemas
- **Provide request/response examples**: Include realistic data samples
- **Document error codes**: Explain all possible error conditions and responses
- **Include authentication**: Detail security requirements and token usage
- **Maintain versioning**: Document API changes and backwards compatibility

## Collaboration & Review Guidelines

### Pull Request Standards
- **Create focused PRs**: Address single features or bug fixes per request
- **Write descriptive titles**: Clearly summarize changes and their purpose
- **Include comprehensive descriptions**: Explain motivation, changes, and testing performed
- **Add reviewers**: Tag appropriate domain experts and maintainers
- **Respond to feedback**: Address review comments promptly and thoroughly

### Code Review Process
- **Review for correctness**: Verify algorithmic implementation and logic
- **Check for consistency**: Ensure adherence to established patterns and conventions
- **Validate testing**: Confirm adequate test coverage and quality
- **Assess documentation**: Review clarity and completeness of documentation
- **Consider performance**: Evaluate computational efficiency and scalability

### Community Engagement
- **Use GitHub discussions**: Engage in technical discussions and feature planning
- **Participate in Discord**: Join real-time conversations and collaboration
- **Follow issue templates**: Use structured formats for bug reports and feature requests
- **Share knowledge**: Contribute to documentation and community resources
- **Mentor newcomers**: Help onboard new contributors to the ecosystem

## Quality Control & Validation

### Data Quality Standards
- **Validate spatial coordinates**: Ensure x,y coordinates are properly formatted and scaled
- **Check gene expression**: Verify count matrices have appropriate ranges and distributions
- **Assess metadata completeness**: Confirm required annotations and sample information
- **Test data integrity**: Validate file formats and cross-reference identifiers
- **Monitor data provenance**: Track data sources and processing steps

### Results Validation Process
- **Cross-method comparison**: Compare results across different algorithmic approaches
- **Statistical validation**: Apply appropriate statistical tests and multiple comparison corrections
- **Biological interpretation**: Ensure results align with known biological principles
- **Reproducibility testing**: Verify consistent results across multiple runs
- **External validation**: Compare against published benchmarks and literature

### Performance Monitoring
- **Track execution metrics**: Monitor runtime, memory usage, and resource consumption
- **Assess scalability**: Test performance across different data sizes and complexities
- **Monitor quality metrics**: Track accuracy, precision, recall, and domain-specific measures
- **Evaluate user experience**: Gather feedback on usability and documentation quality
- **Continuous improvement**: Regularly review and optimize component performance