🕷️ CyberScraper 2077 API

Advanced Web Scraping API with AI-Powered Content Extraction

Version 1.0.0

🚀 Quick Start

  1. Make a simple scrape request to /api/scrape
  2. For multiple requests, create a session first using /api/session
  3. Use the session ID for subsequent requests to /api/session/{session_id}/scrape
  4. Always close sessions when done using DELETE /api/session/{session_id}

📡 API Endpoints

GET /health

Check if the API is running

Example:

curl https://grazieprego-scrapling.hf.space/health

Response:

{
  "status": "ok",
  "message": "CyberScraper 2077 API is running"
}
POST /api/scrape

Stateless scrape request - creates a new extractor for each request

Request Body:

  • url (string) - The URL to scrape
  • query (string) - The extraction query/instruction
  • model_name (string, optional) - AI model to use (default: 'alias-fast')

Example (cURL):

curl -X POST https://grazieprego-scrapling.hf.space/api/scrape \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com",
    "query": "Extract all product prices"
  }'

Example (Python):

import requests

response = requests.post(
    'https://grazieprego-scrapling.hf.space/api/scrape',
    json={
        'url': 'https://example.com',
        'query': 'Extract prices'
    }
)
print(response.json())
POST /api/session

Create a persistent scraping session for multiple requests

Request Body:

  • model_name (string, optional) - AI model to use (default: 'alias-fast')

Example:

curl -X POST https://grazieprego-scrapling.hf.space/api/session \
  -H "Content-Type: application/json" \
  -d '{"model_name": "alias-fast"}'
POST /api/session/{session_id}/scrape

Scrape using an existing session context (more efficient for multiple requests)

Path Parameters:

  • session_id (string) - UUID of the session

Request Body:

  • url (string) - The URL to scrape
  • query (string) - The extraction query
  • model_name (string, optional)

Example:

curl -X POST https://grazieprego-scrapling.hf.space/api/session/uuid-here/scrape \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com/page1",
    "query": "Extract titles"
  }'
DELETE /api/session/{session_id}

Close a session and release resources

Path Parameters:

  • session_id (string) - UUID of the session to close

Example:

curl -X DELETE https://grazieprego-scrapling.hf.space/api/session/uuid-here

💡 Best Practices

  • Use stateless /api/scrape for one-off requests
  • Use sessions for batch processing multiple URLs
  • Always close sessions when finished to free resources
  • Handle errors gracefully (500 errors may occur on complex sites)
  • Set appropriate timeouts for slow-loading pages
  • Implement retry logic for production use

⚠️ Error Handling

  • 404 - Session not found (for session endpoints)
  • 500 - Internal server error - check the detail message

Common Issues:

  • URL unreachable or timeout
  • JavaScript-heavy sites may require different approaches
  • Bot protection may block requests