| # LLM Models for Deepfake Annotation |
|
|
| ## Overview |
|
|
| The pipeline now includes **6 LLM options** in individual cells for easy comparison: |
|
|
| 1. **Deepseek** - Testing (use first!) |
| 2. **Qwen (API)** - Chinese (Alibaba Cloud) |
| 3. **Llama** - American (Meta) |
| 4. **Mixtral** - French (Mistral AI) |
| 5. **Gemma** - American Open Source (Google) |
| 6. **Qwen-2.5-32B Local** - FREE local inference (NEW!) |
|
|
| ## The 6 LLMs |
|
|
| ### 1. Deepseek (Testing) |
| **Cell 10** |
|
|
| - **Model**: deepseek-chat |
| - **Provider**: DeepSeek |
| - **API**: https://platform.deepseek.com/ |
| - **Cost**: ~$0.14-0.28 per 1M tokens (~$1-2 for 10k entries) |
| - **Use case**: **Test this first!** Cheapest option to verify pipeline works |
| - **API Key**: `misc/credentials/deepseek_api_key.txt` |
|
|
| --- |
|
|
| ### 2. Qwen API (Chinese) |
| **Cells 11-12** |
|
|
| - **Model**: qwen-max (automatically uses Qwen3-Max) |
| - **Provider**: Alibaba Cloud DashScope |
| - **API**: https://dashscope.aliyun.com/ |
| - **Cost**: Variable (check Alibaba pricing) |
| - **Use case**: Chinese company, strong multilingual support |
| - **API Key**: `misc/credentials/qwen_api_key.txt` |
| - **Note**: Uses latest Qwen3-Max when you specify `qwen-max` |
|
|
| --- |
|
|
| ### 6. Qwen-2.5-32B Local (FREE!) |
| **Cells 19-20** (NEW!) |
|
|
| - **Model**: qwen2.5:32b-instruct |
| - **Provider**: Ollama (local inference) |
| - **Setup**: https://ollama.com/ |
| - **Cost**: **$0** (FREE - no API costs!) |
| - **Requirements**: |
| - A100 80GB GPU (or similar) |
| - ~25GB VRAM during inference |
| - ~20GB storage for model download |
| - Ollama installed |
| - **Speed**: 5-10 tokens/sec on A100 (~100-200 samples/hour) |
| - **Use case**: |
| - β
Large datasets (>1000 samples) where cost matters |
| - β
Privacy-sensitive research data |
| - β
Offline processing |
| - β
Strong multilingual support |
| - **Setup guide**: See `QWEN_LOCAL_SETUP.md` |
|
|
| --- |
|
|
| ### 3. Llama (American) |
| **Cells 13-14** |
|
|
| - **Model**: meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo |
| - **Provider**: Together AI (hosting Meta's model) |
| - **Developer**: Meta (American) |
| - **API**: https://www.together.ai/ |
| - **Cost**: ~$0.90 per 1M tokens (~$5-10 for 10k entries) |
| - **Use case**: Open-source American model, good quality |
| - **API Key**: `misc/credentials/together_api_key.txt` |
|
|
| --- |
|
|
| ### 4. Mixtral (French) |
| **Cells 15-16** |
|
|
| - **Model**: open-mixtral-8x22b |
| - **Provider**: Mistral AI |
| - **Developer**: Mistral AI (French) |
| - **API**: https://mistral.ai/ |
| - **Cost**: ~$2 per 1M tokens (~$10-20 for 10k entries) |
| - **Use case**: European alternative, Mixture-of-Experts architecture |
| - **API Key**: `misc/credentials/mistral_api_key.txt` |
| - **Note**: Using open-mixtral-8x22b (cheaper than mistral-large) |
|
|
| --- |
|
|
| ### 5. Gemma (American Open Source) |
| **Cells 17-18** |
|
|
| - **Model**: google/gemma-2-27b-it |
| - **Provider**: Together AI (hosting Google's model) |
| - **Developer**: Google (American) |
| - **API**: https://www.together.ai/ (same as Llama) |
| - **Cost**: ~$0.80 per 1M tokens (~$4-8 for 10k entries) |
| - **Use case**: American open-source alternative, competitive quality |
| - **API Key**: `misc/credentials/together_api_key.txt` (same as Llama) |
| - **Note**: Fully open-source, can be self-hosted |
|
|
| --- |
|
|
| ## Cost Comparison (10,000 entries) |
|
|
| | Model | Provider | Cost | Time | Origin | |
| |-------|----------|------|------|--------| |
| | **Qwen-2.5-32B Local** | Ollama (local) | **$0** | ~50-100 hrs | π¨π³ Chinese | |
| | **Deepseek** | DeepSeek | ~$1-2 | ~5-10 hrs | π¨π³ Chinese | |
| | **Gemma 2** | Together AI | ~$4-8 | ~5-10 hrs | πΊπΈ American (open) | |
| | **Llama 3.1** | Together AI | ~$5-10 | ~5-10 hrs | πΊπΈ American (open) | |
| | **Mixtral** | Mistral AI | ~$10-20 | ~5-10 hrs | π«π· French (open) | |
| | **Qwen API** | Alibaba | Variable | ~5-10 hrs | π¨π³ Chinese | |
|
|
| **Note**: Local inference is FREE but slower. Good for large datasets where cost matters more than time. |
|
|
| ## Recommended Testing Order |
|
|
| ### 1. Start with Deepseek |
| ```python |
| # Cell 10 |
| TEST_MODE = True |
| TEST_SIZE = 10 |
| ``` |
| - **Why**: Cheapest, verify pipeline works |
| - **Cost**: Pennies for 10 samples |
|
|
| ### 2. Compare on Small Sample |
| Pick 2-3 models and run on same 100 samples: |
| ```python |
| # In each cell: |
| TEST_MODE = True |
| TEST_SIZE = 100 |
| ``` |
|
|
| **Good combinations:** |
| - Budget: Deepseek + Gemma |
| - Quality: Llama + Mixtral |
| - Geographic: Qwen + Llama + Mixtral |
|
|
| ### 3. Production Run |
| Choose best model from testing and run full dataset: |
| ```python |
| TEST_MODE = False |
| MAX_ROWS = None # or 20000 |
| ``` |
|
|
| ## API Key Setup |
|
|
| ### For Deepseek & Qwen (separate keys): |
| ```bash |
| echo "your-deepseek-key" > misc/credentials/deepseek_api_key.txt |
| echo "your-qwen-key" > misc/credentials/qwen_api_key.txt |
| ``` |
|
|
| ### For Llama & Gemma (same Together AI key): |
| ```bash |
| echo "your-together-key" > misc/credentials/together_api_key.txt |
| ``` |
| Both Llama and Gemma use the same Together AI key! |
|
|
| ### For Mixtral: |
| ```bash |
| echo "your-mistral-key" > misc/credentials/mistral_api_key.txt |
| ``` |
|
|
| ## Output Files |
|
|
| Each LLM saves to a separate file: |
|
|
| ``` |
| data/CSV/ |
| βββ deepseek_annotated_POI_test.csv # Deepseek test |
| βββ deepseek_annotated_POI.csv # Deepseek full |
| βββ qwen_annotated_POI_test.csv # Qwen API test |
| βββ qwen_annotated_POI.csv # Qwen API full |
| βββ qwen_local_annotated_POI_test.csv # Qwen Local test (NEW!) |
| βββ qwen_local_annotated_POI.csv # Qwen Local full (NEW!) |
| βββ llama_annotated_POI_test.csv # Llama test |
| βββ llama_annotated_POI.csv # Llama full |
| βββ mixtral_annotated_POI_test.csv # Mixtral test |
| βββ mixtral_annotated_POI.csv # Mixtral full |
| βββ gemma_annotated_POI_test.csv # Gemma test |
| βββ gemma_annotated_POI.csv # Gemma full |
| ``` |
|
|
| ## Comparing Results |
|
|
| After running multiple LLMs, compare results: |
|
|
| ```python |
| import pandas as pd |
| |
| # Load results from different models |
| deepseek_df = pd.read_csv('data/CSV/deepseek_annotated_POI_test.csv') |
| qwen_df = pd.read_csv('data/CSV/qwen_annotated_POI_test.csv') |
| qwen_local_df = pd.read_csv('data/CSV/qwen_local_annotated_POI_test.csv') # NEW! |
| llama_df = pd.read_csv('data/CSV/llama_annotated_POI_test.csv') |
| mixtral_df = pd.read_csv('data/CSV/mixtral_annotated_POI_test.csv') |
| gemma_df = pd.read_csv('data/CSV/gemma_annotated_POI_test.csv') |
| |
| # Compare profession distributions |
| print("Deepseek professions:", deepseek_df['profession_llm'].value_counts().head()) |
| print("Qwen API professions:", qwen_df['profession_llm'].value_counts().head()) |
| print("Qwen Local professions:", qwen_local_df['profession_llm'].value_counts().head()) # NEW! |
| print("Llama professions:", llama_df['profession_llm'].value_counts().head()) |
| print("Mixtral professions:", mixtral_df['profession_llm'].value_counts().head()) |
| print("Gemma professions:", gemma_df['profession_llm'].value_counts().head()) |
| |
| # Compare specific cases |
| print("\nIrene identification:") |
| print("Deepseek:", deepseek_df[deepseek_df['real_name'] == 'Irene']['full_name'].values) |
| print("Qwen API:", qwen_df[qwen_df['real_name'] == 'Irene']['full_name'].values) |
| print("Qwen Local:", qwen_local_df[qwen_local_df['real_name'] == 'Irene']['full_name'].values) |
| print("Llama:", llama_df[llama_df['real_name'] == 'Irene']['full_name'].values) |
| print("Mixtral:", mixtral_df[mixtral_df['real_name'] == 'Irene']['full_name'].values) |
| print("Gemma:", gemma_df[gemma_df['real_name'] == 'Irene']['full_name'].values) |
| ``` |
|
|
| ## Model Characteristics |
|
|
| ### Deepseek |
| - β
Very cheap |
| - β
Good for testing |
| - β οΈ Less documentation |
| - π¨π³ Chinese company |
|
|
| ### Qwen (Qwen3-Max) |
| - β
Latest version automatically used |
| - β
Strong multilingual |
| - β
Good Asian name recognition |
| - π° Variable cost |
| - π¨π³ Chinese company (Alibaba) |
|
|
| ### Llama 3.1 70B |
| - β
Open-source |
| - β
Strong overall performance |
| - β
Well-documented |
| - β
American (Meta) |
| - π° Mid-range cost |
|
|
| ### Mixtral 8x22B |
| - β
Open-source |
| - β
MoE architecture (efficient) |
| - β
European alternative |
| - π° Mid-range cost |
| - π«π· French company |
|
|
| ### Gemma 2 27B |
| - β
Fully open-source |
| - β
Can self-host |
| - β
American (Google) |
| - β
Cheap via API |
| - β
Good quality for size |
|
|
| ### Qwen-2.5-32B Local (NEW!) |
| - β
**FREE** - $0 cost (no API fees) |
| - β
**FAST** - Local inference on A100 (5-10 tokens/sec) |
| - β
**PRIVATE** - Data never leaves your machine |
| - β
**OFFLINE** - Works without internet |
| - β
**HIGH QUALITY** - 32B parameter model |
| - β
Strong multilingual support |
| - β οΈ Requires: A100 80GB GPU, ~25GB VRAM, Ollama installed |
| - π¨π³ Chinese company (Alibaba) |
| - π¦ Model size: ~20GB download |
|
|
| ## Decision Matrix |
|
|
| ### If you prioritize... |
|
|
| **FREE / Zero Cost**: Use **Qwen-2.5-32B Local** (no API fees!) |
|
|
| **Cost** (with API): Use **Deepseek** or **Gemma** |
|
|
| **Quality**: Use **Qwen-2.5-32B Local**, **Llama**, or **Mixtral** |
|
|
| **Privacy**: Use **Qwen-2.5-32B Local** (data stays on your machine) |
|
|
| **American/Open Source**: Use **Gemma** or **Llama** |
|
|
| **Asian Names**: Use **Qwen** (API or Local - strong multilingual) |
|
|
| **European Provider**: Use **Mixtral** |
|
|
| **Testing**: Use **Deepseek** first, always! |
|
|
| ## Running Multiple Models |
|
|
| You can run all 6 models in sequence: |
|
|
| ```python |
| # 1. Run Cell 10 (Deepseek) - verify works (~$1-2 for 10k) |
| # 2. Run Cell 12 (Qwen API) - Chinese perspective (~variable cost) |
| # 3. Run Cell 14 (Llama) - American perspective (~$5-10 for 10k) |
| # 4. Run Cell 16 (Mixtral) - European perspective (~$10-20 for 10k) |
| # 5. Run Cell 18 (Gemma) - Open source perspective (~$4-8 for 10k) |
| # 6. Run Cell 20 (Qwen-2.5-32B Local) - FREE local inference ($0!) |
| ``` |
|
|
| Each saves to its own file, so you can compare results! |
|
|
| ## Notes |
|
|
| - **Llama and Gemma use the same API key** (Together AI) |
| - All models use the **same 9 profession categories** |
| - All models have **automatic retries** with exponential backoff |
| - All models **save progress** every 10 rows |
| - All models are **resumable** if interrupted |
|
|
| ## Summary |
|
|
| You now have **6 LLM options** to choose from: |
|
|
| 1. π§ͺ **Deepseek** - Test first (cheapest API) |
| 2. π¨π³ **Qwen3-Max API** - Chinese, strong multilingual |
| 3. πΊπΈ **Llama 3.1 70B** - American, open-source |
| 4. π«π· **Mixtral 8x22B** - French, open-source MoE |
| 5. πΊπΈ **Gemma 2 27B** - American open-source (Google) |
| 6. π° **Qwen-2.5-32B Local** - FREE local inference (NEW!) |
|
|
| Each in its own cell, easy to run and compare! π |
|
|
| **Recommended workflow**: |
| 1. Test with Deepseek (Cell 10) - verify pipeline works |
| 2. For small datasets (<1000): Use API (Deepseek/Gemma/Llama) |
| 3. For large datasets (>1000): Use Qwen-2.5-32B Local (Cell 20) - FREE! |
|
|