Mokhtar commited on
Commit
c1aab9f
·
1 Parent(s): e4721a6

Readme Updates

Browse files
Files changed (1) hide show
  1. README.md +10 -54
README.md CHANGED
@@ -1,58 +1,14 @@
1
- # Image Captioning with SOTA Models
 
 
 
 
 
 
 
 
2
 
3
- This project provides a unified API for Image Captioning using various State-of-the-Art (SOTA) models as well as a custom ResNet+GPT2 implementation.
4
 
5
- ## Supported Models
6
 
7
- 1. **BLIP (Bootstrapping Language-Image Pre-training)**
8
- * Model: `Salesforce/blip-image-captioning-large`
9
- * Status: **Default** (Best Performance)
10
- * Description: Produces highly accurate and detailed captions.
11
 
12
- 2. **ViT-GPT2**
13
- * Model: `nlpconnect/vit-gpt2-image-captioning`
14
- * Status: Available
15
- * Description: Uses Vision Transformer (ViT) encoder and GPT-2 decoder.
16
-
17
- 3. **ResNet50 + GPT-2 (Custom)**
18
- * Model: Custom implementation trained from scratch.
19
- * Status: Legacy / Experimental
20
- * Description: Good for learning purposes or custom datasets.
21
-
22
- ## Installation
23
-
24
- 1. Clone the repository.
25
- 2. Install dependencies:
26
- ```bash
27
- pip install -r requirements.txt
28
- ```
29
-
30
- ## Configuration
31
-
32
- Edit `config/config.py` to select the model:
33
-
34
- ```python
35
- class Config:
36
- # ...
37
- MODEL_TYPE = "blip" # Options: "blip", "vit_gpt2", "resnet_gpt2"
38
- ```
39
-
40
- ## Running the API
41
-
42
- Start the FastAPI server:
43
-
44
- ```bash
45
- python main.py --mode api
46
- ```
47
-
48
- Open your browser at `http://localhost:8001` to use the drag-and-drop interface.
49
-
50
- ## Training (ResNet+GPT2 only)
51
-
52
- To train the custom model:
53
-
54
- 1. Set `MODEL_TYPE = "resnet_gpt2"` in config.
55
- 2. Run:
56
- ```bash
57
- python main.py --mode train
58
- ```
 
1
+ ---
2
+ title: Captioning
3
+ emoji: 🐨
4
+ colorFrom: gray
5
+ colorTo: red
6
+ sdk: docker
7
+ pinned: false
8
+ license: mit
9
+ ---
10
 
11
+ Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
12
 
 
13
 
 
 
 
 
14