hashF commited on
Commit
5db7150
·
verified ·
1 Parent(s): b03b3c3

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +70 -0
README.md ADDED
@@ -0,0 +1,70 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ language:
4
+ - ar
5
+ - en
6
+ library_name: keras
7
+ ---
8
+ # Arabic Semantic / Sentiment Classification using BiLSTM
9
+
10
+ This repository contains a TensorFlow/Keras-based **Bidirectional LSTM (BiLSTM)** model for Arabic text classification.
11
+ The model is designed for **binary classification tasks** such as sentiment or semantic polarity detection.
12
+
13
+ ## Overview
14
+ - **Language:** Arabic
15
+ - **Task:** Binary text classification
16
+ - **Model:** BiLSTM neural network
17
+ - **Framework:** TensorFlow / Keras
18
+ - **Focus:** Emoji-aware preprocessing and Arabic stemming
19
+
20
+ This project combines classical NLP preprocessing with deep learning to handle informal Arabic text, including emojis.
21
+
22
+ ---
23
+
24
+ ## Model Architecture
25
+ The neural network architecture consists of:
26
+
27
+ - Embedding layer (vocabulary size = 10,000, embedding dim = 128)
28
+ - Bidirectional LSTM (128 units, return sequences)
29
+ - Dropout (0.5)
30
+ - Bidirectional LSTM (64 units)
31
+ - Dense layer (32 units, ReLU)
32
+ - Output layer (1 unit, Sigmoid)
33
+
34
+ Loss function: **Binary Crossentropy**
35
+ Optimizer: **Adam (lr = 0.001)**
36
+
37
+ ---
38
+
39
+ ## Preprocessing Pipeline
40
+ The preprocessing steps are critical and must be applied **exactly as during training**:
41
+
42
+ 1. Emoji conversion using `demoji`
43
+ 2. Whitespace and regex normalization
44
+ 3. Tokenization using NLTK
45
+ 4. Arabic stemming using **ISRIStemmer**
46
+ 5. Keras tokenization and padding (max length = 100)
47
+
48
+ This pipeline allows the model to better handle:
49
+ - Informal Arabic
50
+ - Social media text
51
+ - Emoji-heavy content
52
+
53
+ ---
54
+
55
+ ## Files in This Repository
56
+
57
+ | File | Description |
58
+ |-----|------------|
59
+ | `lstm_text_model.h5` | Trained BiLSTM model |
60
+ | `tokenizer.pkl` | Keras tokenizer (must match training) |
61
+ | `label_encoder.pkl` | Label encoder for output mapping |
62
+ | `requirements.txt` | Python dependencies |
63
+
64
+ ---
65
+
66
+ ## How to Use the Model
67
+
68
+ ### Installation
69
+ ```bash
70
+ pip install -r requirements.txt