2026-04-08 18:01:30 - medigenius - INFO - DatabaseService initialized INFO:medigenius:DatabaseService initialized 2026-04-08 18:01:34 - medigenius - INFO - ChatService initialized INFO:medigenius:ChatService initialized =========================================================== Start Build Offline Database from E:\AI\DeepMed\backend\data Target vector DB: E:\AI\DeepMed\backend\data\chroma_db =========================================================== --- Logging error --- Traceback (most recent call last): File "C:\Users\Pham Ba Thuong\AppData\Local\Programs\Python\Python311\Lib\logging\__init__.py", line 1113, in emit stream.write(msg + self.terminator) File "C:\Users\Pham Ba Thuong\AppData\Local\Programs\Python\Python311\Lib\encodings\cp1252.py", line 19, in encode return codecs.charmap_encode(input,self.errors,encoding_table)[0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ UnicodeEncodeError: 'charmap' codec can't encode character '\u1ea3' in position 47: character maps to Call stack: File "E:\AI\DeepMed\build_db_offline.py", line 25, in _build_retrievers(DATA_DIR, CHROMA_DB_PATH) File "E:\AI\DeepMed\backend\app\services\chat_service.py", line 208, in _build_retrievers logger.info("--- T\u1ea3i Embedding Model ---") Message: '--- T\u1ea3i Embedding Model ---' Arguments: () 2026-04-08 18:01:35 - medigenius - INFO - --- T\u1ea3i Embedding Model --- --- Logging error --- Traceback (most recent call last): File "C:\Users\Pham Ba Thuong\AppData\Local\Programs\Python\Python311\Lib\logging\__init__.py", line 1113, in emit stream.write(msg + self.terminator) File "C:\Users\Pham Ba Thuong\AppData\Local\Programs\Python\Python311\Lib\encodings\cp1252.py", line 19, in encode return codecs.charmap_encode(input,self.errors,encoding_table)[0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ UnicodeEncodeError: 'charmap' codec can't encode character '\u1ea3' in position 21: character maps to Call stack: File "E:\AI\DeepMed\build_db_offline.py", line 25, in _build_retrievers(DATA_DIR, CHROMA_DB_PATH) File "E:\AI\DeepMed\backend\app\services\chat_service.py", line 208, in _build_retrievers logger.info("--- T\u1ea3i Embedding Model ---") Message: '--- T\u1ea3i Embedding Model ---' Arguments: () INFO:sentence_transformers.SentenceTransformer:Load pretrained SentenceTransformer: sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2 --- Logging error --- Traceback (most recent call last): File "C:\Users\Pham Ba Thuong\AppData\Local\Programs\Python\Python311\Lib\logging\__init__.py", line 1113, in emit stream.write(msg + self.terminator) File "C:\Users\Pham Ba Thuong\AppData\Local\Programs\Python\Python311\Lib\encodings\cp1252.py", line 19, in encode return codecs.charmap_encode(input,self.errors,encoding_table)[0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ UnicodeEncodeError: 'charmap' codec can't encode character '\u1eaf' in position 47: character maps to Call stack: File "E:\AI\DeepMed\build_db_offline.py", line 25, in _build_retrievers(DATA_DIR, CHROMA_DB_PATH) File "E:\AI\DeepMed\backend\app\services\chat_service.py", line 216, in _build_retrievers raw_docs = _load_documents_from_folder(data_path) File "E:\AI\DeepMed\backend\app\services\chat_service.py", line 144, in _load_documents_from_folder logger.info("--- B\u1eaft \u0111\u1ea7u quét th\u01b0 m\u1ee5c: %s ---", folder_path) Message: '--- B\u1eaft \u0111\u1ea7u quét th\u01b0 m\u1ee5c: %s ---' Arguments: ('E:\\AI\\DeepMed\\backend\\data',) 2026-04-08 18:01:45 - medigenius - INFO - --- B\u1eaft \u0111\u1ea7u quét th\u01b0 m\u1ee5c: E:\AI\DeepMed\backend\data --- --- Logging error --- Traceback (most recent call last): File "C:\Users\Pham Ba Thuong\AppData\Local\Programs\Python\Python311\Lib\logging\__init__.py", line 1113, in emit stream.write(msg + self.terminator) File "C:\Users\Pham Ba Thuong\AppData\Local\Programs\Python\Python311\Lib\encodings\cp1252.py", line 19, in encode return codecs.charmap_encode(input,self.errors,encoding_table)[0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ UnicodeEncodeError: 'charmap' codec can't encode character '\u1eaf' in position 21: character maps to Call stack: File "E:\AI\DeepMed\build_db_offline.py", line 25, in _build_retrievers(DATA_DIR, CHROMA_DB_PATH) File "E:\AI\DeepMed\backend\app\services\chat_service.py", line 216, in _build_retrievers raw_docs = _load_documents_from_folder(data_path) File "E:\AI\DeepMed\backend\app\services\chat_service.py", line 144, in _load_documents_from_folder logger.info("--- B\u1eaft \u0111\u1ea7u quét th\u01b0 m\u1ee5c: %s ---", folder_path) Message: '--- B\u1eaft \u0111\u1ea7u quét th\u01b0 m\u1ee5c: %s ---' Arguments: ('E:\\AI\\DeepMed\\backend\\data',) WARNING:pypdf._reader:Ignoring wrong pointing object 49 0 (offset 0) WARNING:pypdf._reader:Ignoring wrong pointing object 55 0 (offset 0) WARNING:pypdf._reader:Ignoring wrong pointing object 68 0 (offset 0) WARNING:pypdf._reader:Ignoring wrong pointing object 90 0 (offset 0) WARNING:pypdf._reader:Ignoring wrong pointing object 101 0 (offset 0) WARNING:pypdf._reader:Ignoring wrong pointing object 208 0 (offset 0) WARNING:pypdf._reader:Ignoring wrong pointing object 210 0 (offset 0) WARNING:pypdf._reader:Ignoring wrong pointing object 257 0 (offset 0) WARNING:pypdf._reader:Ignoring wrong pointing object 265 0 (offset 0) WARNING:pypdf._reader:Ignoring wrong pointing object 426 0 (offset 0) WARNING:pypdf._reader:Ignoring wrong pointing object 622 0 (offset 0) WARNING:pypdf._reader:Ignoring wrong pointing object 623 0 (offset 0) WARNING:pypdf._reader:Ignoring wrong pointing object 625 0 (offset 0) WARNING:pypdf._reader:Ignoring wrong pointing object 626 0 (offset 0) WARNING:pypdf._reader:Ignoring wrong pointing object 627 0 (offset 0) WARNING:pypdf._reader:Ignoring wrong pointing object 633 0 (offset 0) WARNING:pypdf._reader:Ignoring wrong pointing object 647 0 (offset 0) WARNING:pypdf._reader:Ignoring wrong pointing object 648 0 (offset 0) WARNING:pypdf._reader:Ignoring wrong pointing object 650 0 (offset 0) WARNING:pypdf._reader:Ignoring wrong pointing object 651 0 (offset 0) WARNING:pypdf._reader:Ignoring wrong pointing object 655 0 (offset 0) WARNING:pypdf._reader:Ignoring wrong pointing object 656 0 (offset 0) WARNING:pypdf._reader:Ignoring wrong pointing object 658 0 (offset 0) WARNING:pypdf._reader:Ignoring wrong pointing object 659 0 (offset 0) WARNING:pypdf._reader:Ignoring wrong pointing object 10 0 (offset 0) WARNING:pypdf._reader:Ignoring wrong pointing object 13 0 (offset 0) WARNING:pypdf._reader:Ignoring wrong pointing object 21 0 (offset 0) WARNING:pypdf._reader:Ignoring wrong pointing object 56 0 (offset 0) WARNING:pypdf._reader:Ignoring wrong pointing object 80 0 (offset 0) WARNING:pypdf._reader:Ignoring wrong pointing object 345 0 (offset 0) WARNING:pypdf._reader:Ignoring wrong pointing object 16 0 (offset 0) WARNING:pypdf._reader:Ignoring wrong pointing object 93 0 (offset 0) WARNING:pypdf._reader:Ignoring wrong pointing object 150 0 (offset 0) WARNING:pypdf._reader:Ignoring wrong pointing object 207 0 (offset 0) WARNING:pypdf._reader:Ignoring wrong pointing object 734 0 (offset 0) WARNING:pypdf._reader:Ignoring wrong pointing object 883 0 (offset 0) --- Logging error --- Traceback (most recent call last): File "C:\Users\Pham Ba Thuong\AppData\Local\Programs\Python\Python311\Lib\logging\__init__.py", line 1113, in emit stream.write(msg + self.terminator) File "C:\Users\Pham Ba Thuong\AppData\Local\Programs\Python\Python311\Lib\encodings\cp1252.py", line 19, in encode return codecs.charmap_encode(input,self.errors,encoding_table)[0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ UnicodeEncodeError: 'charmap' codec can't encode character '\u1ed5' in position 43: character maps to Call stack: File "E:\AI\DeepMed\build_db_offline.py", line 25, in _build_retrievers(DATA_DIR, CHROMA_DB_PATH) File "E:\AI\DeepMed\backend\app\services\chat_service.py", line 216, in _build_retrievers raw_docs = _load_documents_from_folder(data_path) File "E:\AI\DeepMed\backend\app\services\chat_service.py", line 187, in _load_documents_from_folder logger.info("T\u1ed5ng c\u1ed9ng \u0111ã load: %d tài li\u1ec7u g\u1ed1c.", len(documents)) Message: 'T\u1ed5ng c\u1ed9ng \u0111ã load: %d tài li\u1ec7u g\u1ed1c.' Arguments: (7963,) 2026-04-09 07:57:54 - medigenius - INFO - T\u1ed5ng c\u1ed9ng \u0111ã load: 7963 tài li\u1ec7u g\u1ed1c. --- Logging error --- Traceback (most recent call last): File "C:\Users\Pham Ba Thuong\AppData\Local\Programs\Python\Python311\Lib\logging\__init__.py", line 1113, in emit stream.write(msg + self.terminator) File "C:\Users\Pham Ba Thuong\AppData\Local\Programs\Python\Python311\Lib\encodings\cp1252.py", line 19, in encode return codecs.charmap_encode(input,self.errors,encoding_table)[0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ UnicodeEncodeError: 'charmap' codec can't encode character '\u1ed5' in position 17: character maps to Call stack: File "E:\AI\DeepMed\build_db_offline.py", line 25, in _build_retrievers(DATA_DIR, CHROMA_DB_PATH) File "E:\AI\DeepMed\backend\app\services\chat_service.py", line 216, in _build_retrievers raw_docs = _load_documents_from_folder(data_path) File "E:\AI\DeepMed\backend\app\services\chat_service.py", line 187, in _load_documents_from_folder logger.info("T\u1ed5ng c\u1ed9ng \u0111ã load: %d tài li\u1ec7u g\u1ed1c.", len(documents)) Message: 'T\u1ed5ng c\u1ed9ng \u0111ã load: %d tài li\u1ec7u g\u1ed1c.' Arguments: (7963,) 2026-04-09 07:57:57 - medigenius - INFO - Split vào 30311 chunks. INFO:medigenius:Split vào 30311 chunks. --- Logging error --- Traceback (most recent call last): File "C:\Users\Pham Ba Thuong\AppData\Local\Programs\Python\Python311\Lib\logging\__init__.py", line 1113, in emit stream.write(msg + self.terminator) File "C:\Users\Pham Ba Thuong\AppData\Local\Programs\Python\Python311\Lib\encodings\cp1252.py", line 19, in encode return codecs.charmap_encode(input,self.errors,encoding_table)[0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ UnicodeEncodeError: 'charmap' codec can't encode character '\u1ea1' in position 47: character maps to Call stack: File "E:\AI\DeepMed\build_db_offline.py", line 25, in _build_retrievers(DATA_DIR, CHROMA_DB_PATH) File "E:\AI\DeepMed\backend\app\services\chat_service.py", line 239, in _build_retrievers logger.info("--- T\u1ea1o Index d\u1eef li\u1ec7u m\u1edbi vào ChromaDB ---") Message: '--- T\u1ea1o Index d\u1eef li\u1ec7u m\u1edbi vào ChromaDB ---' Arguments: () 2026-04-09 07:57:58 - medigenius - INFO - --- T\u1ea1o Index d\u1eef li\u1ec7u m\u1edbi vào ChromaDB --- --- Logging error --- Traceback (most recent call last): File "C:\Users\Pham Ba Thuong\AppData\Local\Programs\Python\Python311\Lib\logging\__init__.py", line 1113, in emit stream.write(msg + self.terminator) File "C:\Users\Pham Ba Thuong\AppData\Local\Programs\Python\Python311\Lib\encodings\cp1252.py", line 19, in encode return codecs.charmap_encode(input,self.errors,encoding_table)[0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ UnicodeEncodeError: 'charmap' codec can't encode character '\u1ea1' in position 21: character maps to Call stack: File "E:\AI\DeepMed\build_db_offline.py", line 25, in _build_retrievers(DATA_DIR, CHROMA_DB_PATH) File "E:\AI\DeepMed\backend\app\services\chat_service.py", line 239, in _build_retrievers logger.info("--- T\u1ea1o Index d\u1eef li\u1ec7u m\u1edbi vào ChromaDB ---") Message: '--- T\u1ea1o Index d\u1eef li\u1ec7u m\u1edbi vào ChromaDB ---' Arguments: () --- Logging error --- Traceback (most recent call last): File "C:\Users\Pham Ba Thuong\AppData\Local\Programs\Python\Python311\Lib\logging\__init__.py", line 1113, in emit stream.write(msg + self.terminator) File "C:\Users\Pham Ba Thuong\AppData\Local\Programs\Python\Python311\Lib\encodings\cp1252.py", line 19, in encode return codecs.charmap_encode(input,self.errors,encoding_table)[0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ UnicodeEncodeError: 'charmap' codec can't encode character '\u0110' in position 42: character maps to Call stack: File "E:\AI\DeepMed\build_db_offline.py", line 25, in _build_retrievers(DATA_DIR, CHROMA_DB_PATH) File "E:\AI\DeepMed\backend\app\services\chat_service.py", line 254, in _build_retrievers logger.info("\u0110ã mã hóa và \u0111\u01b0a %d documents vào ChromaDB (Ti\u1ebfn \u0111\u1ed9: %d/%d)", len(batch), min(i + batch_size, len(splits)), len(splits)) Message: '\u0110ã mã hóa và \u0111\u01b0a %d documents vào ChromaDB (Ti\u1ebfn \u0111\u1ed9: %d/%d)' Arguments: (5000, 5000, 30311) 2026-04-09 08:01:32 - medigenius - INFO - \u0110ã mã hóa và \u0111\u01b0a 5000 documents vào ChromaDB (Ti\u1ebfn \u0111\u1ed9: 5000/30311) --- Logging error --- Traceback (most recent call last): File "C:\Users\Pham Ba Thuong\AppData\Local\Programs\Python\Python311\Lib\logging\__init__.py", line 1113, in emit stream.write(msg + self.terminator) File "C:\Users\Pham Ba Thuong\AppData\Local\Programs\Python\Python311\Lib\encodings\cp1252.py", line 19, in encode return codecs.charmap_encode(input,self.errors,encoding_table)[0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ UnicodeEncodeError: 'charmap' codec can't encode character '\u0110' in position 16: character maps to Call stack: File "E:\AI\DeepMed\build_db_offline.py", line 25, in _build_retrievers(DATA_DIR, CHROMA_DB_PATH) File "E:\AI\DeepMed\backend\app\services\chat_service.py", line 254, in _build_retrievers logger.info("\u0110ã mã hóa và \u0111\u01b0a %d documents vào ChromaDB (Ti\u1ebfn \u0111\u1ed9: %d/%d)", len(batch), min(i + batch_size, len(splits)), len(splits)) Message: '\u0110ã mã hóa và \u0111\u01b0a %d documents vào ChromaDB (Ti\u1ebfn \u0111\u1ed9: %d/%d)' Arguments: (5000, 5000, 30311) --- Logging error --- Traceback (most recent call last): File "C:\Users\Pham Ba Thuong\AppData\Local\Programs\Python\Python311\Lib\logging\__init__.py", line 1113, in emit stream.write(msg + self.terminator) File "C:\Users\Pham Ba Thuong\AppData\Local\Programs\Python\Python311\Lib\encodings\cp1252.py", line 19, in encode return codecs.charmap_encode(input,self.errors,encoding_table)[0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ UnicodeEncodeError: 'charmap' codec can't encode character '\u0110' in position 42: character maps to Call stack: File "E:\AI\DeepMed\build_db_offline.py", line 25, in _build_retrievers(DATA_DIR, CHROMA_DB_PATH) File "E:\AI\DeepMed\backend\app\services\chat_service.py", line 254, in _build_retrievers logger.info("\u0110ã mã hóa và \u0111\u01b0a %d documents vào ChromaDB (Ti\u1ebfn \u0111\u1ed9: %d/%d)", len(batch), min(i + batch_size, len(splits)), len(splits)) Message: '\u0110ã mã hóa và \u0111\u01b0a %d documents vào ChromaDB (Ti\u1ebfn \u0111\u1ed9: %d/%d)' Arguments: (5000, 10000, 30311) 2026-04-09 08:05:22 - medigenius - INFO - \u0110ã mã hóa và \u0111\u01b0a 5000 documents vào ChromaDB (Ti\u1ebfn \u0111\u1ed9: 10000/30311) --- Logging error --- Traceback (most recent call last): File "C:\Users\Pham Ba Thuong\AppData\Local\Programs\Python\Python311\Lib\logging\__init__.py", line 1113, in emit stream.write(msg + self.terminator) File "C:\Users\Pham Ba Thuong\AppData\Local\Programs\Python\Python311\Lib\encodings\cp1252.py", line 19, in encode return codecs.charmap_encode(input,self.errors,encoding_table)[0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ UnicodeEncodeError: 'charmap' codec can't encode character '\u0110' in position 16: character maps to Call stack: File "E:\AI\DeepMed\build_db_offline.py", line 25, in _build_retrievers(DATA_DIR, CHROMA_DB_PATH) File "E:\AI\DeepMed\backend\app\services\chat_service.py", line 254, in _build_retrievers logger.info("\u0110ã mã hóa và \u0111\u01b0a %d documents vào ChromaDB (Ti\u1ebfn \u0111\u1ed9: %d/%d)", len(batch), min(i + batch_size, len(splits)), len(splits)) Message: '\u0110ã mã hóa và \u0111\u01b0a %d documents vào ChromaDB (Ti\u1ebfn \u0111\u1ed9: %d/%d)' Arguments: (5000, 10000, 30311) --- Logging error --- Traceback (most recent call last): File "C:\Users\Pham Ba Thuong\AppData\Local\Programs\Python\Python311\Lib\logging\__init__.py", line 1113, in emit stream.write(msg + self.terminator) File "C:\Users\Pham Ba Thuong\AppData\Local\Programs\Python\Python311\Lib\encodings\cp1252.py", line 19, in encode return codecs.charmap_encode(input,self.errors,encoding_table)[0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ UnicodeEncodeError: 'charmap' codec can't encode character '\u0110' in position 42: character maps to Call stack: File "E:\AI\DeepMed\build_db_offline.py", line 25, in _build_retrievers(DATA_DIR, CHROMA_DB_PATH) File "E:\AI\DeepMed\backend\app\services\chat_service.py", line 254, in _build_retrievers logger.info("\u0110ã mã hóa và \u0111\u01b0a %d documents vào ChromaDB (Ti\u1ebfn \u0111\u1ed9: %d/%d)", len(batch), min(i + batch_size, len(splits)), len(splits)) Message: '\u0110ã mã hóa và \u0111\u01b0a %d documents vào ChromaDB (Ti\u1ebfn \u0111\u1ed9: %d/%d)' Arguments: (5000, 15000, 30311) 2026-04-09 08:09:14 - medigenius - INFO - \u0110ã mã hóa và \u0111\u01b0a 5000 documents vào ChromaDB (Ti\u1ebfn \u0111\u1ed9: 15000/30311) --- Logging error --- Traceback (most recent call last): File "C:\Users\Pham Ba Thuong\AppData\Local\Programs\Python\Python311\Lib\logging\__init__.py", line 1113, in emit stream.write(msg + self.terminator) File "C:\Users\Pham Ba Thuong\AppData\Local\Programs\Python\Python311\Lib\encodings\cp1252.py", line 19, in encode return codecs.charmap_encode(input,self.errors,encoding_table)[0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ UnicodeEncodeError: 'charmap' codec can't encode character '\u0110' in position 16: character maps to Call stack: File "E:\AI\DeepMed\build_db_offline.py", line 25, in _build_retrievers(DATA_DIR, CHROMA_DB_PATH) File "E:\AI\DeepMed\backend\app\services\chat_service.py", line 254, in _build_retrievers logger.info("\u0110ã mã hóa và \u0111\u01b0a %d documents vào ChromaDB (Ti\u1ebfn \u0111\u1ed9: %d/%d)", len(batch), min(i + batch_size, len(splits)), len(splits)) Message: '\u0110ã mã hóa và \u0111\u01b0a %d documents vào ChromaDB (Ti\u1ebfn \u0111\u1ed9: %d/%d)' Arguments: (5000, 15000, 30311) --- Logging error --- Traceback (most recent call last): File "C:\Users\Pham Ba Thuong\AppData\Local\Programs\Python\Python311\Lib\logging\__init__.py", line 1113, in emit stream.write(msg + self.terminator) File "C:\Users\Pham Ba Thuong\AppData\Local\Programs\Python\Python311\Lib\encodings\cp1252.py", line 19, in encode return codecs.charmap_encode(input,self.errors,encoding_table)[0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ UnicodeEncodeError: 'charmap' codec can't encode character '\u0110' in position 42: character maps to Call stack: File "E:\AI\DeepMed\build_db_offline.py", line 25, in _build_retrievers(DATA_DIR, CHROMA_DB_PATH) File "E:\AI\DeepMed\backend\app\services\chat_service.py", line 254, in _build_retrievers logger.info("\u0110ã mã hóa và \u0111\u01b0a %d documents vào ChromaDB (Ti\u1ebfn \u0111\u1ed9: %d/%d)", len(batch), min(i + batch_size, len(splits)), len(splits)) Message: '\u0110ã mã hóa và \u0111\u01b0a %d documents vào ChromaDB (Ti\u1ebfn \u0111\u1ed9: %d/%d)' Arguments: (5000, 20000, 30311) 2026-04-09 08:13:07 - medigenius - INFO - \u0110ã mã hóa và \u0111\u01b0a 5000 documents vào ChromaDB (Ti\u1ebfn \u0111\u1ed9: 20000/30311) --- Logging error --- Traceback (most recent call last): File "C:\Users\Pham Ba Thuong\AppData\Local\Programs\Python\Python311\Lib\logging\__init__.py", line 1113, in emit stream.write(msg + self.terminator) File "C:\Users\Pham Ba Thuong\AppData\Local\Programs\Python\Python311\Lib\encodings\cp1252.py", line 19, in encode return codecs.charmap_encode(input,self.errors,encoding_table)[0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ UnicodeEncodeError: 'charmap' codec can't encode character '\u0110' in position 16: character maps to Call stack: File "E:\AI\DeepMed\build_db_offline.py", line 25, in _build_retrievers(DATA_DIR, CHROMA_DB_PATH) File "E:\AI\DeepMed\backend\app\services\chat_service.py", line 254, in _build_retrievers logger.info("\u0110ã mã hóa và \u0111\u01b0a %d documents vào ChromaDB (Ti\u1ebfn \u0111\u1ed9: %d/%d)", len(batch), min(i + batch_size, len(splits)), len(splits)) Message: '\u0110ã mã hóa và \u0111\u01b0a %d documents vào ChromaDB (Ti\u1ebfn \u0111\u1ed9: %d/%d)' Arguments: (5000, 20000, 30311) --- Logging error --- Traceback (most recent call last): File "C:\Users\Pham Ba Thuong\AppData\Local\Programs\Python\Python311\Lib\logging\__init__.py", line 1113, in emit stream.write(msg + self.terminator) File "C:\Users\Pham Ba Thuong\AppData\Local\Programs\Python\Python311\Lib\encodings\cp1252.py", line 19, in encode return codecs.charmap_encode(input,self.errors,encoding_table)[0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ UnicodeEncodeError: 'charmap' codec can't encode character '\u0110' in position 42: character maps to Call stack: File "E:\AI\DeepMed\build_db_offline.py", line 25, in _build_retrievers(DATA_DIR, CHROMA_DB_PATH) File "E:\AI\DeepMed\backend\app\services\chat_service.py", line 254, in _build_retrievers logger.info("\u0110ã mã hóa và \u0111\u01b0a %d documents vào ChromaDB (Ti\u1ebfn \u0111\u1ed9: %d/%d)", len(batch), min(i + batch_size, len(splits)), len(splits)) Message: '\u0110ã mã hóa và \u0111\u01b0a %d documents vào ChromaDB (Ti\u1ebfn \u0111\u1ed9: %d/%d)' Arguments: (5000, 25000, 30311) 2026-04-09 08:16:52 - medigenius - INFO - \u0110ã mã hóa và \u0111\u01b0a 5000 documents vào ChromaDB (Ti\u1ebfn \u0111\u1ed9: 25000/30311) --- Logging error --- Traceback (most recent call last): File "C:\Users\Pham Ba Thuong\AppData\Local\Programs\Python\Python311\Lib\logging\__init__.py", line 1113, in emit stream.write(msg + self.terminator) File "C:\Users\Pham Ba Thuong\AppData\Local\Programs\Python\Python311\Lib\encodings\cp1252.py", line 19, in encode return codecs.charmap_encode(input,self.errors,encoding_table)[0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ UnicodeEncodeError: 'charmap' codec can't encode character '\u0110' in position 16: character maps to Call stack: File "E:\AI\DeepMed\build_db_offline.py", line 25, in _build_retrievers(DATA_DIR, CHROMA_DB_PATH) File "E:\AI\DeepMed\backend\app\services\chat_service.py", line 254, in _build_retrievers logger.info("\u0110ã mã hóa và \u0111\u01b0a %d documents vào ChromaDB (Ti\u1ebfn \u0111\u1ed9: %d/%d)", len(batch), min(i + batch_size, len(splits)), len(splits)) Message: '\u0110ã mã hóa và \u0111\u01b0a %d documents vào ChromaDB (Ti\u1ebfn \u0111\u1ed9: %d/%d)' Arguments: (5000, 25000, 30311) --- Logging error --- Traceback (most recent call last): File "C:\Users\Pham Ba Thuong\AppData\Local\Programs\Python\Python311\Lib\logging\__init__.py", line 1113, in emit stream.write(msg + self.terminator) File "C:\Users\Pham Ba Thuong\AppData\Local\Programs\Python\Python311\Lib\encodings\cp1252.py", line 19, in encode return codecs.charmap_encode(input,self.errors,encoding_table)[0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ UnicodeEncodeError: 'charmap' codec can't encode character '\u0110' in position 42: character maps to Call stack: File "E:\AI\DeepMed\build_db_offline.py", line 25, in _build_retrievers(DATA_DIR, CHROMA_DB_PATH) File "E:\AI\DeepMed\backend\app\services\chat_service.py", line 254, in _build_retrievers logger.info("\u0110ã mã hóa và \u0111\u01b0a %d documents vào ChromaDB (Ti\u1ebfn \u0111\u1ed9: %d/%d)", len(batch), min(i + batch_size, len(splits)), len(splits)) Message: '\u0110ã mã hóa và \u0111\u01b0a %d documents vào ChromaDB (Ti\u1ebfn \u0111\u1ed9: %d/%d)' Arguments: (5000, 30000, 30311) 2026-04-09 08:20:57 - medigenius - INFO - \u0110ã mã hóa và \u0111\u01b0a 5000 documents vào ChromaDB (Ti\u1ebfn \u0111\u1ed9: 30000/30311) --- Logging error --- Traceback (most recent call last): File "C:\Users\Pham Ba Thuong\AppData\Local\Programs\Python\Python311\Lib\logging\__init__.py", line 1113, in emit stream.write(msg + self.terminator) File "C:\Users\Pham Ba Thuong\AppData\Local\Programs\Python\Python311\Lib\encodings\cp1252.py", line 19, in encode return codecs.charmap_encode(input,self.errors,encoding_table)[0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ UnicodeEncodeError: 'charmap' codec can't encode character '\u0110' in position 16: character maps to Call stack: File "E:\AI\DeepMed\build_db_offline.py", line 25, in _build_retrievers(DATA_DIR, CHROMA_DB_PATH) File "E:\AI\DeepMed\backend\app\services\chat_service.py", line 254, in _build_retrievers logger.info("\u0110ã mã hóa và \u0111\u01b0a %d documents vào ChromaDB (Ti\u1ebfn \u0111\u1ed9: %d/%d)", len(batch), min(i + batch_size, len(splits)), len(splits)) Message: '\u0110ã mã hóa và \u0111\u01b0a %d documents vào ChromaDB (Ti\u1ebfn \u0111\u1ed9: %d/%d)' Arguments: (5000, 30000, 30311) --- Logging error --- Traceback (most recent call last): File "C:\Users\Pham Ba Thuong\AppData\Local\Programs\Python\Python311\Lib\logging\__init__.py", line 1113, in emit stream.write(msg + self.terminator) File "C:\Users\Pham Ba Thuong\AppData\Local\Programs\Python\Python311\Lib\encodings\cp1252.py", line 19, in encode return codecs.charmap_encode(input,self.errors,encoding_table)[0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ UnicodeEncodeError: 'charmap' codec can't encode character '\u0110' in position 42: character maps to Call stack: File "E:\AI\DeepMed\build_db_offline.py", line 25, in _build_retrievers(DATA_DIR, CHROMA_DB_PATH) File "E:\AI\DeepMed\backend\app\services\chat_service.py", line 254, in _build_retrievers logger.info("\u0110ã mã hóa và \u0111\u01b0a %d documents vào ChromaDB (Ti\u1ebfn \u0111\u1ed9: %d/%d)", len(batch), min(i + batch_size, len(splits)), len(splits)) Message: '\u0110ã mã hóa và \u0111\u01b0a %d documents vào ChromaDB (Ti\u1ebfn \u0111\u1ed9: %d/%d)' Arguments: (311, 30311, 30311) 2026-04-09 08:21:12 - medigenius - INFO - \u0110ã mã hóa và \u0111\u01b0a 311 documents vào ChromaDB (Ti\u1ebfn \u0111\u1ed9: 30311/30311) --- Logging error --- Traceback (most recent call last): File "C:\Users\Pham Ba Thuong\AppData\Local\Programs\Python\Python311\Lib\logging\__init__.py", line 1113, in emit stream.write(msg + self.terminator) File "C:\Users\Pham Ba Thuong\AppData\Local\Programs\Python\Python311\Lib\encodings\cp1252.py", line 19, in encode return codecs.charmap_encode(input,self.errors,encoding_table)[0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ UnicodeEncodeError: 'charmap' codec can't encode character '\u0110' in position 16: character maps to Call stack: File "E:\AI\DeepMed\build_db_offline.py", line 25, in _build_retrievers(DATA_DIR, CHROMA_DB_PATH) File "E:\AI\DeepMed\backend\app\services\chat_service.py", line 254, in _build_retrievers logger.info("\u0110ã mã hóa và \u0111\u01b0a %d documents vào ChromaDB (Ti\u1ebfn \u0111\u1ed9: %d/%d)", len(batch), min(i + batch_size, len(splits)), len(splits)) Message: '\u0110ã mã hóa và \u0111\u01b0a %d documents vào ChromaDB (Ti\u1ebfn \u0111\u1ed9: %d/%d)' Arguments: (311, 30311, 30311) 2026-04-09 08:21:16 - medigenius - INFO - Fast Retriever: BM25(30311 docs) + Vector, k=15 INFO:medigenius:Fast Retriever: BM25(30311 docs) + Vector, k=15 --- Logging error --- Traceback (most recent call last): File "C:\Users\Pham Ba Thuong\AppData\Local\Programs\Python\Python311\Lib\logging\__init__.py", line 1113, in emit stream.write(msg + self.terminator) File "C:\Users\Pham Ba Thuong\AppData\Local\Programs\Python\Python311\Lib\encodings\cp1252.py", line 19, in encode return codecs.charmap_encode(input,self.errors,encoding_table)[0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ UnicodeEncodeError: 'charmap' codec can't encode character '\u1ea3' in position 47: character maps to Call stack: File "E:\AI\DeepMed\build_db_offline.py", line 25, in _build_retrievers(DATA_DIR, CHROMA_DB_PATH) File "E:\AI\DeepMed\backend\app\services\chat_service.py", line 302, in _build_retrievers logger.info("--- T\u1ea3i Reranker Model (BGE-reranker-v2-m3) trên [%s] ---", DEVICE) Message: '--- T\u1ea3i Reranker Model (BGE-reranker-v2-m3) trên [%s] ---' Arguments: ('cpu',) 2026-04-09 08:21:20 - medigenius - INFO - --- T\u1ea3i Reranker Model (BGE-reranker-v2-m3) trên [cpu] --- --- Logging error --- Traceback (most recent call last): File "C:\Users\Pham Ba Thuong\AppData\Local\Programs\Python\Python311\Lib\logging\__init__.py", line 1113, in emit stream.write(msg + self.terminator) File "C:\Users\Pham Ba Thuong\AppData\Local\Programs\Python\Python311\Lib\encodings\cp1252.py", line 19, in encode return codecs.charmap_encode(input,self.errors,encoding_table)[0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ UnicodeEncodeError: 'charmap' codec can't encode character '\u1ea3' in position 21: character maps to Call stack: File "E:\AI\DeepMed\build_db_offline.py", line 25, in _build_retrievers(DATA_DIR, CHROMA_DB_PATH) File "E:\AI\DeepMed\backend\app\services\chat_service.py", line 302, in _build_retrievers logger.info("--- T\u1ea3i Reranker Model (BGE-reranker-v2-m3) trên [%s] ---", DEVICE) Message: '--- T\u1ea3i Reranker Model (BGE-reranker-v2-m3) trên [%s] ---' Arguments: ('cpu',) C:\Users\Pham Ba Thuong\AppData\Local\Programs\Python\Python311\Lib\site-packages\huggingface_hub\file_download.py:143: UserWarning: `huggingface_hub` cache-system uses symlinks by default to efficiently store duplicated files but your machine does not support them in C:\Users\Pham Ba Thuong\.cache\huggingface\hub\models--BAAI--bge-reranker-v2-m3. Caching files will still work but in a degraded version that might require more space on your disk. This warning can be disabled by setting the `HF_HUB_DISABLE_SYMLINKS_WARNING` environment variable. For more details, see https://huggingface.co/docs/huggingface_hub/how-to-cache#limitations. To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development warnings.warn(message) Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet` WARNING:huggingface_hub.file_download:Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet` Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet` WARNING:huggingface_hub.file_download:Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet` Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet` WARNING:huggingface_hub.file_download:Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet` --- Logging error --- Traceback (most recent call last): File "C:\Users\Pham Ba Thuong\AppData\Local\Programs\Python\Python311\Lib\logging\__init__.py", line 1113, in emit stream.write(msg + self.terminator) File "C:\Users\Pham Ba Thuong\AppData\Local\Programs\Python\Python311\Lib\encodings\cp1252.py", line 19, in encode return codecs.charmap_encode(input,self.errors,encoding_table)[0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ UnicodeEncodeError: 'charmap' codec can't encode character '\u2192' in position 73: character maps to Call stack: File "E:\AI\DeepMed\build_db_offline.py", line 25, in _build_retrievers(DATA_DIR, CHROMA_DB_PATH) File "E:\AI\DeepMed\backend\app\services\chat_service.py", line 312, in _build_retrievers logger.info("Deep Retriever: Ensemble(k=25) \u2192 Reranker(top_n=5)") Message: 'Deep Retriever: Ensemble(k=25) \u2192 Reranker(top_n=5)' Arguments: () 2026-04-09 08:33:26 - medigenius - INFO - Deep Retriever: Ensemble(k=25) \u2192 Reranker(top_n=5) --- Logging error --- Traceback (most recent call last): File "C:\Users\Pham Ba Thuong\AppData\Local\Programs\Python\Python311\Lib\logging\__init__.py", line 1113, in emit stream.write(msg + self.terminator) File "C:\Users\Pham Ba Thuong\AppData\Local\Programs\Python\Python311\Lib\encodings\cp1252.py", line 19, in encode return codecs.charmap_encode(input,self.errors,encoding_table)[0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ UnicodeEncodeError: 'charmap' codec can't encode character '\u2192' in position 47: character maps to Call stack: File "E:\AI\DeepMed\build_db_offline.py", line 25, in _build_retrievers(DATA_DIR, CHROMA_DB_PATH) File "E:\AI\DeepMed\backend\app\services\chat_service.py", line 312, in _build_retrievers logger.info("Deep Retriever: Ensemble(k=25) \u2192 Reranker(top_n=5)") Message: 'Deep Retriever: Ensemble(k=25) \u2192 Reranker(top_n=5)' Arguments: () =========================================================== FINISH BUILD INDEX! ===========================================================