NeuroMed-Cardio-0.8B

A multi-component AI system for cardiomegaly detection from chest X-ray images. The system combines segmentation, object detection, radiomics feature extraction, and an artificial neural network to produce a single final prediction.

unnamed

Clinical Context: A cardiothoracic ratio (CTR) ≥ 0.5 on a chest X-ray is the standard clinical threshold for cardiomegaly — an early indicator of heart failure.


System Architecture

The pipeline runs through two parallel branches. All outputs are fused inside a final ANN for the binary cardiomegaly prediction.

Chest X-Ray
      │
      ▼
[Preprocessing]
Gamma Correction → Gaussian Blur → CLAHE
      │
      ├──────────────────────────────────┐
      │                                  │
      ▼                                  ▼
[Segmentation Branch]            [YOLOv11 Branch]
U-Net++ (Heart)                  Heart + Lung
DeepLabV3+ (Lungs)               Bounding Box Detection
      │                                  │
      ├── Segmentation CTR          YOLO CTR
      ├── Heart/Lung Area Ratio          │
      ├── Heart Left–Right Width         │
      └── Radiomics (55 features)        │
            │                            │
            └──────────┬─────────────────┘
                       ▼
                 Ensemble CTR
              (YOLO 95% + Segmentation 5%)
                       │
                       ▼
                [ANN — 63 Features]
                512 → 256 → 128 neurons
                       │
                       ▼
             Cardiomegaly Prediction

Preprocessing

Raw X-ray images vary significantly across different scanner devices — contrast imbalances and sensor noise directly affect model performance. A three-stage preprocessing pipeline is applied before any model sees the image:

Technique What It Does
Gamma Correction Reveals cardiac tissue and vascular boundary details hidden in dark regions
Gaussian Blur Suppresses pixel-level sensor noise and smooths edges for cleaner detection
CLAHE Locally enhances contrast to bring out fine anatomical structures

All images are normalized to 256×256 pixels.


Segmentation Models

U-Net++ — Heart Segmentation

Predicts the heart mask at the pixel level.

  • Architecture: U-Net++ with nested skip connections (detail-preserving)
  • Parameters: ~8 million
  • Performance: 89% IoU score
  • Regularization: BatchNormalization + Dropout (prevents overfitting)
  • Output: Binary mask isolating the heart region

DeepLabV3+ — Lung Segmentation

Produces separate masks for the left and right lungs.

  • Backbone: ResNet101
  • Key Component: ASPP (Atrous Spatial Pyramid Pooling) — captures multi-scale contextual features simultaneously
  • Mechanism: Atrous (dilated) convolution captures wide receptive fields without losing fine boundary detail
  • Output: Separate masks for left lung and right lung

YOLOv11 Branch — Object Detection

Runs in parallel with segmentation and produces an independent CTR estimate.

  • Model: YOLOv11-large
  • Detected Structures: Heart, Left Lung, Right Lung
  • Keypoint Detection: Lower corners of lungs and heart — used to measure horizontal extents
  • Data Augmentation: Mosaic, Mixup, Horizontal/Vertical Flip
  • Training: 50 epochs, batch size 64
  • CTR Calculation: Heart width and lung width are derived from detected bounding box coordinates

CTR Calculation

Segmentation-Based CTR

Canny edge detection is applied to the heart and lung masks. Horizontal extents are measured from the detected boundaries:

CTR = Heart Width / Lung Width

Ensemble CTR

The two branches are combined with a weighted average:

Ensemble CTR = (YOLO CTR × 0.95) + (Segmentation CTR × 0.05)

YOLO contributes high overall accuracy; the segmentation branch adds fine anatomical detail as a complementary correction signal.

Ensemble performance improvement:

Model MAPE SMAPE MAE RMSE
YOLO only 5.58% 5.32% 0.02 0.04
Segmentation only 6.03% 5.64% 0.03 0.05
Ensemble 5.02% 4.87% 0.02 0.03

Feature Extraction

The 63 features fed into the ANN come from four sources:

1. Ensemble CTR

The weighted CTR value computed above. The primary clinical measurement.

2. Heart/Lung Area Ratio

Computed as: Heart Pixel Area / Lung Pixel Area

Complements CTR by providing an area-based measurement rather than a width-based one. Particularly useful in cases of chest deformity or lung disease where width measurements alone can be misleading.

3. Heart Left–Right Width

Measures how much the heart extends to the left and right of the cardiac midline.

  • The esophageal axis is estimated by predicting its upper and lower endpoints
  • The angle between these points is computed via arctan2 and extended into a full axis line
  • The segmented heart is split along this line
  • Left-side and right-side widths are measured separately

This determines the direction of cardiac enlargement, providing clinically meaningful spatial context beyond a single ratio.

4. Radiomics — 55 Features

Extracted from the heart segmentation mask using PyRadiomics.

  • 186 candidate features are initially extracted covering intensity distribution, tissue homogeneity, shape, and texture
  • 55 clinically significant and statistically contributive features are selected for the final model
  • These features allow the model to reason about the statistical character of the image, not just its visual appearance

CNN — Parallel Image Classifier

A custom Residual Attention CNN runs alongside the feature extraction pipeline and provides an independent classification signal.

  • Input: 256×256 grayscale chest X-ray
  • Parameters: ~7.5 million
  • Loss Function: Binary Crossentropy
  • Optimizer: Adam, 100 epochs
Component Purpose
Residual (skip) blocks Prevents information loss in deeper layers
Attention mechanism Focuses the model on the cardiac and pulmonary region
Dropout + Batch Normalization Prevents overfitting
Latent Supervision Trains on both intermediate and final outputs for balanced learning
  • External Test Performance: 80.8% accuracy, 81.8% F1 score

ANN — Final Fusion Model

All features (63-dimensional vector) are fused in a single fully-connected ANN that produces the final prediction.

  • Architecture: 512 → 256 → 128 neurons (fully connected)
  • Input: CNN output + Ensemble CTR + Heart/Lung Area Ratio + Heart Left-Right Width + 55 Radiomics features
  • Regularization: BatchNormalization + Dropout
  • Output: Cardiomegaly present / not present (binary)
  • External Test Performance: 84.0% accuracy, 85.3% F1 score

Synthetic Data Augmentation with GANs

To address low diversity in cardiomegaly cases, four GAN architectures were trained and evaluated using FID scores (lower = better):

Model FID Score
CGAN 62
DCGAN 72
IAGAN 40
StyleGAN2-ADA 25

StyleGAN2-ADA's Adaptive Data Augmentation (ADA) mechanism enables stable training even with limited data, achieving the lowest FID score and the most realistic synthetic X-ray images.


Explainable AI (XAI)

The system not only produces a prediction — it explains why.

  • Grad-CAM: Generates heatmaps showing which image regions the CNN focused on. Confirms the model attends to the cardiac and pulmonary area rather than irrelevant regions.
  • SHAP Analysis: Quantifies each feature's individual contribution to the final prediction. CNN output is the highest contributor, followed by YOLO CTR and area ratio.

Training Data

A total of 80,572 images from five datasets were assembled into a balanced training set (40,286 cardiomegaly / 40,286 non-cardiomegaly).

Dataset Institution Images Split
MIMIC-CXR MIT, USA 51,693 Training
PadChest Hospital San Juan, Spain 12,600 Training
CheXpert Stanford University, USA 6,046 Training
VinDr-CXR VinBigData, Vietnam 4,598 Training
BRAX Hospital Albert Einstein, Brazil 2,572 Hard Training
NIH ChestX-Ray14 NIH Clinical Center, USA 3,063 External Test

Performance Summary

Model / Component Metric Value
U-Net++ (Heart Segmentation) IoU 89%
YOLOv11 CTR MAPE / SMAPE 5.58% / 5.32%
Segmentation CTR MAPE / SMAPE 6.03% / 5.64%
Ensemble CTR MAPE / SMAPE 5.02% / 4.87%
CNN Classifier F1 / Accuracy 81.8% / 80.8%
ANN (Final) F1 / Accuracy 85.3% / 84.0%

Installation & Validation

This is the official technical documentation prepared by the Ethosoft team for Teknofest validators.

Python Version

Python 3.12 must be installed on the validation device. This is required for full library compatibility.

NVIDIA GPU Setup (Optional)

If the validation device has an NVIDIA GPU and GPU acceleration is desired, follow the steps below. This step can be skipped for CPU-only runs.

⚠️ On Windows, GPU acceleration will not work for TensorFlow. It can be used for PyTorch only.
⚠️ Non-NVIDIA GPUs are not supported.

  1. Visit the CUDA Toolkit download page for your device type and follow the installation steps.
  2. After installation, verify by running the following in your terminal:
nvidia-smi

Library Installation

Windows

  1. Activate your target Python environment (skip if installing globally).
  2. Run:
pip install -r winrequirements.txt

Linux

  1. Activate your target Python environment (skip if installing globally).
  2. Run:
pip install -r requirements.txt

PyRadiomics Installation

The cardiomegaly task uses PyRadiomics, which cannot be installed directly via pip and requires a manual build. Git must be installed on your system.

  1. Open a terminal in the documentation folder.
  2. Run:
git clone https://github.com/AIM-Harvard/pyradiomics.git
pip install -e pyradiomics/[dev,docs,test]

Running Cardiomegaly / CTR Validation

From the documentation folder, with your Python environment active, run:

python kardiyomegaliteknefes.py --base_path [IMAGE_PATH] --output_json [OUTPUT_JSON_PATH]

Replace the bracketed values with your own paths:

Argument Description
--base_path Path to the folder containing the chest X-ray images
--output_json Path where the output JSON file will be saved
Downloads last month
233
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Evaluation results