Update README.md
Browse files
README.md
CHANGED
|
@@ -1,21 +1,23 @@
|
|
| 1 |
-
#
|
| 2 |
|
| 3 |
This repository is based on [Marigold](https://marigoldmonodepth.github.io), CVPR 2024 Best Paper: [**Repurposing Diffusion-Based Image Generators for Monocular Depth Estimation**](https://arxiv.org/abs/2312.02145)
|
| 4 |
|
| 5 |
[](https://haruko386.github.io/research)
|
| 6 |
[](https://www.apache.org/licenses/LICENSE-2.0)
|
| 7 |
-
[](https://steamcommunity.com/profiles/76561198217881431/)
|
| 8 |
[](https://huggingface.co/developy/ApDepth)
|
| 9 |
[](https://huggingface.co/spaces/developy/ApDepth)
|
| 10 |
|
| 11 |
-
[Haruko386](https://haruko386.github.io/),
|
| 12 |
-
[Shuai Yuan](https://syjz.teacher.360eol.com/teacherBasic/preview?teacherId=23776)
|
|
|
|
|
|
|
| 13 |
|
| 14 |

|
| 15 |
|
| 16 |
-
>We present **
|
| 17 |
|
| 18 |
## π’ News
|
|
|
|
| 19 |
- 2025-10-09: We propose a novel diffusion-based deep estimation framework guided by pre-trained models.
|
| 20 |
- 2025-09-23: We change Marigold from **Stochastic multi-step generation** to **Deterministic one-step perception**
|
| 21 |
- 2025-08-10: Trying to make some optimizations in Feature Expression<br>
|
|
@@ -23,7 +25,7 @@ This repository is based on [Marigold](https://marigoldmonodepth.github.io), CVP
|
|
| 23 |
|
| 24 |
## π Usage
|
| 25 |
|
| 26 |
-
**We offer several ways to interact with
|
| 27 |
|
| 28 |
1. A free online interactive demo is available here: <a href="https://huggingface.co/spaces/developy/ApDepth"><img src="https://img.shields.io/badge/π€%20Hugging%20Face-Demo-purple" height="18"></a>
|
| 29 |
|
|
@@ -32,10 +34,13 @@ This repository is based on [Marigold](https://marigoldmonodepth.github.io), CVP
|
|
| 32 |
3. Local development instructions with this codebase are given below.
|
| 33 |
|
| 34 |
## π οΈ Setup
|
|
|
|
|
|
|
|
|
|
| 35 |
|
| 36 |
The inference code was tested on:
|
| 37 |
|
| 38 |
-
- Ubuntu 22.04 LTS, Python 3.12.9, CUDA 11.8,
|
| 39 |
|
| 40 |
### πͺ§ A Note for Windows users
|
| 41 |
|
|
@@ -50,8 +55,8 @@ We recommend running the code in WSL2:
|
|
| 50 |
Clone the repository (requires git):
|
| 51 |
|
| 52 |
```bash
|
| 53 |
-
git clone https://github.com/
|
| 54 |
-
cd
|
| 55 |
```
|
| 56 |
|
| 57 |
### π» Dependencies
|
|
@@ -59,12 +64,18 @@ cd ApDepth
|
|
| 59 |
**Using Conda:**
|
| 60 |
Alternatively, create a Python native virtual environment and install dependencies into it:
|
| 61 |
|
| 62 |
-
|
| 63 |
-
|
| 64 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 65 |
|
| 66 |
-
Keep the environment activated before running the inference script.
|
| 67 |
-
Activate the environment again after restarting the terminal session.
|
| 68 |
|
| 69 |
## π Testing on your images
|
| 70 |
|
|
@@ -81,8 +92,7 @@ This setting corresponds to our paper. For academic comparison, please run with
|
|
| 81 |
```bash
|
| 82 |
python run.py \
|
| 83 |
--checkpoint prs-eth/marigold-v1-0 \
|
| 84 |
-
--
|
| 85 |
-
--ensemble_size 10 \
|
| 86 |
--input_rgb_dir input/in-the-wild_example \
|
| 87 |
--output_dir output/in-the-wild_example
|
| 88 |
```
|
|
@@ -94,10 +104,9 @@ You can find all results in `output/in-the-wild_example`. Enjoy!
|
|
| 94 |
The default settings are optimized for the best result. However, the behavior of the code can be customized:
|
| 95 |
|
| 96 |
- Trade-offs between the **accuracy** and **speed** (for both options, larger values result in better accuracy at the cost of slower inference.)
|
| 97 |
-
- `--ensemble_size`: Number of inference passes in the ensemble.
|
| 98 |
-
- `--denoise_steps`: Number of denoising steps of each inference pass. For the original (DDIM) version, it's recommended to use 10-50 steps, while for LCM 1-4 steps. When unassigned (`None`), will read default setting from model config. Default: ~~10 4 (for LCM)~~ `None`.
|
| 99 |
|
| 100 |
-
- By default, the inference script resizes input images to the *processing resolution*, and then resizes the prediction back to the original resolution. This gives the best quality, as Stable Diffusion, from which
|
| 101 |
|
| 102 |
- `--processing_res`: the processing resolution; set as 0 to process the input resolution directly. When unassigned (`None`), will read default setting from model config. Default: ~~768~~ `None`.
|
| 103 |
- `--output_processing_res`: produce output at the processing resolution instead of upsampling it to the input resolution. Default: False.
|
|
@@ -111,28 +120,25 @@ The default settings are optimized for the best result. However, the behavior of
|
|
| 111 |
|
| 112 |
### β¬ Checkpoint cache
|
| 113 |
|
| 114 |
-
By default, the [checkpoint](https://huggingface.co/
|
| 115 |
The `HF_HOME` environment variable defines its location and can be overridden, e.g.:
|
| 116 |
|
| 117 |
```bash
|
| 118 |
export HF_HOME=$(pwd)/cache
|
| 119 |
```
|
| 120 |
-
|
| 121 |
Alternatively, use the following script to download the checkpoint weights locally:
|
| 122 |
|
| 123 |
```bash
|
| 124 |
bash script/download_weights.sh marigold-v1-0
|
| 125 |
-
|
| 126 |
-
bash script/download_weights.sh marigold-lcm-v1-0
|
| 127 |
-
```
|
| 128 |
|
| 129 |
At inference, specify the checkpoint path:
|
| 130 |
|
| 131 |
```bash
|
| 132 |
python run.py \
|
| 133 |
-
--checkpoint
|
| 134 |
-
--
|
| 135 |
-
--ensemble_size 10 \
|
| 136 |
--input_rgb_dir input/in-the-wild_example\
|
| 137 |
--output_dir output/in-the-wild_example
|
| 138 |
```
|
|
@@ -163,44 +169,27 @@ bash script/eval/11_infer_nyu.sh
|
|
| 163 |
bash script/eval/12_eval_nyu.sh
|
| 164 |
```
|
| 165 |
|
| 166 |
-
|
| 167 |
-
|
| 168 |
-
## ποΈ Training
|
| 169 |
-
|
| 170 |
-
Based on the previously created environment, install extended requirements:
|
| 171 |
-
|
| 172 |
-
```bash
|
| 173 |
-
pip install -r requirements++.txt -r requirements+.txt -r requirements.txt
|
| 174 |
-
```
|
| 175 |
-
|
| 176 |
-
Set environment parameters for the data directory:
|
| 177 |
|
| 178 |
```bash
|
| 179 |
-
|
| 180 |
-
export BASE_CKPT_DIR=YOUR_CHECKPOINT_DIR # directory of pretrained checkpoint
|
| 181 |
```
|
|
|
|
| 182 |
|
| 183 |
-
|
|
|
|
|
|
|
| 184 |
|
| 185 |
-
Prepare for [Hypersim](https://github.com/apple/ml-hypersim) and [Virtual KITTI 2](https://europe.naverlabs.com/research/computer-vision/proxy-virtual-worlds-vkitti-2/) datasets and save into `${BASE_DATA_DIR}`. Please refer to [this README](script/dataset_preprocess/hypersim/README.md) for Hypersim preprocessing.
|
| 186 |
|
| 187 |
-
|
| 188 |
|
| 189 |
-
```
|
| 190 |
-
python train.py --config config/train_marigold.yaml --no_wandb
|
| 191 |
-
```
|
| 192 |
|
| 193 |
-
|
|
|
|
|
|
|
| 194 |
|
| 195 |
-
```bash
|
| 196 |
-
python train.py --resume_run output/train_marigold/checkpoint/latest --no_wandb
|
| 197 |
-
```
|
| 198 |
-
|
| 199 |
-
Evaluating results
|
| 200 |
-
|
| 201 |
-
Only the U-Net is updated and saved during training. To use the inference pipeline with your training result, replace `unet` folder in Marigold checkpoints with that in the `checkpoint` output folder. Then refer to [this section](#evaluation) for evaluation.
|
| 202 |
|
| 203 |
-
**Note**: Although random seeds have been set, the training result might be slightly different on different hardwares. It's recommended to train without interruption.
|
| 204 |
|
| 205 |
## βοΈ Contributing
|
| 206 |
|
|
@@ -215,17 +204,16 @@ Please refer to [this](CONTRIBUTING.md) instruction.
|
|
| 215 |
|
| 216 |
|
| 217 |
## π Citation
|
| 218 |
-
|
| 219 |
-
<!-- Please cite our paper:
|
| 220 |
|
| 221 |
```bibtex
|
| 222 |
-
@InProceedings{
|
| 223 |
-
title={
|
| 224 |
-
author={
|
| 225 |
-
booktitle = {
|
| 226 |
-
year={
|
| 227 |
}
|
| 228 |
-
```
|
| 229 |
|
| 230 |
## π« License
|
| 231 |
|
|
|
|
| 1 |
+
# SCFDepth: A Single-Step Coarse-to-Fine Diffusion Framework for Monocular Depth Estimation
|
| 2 |
|
| 3 |
This repository is based on [Marigold](https://marigoldmonodepth.github.io), CVPR 2024 Best Paper: [**Repurposing Diffusion-Based Image Generators for Monocular Depth Estimation**](https://arxiv.org/abs/2312.02145)
|
| 4 |
|
| 5 |
[](https://haruko386.github.io/research)
|
| 6 |
[](https://www.apache.org/licenses/LICENSE-2.0)
|
|
|
|
| 7 |
[](https://huggingface.co/developy/ApDepth)
|
| 8 |
[](https://huggingface.co/spaces/developy/ApDepth)
|
| 9 |
|
| 10 |
+
[**Haruko386**](https://haruko386.github.io/),
|
| 11 |
+
[Shuai Yuan](https://syjz.teacher.360eol.com/teacherBasic/preview?teacherId=23776),
|
| 12 |
+
[Mingbo Lei](https://github.com/Ltohka)
|
| 13 |
+
[Yibo Chen](#)
|
| 14 |
|
| 15 |

|
| 16 |
|
| 17 |
+
>We present **SCFDepth**, a diffusion model, and associated fine-tuning protocol for monocular depth estimation. Based on Marigold. Its core innovation lies in addressing the deficiency of diffusion models in feature representation capability. Our model followed Marigold, derived from Stable Diffusion and fine-tuned with synthetic data: Hypersim and VKitti, achieved ideal results in object edge refinement.
|
| 18 |
|
| 19 |
## π’ News
|
| 20 |
+
- 2025-10-25: Inspired by DepthMaster, we propose a two-stage loss function training strategy based on `Apepth V1-0`. In the first stage, we perform foundational training using MSE loss. In the second stage, we learn edge structures through FFT loss. Based on this, we introduce Apepth V1-1.
|
| 21 |
- 2025-10-09: We propose a novel diffusion-based deep estimation framework guided by pre-trained models.
|
| 22 |
- 2025-09-23: We change Marigold from **Stochastic multi-step generation** to **Deterministic one-step perception**
|
| 23 |
- 2025-08-10: Trying to make some optimizations in Feature Expression<br>
|
|
|
|
| 25 |
|
| 26 |
## π Usage
|
| 27 |
|
| 28 |
+
**We offer several ways to interact with SCFDepth**:
|
| 29 |
|
| 30 |
1. A free online interactive demo is available here: <a href="https://huggingface.co/spaces/developy/ApDepth"><img src="https://img.shields.io/badge/π€%20Hugging%20Face-Demo-purple" height="18"></a>
|
| 31 |
|
|
|
|
| 34 |
3. Local development instructions with this codebase are given below.
|
| 35 |
|
| 36 |
## π οΈ Setup
|
| 37 |
+
The Model was trained on:
|
| 38 |
+
|
| 39 |
+
- Ubuntu 22.04 LTS, Python 3.12.9, CUDA 11.8, `NVIDIA RTX 6000 Ada Generation`
|
| 40 |
|
| 41 |
The inference code was tested on:
|
| 42 |
|
| 43 |
+
- Ubuntu 22.04 LTS, Python 3.12.9, CUDA 11.8, `NVIDIA GeForce RTX 4090`
|
| 44 |
|
| 45 |
### πͺ§ A Note for Windows users
|
| 46 |
|
|
|
|
| 55 |
Clone the repository (requires git):
|
| 56 |
|
| 57 |
```bash
|
| 58 |
+
git clone https://github.com/Dimon0000000/SCFDepth.git
|
| 59 |
+
cd SCFDepth
|
| 60 |
```
|
| 61 |
|
| 62 |
### π» Dependencies
|
|
|
|
| 64 |
**Using Conda:**
|
| 65 |
Alternatively, create a Python native virtual environment and install dependencies into it:
|
| 66 |
|
| 67 |
+
```bash
|
| 68 |
+
conda create -n SCFDepth python==3.12.9
|
| 69 |
+
conda activate SCFDepth
|
| 70 |
+
pip install -r requirements.txt
|
| 71 |
+
```
|
| 72 |
+
|
| 73 |
+
> [!NOTE]
|
| 74 |
+
>
|
| 75 |
+
> Keep the environment activated before running the inference script.
|
| 76 |
+
> Activate the environment again after restarting the terminal session.
|
| 77 |
+
|
| 78 |
|
|
|
|
|
|
|
| 79 |
|
| 80 |
## π Testing on your images
|
| 81 |
|
|
|
|
| 92 |
```bash
|
| 93 |
python run.py \
|
| 94 |
--checkpoint prs-eth/marigold-v1-0 \
|
| 95 |
+
--ensemble_size 1 \
|
|
|
|
| 96 |
--input_rgb_dir input/in-the-wild_example \
|
| 97 |
--output_dir output/in-the-wild_example
|
| 98 |
```
|
|
|
|
| 104 |
The default settings are optimized for the best result. However, the behavior of the code can be customized:
|
| 105 |
|
| 106 |
- Trade-offs between the **accuracy** and **speed** (for both options, larger values result in better accuracy at the cost of slower inference.)
|
| 107 |
+
- `--ensemble_size`: Number of inference passes in the ensemble.
|
|
|
|
| 108 |
|
| 109 |
+
- By default, the inference script resizes input images to the *processing resolution*, and then resizes the prediction back to the original resolution. This gives the best quality, as Stable Diffusion, from which SCFDepth is derived, performs best at 768x768 resolution.
|
| 110 |
|
| 111 |
- `--processing_res`: the processing resolution; set as 0 to process the input resolution directly. When unassigned (`None`), will read default setting from model config. Default: ~~768~~ `None`.
|
| 112 |
- `--output_processing_res`: produce output at the processing resolution instead of upsampling it to the input resolution. Default: False.
|
|
|
|
| 120 |
|
| 121 |
### β¬ Checkpoint cache
|
| 122 |
|
| 123 |
+
By default, the [checkpoint](https://huggingface.co/developy/ApDepth) is stored in the Hugging Face cache.
|
| 124 |
The `HF_HOME` environment variable defines its location and can be overridden, e.g.:
|
| 125 |
|
| 126 |
```bash
|
| 127 |
export HF_HOME=$(pwd)/cache
|
| 128 |
```
|
| 129 |
+
<!--
|
| 130 |
Alternatively, use the following script to download the checkpoint weights locally:
|
| 131 |
|
| 132 |
```bash
|
| 133 |
bash script/download_weights.sh marigold-v1-0
|
| 134 |
+
``` -->
|
|
|
|
|
|
|
| 135 |
|
| 136 |
At inference, specify the checkpoint path:
|
| 137 |
|
| 138 |
```bash
|
| 139 |
python run.py \
|
| 140 |
+
--checkpoint checkpoints/SCFDepth \
|
| 141 |
+
--ensemble_size 1 \
|
|
|
|
| 142 |
--input_rgb_dir input/in-the-wild_example\
|
| 143 |
--output_dir output/in-the-wild_example
|
| 144 |
```
|
|
|
|
| 169 |
bash script/eval/12_eval_nyu.sh
|
| 170 |
```
|
| 171 |
|
| 172 |
+
Alternatively, use the following script to evaluate all datasets.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 173 |
|
| 174 |
```bash
|
| 175 |
+
bash script/eval/00_test_all.sh
|
|
|
|
| 176 |
```
|
| 177 |
+
You can get the result under `output/eval`
|
| 178 |
|
| 179 |
+
> [!IMPORTANT]
|
| 180 |
+
>
|
| 181 |
+
> Although the seed has been set, the results might still be slightly different on different hardware.
|
| 182 |
|
|
|
|
| 183 |
|
| 184 |
+
**Evaluating results**
|
| 185 |
|
| 186 |
+
Only the U-Net is updated and saved during training. To use the inference pipeline with your training result, replace `unet` folder in `train_SCFDepth` checkpoints with that in the `checkpoint` output folder. Then refer to [this section](#evaluation) for evaluation.
|
|
|
|
|
|
|
| 187 |
|
| 188 |
+
> [!IMPORTANT]
|
| 189 |
+
>
|
| 190 |
+
> Although random seeds have been set, the training result might be slightly different on different hardwares. It's recommended to train without interruption.
|
| 191 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 192 |
|
|
|
|
| 193 |
|
| 194 |
## βοΈ Contributing
|
| 195 |
|
|
|
|
| 204 |
|
| 205 |
|
| 206 |
## π Citation
|
| 207 |
+
Please cite our paper:
|
|
|
|
| 208 |
|
| 209 |
```bibtex
|
| 210 |
+
@InProceedings{haruko26scfdepth,
|
| 211 |
+
title={SCFDepth: A Single-Step Coarse-to-Fine Diffusion Framework for Monocular Depth Estimation},
|
| 212 |
+
author={Haruko386 and Yuan Shuai},
|
| 213 |
+
booktitle = {Under review},
|
| 214 |
+
year={2026}
|
| 215 |
}
|
| 216 |
+
```
|
| 217 |
|
| 218 |
## π« License
|
| 219 |
|