developy commited on
Commit
ac05de6
Β·
verified Β·
1 Parent(s): f9bd0da

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +51 -63
README.md CHANGED
@@ -1,21 +1,23 @@
1
- # ApDepth: Aiming for Precise Monocular Depth Estimation Based on Diffusion Models
2
 
3
  This repository is based on [Marigold](https://marigoldmonodepth.github.io), CVPR 2024 Best Paper: [**Repurposing Diffusion-Based Image Generators for Monocular Depth Estimation**](https://arxiv.org/abs/2312.02145)
4
 
5
  [![Website](doc/badges/badge-website.svg)](https://haruko386.github.io/research)
6
  [![License](https://img.shields.io/badge/License-Apache--2.0-929292)](https://www.apache.org/licenses/LICENSE-2.0)
7
- [![Static Badge](https://img.shields.io/badge/build-Haruko386-brightgreen?style=flat&logo=steam&logoColor=white&logoSize=auto&label=steam&labelColor=black&color=gray&cacheSeconds=3600)](https://steamcommunity.com/profiles/76561198217881431/)
8
  [![Hugging Face Model](https://img.shields.io/badge/πŸ€—%20Hugging%20Face-Model-green)](https://huggingface.co/developy/ApDepth)
9
  [![Hugging Face Demo](https://img.shields.io/badge/πŸ€—%20Hugging%20Face-Demo-purple)](https://huggingface.co/spaces/developy/ApDepth)
10
 
11
- [Haruko386](https://haruko386.github.io/),
12
- [Shuai Yuan](https://syjz.teacher.360eol.com/teacherBasic/preview?teacherId=23776)
 
 
13
 
14
  ![cover](doc/cover.jpg)
15
 
16
- >We present **ApDepth**, a diffusion model, and associated fine-tuning protocol for monocular depth estimation. Based on Marigold. Its core innovation lies in addressing the deficiency of diffusion models in feature representation capability. Our model followed Marigold, derived from Stable Diffusion and fine-tuned with synthetic data: Hypersim and VKitti, achieved ideal results in object edge refinement.
17
 
18
  ## πŸ“’ News
 
19
  - 2025-10-09: We propose a novel diffusion-based deep estimation framework guided by pre-trained models.
20
  - 2025-09-23: We change Marigold from **Stochastic multi-step generation** to **Deterministic one-step perception**
21
  - 2025-08-10: Trying to make some optimizations in Feature Expression<br>
@@ -23,7 +25,7 @@ This repository is based on [Marigold](https://marigoldmonodepth.github.io), CVP
23
 
24
  ## πŸš€ Usage
25
 
26
- **We offer several ways to interact with Marigold**:
27
 
28
  1. A free online interactive demo is available here: <a href="https://huggingface.co/spaces/developy/ApDepth"><img src="https://img.shields.io/badge/πŸ€—%20Hugging%20Face-Demo-purple" height="18"></a>
29
 
@@ -32,10 +34,13 @@ This repository is based on [Marigold](https://marigoldmonodepth.github.io), CVP
32
  3. Local development instructions with this codebase are given below.
33
 
34
  ## πŸ› οΈ Setup
 
 
 
35
 
36
  The inference code was tested on:
37
 
38
- - Ubuntu 22.04 LTS, Python 3.12.9, CUDA 11.8, GeForce RTX 4090 & GeForce RTX 5080 (pip)
39
 
40
  ### πŸͺ§ A Note for Windows users
41
 
@@ -50,8 +55,8 @@ We recommend running the code in WSL2:
50
  Clone the repository (requires git):
51
 
52
  ```bash
53
- git clone https://github.com/Haruko386/ApDepth.git
54
- cd ApDepth
55
  ```
56
 
57
  ### πŸ’» Dependencies
@@ -59,12 +64,18 @@ cd ApDepth
59
  **Using Conda:**
60
  Alternatively, create a Python native virtual environment and install dependencies into it:
61
 
62
- conda create -n apdepth python==3.12.9
63
- conda activate apdepth
64
- pip install -r requirements.txt
 
 
 
 
 
 
 
 
65
 
66
- Keep the environment activated before running the inference script.
67
- Activate the environment again after restarting the terminal session.
68
 
69
  ## πŸƒ Testing on your images
70
 
@@ -81,8 +92,7 @@ This setting corresponds to our paper. For academic comparison, please run with
81
  ```bash
82
  python run.py \
83
  --checkpoint prs-eth/marigold-v1-0 \
84
- --denoise_steps 50 \
85
- --ensemble_size 10 \
86
  --input_rgb_dir input/in-the-wild_example \
87
  --output_dir output/in-the-wild_example
88
  ```
@@ -94,10 +104,9 @@ You can find all results in `output/in-the-wild_example`. Enjoy!
94
  The default settings are optimized for the best result. However, the behavior of the code can be customized:
95
 
96
  - Trade-offs between the **accuracy** and **speed** (for both options, larger values result in better accuracy at the cost of slower inference.)
97
- - `--ensemble_size`: Number of inference passes in the ensemble. For LCM `ensemble_size` is more important than `denoise_steps`. Default: ~~10~~ 5 (for LCM).
98
- - `--denoise_steps`: Number of denoising steps of each inference pass. For the original (DDIM) version, it's recommended to use 10-50 steps, while for LCM 1-4 steps. When unassigned (`None`), will read default setting from model config. Default: ~~10 4 (for LCM)~~ `None`.
99
 
100
- - By default, the inference script resizes input images to the *processing resolution*, and then resizes the prediction back to the original resolution. This gives the best quality, as Stable Diffusion, from which Marigold is derived, performs best at 768x768 resolution.
101
 
102
  - `--processing_res`: the processing resolution; set as 0 to process the input resolution directly. When unassigned (`None`), will read default setting from model config. Default: ~~768~~ `None`.
103
  - `--output_processing_res`: produce output at the processing resolution instead of upsampling it to the input resolution. Default: False.
@@ -111,28 +120,25 @@ The default settings are optimized for the best result. However, the behavior of
111
 
112
  ### ⬇ Checkpoint cache
113
 
114
- By default, the [checkpoint](https://huggingface.co/prs-eth/marigold-v1-0) is stored in the Hugging Face cache.
115
  The `HF_HOME` environment variable defines its location and can be overridden, e.g.:
116
 
117
  ```bash
118
  export HF_HOME=$(pwd)/cache
119
  ```
120
-
121
  Alternatively, use the following script to download the checkpoint weights locally:
122
 
123
  ```bash
124
  bash script/download_weights.sh marigold-v1-0
125
- # or LCM checkpoint
126
- bash script/download_weights.sh marigold-lcm-v1-0
127
- ```
128
 
129
  At inference, specify the checkpoint path:
130
 
131
  ```bash
132
  python run.py \
133
- --checkpoint checkpoint/marigold-v1-0 \
134
- --denoise_steps 50 \
135
- --ensemble_size 10 \
136
  --input_rgb_dir input/in-the-wild_example\
137
  --output_dir output/in-the-wild_example
138
  ```
@@ -163,44 +169,27 @@ bash script/eval/11_infer_nyu.sh
163
  bash script/eval/12_eval_nyu.sh
164
  ```
165
 
166
- Note: although the seed has been set, the results might still be slightly different on different hardware.
167
-
168
- ## πŸ‹οΈ Training
169
-
170
- Based on the previously created environment, install extended requirements:
171
-
172
- ```bash
173
- pip install -r requirements++.txt -r requirements+.txt -r requirements.txt
174
- ```
175
-
176
- Set environment parameters for the data directory:
177
 
178
  ```bash
179
- export BASE_DATA_DIR=YOUR_DATA_DIR # directory of training data
180
- export BASE_CKPT_DIR=YOUR_CHECKPOINT_DIR # directory of pretrained checkpoint
181
  ```
 
182
 
183
- Download Stable Diffusion v2 [checkpoint](https://huggingface.co/stabilityai/stable-diffusion-2) into `${BASE_CKPT_DIR}`
 
 
184
 
185
- Prepare for [Hypersim](https://github.com/apple/ml-hypersim) and [Virtual KITTI 2](https://europe.naverlabs.com/research/computer-vision/proxy-virtual-worlds-vkitti-2/) datasets and save into `${BASE_DATA_DIR}`. Please refer to [this README](script/dataset_preprocess/hypersim/README.md) for Hypersim preprocessing.
186
 
187
- Run training script
188
 
189
- ```bash
190
- python train.py --config config/train_marigold.yaml --no_wandb
191
- ```
192
 
193
- Resume from a checkpoint, e.g.
 
 
194
 
195
- ```bash
196
- python train.py --resume_run output/train_marigold/checkpoint/latest --no_wandb
197
- ```
198
-
199
- Evaluating results
200
-
201
- Only the U-Net is updated and saved during training. To use the inference pipeline with your training result, replace `unet` folder in Marigold checkpoints with that in the `checkpoint` output folder. Then refer to [this section](#evaluation) for evaluation.
202
 
203
- **Note**: Although random seeds have been set, the training result might be slightly different on different hardwares. It's recommended to train without interruption.
204
 
205
  ## ✏️ Contributing
206
 
@@ -215,17 +204,16 @@ Please refer to [this](CONTRIBUTING.md) instruction.
215
 
216
 
217
  ## πŸŽ“ Citation
218
- Waitting for publishing⏱️
219
- <!-- Please cite our paper:
220
 
221
  ```bibtex
222
- @InProceedings{ke2023repurposing,
223
- title={Repurposing Diffusion-Based Image Generators for Monocular Depth Estimation},
224
- author={Bingxin Ke and Anton Obukhov and Shengyu Huang and Nando Metzger and Rodrigo Caye Daudt and Konrad Schindler},
225
- booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
226
- year={2024}
227
  }
228
- ``` -->
229
 
230
  ## 🎫 License
231
 
 
1
+ # SCFDepth: A Single-Step Coarse-to-Fine Diffusion Framework for Monocular Depth Estimation
2
 
3
  This repository is based on [Marigold](https://marigoldmonodepth.github.io), CVPR 2024 Best Paper: [**Repurposing Diffusion-Based Image Generators for Monocular Depth Estimation**](https://arxiv.org/abs/2312.02145)
4
 
5
  [![Website](doc/badges/badge-website.svg)](https://haruko386.github.io/research)
6
  [![License](https://img.shields.io/badge/License-Apache--2.0-929292)](https://www.apache.org/licenses/LICENSE-2.0)
 
7
  [![Hugging Face Model](https://img.shields.io/badge/πŸ€—%20Hugging%20Face-Model-green)](https://huggingface.co/developy/ApDepth)
8
  [![Hugging Face Demo](https://img.shields.io/badge/πŸ€—%20Hugging%20Face-Demo-purple)](https://huggingface.co/spaces/developy/ApDepth)
9
 
10
+ [**Haruko386**](https://haruko386.github.io/),
11
+ [Shuai Yuan](https://syjz.teacher.360eol.com/teacherBasic/preview?teacherId=23776),
12
+ [Mingbo Lei](https://github.com/Ltohka)
13
+ [Yibo Chen](#)
14
 
15
  ![cover](doc/cover.jpg)
16
 
17
+ >We present **SCFDepth**, a diffusion model, and associated fine-tuning protocol for monocular depth estimation. Based on Marigold. Its core innovation lies in addressing the deficiency of diffusion models in feature representation capability. Our model followed Marigold, derived from Stable Diffusion and fine-tuned with synthetic data: Hypersim and VKitti, achieved ideal results in object edge refinement.
18
 
19
  ## πŸ“’ News
20
+ - 2025-10-25: Inspired by DepthMaster, we propose a two-stage loss function training strategy based on `Apepth V1-0`. In the first stage, we perform foundational training using MSE loss. In the second stage, we learn edge structures through FFT loss. Based on this, we introduce Apepth V1-1.
21
  - 2025-10-09: We propose a novel diffusion-based deep estimation framework guided by pre-trained models.
22
  - 2025-09-23: We change Marigold from **Stochastic multi-step generation** to **Deterministic one-step perception**
23
  - 2025-08-10: Trying to make some optimizations in Feature Expression<br>
 
25
 
26
  ## πŸš€ Usage
27
 
28
+ **We offer several ways to interact with SCFDepth**:
29
 
30
  1. A free online interactive demo is available here: <a href="https://huggingface.co/spaces/developy/ApDepth"><img src="https://img.shields.io/badge/πŸ€—%20Hugging%20Face-Demo-purple" height="18"></a>
31
 
 
34
  3. Local development instructions with this codebase are given below.
35
 
36
  ## πŸ› οΈ Setup
37
+ The Model was trained on:
38
+
39
+ - Ubuntu 22.04 LTS, Python 3.12.9, CUDA 11.8, `NVIDIA RTX 6000 Ada Generation`
40
 
41
  The inference code was tested on:
42
 
43
+ - Ubuntu 22.04 LTS, Python 3.12.9, CUDA 11.8, `NVIDIA GeForce RTX 4090`
44
 
45
  ### πŸͺ§ A Note for Windows users
46
 
 
55
  Clone the repository (requires git):
56
 
57
  ```bash
58
+ git clone https://github.com/Dimon0000000/SCFDepth.git
59
+ cd SCFDepth
60
  ```
61
 
62
  ### πŸ’» Dependencies
 
64
  **Using Conda:**
65
  Alternatively, create a Python native virtual environment and install dependencies into it:
66
 
67
+ ```bash
68
+ conda create -n SCFDepth python==3.12.9
69
+ conda activate SCFDepth
70
+ pip install -r requirements.txt
71
+ ```
72
+
73
+ > [!NOTE]
74
+ >
75
+ > Keep the environment activated before running the inference script.
76
+ > Activate the environment again after restarting the terminal session.
77
+
78
 
 
 
79
 
80
  ## πŸƒ Testing on your images
81
 
 
92
  ```bash
93
  python run.py \
94
  --checkpoint prs-eth/marigold-v1-0 \
95
+ --ensemble_size 1 \
 
96
  --input_rgb_dir input/in-the-wild_example \
97
  --output_dir output/in-the-wild_example
98
  ```
 
104
  The default settings are optimized for the best result. However, the behavior of the code can be customized:
105
 
106
  - Trade-offs between the **accuracy** and **speed** (for both options, larger values result in better accuracy at the cost of slower inference.)
107
+ - `--ensemble_size`: Number of inference passes in the ensemble.
 
108
 
109
+ - By default, the inference script resizes input images to the *processing resolution*, and then resizes the prediction back to the original resolution. This gives the best quality, as Stable Diffusion, from which SCFDepth is derived, performs best at 768x768 resolution.
110
 
111
  - `--processing_res`: the processing resolution; set as 0 to process the input resolution directly. When unassigned (`None`), will read default setting from model config. Default: ~~768~~ `None`.
112
  - `--output_processing_res`: produce output at the processing resolution instead of upsampling it to the input resolution. Default: False.
 
120
 
121
  ### ⬇ Checkpoint cache
122
 
123
+ By default, the [checkpoint](https://huggingface.co/developy/ApDepth) is stored in the Hugging Face cache.
124
  The `HF_HOME` environment variable defines its location and can be overridden, e.g.:
125
 
126
  ```bash
127
  export HF_HOME=$(pwd)/cache
128
  ```
129
+ <!--
130
  Alternatively, use the following script to download the checkpoint weights locally:
131
 
132
  ```bash
133
  bash script/download_weights.sh marigold-v1-0
134
+ ``` -->
 
 
135
 
136
  At inference, specify the checkpoint path:
137
 
138
  ```bash
139
  python run.py \
140
+ --checkpoint checkpoints/SCFDepth \
141
+ --ensemble_size 1 \
 
142
  --input_rgb_dir input/in-the-wild_example\
143
  --output_dir output/in-the-wild_example
144
  ```
 
169
  bash script/eval/12_eval_nyu.sh
170
  ```
171
 
172
+ Alternatively, use the following script to evaluate all datasets.
 
 
 
 
 
 
 
 
 
 
173
 
174
  ```bash
175
+ bash script/eval/00_test_all.sh
 
176
  ```
177
+ You can get the result under `output/eval`
178
 
179
+ > [!IMPORTANT]
180
+ >
181
+ > Although the seed has been set, the results might still be slightly different on different hardware.
182
 
 
183
 
184
+ **Evaluating results**
185
 
186
+ Only the U-Net is updated and saved during training. To use the inference pipeline with your training result, replace `unet` folder in `train_SCFDepth` checkpoints with that in the `checkpoint` output folder. Then refer to [this section](#evaluation) for evaluation.
 
 
187
 
188
+ > [!IMPORTANT]
189
+ >
190
+ > Although random seeds have been set, the training result might be slightly different on different hardwares. It's recommended to train without interruption.
191
 
 
 
 
 
 
 
 
192
 
 
193
 
194
  ## ✏️ Contributing
195
 
 
204
 
205
 
206
  ## πŸŽ“ Citation
207
+ Please cite our paper:
 
208
 
209
  ```bibtex
210
+ @InProceedings{haruko26scfdepth,
211
+ title={SCFDepth: A Single-Step Coarse-to-Fine Diffusion Framework for Monocular Depth Estimation},
212
+ author={Haruko386 and Yuan Shuai},
213
+ booktitle = {Under review},
214
+ year={2026}
215
  }
216
+ ```
217
 
218
  ## 🎫 License
219