AniAggarwal commited on
Commit
a084a7a
·
verified ·
0 Parent(s):

Initial commit.

Browse files
.gitattributes ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ *.7z filter=lfs diff=lfs merge=lfs -text
2
+ *.arrow filter=lfs diff=lfs merge=lfs -text
3
+ *.bin filter=lfs diff=lfs merge=lfs -text
4
+ *.bz2 filter=lfs diff=lfs merge=lfs -text
5
+ *.ckpt filter=lfs diff=lfs merge=lfs -text
6
+ *.ftz filter=lfs diff=lfs merge=lfs -text
7
+ *.gz filter=lfs diff=lfs merge=lfs -text
8
+ *.h5 filter=lfs diff=lfs merge=lfs -text
9
+ *.joblib filter=lfs diff=lfs merge=lfs -text
10
+ *.lfs.* filter=lfs diff=lfs merge=lfs -text
11
+ *.mlmodel filter=lfs diff=lfs merge=lfs -text
12
+ *.model filter=lfs diff=lfs merge=lfs -text
13
+ *.msgpack filter=lfs diff=lfs merge=lfs -text
14
+ *.npy filter=lfs diff=lfs merge=lfs -text
15
+ *.npz filter=lfs diff=lfs merge=lfs -text
16
+ *.onnx filter=lfs diff=lfs merge=lfs -text
17
+ *.ot filter=lfs diff=lfs merge=lfs -text
18
+ *.parquet filter=lfs diff=lfs merge=lfs -text
19
+ *.pb filter=lfs diff=lfs merge=lfs -text
20
+ *.pickle filter=lfs diff=lfs merge=lfs -text
21
+ *.pkl filter=lfs diff=lfs merge=lfs -text
22
+ *.pt filter=lfs diff=lfs merge=lfs -text
23
+ *.pth filter=lfs diff=lfs merge=lfs -text
24
+ *.rar filter=lfs diff=lfs merge=lfs -text
25
+ *.safetensors filter=lfs diff=lfs merge=lfs -text
26
+ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
27
+ *.tar.* filter=lfs diff=lfs merge=lfs -text
28
+ *.tar filter=lfs diff=lfs merge=lfs -text
29
+ *.tflite filter=lfs diff=lfs merge=lfs -text
30
+ *.tgz filter=lfs diff=lfs merge=lfs -text
31
+ *.wasm filter=lfs diff=lfs merge=lfs -text
32
+ *.xz filter=lfs diff=lfs merge=lfs -text
33
+ *.zip filter=lfs diff=lfs merge=lfs -text
34
+ *.zst filter=lfs diff=lfs merge=lfs -text
35
+ *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ Gigi_1_512.png filter=lfs diff=lfs merge=lfs -text
37
+ Gigi_1_512.png_uplift_dinov3-splus16-4-PCA.png filter=lfs diff=lfs merge=lfs -text
Gigi_1_512.png ADDED

Git LFS Details

  • SHA256: 083e6064bed642ba81046ea0e1861447bcf92ebf8f57e60f0505453a65a40d35
  • Pointer size: 131 Bytes
  • Size of remote file: 453 kB
Gigi_1_512.png_uplift_dinov3-splus16-4-PCA.png ADDED

Git LFS Details

  • SHA256: b35c9eb0aee475642c218620856e80db59d4f9ef58d201f8af20d52f58a5d861
  • Pointer size: 131 Bytes
  • Size of remote file: 166 kB
Gigi_1_512.png_uplift_dinov3-splus16-base-feature-PCA.png ADDED
README.md ADDED
@@ -0,0 +1,129 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ library_name: pytorch
4
+ tags:
5
+ - feature-upsampling
6
+ - pixel-dense-features
7
+ - computer-vision
8
+ - dinov3
9
+ - vision-transformer
10
+ - uplift
11
+ datasets:
12
+ - ILSVRC/imagenet-1k
13
+ ---
14
+
15
+ # UPLiFT for DINOv3-S+/16
16
+
17
+ | Input Image | Base DINOv3 Features | UPLiFT Upsampled Features |
18
+ |:-----------:|:--------------------:|:-------------------------:|
19
+ | ![Input](Gigi_1_512.png) | ![Base Features](Gigi_1_512.png_uplift_dinov3-splus16-base-feature-PCA.png) | ![UPLiFT Features](Gigi_1_512.png_uplift_dinov3-splus16-4-PCA.png) |
20
+
21
+ This is the official pretrained **UPLiFT** (Efficient Pixel-Dense Feature Upsampling with Local Attenders) model for the **DINOv3-S+/16** backbone.
22
+
23
+ UPLiFT is a lightweight method to upscale features from pretrained vision backbones to create pixel-dense feature maps. It uses Local Attenders to efficiently upsample low-resolution backbone features while preserving semantic information.
24
+
25
+ ## Model Details
26
+
27
+ | Property | Value |
28
+ |----------|-------|
29
+ | **Backbone** | DINOv3-S+/16 (`vit_small_plus_patch16_dinov3.lvd1689m`) |
30
+ | **Backbone Channels** | 384 |
31
+ | **Patch Size** | 16 |
32
+ | **Upsampling Factor** | 2x per iteration |
33
+ | **Local Attender Size** | N=17 |
34
+ | **Training Dataset** | ImageNet |
35
+ | **Training Image Size** | 448x448 |
36
+ | **License** | MIT |
37
+
38
+ ## Links
39
+
40
+ - **Paper**: [Coming Soon]
41
+ - **GitHub**: [https://github.com/mwalmer-umd/UPLiFT](https://github.com/mwalmer-umd/UPLiFT)
42
+ - **Project Website**: [https://www.cs.umd.edu/~mwalmer/uplift/](https://www.cs.umd.edu/~mwalmer/uplift/)
43
+
44
+ ## Installation
45
+
46
+ ```bash
47
+ pip install 'uplift[vit] @ git+https://github.com/mwalmer-umd/UPLiFT.git'
48
+ ```
49
+
50
+ ## Quick Start
51
+
52
+ ```python
53
+ import torch
54
+ from PIL import Image
55
+
56
+ # Load model (weights auto-download from HuggingFace)
57
+ model = torch.hub.load('mwalmer-umd/UPLiFT', 'uplift_dinov3_splus16')
58
+
59
+ # Run inference
60
+ image = Image.open('your_image.jpg')
61
+ features = model(image) # Returns pixel-dense features
62
+ ```
63
+
64
+ ## Usage Options
65
+
66
+ ### Adjust Upsampling Iterations
67
+
68
+ Control the number of iterative upsampling steps (default: 4):
69
+
70
+ ```python
71
+ # Fewer iterations = lower memory usage
72
+ model = torch.hub.load('mwalmer-umd/UPLiFT', 'uplift_dinov3_splus16', iters=4)
73
+ ```
74
+
75
+ ### Raw UPLiFT Model (Without Backbone)
76
+
77
+ Load only the UPLiFT upsampling module without the DINOv3 backbone:
78
+
79
+ ```python
80
+ model = torch.hub.load('mwalmer-umd/UPLiFT', 'uplift_dinov3_splus16',
81
+ include_extractor=False)
82
+ ```
83
+
84
+ ### Return Base Features
85
+
86
+ Get both upsampled and original backbone features:
87
+
88
+ ```python
89
+ model = torch.hub.load('mwalmer-umd/UPLiFT', 'uplift_dinov3_splus16',
90
+ return_base_feat=True)
91
+ upsampled_features, base_features = model(image)
92
+ ```
93
+
94
+ ## Architecture
95
+
96
+ UPLiFT consists of:
97
+
98
+ 1. **Encoder**: Processes the input image with a series of convolutional blocks to create dense representations to guide feature upsampling
99
+ 2. **Decoder**: Upsamples features using transposed convolutions with bilinear residual connections
100
+ 3. **Local Attender**: A local-neighborhood-based attention pooling module that maintains semantic consistency with the original features
101
+
102
+ The model uses encoder sharing, meaning a single encoder pass is used across all upsampling iterations for efficiency.
103
+
104
+ ## Intended Use
105
+
106
+ This model is designed for:
107
+
108
+ - Creating pixel-dense feature maps from DINOv3 features
109
+ - Dense prediction tasks (semantic segmentation, depth estimation, etc.)
110
+ - Feature visualization and analysis
111
+ - Research on vision foundation models
112
+
113
+ ## Limitations
114
+
115
+ - Optimized specifically for DINOv3-S+/16 features; may not generalize to other backbones without retraining
116
+ - Performance depends on the quality of the underlying DINOv3 features
117
+ - Higher iteration counts increase computation time
118
+
119
+ ## Citation
120
+
121
+ If you use UPLiFT in your research, please cite our paper.
122
+
123
+ [citation coming soon]
124
+
125
+ ## Acknowledgements
126
+
127
+ This work builds upon:
128
+ - [DINOv3](https://github.com/facebookresearch/dinov3) by Meta AI
129
+ - [timm](https://github.com/huggingface/pytorch-image-models) for model loading
uplift_dinov3-splus16.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4b226a20c6d8b1ff5274d67eda4304c575ee0ac729093aec7bfec4eaa110be94
3
+ size 3170760