Fuse N views into a global state,
then decode a single coherent surface from it.
Variable views → one latent.
Fuse N views into a fixed state of K=128 tokens.
Decode at any resolution.
Decode as many points as you want from one global state.
Independent flow, coherent surface.
While points flow independently, our guidance couples them.
State of the art performance.
Evaluated on 8 benchmarks, from 2 to 32 views.
New surface dataset for real-world scenes.
~10.5K DL3DV scenes, full scene meshes.
01
Overview
N input views
N = 16
Encoder Eφ
VGGT backbone + Perceiver compressor
Global state z
K = 128 tokens · fixed for any N
Decoder vθ
Per-point Flow-matching ODE
M oriented points
M up to 106
Global state. Surflo turns a variable number of unposed images into a single global latent—not a stack of per-view tokens.
Arbitrary resolution. Each surface point is then decoded independently, with a flow-matching ODE conditioned on that latent: we can sample any number of points from one encoder pass.
Coupling via guidance. We finally introduce a communication guidance mechanism relying on a shared rendering loss. At each step of the ODE integration, points are converted into 3D Gaussians and rendered with Gaussian Splatting. This rendering guidance limits disagreement between nearby query points.
02
One global state, One coherent surface
Modern feed-forward 3D models — VGGT, DUSt3R,
DepthAnything-3 — produce a pointmap per view. The representation grows linearly with the number of views, and leads to both noise and redundancy.
Surflo encodes the entire image set into a
global statez —
one fixed-size representation, regardless of how many views you
provide. From it, we decode a single coherent surface at whichever
resolution we ask for.
The result is a representation of geometry that, by construction,
captures only what is shared across views.
Geometry is what remains invariant under transformations of view.
Surflo
VGGT pointmaps
Drag to compare — one shared latent vsN independent pointmaps.
03
Decoding explicit surfaces
Each scene below is a Surflo reconstruction from 16 unposed images. For each scene, a total of 100K points are decoded from the global state before being assembled into a mesh.
For the sake of visualization, RGB colors were computed by naively averaging over input images.
Please click on Normals to toggle normal shading.
Buzz16 views · textured
04
A Global State filtering out Redundancy
Adding more input images doesn't grow the global state. The K = 128 latent tokens simply see more information through cross-attention, leading to more complete reconstructions.
Please see the Surflo points below, with increasing number of input images. For the sake of visualization, RGB colors are computed by naively averaging over input images.
Woody · 17 views17 input images · K=128 latent
05
One latent, Any output resolution
One encoder forward pass, one global state — decode the surface at
whichever density you can afford. No re-encoding required.
Points
Mesh
8K points
Points
Mesh
32K points
Points
Mesh
128K points
06
Points communicate through rendering
Independent ODEs are cheap and parallel, but two nearby queries can lock onto
different surface ambiguities. We couple the points at inference time with a guidance mechanism: At each ODE step, we render the points with Gaussian splatting, back-propagate an image-space loss and update the velocities. The rendering gradient is the channel through which points communicate.
no guidance Plain flow matching
+ photometric L1 + DSSIM
+ monodepth expert Depth order regulariser
07
Comparison with state-of-the-art
Drag the dividers to compare Surflo against VGGT and Gaussian Wrapping, the leading method for surface reconstruction from images. Surflo holds up even on tough captures with strong exposure variation and transparent objects — see for instance the translucent Totoro figurine in Scene 01.
Scene 01
Totoro
Custom capture · 16 views · transparent figurine
Reference
Reference image
Surflo
VGGT pointmaps
Points — Surflo vs VGGT pointmaps
Surflo
Gaussian Wrapping
Mesh — Surflo vs Gaussian Wrapping
Scene 02
Gallos
Custom capture · 16 views
Reference
Reference image
Surflo
VGGT pointmaps
Points — Surflo vs VGGT pointmaps
Surflo
Gaussian Wrapping
Mesh — Surflo vs Gaussian Wrapping
Scene 03
Caterpillar
Tanks & Temples · 16 views
Reference
Reference image
Surflo
VGGT pointmaps
Points — Surflo vs VGGT pointmaps
Surflo
Gaussian Wrapping
Mesh — Surflo vs Gaussian Wrapping
Scene 04
Garden
Mip-NeRF 360 · 16 views
Reference
Reference image
Surflo
VGGT pointmaps
Points — Surflo vs VGGT pointmaps
Surflo
Gaussian Wrapping
Mesh — Surflo vs Gaussian Wrapping
Scene 05
Ignatius
Tanks & Temples · 16 views
Reference
Reference image
Surflo
VGGT pointmaps
Points — Surflo vs VGGT pointmaps
Surflo
Gaussian Wrapping
Mesh — Surflo vs Gaussian Wrapping
Scene 06
Robot
Custom capture · 16 views
Reference
Reference image
Surflo
VGGT pointmaps
Points — Surflo vs VGGT pointmaps
Surflo
Gaussian Wrapping
Mesh — Surflo vs Gaussian Wrapping
08
Numbers
Surflo is trained on our augmented version of DL3DV only, and evaluated on eight benchmarks — four standard novel-view synthesis datasets with reference surfaces computed from dense views using the state-of-the-art meshing method Gaussian Wrapping,
and four benchmarks with native surface
ground truth. Every method sees the same 16 unposed input views per
scene; we report Chamfer Distance (CD ↓) and F1-score
(F1 ↑).
Per-view feed-forward baselines are reported with TSDF fusion to a
single global mesh. Both Surflo rows highlighted — with and
without the shared rendering guidance.
Proxy surfaces obtained from dense views
Chamfer distance ↓
lower is better
VGGT + TSDF
0.0126
DA3 + TSDF
0.0120
NOVA3R
0.0459
2DGS
0.0163
RaDe-GS
0.0166
Gaussian Wrapping
0.0168
Surflo (no guid.)
0.0072
Surflo (guid.)
0.0083
F1-score ↑
higher is better
VGGT + TSDF
69.23
DA3 + TSDF
72.30
NOVA3R
30.51
2DGS
60.10
RaDe-GS
59.48
Gaussian Wrapping
60.67
Surflo (no guid.)
81.92
Surflo (guid.)
78.55
Chamfer distance ↓
lower is better
VGGT + TSDF
0.0113
DA3 + TSDF
0.0177
NOVA3R
0.0432
2DGS
0.0161
RaDe-GS
0.0170
Gaussian Wrapping
0.0157
Surflo (no guid.)
0.0053
Surflo (guid.)
0.0056
F1-score ↑
higher is better
VGGT + TSDF
77.46
DA3 + TSDF
70.80
NOVA3R
32.99
2DGS
62.95
RaDe-GS
61.67
Gaussian Wrapping
64.94
Surflo (no guid.)
88.57
Surflo (guid.)
86.40
Chamfer distance ↓
lower is better
VGGT + TSDF
0.0178
DA3 + TSDF
0.0182
NOVA3R
0.0429
2DGS
0.0222
RaDe-GS
0.0224
Gaussian Wrapping
0.0201
Surflo (no guid.)
0.0068
Surflo (guid.)
0.0103
F1-score ↑
higher is better
VGGT + TSDF
60.64
DA3 + TSDF
59.91
NOVA3R
25.60
2DGS
51.08
RaDe-GS
50.83
Gaussian Wrapping
57.86
Surflo (no guid.)
82.00
Surflo (guid.)
76.57
Chamfer distance ↓
lower is better
VGGT + TSDF
0.0193
DA3 + TSDF
0.0210
NOVA3R
0.0550
2DGS
0.0204
RaDe-GS
0.0202
Gaussian Wrapping
0.0164
Surflo (no guid.)
0.0116
Surflo (guid.)
0.0109
F1-score ↑
higher is better
VGGT + TSDF
62.30
DA3 + TSDF
54.03
NOVA3R
27.61
2DGS
59.54
RaDe-GS
60.04
Gaussian Wrapping
64.54
Surflo (no guid.)
70.96
Surflo (guid.)
75.09
Native surface ground truth
Chamfer distance ↓
lower is better
VGGT + TSDF
0.0138
DA3 + TSDF
0.0151
NOVA3R
0.0635
2DGS
0.0176
RaDe-GS
0.0174
Gaussian Wrapping
0.0145
Surflo (no guid.)
0.0097
Surflo (guid.)
0.0079
F1-score ↑
higher is better
VGGT + TSDF
74.08
DA3 + TSDF
69.07
NOVA3R
27.65
2DGS
62.03
RaDe-GS
62.68
Gaussian Wrapping
66.86
Surflo (no guid.)
77.98
Surflo (guid.)
87.97
Chamfer distance ↓
lower is better
VGGT + TSDF
0.0270
DA3 + TSDF
0.0875
NOVA3R
0.0413
2DGS
0.0295
RaDe-GS
0.0303
Gaussian Wrapping
0.0259
Surflo (no guid.)
0.0103
Surflo (guid.)
0.0114
F1-score ↑
higher is better
VGGT + TSDF
59.64
DA3 + TSDF
52.61
NOVA3R
32.13
2DGS
48.19
RaDe-GS
48.85
Gaussian Wrapping
55.64
Surflo (no guid.)
76.50
Surflo (guid.)
77.28
Chamfer distance ↓
lower is better
VGGT + TSDF
0.0380
DA3 + TSDF
0.0801
NOVA3R
0.0307
2DGS
0.0394
RaDe-GS
0.0393
Gaussian Wrapping
0.0460
Surflo (no guid.)
0.0242
Surflo (guid.)
0.0240
F1-score ↑
higher is better
VGGT + TSDF
28.93
DA3 + TSDF
17.93
NOVA3R
31.41
2DGS
28.78
RaDe-GS
28.30
Gaussian Wrapping
30.17
Surflo (no guid.)
39.23
Surflo (guid.)
42.05
Chamfer distance ↓
lower is better
VGGT + TSDF
0.0226
DA3 + TSDF
0.0220
NOVA3R
0.0771
2DGS
0.0234
RaDe-GS
0.0242
Gaussian Wrapping
0.0123
Surflo (no guid.)
0.0114
Surflo (guid.)
0.0070
F1-score ↑
higher is better
VGGT + TSDF
59.89
DA3 + TSDF
60.95
NOVA3R
27.41
2DGS
53.57
RaDe-GS
53.52
Gaussian Wrapping
62.96
Surflo (no guid.)
61.20
Surflo (guid.)
81.11
Surflo still performs better than the baselines when changing the number of input views. We vary the number of unposed views per scene from 2 to 32 and report
the same Chamfer Distance (CD ↓) and F1-score
(F1 ↑) on the OOD datasets Tanks & Temples and
Mip-NeRF 360. Surflo leads at every view count, including the
hard 2-view regime.
Varying input views
Chamfer distance ↓
lower is better
VGGT + TSDF
0.1444
DA3 + TSDF
0.1428
NOVA3R
0.2620
2DGS
0.1453
RaDe-GS
0.1454
Gaussian Wrapping
0.1476
Surflo (no guid.)
0.1345
Surflo (guid.)
0.1416
F1-score ↑
higher is better
VGGT + TSDF
6.83
DA3 + TSDF
6.97
NOVA3R
5.78
2DGS
6.09
RaDe-GS
6.30
Gaussian Wrapping
5.24
Surflo (no guid.)
9.28
Surflo (guid.)
7.08
Chamfer distance ↓
lower is better
VGGT + TSDF
0.0285
DA3 + TSDF
0.0293
NOVA3R
0.0502
2DGS
0.0316
RaDe-GS
0.0314
Gaussian Wrapping
0.0313
Surflo (no guid.)
0.0135
Surflo (guid.)
0.0198
F1-score ↑
higher is better
VGGT + TSDF
53.62
DA3 + TSDF
47.90
NOVA3R
30.29
2DGS
43.05
RaDe-GS
42.60
Gaussian Wrapping
45.28
Surflo (no guid.)
75.07
Surflo (guid.)
72.65
Chamfer distance ↓
lower is better
VGGT + TSDF
0.0138
DA3 + TSDF
0.0210
NOVA3R
0.0557
2DGS
0.0187
RaDe-GS
0.0191
Gaussian Wrapping
0.0176
Surflo (no guid.)
0.0059
Surflo (guid.)
0.0061
F1-score ↑
higher is better
VGGT + TSDF
70.85
DA3 + TSDF
61.64
NOVA3R
31.78
2DGS
55.95
RaDe-GS
55.67
Gaussian Wrapping
60.04
Surflo (no guid.)
86.59
Surflo (guid.)
86.25
Chamfer distance ↓
lower is better
VGGT + TSDF
0.0094
DA3 + TSDF
0.0140
NOVA3R
0.0423
2DGS
0.0152
RaDe-GS
0.0156
Gaussian Wrapping
0.0133
Surflo (no guid.)
0.0049
Surflo (guid.)
0.0049
F1-score ↑
higher is better
VGGT + TSDF
84.83
DA3 + TSDF
74.59
NOVA3R
35.54
2DGS
68.43
RaDe-GS
66.24
Gaussian Wrapping
72.10
Surflo (no guid.)
90.76
Surflo (guid.)
90.34
Chamfer distance ↓
lower is better
VGGT + TSDF
0.0755
DA3 + TSDF
0.0746
NOVA3R
0.0795
2DGS
0.0754
RaDe-GS
0.0756
Gaussian Wrapping
0.0766
Surflo (no guid.)
0.0714
Surflo (guid.)
0.0736
F1-score ↑
higher is better
VGGT + TSDF
9.63
DA3 + TSDF
9.24
NOVA3R
8.46
2DGS
8.73
RaDe-GS
8.76
Gaussian Wrapping
7.28
Surflo (no guid.)
13.07
Surflo (guid.)
10.40
Chamfer distance ↓
lower is better
VGGT + TSDF
0.0326
DA3 + TSDF
0.0322
NOVA3R
0.0642
2DGS
0.0344
RaDe-GS
0.0352
Gaussian Wrapping
0.0339
Surflo (no guid.)
0.0192
Surflo (guid.)
0.0263
F1-score ↑
higher is better
VGGT + TSDF
40.79
DA3 + TSDF
40.31
NOVA3R
21.41
2DGS
37.15
RaDe-GS
37.11
Gaussian Wrapping
41.61
Surflo (no guid.)
58.68
Surflo (guid.)
53.71
Chamfer distance ↓
lower is better
VGGT + TSDF
0.0244
DA3 + TSDF
0.0220
NOVA3R
0.0492
2DGS
0.0283
RaDe-GS
0.0293
Gaussian Wrapping
0.0251
Surflo (no guid.)
0.0127
Surflo (guid.)
0.0145
F1-score ↑
higher is better
VGGT + TSDF
55.29
DA3 + TSDF
55.17
NOVA3R
28.01
2DGS
47.45
RaDe-GS
47.89
Gaussian Wrapping
53.97
Surflo (no guid.)
74.07
Surflo (guid.)
73.44
Chamfer distance ↓
lower is better
VGGT + TSDF
0.0161
DA3 + TSDF
0.0162
NOVA3R
0.0562
2DGS
0.0217
RaDe-GS
0.0215
Gaussian Wrapping
0.0162
Surflo (no guid.)
0.0071
Surflo (guid.)
0.0137
F1-score ↑
higher is better
VGGT + TSDF
65.56
DA3 + TSDF
64.75
NOVA3R
18.95
2DGS
52.13
RaDe-GS
53.08
Gaussian Wrapping
62.41
Surflo (no guid.)
81.24
Surflo (guid.)
80.66
09
A new Dataset for Surface Reconstruction
Alongside Surflo we will release an augmented version of DL3DV
where every scene ships with a full surface mesh covering both
foreground and background geometry. Each mesh is computed
with the state-of-the-art Gaussian Wrapping pipeline, giving
the community a large, scene-level supervision signal for surface reconstruction.
@article{guedon2026surflo,
title = {Surflo: Consistent 3D Surface Flow from a Global State},
author = {Gu{\'e}don, Antoine and Nakamura, Shu and Dufour, Nicolas
and Lei, Jiahui and Nishino, Ko and Kanazawa, Angjoo},
journal = {arXiv preprint},
year = {2026}
}