Surflo Consistent 3D Surface Flow from a Global State
Scroll
arXiv preprint · 2026
Antoine Guédon* LIX, École Polytechnique
Shu Nakamura* Kyoto University
Nicolas Dufour* Kyutai
Jiahui Lei UC Berkeley
Ko Nishino Kyoto University
Angjoo Kanazawa UC Berkeley

* Equal contribution

Fuse N views into a global state,
then decode a single coherent surface from it.

Variable views
one latent.

Fuse N views into a fixed state of K=128 tokens.

Decode at any resolution.

Decode as many points as you want from one global state.

Independent flow, coherent surface.

While points flow independently, our guidance couples them.

State of the art performance.

Evaluated on 8 benchmarks, from 2 to 32 views.

New surface dataset for real-world scenes.

~10.5K DL3DV scenes, full scene meshes.



01

Overview


02

One global state, One coherent surface

Modern feed-forward 3D models — VGGT, DUSt3R, DepthAnything-3 — produce a pointmap per view. The representation grows linearly with the number of views, and leads to both noise and redundancy.
Surflo encodes the entire image set into a global state z — one fixed-size representation, regardless of how many views you provide. From it, we decode a single coherent surface at whichever resolution we ask for.

The result is a representation of geometry that, by construction, captures only what is shared across views.

Geometry is what remains invariant under transformations of view.

Felix Klein · Erlangen Programme, 1872
Surflo: a single coherent surface decoded from one global state
Drag to compare — one shared latent vs N independent pointmaps.
04

A Global State filtering out Redundancy

Adding more input images doesn't grow the global state. The K = 128 latent tokens simply see more information through cross-attention, leading to more complete reconstructions. Please see the Surflo points below, with increasing number of input images. For the sake of visualization, RGB colors are computed by naively averaging over input images.

05

One latent, Any output resolution

One encoder forward pass, one global state — decode the surface at whichever density you can afford. No re-encoding required.

Surflo points · 8K
8K points
Surflo points · 32K
32K points
Surflo points · 128K
128K points
06

Points communicate through rendering

Independent ODEs are cheap and parallel, but two nearby queries can lock onto different surface ambiguities. We couple the points at inference time with a guidance mechanism: At each ODE step, we render the points with Gaussian splatting, back-propagate an image-space loss and update the velocities. The rendering gradient is the channel through which points communicate.

no guidance
Plain flow matching
+ photometric
L1 + DSSIM
+ monodepth expert
Depth order regulariser
07

Comparison with state-of-the-art

Drag the dividers to compare Surflo against VGGT and Gaussian Wrapping, the leading method for surface reconstruction from images. Surflo holds up even on tough captures with strong exposure variation and transparent objects — see for instance the translucent Totoro figurine in Scene 01.

Scene 01

Totoro

Custom capture · 16 views · transparent figurine
Totoro reference photo Reference
Reference image
Surflo points
Points — Surflo vs VGGT pointmaps
Surflo mesh
Mesh — Surflo vs Gaussian Wrapping
Scene 02

Gallos

Custom capture · 16 views
Gallos reference photo Reference
Reference image
Surflo points
Points — Surflo vs VGGT pointmaps
Surflo mesh
Mesh — Surflo vs Gaussian Wrapping
Scene 03

Caterpillar

Tanks & Temples · 16 views
Caterpillar reference photo Reference
Reference image
Surflo points
Points — Surflo vs VGGT pointmaps
Surflo mesh
Mesh — Surflo vs Gaussian Wrapping
Scene 04

Garden

Mip-NeRF 360 · 16 views
Garden reference photo Reference
Reference image
Surflo points
Points — Surflo vs VGGT pointmaps
Surflo mesh
Mesh — Surflo vs Gaussian Wrapping
Scene 05

Ignatius

Tanks & Temples · 16 views
Ignatius reference photo Reference
Reference image
Surflo points
Points — Surflo vs VGGT pointmaps
Surflo mesh
Mesh — Surflo vs Gaussian Wrapping
Scene 06

Robot

Custom capture · 16 views
Robot reference photo Reference
Reference image
Surflo points
Points — Surflo vs VGGT pointmaps
Surflo mesh
Mesh — Surflo vs Gaussian Wrapping
08

Numbers

Surflo is trained on our augmented version of DL3DV only, and evaluated on eight benchmarks — four standard novel-view synthesis datasets with reference surfaces computed from dense views using the state-of-the-art meshing method Gaussian Wrapping, and four benchmarks with native surface ground truth. Every method sees the same 16 unposed input views per scene; we report Chamfer Distance (CD ↓) and F1-score (F1 ↑). Per-view feed-forward baselines are reported with TSDF fusion to a single global mesh. Both Surflo rows highlighted — with and without the shared rendering guidance.

Proxy surfaces obtained from dense views

Chamfer distance ↓

lower is better

  • VGGT + TSDF
    0.0126
  • DA3 + TSDF
    0.0120
  • NOVA3R
    0.0459
  • 2DGS
    0.0163
  • RaDe-GS
    0.0166
  • Gaussian Wrapping
    0.0168
  • Surflo (no guid.)
    0.0072
  • Surflo (guid.)
    0.0083

F1-score ↑

higher is better

  • VGGT + TSDF
    69.23
  • DA3 + TSDF
    72.30
  • NOVA3R
    30.51
  • 2DGS
    60.10
  • RaDe-GS
    59.48
  • Gaussian Wrapping
    60.67
  • Surflo (no guid.)
    81.92
  • Surflo (guid.)
    78.55

Chamfer distance ↓

lower is better

  • VGGT + TSDF
    0.0113
  • DA3 + TSDF
    0.0177
  • NOVA3R
    0.0432
  • 2DGS
    0.0161
  • RaDe-GS
    0.0170
  • Gaussian Wrapping
    0.0157
  • Surflo (no guid.)
    0.0053
  • Surflo (guid.)
    0.0056

F1-score ↑

higher is better

  • VGGT + TSDF
    77.46
  • DA3 + TSDF
    70.80
  • NOVA3R
    32.99
  • 2DGS
    62.95
  • RaDe-GS
    61.67
  • Gaussian Wrapping
    64.94
  • Surflo (no guid.)
    88.57
  • Surflo (guid.)
    86.40

Chamfer distance ↓

lower is better

  • VGGT + TSDF
    0.0178
  • DA3 + TSDF
    0.0182
  • NOVA3R
    0.0429
  • 2DGS
    0.0222
  • RaDe-GS
    0.0224
  • Gaussian Wrapping
    0.0201
  • Surflo (no guid.)
    0.0068
  • Surflo (guid.)
    0.0103

F1-score ↑

higher is better

  • VGGT + TSDF
    60.64
  • DA3 + TSDF
    59.91
  • NOVA3R
    25.60
  • 2DGS
    51.08
  • RaDe-GS
    50.83
  • Gaussian Wrapping
    57.86
  • Surflo (no guid.)
    82.00
  • Surflo (guid.)
    76.57

Chamfer distance ↓

lower is better

  • VGGT + TSDF
    0.0193
  • DA3 + TSDF
    0.0210
  • NOVA3R
    0.0550
  • 2DGS
    0.0204
  • RaDe-GS
    0.0202
  • Gaussian Wrapping
    0.0164
  • Surflo (no guid.)
    0.0116
  • Surflo (guid.)
    0.0109

F1-score ↑

higher is better

  • VGGT + TSDF
    62.30
  • DA3 + TSDF
    54.03
  • NOVA3R
    27.61
  • 2DGS
    59.54
  • RaDe-GS
    60.04
  • Gaussian Wrapping
    64.54
  • Surflo (no guid.)
    70.96
  • Surflo (guid.)
    75.09

Native surface ground truth

Chamfer distance ↓

lower is better

  • VGGT + TSDF
    0.0138
  • DA3 + TSDF
    0.0151
  • NOVA3R
    0.0635
  • 2DGS
    0.0176
  • RaDe-GS
    0.0174
  • Gaussian Wrapping
    0.0145
  • Surflo (no guid.)
    0.0097
  • Surflo (guid.)
    0.0079

F1-score ↑

higher is better

  • VGGT + TSDF
    74.08
  • DA3 + TSDF
    69.07
  • NOVA3R
    27.65
  • 2DGS
    62.03
  • RaDe-GS
    62.68
  • Gaussian Wrapping
    66.86
  • Surflo (no guid.)
    77.98
  • Surflo (guid.)
    87.97

Chamfer distance ↓

lower is better

  • VGGT + TSDF
    0.0270
  • DA3 + TSDF
    0.0875
  • NOVA3R
    0.0413
  • 2DGS
    0.0295
  • RaDe-GS
    0.0303
  • Gaussian Wrapping
    0.0259
  • Surflo (no guid.)
    0.0103
  • Surflo (guid.)
    0.0114

F1-score ↑

higher is better

  • VGGT + TSDF
    59.64
  • DA3 + TSDF
    52.61
  • NOVA3R
    32.13
  • 2DGS
    48.19
  • RaDe-GS
    48.85
  • Gaussian Wrapping
    55.64
  • Surflo (no guid.)
    76.50
  • Surflo (guid.)
    77.28

Chamfer distance ↓

lower is better

  • VGGT + TSDF
    0.0380
  • DA3 + TSDF
    0.0801
  • NOVA3R
    0.0307
  • 2DGS
    0.0394
  • RaDe-GS
    0.0393
  • Gaussian Wrapping
    0.0460
  • Surflo (no guid.)
    0.0242
  • Surflo (guid.)
    0.0240

F1-score ↑

higher is better

  • VGGT + TSDF
    28.93
  • DA3 + TSDF
    17.93
  • NOVA3R
    31.41
  • 2DGS
    28.78
  • RaDe-GS
    28.30
  • Gaussian Wrapping
    30.17
  • Surflo (no guid.)
    39.23
  • Surflo (guid.)
    42.05

Chamfer distance ↓

lower is better

  • VGGT + TSDF
    0.0226
  • DA3 + TSDF
    0.0220
  • NOVA3R
    0.0771
  • 2DGS
    0.0234
  • RaDe-GS
    0.0242
  • Gaussian Wrapping
    0.0123
  • Surflo (no guid.)
    0.0114
  • Surflo (guid.)
    0.0070

F1-score ↑

higher is better

  • VGGT + TSDF
    59.89
  • DA3 + TSDF
    60.95
  • NOVA3R
    27.41
  • 2DGS
    53.57
  • RaDe-GS
    53.52
  • Gaussian Wrapping
    62.96
  • Surflo (no guid.)
    61.20
  • Surflo (guid.)
    81.11

Surflo still performs better than the baselines when changing the number of input views. We vary the number of unposed views per scene from 2 to 32 and report the same Chamfer Distance (CD ↓) and F1-score (F1 ↑) on the OOD datasets Tanks & Temples and Mip-NeRF 360. Surflo leads at every view count, including the hard 2-view regime.

Varying input views

Chamfer distance ↓

lower is better

  • VGGT + TSDF
    0.1444
  • DA3 + TSDF
    0.1428
  • NOVA3R
    0.2620
  • 2DGS
    0.1453
  • RaDe-GS
    0.1454
  • Gaussian Wrapping
    0.1476
  • Surflo (no guid.)
    0.1345
  • Surflo (guid.)
    0.1416

F1-score ↑

higher is better

  • VGGT + TSDF
    6.83
  • DA3 + TSDF
    6.97
  • NOVA3R
    5.78
  • 2DGS
    6.09
  • RaDe-GS
    6.30
  • Gaussian Wrapping
    5.24
  • Surflo (no guid.)
    9.28
  • Surflo (guid.)
    7.08

Chamfer distance ↓

lower is better

  • VGGT + TSDF
    0.0285
  • DA3 + TSDF
    0.0293
  • NOVA3R
    0.0502
  • 2DGS
    0.0316
  • RaDe-GS
    0.0314
  • Gaussian Wrapping
    0.0313
  • Surflo (no guid.)
    0.0135
  • Surflo (guid.)
    0.0198

F1-score ↑

higher is better

  • VGGT + TSDF
    53.62
  • DA3 + TSDF
    47.90
  • NOVA3R
    30.29
  • 2DGS
    43.05
  • RaDe-GS
    42.60
  • Gaussian Wrapping
    45.28
  • Surflo (no guid.)
    75.07
  • Surflo (guid.)
    72.65

Chamfer distance ↓

lower is better

  • VGGT + TSDF
    0.0138
  • DA3 + TSDF
    0.0210
  • NOVA3R
    0.0557
  • 2DGS
    0.0187
  • RaDe-GS
    0.0191
  • Gaussian Wrapping
    0.0176
  • Surflo (no guid.)
    0.0059
  • Surflo (guid.)
    0.0061

F1-score ↑

higher is better

  • VGGT + TSDF
    70.85
  • DA3 + TSDF
    61.64
  • NOVA3R
    31.78
  • 2DGS
    55.95
  • RaDe-GS
    55.67
  • Gaussian Wrapping
    60.04
  • Surflo (no guid.)
    86.59
  • Surflo (guid.)
    86.25

Chamfer distance ↓

lower is better

  • VGGT + TSDF
    0.0094
  • DA3 + TSDF
    0.0140
  • NOVA3R
    0.0423
  • 2DGS
    0.0152
  • RaDe-GS
    0.0156
  • Gaussian Wrapping
    0.0133
  • Surflo (no guid.)
    0.0049
  • Surflo (guid.)
    0.0049

F1-score ↑

higher is better

  • VGGT + TSDF
    84.83
  • DA3 + TSDF
    74.59
  • NOVA3R
    35.54
  • 2DGS
    68.43
  • RaDe-GS
    66.24
  • Gaussian Wrapping
    72.10
  • Surflo (no guid.)
    90.76
  • Surflo (guid.)
    90.34

Chamfer distance ↓

lower is better

  • VGGT + TSDF
    0.0755
  • DA3 + TSDF
    0.0746
  • NOVA3R
    0.0795
  • 2DGS
    0.0754
  • RaDe-GS
    0.0756
  • Gaussian Wrapping
    0.0766
  • Surflo (no guid.)
    0.0714
  • Surflo (guid.)
    0.0736

F1-score ↑

higher is better

  • VGGT + TSDF
    9.63
  • DA3 + TSDF
    9.24
  • NOVA3R
    8.46
  • 2DGS
    8.73
  • RaDe-GS
    8.76
  • Gaussian Wrapping
    7.28
  • Surflo (no guid.)
    13.07
  • Surflo (guid.)
    10.40

Chamfer distance ↓

lower is better

  • VGGT + TSDF
    0.0326
  • DA3 + TSDF
    0.0322
  • NOVA3R
    0.0642
  • 2DGS
    0.0344
  • RaDe-GS
    0.0352
  • Gaussian Wrapping
    0.0339
  • Surflo (no guid.)
    0.0192
  • Surflo (guid.)
    0.0263

F1-score ↑

higher is better

  • VGGT + TSDF
    40.79
  • DA3 + TSDF
    40.31
  • NOVA3R
    21.41
  • 2DGS
    37.15
  • RaDe-GS
    37.11
  • Gaussian Wrapping
    41.61
  • Surflo (no guid.)
    58.68
  • Surflo (guid.)
    53.71

Chamfer distance ↓

lower is better

  • VGGT + TSDF
    0.0244
  • DA3 + TSDF
    0.0220
  • NOVA3R
    0.0492
  • 2DGS
    0.0283
  • RaDe-GS
    0.0293
  • Gaussian Wrapping
    0.0251
  • Surflo (no guid.)
    0.0127
  • Surflo (guid.)
    0.0145

F1-score ↑

higher is better

  • VGGT + TSDF
    55.29
  • DA3 + TSDF
    55.17
  • NOVA3R
    28.01
  • 2DGS
    47.45
  • RaDe-GS
    47.89
  • Gaussian Wrapping
    53.97
  • Surflo (no guid.)
    74.07
  • Surflo (guid.)
    73.44

Chamfer distance ↓

lower is better

  • VGGT + TSDF
    0.0161
  • DA3 + TSDF
    0.0162
  • NOVA3R
    0.0562
  • 2DGS
    0.0217
  • RaDe-GS
    0.0215
  • Gaussian Wrapping
    0.0162
  • Surflo (no guid.)
    0.0071
  • Surflo (guid.)
    0.0137

F1-score ↑

higher is better

  • VGGT + TSDF
    65.56
  • DA3 + TSDF
    64.75
  • NOVA3R
    18.95
  • 2DGS
    52.13
  • RaDe-GS
    53.08
  • Gaussian Wrapping
    62.41
  • Surflo (no guid.)
    81.24
  • Surflo (guid.)
    80.66
09

A new Dataset for Surface Reconstruction

Alongside Surflo we will release an augmented version of DL3DV where every scene ships with a full surface mesh covering both foreground and background geometry. Each mesh is computed with the state-of-the-art Gaussian Wrapping pipeline, giving the community a large, scene-level supervision signal for surface reconstruction.

~10.5K scenes indoor & outdoor foreground + background posed images, depth, mesh

Scene 01 — reference photo
Scene 01 — reference vs Gaussian Wrapping mesh
Scene 02 — reference photo
Scene 02 — reference vs Gaussian Wrapping mesh
Scene 03 — reference photo
Scene 03 — reference vs Gaussian Wrapping mesh
11

Citation

@article{guedon2026surflo,
  title       = {Surflo: Consistent 3D Surface Flow from a Global State},
  author      = {Gu{\'e}don, Antoine and Nakamura, Shu and Dufour, Nicolas
                  and Lei, Jiahui and Nishino, Ko and Kanazawa, Angjoo},
  journal     = {arXiv preprint},
  year        = {2026}
}