🌲 SelectAnyTree

A Promptable Instance Segmentation Model for
3D Forest LiDAR Point Clouds

1Nagoya University 2RIKEN 3University of Freiburg 4UCLA 5University of Twente 6KC Machine Learning Lab 7HUST 8Ritsumeikan University
SelectAnyTree teaser — scene overview with three selected trees, top-down and side view

SelectAnyTree segments any individual tree in a 3D forest LiDAR scan with a user's click. Left: full scene overview with three highlighted trees. Right: zoom panels showing predicted masks. The scene is encoded once; each subsequent click is a cheap query.

Abstract

Automated instance segmentation of forest LiDAR point clouds is increasingly critical as forest monitoring moves toward scalable, detailed, 3D measurement. Yet progress is constrained by label scarcity for tree instances: a single hectare can hold millions of points and hundreds of overlapping, complex crowns, making manual annotation laborious and error-prone.

Inspired by the promptable paradigm of foundation segmentation models, we propose SelectAnyTree, a promptable instance segmentation model that delineates any individual tree in a 3D forest point cloud from a few clicks. SelectAnyTree introduces two key components: a click-to-query prompt encoder and a Canopy Height Model (CHM)-guided first prompt. The former turns each click into a single content query, encoding its 3D position and positive/negative polarity together with a pooled local backbone feature. The CHM provides treetops as a geometry- and ecologically-guided first prompt without any user input.

We evaluate SelectAnyTree across seven diverse forest regions and an independent held-out test dataset, demonstrating strong generalization beyond the training domains. It segments a target tree to 78.2 IoU from a single click — 24.8 points above the strongest promptable baseline — and reaches every accuracy target with the fewest clicks, while using far fewer parameters and less inference time than prior promptable models.

Demo

Pre-computed from the test set. Each frame shows top-down view (left) and side view (right).

Positive click Negative click CHM treetop True positive False positive False negative

360° view — select any tree, click by click

Left: the whole plot with every tree segmented. Right: each tree, rotating, as prompts accumulate from 1 to 5 clicks.

SelectAnyTree — 360° rotating view of the scene and each segmented tree as clicks accumulate

Norway (NIBIO)

SelectAnyTree — Norway NIBIO, click-by-click segmentation

Australia (BlueCat)

SelectAnyTree — Australia BlueCat, click-by-click segmentation

Czech Republic (CULS)

SelectAnyTree — Czech Republic CULS, click-by-click segmentation

New Zealand (SCION)

SelectAnyTree — New Zealand SCION, click-by-click segmentation

Point-SAM vs. SelectAnyTree

Same scene, same click budget.

Point-SAM

Point-SAM comparison

SelectAnyTree Ours

SelectAnyTree comparison

Method

SelectAnyTree turns a structure-aware forest backbone into a promptable instance segmentation model through four stages:

  1. Scene encoding (once). The point cloud is voxelized and processed by a sparse MambaForest encoder to produce per-voxel features — cached and reused for all subsequent clicks.
  2. Prompt encoder. Each click set is mapped to a single content query via cylinder-pooled backbone features, random-Fourier positional encoding, and signed aggregation of click polarities.
  3. State-space query decoder. The query is decoded into a voxel-resolution binary mask using a Mamba-based decoder that captures long-range context in linear time — essential for large-scale forest scenes.
  4. CHM-guided first prompt. Canopy Height Model treetop positions provide geometry-guided "free" first prompts, bridging fully-automatic and interactive segmentation.
SelectAnyTree architecture

SelectAnyTree architecture. Scene encoding is performed once per plot; prompt encoding and decoding are repeated per click set.

Results

78.2 IoU @ 1 click
+24.8 pts above best baseline
#1 fewest clicks at every IoU target
19.4M params — 16× smaller than Point-SAM

Interactive Segmentation

FOR-instanceV2 test set (in-distribution) and LAUTx (cross-dataset generalization, held-out). Bold = best, underline = second best.

Interactive segmentation comparison: IoU@Clicks and NoC@IoU on FOR-instanceV2 and LAUTx

Model Size & Inference Efficiency

GPU wall-clock time (ms) per scene for a k-click session on the FOR-instanceV2 test set.

Model size and inference efficiency comparison

Qualitative Results

Single-click segmentation compared against promptable baselines on the same target tree.

Single-click segmentation qualitative comparison against baselines

BibTeX

@article{nguyen2026selectanytree,
  title   = {SelectAnyTree: A Promptable Instance Segmentation Model for 3D Forest {LiDAR} Point Clouds},
  author  = {Nguyen, Trung Thanh and Lusk, Daniel and Gerberding, Kilian and Vajna-Jehle, Janusch and Vu, Tuan-Anh and Le, Duc Viet and Vo, Tu and Nguyen, Phi Le and Kawanishi, Yasutomo and Komamizu, Takahiro and Ide, Ichiro and Frey, Julian and Kattenborn, Teja},
  journal = {arXiv preprint arXiv:2606.27491},
  year    = {2026}
}