We describe a simple and fast algorithm for identifying friends-of-friends clusters and prove its correctness. The algorithm avoids unnecessary expensive neighbor queries, uses minimal memory overhead, and rejects slowdown in high over-density regions. We define our algorithm formally based on pair enumeration, a problem that has been heavily studied in fast 2-point correlation codes and our reference implementation employs a dual KD-tree correlation function code. We construct halos in a hierarchical merger tree, and use a splay operation to reduce the average cost of identifying the root of a cluster from $O[\log L]$ to $O[1]$ ($L$ is the size of a cluster) without additional memory costs. This reduces the overall time complexity of merging trees form $O[L\log L]$ to $O[L]$, reducing the number of operations per by orders of magnitude. We next introduce a pruning operation that skips pair enumeration between two fully self-connected KD-tree nodes. This improves the robustness of the algorithm, reducing cost of exploring to high density peaks from $O[\delta^2]$ to $O[\delta]$. We show that for cosmological data set the algorithm eliminates more than half of enumerations for typically used linking lengths $b \sim 0.2$, and empirically scales as $O[\log b]$ at large $b$ (linking length) limit. Furthermore, our algorithm is extremely simple and easy to implement on top of an existing pair enumeration code, reusing the optimization effort that has been invested in fast correlation function codes.
↧