We present an implementation of the hierarchical tree algorithm on the individual timestep algorithm (the Hermite scheme) for collisional $N$-body simulations, running on GRAPE-9 system, a special-purpose hardware accelerator for gravitational many-body simulations. Such combination of the tree algorithm and the individual timestep algorithm was not easy on the previous GRAPE system mainly because its memory addressing scheme was limited only to sequential access to a full set of particle data. The present GRAPE-9 system has an indirect memory addressing unit and a particle memory large enough to store all particles data and also tree nodes data. The indirect memory addressing unit stores interaction lists for the tree algorithm, which is constructed on host computer, and, according to the interaction lists, force pipelines calculate only the interactions necessary. In our implementation, the interaction calculations are significantly reduced compared to direct $N^2$ summation in the original Hermite scheme. For example, we can archive about a factor 30 of speedup (equivalent to about 17 teraflops) against the Hermite scheme for a simulation of $N=10^6$ system, using hardware of a peak speed of 0.6 teraflops for the Hermite scheme.
↧