← Carl Osborne

GPU N-Body Simulation

Real-time astronomy simulation with a GPU Barnes-Hut solver, parallel radix sort, and relativistic rendering.

2025 Swift · Metal github.com/osbo/astro

Astro: High-Performance Barnes-Hut N-Body Simulation

License: MIT

A sophisticated N-body gravitational simulation written in Swift and Metal, featuring advanced GPU optimizations, custom radix sorting, and optimized octree data structures. This project demonstrates high-performance computing techniques optimized for Apple Silicon Macs, achieving real-time simulation of 500,000+ particles through algorithmic optimization rather than brute force.

https://github.com/user-attachments/assets/08e9cf07-12ed-4a69-9199-036abba64c4d

Performance Benchmarks

Metric Value Context
Particle Count 500,000+ Real-time simulation
Frame Rate 120 FPS Target performance
Algorithm Complexity O(n log n) Barnes-Hut optimization
GPU Memory Efficiency Contiguous storage Zero pointer overhead
Spatial Resolution 21 bits per axis Morton code optimization

Core Technical Innovations

Layer-by-Layer Octree Storage

The octree implementation uses a storage pattern where each layer is stored sequentially from leaves to root, with collision space pre-allocated at the end of each layer. Layer offsets and sizes are mathematically pre-calculated based on maximum particle count, allowing each layer to determine the next layer's starting position without pointer traversal.

This design eliminates pointer-based tree structures, keeping all octree data in contiguous memory regions that maximize GPU residency and enable efficient parallel processing across the entire tree structure.

Custom GPU Radix Sort

The project implements a complete GPU radix sort from scratch, providing full control over the sorting process without relying on Metal Performance Shaders. This custom implementation features:

This custom implementation was necessary because Metal Performance Shaders don't provide the level of control needed for the specific Morton code sorting requirements of the Barnes-Hut algorithm.

Morton Code Optimization

The Morton code implementation supports up to 21 bits per spatial direction (63 total bits), providing extremely fine spatial resolution. Testing with particle counts from 10,000 to 500,000 revealed that using only 7 layers (21 bits total) provides the optimal balance between spatial partitioning and individual particle isolation.

At this depth, most individual particles maintain their own leaf nodes, ensuring maximum accuracy in the Barnes-Hut approximation while minimizing computational overhead. This optimization was determined through empirical testing rather than theoretical analysis.

Implementation Details

Barnes-Hut Force Calculation

The force calculation kernel implements a stack-based traversal algorithm for optimal performance. Each particle traverses the octree using a depth-first approach, maintaining a stack of nodes to visit. The algorithm applies the theta criterion (θ = 0.7) to determine when to approximate distant particle groups as single masses.

Key optimizations include:

GPU-Optimized Data Structures

All data structures are designed for optimal GPU caching and memory access patterns:

Engineering Trade-offs: Performance vs. Visual Fidelity

This project demonstrates deliberate engineering decision-making by prioritizing simulation scale over visual effects. The current version focuses entirely on core simulation performance, achieving 500,000+ particles through algorithmic optimization.

Previous Version: Visual Fidelity Focus

The previous version prioritized visual fidelity and advanced graphics techniques, showcasing sophisticated rendering capabilities:

https://github.com/user-attachments/assets/249034c1-7b04-4f1e-8b72-a393e5383a83

Advanced Gravitational Effects:

Comprehensive Post-Processing Pipeline:

Current Version: Performance Focus

The current version represents a deliberate engineering trade-off, removing advanced graphics features to achieve unprecedented simulation scale. This demonstrates:

Current Configuration

The simulation is currently configured for maximum particle count:

The configuration can be adjusted in Renderer.swift to optimize for different performance targets.

Summary

This project demonstrates expertise in:

Technical Skills

Engineering Judgment

The combination of algorithmic optimization, GPU optimization, and performance engineering makes this project a compelling demonstration of advanced software engineering skills in high-performance computing and graphics programming.


This project represents a comprehensive implementation of high-performance computing techniques, demonstrating expertise in GPU programming, algorithm optimization, and real-time graphics programming.