GPU-based Branchless Distance-Driven Projection and Backprojection.
Author | |
---|---|
Abstract |
:
Projection and backprojection operations are essential in a variety of image reconstruction and physical correction algorithms in CT. The distance-driven (DD) projection and backprojection are widely used for their highly sequential memory access pattern and low arithmetic cost. However, a typical DD implementation has an inner loop that adjusts the calculation depending on the relative position between voxel and detector cell boundaries. The irregularity of the branch behavior makes it inefficient to be implemented on massively parallel computing devices such as graphics processing units (GPUs). Such irregular branch behaviors can be eliminated by factorizing the DD operation as three branchless steps: integration, linear interpolation, and differentiation, all of which are highly amenable to massive vectorization. In this paper, we implement and evaluate a highly parallel branchless DD algorithm for 3D cone beam CT. The algorithm utilizes the texture memory and hardware interpolation on GPUs to achieve fast computational speed. The developed branchless DD algorithm achieved 137-fold speedup for forward projection and 188-fold speedup for backprojection relative to a single-thread CPU implementation. Compared with a state-of-the-art 32-thread CPU implementation, the proposed branchless DD achieved 8-fold acceleration for forward projection and 10-fold acceleration for backprojection. GPU based branchless DD method was evaluated by iterative reconstruction algorithms with both simulation and real datasets. It obtained visually identical images as the CPU reference algorithm. |
Year of Publication |
:
2017
|
Journal |
:
IEEE transactions on computational imaging
|
Volume |
:
3
|
Issue |
:
4
|
Number of Pages |
:
617-632
|
DOI |
:
10.1109/TCI.2017.2675705
|
Short Title |
:
IEEE Trans Comput Imaging
|
Download citation |