Hi Dustin,
in principle, Henrik's tip is right on target. We implemented the commands ImageFileApply, ImageFileFilter, and ImageFileScan to allow for out-of-core image processing. The limiting factor for you may be the small set of supported image file types (e.g. TIFF).
Hence, the first idea would be to convert your data to one of the supported file types and then use the out-of-core capability of Mathematica.
If that does not work, the second approach would be an InputStream with which you could read and partition the incoming data into frames and process them one at a time. I believe 8k frame should easily fit into your memory. Mathematica has superb pattern matching capabilities to facilitate the parsing of any (binary) file format at hand.
Last but not least, I like to mention, that Mathematica has some built-in CUDA / OpenCL functions, such as CUDAFourier, CUDAImageConvolve, etc.. Hence, it is no alway necessary to link to you own CUDA / OpenCL kernels.
For further information, please look up the ref-pages of the functions mentioned above.
Regards, Markus