The VideoCore IV doesn't have any OpenCL support, to my knowledge. So if someone were to write accelerated BLAS libraries for it, they would be low-level and completely specific to that GPU (if they were really beneficial at all).
Honestly, I think we would be happy to just get the VideoCore doing what it was designed to do for our software as opposed to making it do something it was never intended to do. We also experimented with performance tuned BLAS libraries for the ARM CPU, but the Raspberry Pi distribution's use of ARMv6 [Raspberry Pi 1] software for their ARMv7 [Raspberry Pi 2] and ARMv8 [Raspberry Pi 3] devices makes that very difficult. Libraries like OpenBLAS don't do the same sort of runtime CPU detection that things like Intel MKL do and the hand-tuned assembly for ARMv6 will often just crash on ARMv7 (and have missing operations if you target ARMv7 and attempt to run it on ARMv6).
I would definitely be interested if anyone did create such libraries, though. Of course a tremendous amount of time and validation from vendors and the community would be required to get them up the quality where they'll be accurate enough in all scenarios we rely on them. For now, I'm having to use very basic CBLAS because it's the best way to deliver software that runs on all generations of the Raspberry Pi and is relatively accurate.