Here's another order of magnitude, achieved by simply replacing Floor with Quotient, which will allow Tally to operate on an integer list.
In[1]:= d = RandomReal[1, 5*^6];
In[2]:= BinCounts[d, {0, 1, 0.1}] // AbsoluteTiming
Out[2]= {4.690474, {500670, 499971, 498900, 499459, 500789, 501261, 499914, 499137, 499619, 500280}}
In[3]:= SortBy[Tally@Floor[d, 0.1], First][[All, 2]] // AbsoluteTiming
Out[3]= {0.717640, {500670, 499971, 498900, 499459, 500789, 501261, 499914, 499137, 499619, 500280}}
In[4]:= SortBy[Tally@Quotient[d, 0.1], First][[All, 2]] // AbsoluteTiming
Out[4]= {0.053532, {500670, 499971, 498900, 499459, 500789, 501261, 499914, 499137, 499619, 500280}}
Because of what Tally (probably) does internally, this is not the most algorithmically efficient version, but it's the fastest I could come up with
in Mathematica. For few bins it's faster than what I managed with Compile.
Anyway, just the fact that the compiled version is so much faster than BinCounts or HistogramList shows that BinCounts could (should?) be optimized more than it is currently, at least for packed arrays and equal width bins. As a user I wouldn't want to rewrite these types of functions for speed.