At the time when StepMonitor is invoked, there is no solution available to differentiate yet. There is just a discrete grid of points
$t_1, t_2, \ldots$ for each time step and corresponding approximate solution values
$y_1, y_2, \ldots$ . The plot works because the solution has already been built up as an interpolating function, and Derivative knows what to do with these.
Of course, if solving an equation of the form
$y'(t) = f(t, y)$, then why not just use Sow[f[t, y[t]] to effectively sow the derivative values at each step?