I had a need/desire to figure out how the DistributionSmoothing option in Classify really worked when the Method used was NearestNeighbors. After some poking around, I believe I figured it out. Here's conceptually what it does; I suspect it uses code that is several orders of magnitude more efficient. My code is more explanatory purposes only.

Get a list of the classes -- the right answers -- of the training set. Create another "augmentation list" that has one member of each possible class in the training set. Put these two lists together and find their distribution. Call this the "augmented global distribution."

Now get the distribution of the decisions your classifier makes about the neighbors. Call this the "local distribution."

Form a mixture distribution of the local distribution and the augmented global distribution. The weights on the distribution are 1 for the local distribution and s/n for the augmented global distribution where s is the distribution smoothing parameter and n is the number of neighbors you are using.

The idea is that as the distribution smoothing parameter gets bigger, the particular distribution of neighbors in the locality matters less and less and the global distribution matters more. I assume one augments the list to avoid problems (maybe division by zero?) that might occur if the global list had just one class in it.

Here's a simple example.

data = {1 -> True, 2 -> True, 3 -> True, 4 -> False, 5 -> True,
6 -> True, 7 -> False, 8 -> False, 9 -> True};

Now form the augmented distribution and create the augmentedGlobalDistribution using EmpiricalDistribution

augmentedList = Join[Last /@ data, {True, False}]
augmentedGlobalDistribution=EmpiricalDistribution[augmentedList]

Form the local distribution. I'll arbitrarily select person 5 as the individual of interest.

subject=5;
neighbors=3;
nf=Nearest[data];
localDistribution = EmpiricalDistribution[nf[subject, neighbors]]

Now let's form the mixture distribution of the augmented global distribution and the local distribution. I do so by creating a function in which the argument ds represents the data smoothing parameter.

predictedProbability[ds_] :=
MixtureDistribution[{1, ds/neighbors}, {localDistribution,
augmentedGlobalDistribution}]

We now use that to predict the probability that our subject (person 5) will be True when I use distribution smoothing parameters of 0, 1, 2 and 4.

Map[PDF[predictedProbability[#],2]&,{0,1,2,4}]

The answer is

{0.666667, 0.659091, 0.654545, 0.649351}

Now we see what happens when we use Classify with the same distribution smoothing parameters

Map[Classify[data,
Method -> {"NearestNeighbors", "NeighborsNumber" -> neighbors,
"DistributionSmoothing" -> #},
TrainingProgressReporting -> None][subject, "Probabilities"][
True] &, {0, 1, 2, 4}]

We get the same answers! So, is this proof that I have successfully reverse engineered the DistributionSmoothing option? No. But am I confident in my work? Yes. The algorithm makes sense.