Message Boards Message Boards

Simple Math Problem Shows Massive Flaw In All Machine Learning Algorythims

POSTED BY: David Johnston
3 Replies
POSTED BY: David Johnston
POSTED BY: Sean Clarke

Found something that works. Not sure if it can be applied in other more general situations yet. Basically, if the desired label appears anywhere in the data samples, it can find it. Basically, you just create every combination possible.

So far it only works with the Classify. Using the numerical data set and Predict it never gets the right answer.

subs = Union[Select[ArrayFlatten[Subsets[#] & /@ Transpose[catData], 1], Length[#] > 0 &]];

c = Classify[subs -> 1, PerformanceGoal -> "Quality"]

c[{"1st", "2nd", "3rd", "4th", "5th", "6th", "7th"}]

Out="10th"

I am sure this would drastically slow down most processes. With this tiny data set, it ballooned up to 2240 samples. I suppose it could be trimmed down by changing the Select length to something like 50% of the average sample length or something like that.

subs = Union[Select[ArrayFlatten[Subsets[#] & /@ catData, 1], Length[#] > Total[Length[#] & /@ catData]/Length[catData]*.5 &]];

This trims it down to 598 variations, which is still an almost 600% bloat.

I did notice it fails if your length Select gets close to the same length as the samples. The average length is 8 but if you limit Select to 6, 7 or 8 it will fail to find the right answer.

This doesn't really accomplish my goal of 3D predictions. If there are patterns in the columns there should be a way to include that in an algorithm. Limiting the algo to rows only may be good for some things but it's not good for what I believe is called semi-unsupervised learning.

In a way using Transpose and Subsets is a short cut to getting the right answer. I wonder if there is a more mathematically sound way to do it that could be generally applied to any sequential data set.

POSTED BY: David Johnston
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard

Group Abstract Group Abstract