Group Abstract

Message Boards

WOLFRAM COMMUNITY

18.8K Views

3 Replies

2 Total Likes

View groups...

Follow this post

Share this post:

GROUPS:

Simple Math Problem Shows Massive Flaw In All Machine Learning Algorythims

David Johnston

David Johnston, Artificial Brilliance, Inc.

Posted 10 years ago

I am testing sequence prediction. If things happen in a specific sequence, what will the likely next step in the sequence be? Computational power should be included in machine learning algorithms but not even Google's "Prediction API" can solve this simple mathematical exercise. In fact, in my naivety, I reported it as a possible bug and they emailed me: As far as your example, the Prediction API will not learn that particular pattern because when trying to classify, it will choose from one of the output classes seen in the training set, so without instances of class "10th", it will not even know that such a class exists, does that make sense? I have used two simple data sets. Categorization: catData={ {"1st", "2nd", "3rd", "4th", "5th", "6th", "7th", "8th"}, {"2nd", "3rd", "4th", "5th", "6th", "7th", "8th", "9th"}, {"3rd", "4th", "5th", "6th", "7th", "8th", "9th", "10th"}, {"4th", "5th", "6th", "7th", "8th", "9th", "10th", "1st"}, {"5th", "6th", "7th", "8th", "9th", "10th", "1st", "2nd"}, {"6th", "7th", "8th", "9th", "10th", "1st", "2nd", "3rd"}, {"7th", "8th", "9th", "10th", "1st", "2nd", "3rd", "4th"}, {"8th", "9th", "10th", "1st", "2nd", "3rd", "4th", "5th"}, {"9th", "10th", "1st", "2nd", "3rd", "4th", "5th", "6th"} } c=Classify[catData->1]; Classification request: c[{"1st","2nd","3rd","4th","5th","6th","7th"}] I have purposely left out a sequence pattern starting with "10th" to see if the model can predict it given the right data sample. Expected Result: Output="10th" Unexpected Result: Output="1st" I also attempted this with a regression model. Regression: predictData={ {1,2,3,4,5,6,7,8}, {2,3,4,5,6,7,8,9}, {3,4,5,6,7,8,9,10}, {4,5,6,7,8,9,10,1}, {5,6,7,8,9,10,1,2}, {6,7,8,9,10,1,2,3}, {7,8,9,10,1,2,3,4}, {8,9,10,1,2,3,4,5}, {9,10,1,2,3,4,5,6} } p=Predict[predictData->1]; Prediction request: p[{1,2,3,4,5,6,7}] Expected Result: Output=10 Unexpected Result: Output=6 This represents a real world scenario where I am trying to pick the best option to present to a user based on their past behavior. So the next item is unknown because no user has chosen it before and the classification label hasn't been added to the training yet because it's crowd generated dynamically. However, we don't want to just give them any random option. We want to give them something highly relevant during the first interaction with the system. The whole premise of "prediction" seems to imply the ability to predict a categorical label based on other example labels. In this data set, it should be easy to identify the pattern and auto generate 10th as a category label. In this scenario, mathematics is all that is needed to determine the next item in a sequence. I guess if we have tons of sequences not all in perfect order but close, machine learning algorithms becomes much more useful. If there is already a known solution to this issue, that would be amazing to know. Forgive me for not being trained/educated enough to question why this isn't already built into all machine learning algorithms already. I suppose someone could use transpose to create a model out of the columns in the dataset in addition to the rows. However, in my view, anything related to permutations, shifting, changing, organizing, reordering, etc. of the data to find better predictions should be natively built into the algorithms. Even auto generating new columns/features based on patterns, sequences, math equations, etc. that can be derived from the existing data should be automated. None of these things require human input and are mathematical in nature. I bring this to the Wolfram community because the WL is uniquely positioned to tap into knowledge and computation in order to create drastically superior machine learning functions. If everything can be symbolic, everything can be computed.

POSTED BY: David Johnston

3 Replies

Sort By:

David Johnston

David Johnston, Artificial Brilliance, Inc.

Posted 10 years ago

POSTED BY: David Johnston

Sean Clarke

Sean Clarke, Wolfram Research

Posted 10 years ago

Google's response is spot on. Fundamentally, it cannot return "10" if "10" isn't even a category that it was trained with. Classify is basically a set of supervised machine learning algorithms. Looking at your input data as a human being, I would (understanding the input to mean what it means to the Classify) return 1 as well. If you want to understand what Classify is doing, you can see what methods it chose using the ClassifierInformation function. If you do, you will see that Classify is using a series of markov chains, and that's not going to do what you want. You are asking for an algorithm that can robustly handle recognizing sequences with "anything related to permutations, shifting, changing, organizing, reordering, etc. of the data". I would encourage you to take a look into different machine learning algorithms, both supervised and unsupervised and see what kinds of properties they have. What you are asking for here, is not reasonable to expect from a generic ML algorithm. If you've discovered a "Massive Flaw in All Machine Learning Algorythims", it is that they less magical and intelligent than they often appear at first. Classify is by far the easiest to use ML tool I've ever seen, but if you want to push it beyond its conventional uses, you must understand how ML algorithms work and what their limitations are.

POSTED BY: Sean Clarke

David Johnston

David Johnston, Artificial Brilliance, Inc.

Posted 10 years ago

Found something that works. Not sure if it can be applied in other more general situations yet. Basically, if the desired label appears anywhere in the data samples, it can find it. Basically, you just create every combination possible. So far it only works with the Classify. Using the numerical data set and Predict it never gets the right answer. subs = Union[Select[ArrayFlatten[Subsets[#] & /@ Transpose[catData], 1], Length[#] > 0 &]]; c = Classify[subs -> 1, PerformanceGoal -> "Quality"] c[{"1st", "2nd", "3rd", "4th", "5th", "6th", "7th"}] Out="10th" I am sure this would drastically slow down most processes. With this tiny data set, it ballooned up to 2240 samples. I suppose it could be trimmed down by changing the Select length to something like 50% of the average sample length or something like that. subs = Union[Select[ArrayFlatten[Subsets[#] & /@ catData, 1], Length[#] > Total[Length[#] & /@ catData]/Length[catData]*.5 &]]; This trims it down to 598 variations, which is still an almost 600% bloat. I did notice it fails if your length Select gets close to the same length as the samples. The average length is 8 but if you limit Select to 6, 7 or 8 it will fail to find the right answer. This doesn't really accomplish my goal of 3D predictions. If there are patterns in the columns there should be a way to include that in an algorithm. Limiting the algo to rows only may be good for some things but it's not good for what I believe is called semi-unsupervised learning. In a way using Transpose and Subsets is a short cut to getting the right answer. I wonder if there is a more mathematically sound way to do it that could be generally applied to any sequential data set.

POSTED BY: David Johnston

Reply to this discussion

Reply Preview

Attachments

Remove Add a file to this post

Follow this discussion

or Discard

Feedback