Message Boards

WOLFRAM COMMUNITY

3360 Views

4 Replies

0 Total Likes

View groups...

Follow this post

Share this post:

GROUPS:

Data Science Wolfram Language Machine Learning

Issue in the classification optimization algorithm for large data sets?

Karol Lawniczak

Karol Lawniczak, University of Lodz

Posted 5 years ago

I believe that I encountered a bug in Mathematica 12. The `Classify[]` function throws errors when simultaneously: The training set has significantly above $10^5$ examples. `Method->"_"` option is absent, i.e. the procedure for searching for the optimal algorithm is active. (When any of these conditions are changed, the classifier training proceeds correctly.) The problem depends on the Mathematica version. (I will elaborate this idea.) The following errors appear together (repeated several times): a) NetTrain :: encgenfail2: Could not encode one or more inputs for "Output" port: supplied data was a length-64 vector of real numbers, but expected a class. The invalid inputs had indices {158629, ..., << 14 >>} b) LibraryFunction :: typerr: An error occurred in the tree_evaluation. c) Part :: pkspec1: The expression -LibraryFunctionError [LIBRARYTYPEERROR, 1] can not be used as a part specification. and subsequent errors related to list and iteration indices and function domains. The function in question returns a working classifier, but it takes a lot of time and sometimes it is obtained by a non-optimal method and exhibits a non-optimal performance. It seems that for large (but not very large!) data sets, the optimization procedure of the classification method fails. Performing the classification with specified methods and selecting the best is not a satisfactory solution, among others because each method has its own variants and they have some meta- parameters that are optimized. I do not know if in case of specified method, optimization within its variants and meta-parameters is done, or their default values ??are used. On smaller data sets, where both approaches work, you can notice a worse performance of classification with a certain method, even if it is the one that the automatic search finds the best. In version 11.3 this error did not show up. I do not know, however, whether it was absent or simply invisible, because it seemed to me that the performance of the classification was (with large sets) insufficient and the method choice surprising. Does anyone of you have an idea how to force it to work properly or is there any hope for a patch? To be really precise, I attach links to the notebook and the database: data notebook

POSTED BY: Karol Lawniczak

4 Replies

Sort By:

Karol Lawniczak

Karol Lawniczak, University of Lodz

Posted 5 years ago

The same topic is present on Mathematica StackExchange: link.

POSTED BY: Karol Lawniczak

Karol Lawniczak

Karol Lawniczak, University of Lodz

Posted 5 years ago

To be really precise, I attach links to the notebook and the database in the edited version of the question.

POSTED BY: Karol Lawniczak

Sander Huisman

Sander Huisman, University of Twente

Posted 5 years ago

Without the actual network, the audience here can only guess. Can you post the code etc?

POSTED BY: Sander Huisman

Karol Lawniczak

Karol Lawniczak, University of Lodz

Posted 5 years ago

The code is really simple: classify = Classify[trainingset] The problem is in the data set, or rather in its size, becouse its structure is rather ordinary and simple: trainingset={{1.1397,"abc",5.76211,26.7396}->"A",{3.21085,"klm",47.1485,17.5633}->"C",{2.57019,"xyz",59.5656,13.73}->"A",...,{1.04451,"klm",13.9758,1.44347}->"B"}}; where the length of the list is of the order of $n=10^6$. When I subsample the trainingset to the lenght of $n'=10^5$, for example by `classify=Classify[RandomSample[trainingset,10^5]]`, the problem disappear. The problem disappear also when I specify a method by `classify=Classify[trainingset,Method->"DecisionTree"]` (or with another method). Errors in this place did not appear in Mathematica version 11.3. There is nothing more to be specified about the code. The error messages I cited above.

POSTED BY: Karol Lawniczak

Reply to this discussion

Reply Preview

Attachments

Remove Add a file to this post

Follow this discussion

or Discard

Group Abstract

Feedback