Group Abstract Group Abstract

Message Boards Message Boards

0
|
4.5K Views
|
4 Replies
|
0 Total Likes
View groups...
Share
Share this post:

Issue in the classification optimization algorithm for large data sets?

I believe that I encountered a bug in Mathematica 12.

The Classify[] function throws errors when simultaneously:

  1. The training set has significantly above $10^5$ examples.
  2. Method->"_" option is absent, i.e. the procedure for searching for the optimal algorithm is active.

(When any of these conditions are changed, the classifier training proceeds correctly.)

The problem depends on the Mathematica version. (I will elaborate this idea.)

The following errors appear together (repeated several times):

a)

NetTrain :: encgenfail2: Could not encode one or more inputs for "Output" port: supplied data was a length-64 vector of real numbers, but expected a class. The invalid inputs had indices {158629, ..., << 14 >>}

b)

LibraryFunction :: typerr: An error occurred in the tree_evaluation.

c)

Part :: pkspec1: The expression -LibraryFunctionError [LIBRARYTYPEERROR, 1] can not be used as a part specification.

and subsequent errors related to list and iteration indices and function domains.

The function in question returns a working classifier, but it takes a lot of time and sometimes it is obtained by a non-optimal method and exhibits a non-optimal performance.

It seems that for large (but not very large!) data sets, the optimization procedure of the classification method fails.

Performing the classification with specified methods and selecting the best is not a satisfactory solution, among others because each method has its own variants and they have some meta- parameters that are optimized. I do not know if in case of specified method, optimization within its variants and meta-parameters is done, or their default values ??are used. On smaller data sets, where both approaches work, you can notice a worse performance of classification with a certain method, even if it is the one that the automatic search finds the best.

In version 11.3 this error did not show up. I do not know, however, whether it was absent or simply invisible, because it seemed to me that the performance of the classification was (with large sets) insufficient and the method choice surprising.

Does anyone of you have an idea how to force it to work properly or is there any hope for a patch?

To be really precise, I attach links to the notebook and the database:

POSTED BY: Karol Lawniczak
4 Replies

The same topic is present on Mathematica StackExchange: link.

POSTED BY: Karol Lawniczak
POSTED BY: Karol Lawniczak

Without the actual network, the audience here can only guess. Can you post the code etc?

POSTED BY: Sander Huisman
POSTED BY: Karol Lawniczak
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard