Group Abstract Group Abstract

Message Boards Message Boards

Classifying Japanese characters from the Edo period

POSTED BY: Marco Thiel
10 Replies
POSTED BY: Daniel Lichtblau
POSTED BY: Marco Thiel
POSTED BY: Daniel Lichtblau
POSTED BY: Marco Thiel

With what seems to be the best tuning I could manage it gets 98% on MNIST. In contrast the best methods, which I believe are NN-based, hit around 99.7% correct if memory serves. There is a related set from the US Postal Service that is somewhat more challenging, with best methods "only" getting around 98% I think.

I confess I may have borrowed a brain for that particular bit of work.

POSTED BY: Daniel Lichtblau
POSTED BY: Vitaliy Kaurov
POSTED BY: Marco Thiel
POSTED BY: EDITORIAL BOARD

It'll take me a while to download the dataset and check, but it looks like there's a lot of hentaigana. I can only recognize some of them. https://en.wikipedia.org/wiki/Hentaigana

To summarize, in older Japanese there are a lot of possible variant characters that can be used to represent the same sound. So your learning task is made much harder because of these. If you could sort them out and learn the variant characters, you'd probably get even better results.

POSTED BY: Sean Clarke
POSTED BY: Marco Thiel
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard