Group Abstract Group Abstract

Message Boards Message Boards

Mathematica code to extract tabulated data after conversion from pdf to text

4 Replies
Posted 1 year ago
POSTED BY: Eric Rimbey
Posted 1 year ago

There is a lot of extraneous code in your question. Could you pare this down to just the essentials? And I'm not convinced that you need Classify for this, because you seem to just need to look for specific labels. Also, you seem to be using an external too to convert pdf to text, but Mathematica can do that for you, so is there some reason why you need that tool?

I don't think you're going to get a definitive answer to this. I think this is one of those things where you need to keep adding test cases (and the code to handle them) as you encounter new invoice formats. But frankly, that seems easier than what you're trying to do now (if I even understand it).

POSTED BY: Eric Rimbey
Posted 1 year ago

Can you provide a couple sample PDFs?

POSTED BY: Eric Rimbey
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard