Hello, I upgraded to Mathematica 11 because of the Machine learning. My question is probably 10 years too early but I can ask. :-)
I am in no way a mathematician and nor do I want to become one, but as a developer I encounter an issue with trace files that might be resolved by machine learning.
I need to parse trace files, pure text files. At this moment I do it brute force, using my own brain to find patterns and the develop code to extract these patterns. The thing is that I already resolved some 150ish patterns in one single trace file so it would be nice if I could have some help of machine learning.
The complexity of these trace files would be equivalent of taking 23 different books (I counted the potential pattern sources) and mix the paragraphs in random order to produced that one big giant book. I need to be able to dentangled the books (actually filter out the noise). But I cannot detangle when I don't know what the contents of the books look like or even don't know how many books it consists of.
The thing is that the trace files are not not intended to be parsed by computer or human. They probably were not even intended to be read So the start and endings of blocks are a bit unpredictable. My human brain knows that I have left the previous pattern because I suddenly discover a new block indicating that I probably missed a previous exit.
The issue that I have problem is that there are multi line blocks, so I must identify the starting lines and ending lines, but I can only do that when I know what the contents is actually about.
Could deep learning do this and how would you do this?
I do have some ideas.
Convert the text to pixels and make it a bitmap. These color coded lines could be used to identify regions. Then in a second stage go through every region to analyze if that was indeed that type of block. If not go back and try again with a different region.
Try to find starting blocks first. Then take the lines out of that and in a second stage confirm that these blocks are indeed complete blocks and not that are composed of even more blocks I did not know about yet.
Before trying to parse the lines, remove the noise lines first (like watchdog lines that pops up in the middle of the to be parsed blocks)
The end goal is not to have a perfect parser but some thing that can aid me in developing code or find anomalies in trace files that I normal would never be able to find.
Many thanks.