I've been tasked with query intent classification via non-probabilistic context free grammars. Queries are in english and have some breadth to them - each can match a maximum of 2 / 20 + potential classifications - you might have already guessed chat bot. I am trying, in vain, to generate rules using the intersect (substring matching) of triangular matrix forms that may or may not use well formed strings. You may look here: http://www.nltk.org/book/ch08.html @ sub-chapter 4.4 'Well-Formed Substring Tables' for the unit approach (of which my code is a dupe). I am looking for ideas on how to do this efficiently w/ alternatives if any.
First time poster and new to the NLP game.
Did you ever get the methods and information to accomplish this? I'm seeing an un-answered question two years later.