Only cuz I hope it will be of interest to someone:
I work at a dental company where we do lots of 3D geometry pattern recognition. I did this with the best of them for a few years before learning that the right approach is often through some kind of
model fitting process. Recently a colleague developed a "model" for the entire pair of upper/lower arch of teeth, by doing principal component analysis on many examples of upper/lower arches. Using this model to fit new examples was a totally different exercise. Not like what you do in calculus, where the best fit is calculated, but rather something more like an exhaustive search through the possibilities. Nevertheless, the results are very effective.
I had similar success measuring the profiles of teeth and coming up with 2D polygon models, to fit to those profiles. In my case I could measure widths and heights of features in the profiles, and use these numbers to parameterize the 2D models. This gave a shortcut to finding a best model and a best fit. Before I developed a model based approach I spent a month or so trying to do pattern recognition solely with algebraic relations between the feature measurements. It just didn't work. But the models did (or did significantly better).
Somewhere around in there I started thinking about the ideas I published in the "Best Models" paper. While trying to write about them, at the same time I was seeing examples in how my cognitive processing was handling routine tasks: like getting up in the morning and going to the kitchen or, one time, when I started reading a paragraph. I noticed how the leading sentence of the paragraph gave me a frame of reference for the following sentences. That planted a seed.
It was in this intellectual environment that I was given a new assignment: To sort incoming text from the customers into "design" and "non-design" statements. It turns out customers use a text feature to override what they ordered or to ask for custom features. Sometimes they simply say "Hello" or "Merry Christmas". All of which makes order fulfillment harder. I was bound and determined to come at this problem using a best model approach.
Now everyone was an "expert" in Natural Language Processing and they were all braying about Bayesian Statistics (stuff I know how to do using AO Diagrams and Data Equilibrium) But I was convinced the best model approach could be applied to language and set about hacking together something that used keywords and tried to use key phrases and even tried to contain an information model for statements of particular importance. At some point I wished I had a dictionary of word patterns I could use, like a keyword dictionary. For example:
"As [blank] as possible"
I did not have that sort of generality and did the best I could. I learned that our customers had about 14 different topics they wrote about and came to call this "corpus" of text a narrow world. In a narrow world you want to sort text into its appropriate sub-topic and then try, as best as possible, to understand and be able to act on, whatever is expressed about that subtopic. Not only were the possibilities quite limited but, as a matter of fact, most people express themselves the same way or in one of several different ways. Since the possibilities are finite, processing text accurately should be possible.
I spent part of a year building an even better reader at home. I studied hotel reviews as a narrow world and tried to read what people were writing when they
complained about noise. It was an interesting subject and I built up some good word lists. But, at the time, there was much I did not understand about approaching language with geometric preconceptions. The best model approach lays out a specific Fibre Bundle description for the relation between a total 'Pattern Space' and a base "Measurement Space' [see illustration
here]. I never could understand which was which in the case of words. My noise reader was OK but got messier and messier and, in the end, Trip Advisor told me I could not mine their reviews. So dreams of money ended.
Then I spent a year thinking harder about word patterns, calling them narrative patterns, and deriving a new formalism called proto semantics for analyzing story structure. I wrote it up in a paper "The Elements of Narrative" and got it rejected by a couple of journals. (I am revising it.) In fact I got distracted from the pursuit of language programming by some actually startling discoveries about simple narrative patterns we live with. There is an entire world of meaning between words and thoughts that has never been discussed because there was never an adequate notation for it. So proto semantics opens other doors it is always tempting to wander through (even in my dottage) . Hence this Sphinxmoth blog.
Then I got sick and badly depressed for 2 month. Recovered with an bubbling up of joy and went to Woods Hole promising myself I would take another close look at noise complaint examples I still have. So this summer I looked hard at the examples and tried to translate them into the new proto semantic notation. As it turned out there were about 6 distinct stories being used to described noise. Like:
"The room was near the elevator"
or:
"Open windows let in the noise"
I showed an image of the six recently
here.
Then I started thinking harder about how to organize keyword dictionaries and had an important insight: once you arrange keyword lists into hierarchies you have, effectively, a number system. Then I realized that the "Measurement Space" of the best model approach
is this hierarchy of words. And that the "Pattern Space", contained the meanings and the ideal objects defined in proto semantics that were to be fitted to those meanings. So a sequence of words in a text, is a path not just through the tree of words but a path, lifted up, into the space of meanings. The final hurdle in this approach is to have a
goodness of fit calculation to compute how well a narrative pattern fits an incoming text. This is the key difficulty at present in Narwhal. (I just realized it has to be recursive and am writing this instead of thinking about that).
I want to say that pattern recognition really does work better with models. It is inevitable that one starts by measuring the data and looking at how relationships between the numbers relate to relationships between the patterns of interest. But it is a mistake to get stuck in the "Measurement Space", and it always goes better when you include an understanding of the "Pattern Space".
I was in Woods Hole a different time later in the summer and was looking
at the noise complaint narrative structure and was able, for the first
time, to conceive of how to write an automated "reader"
in general.
Once I saw the possibility, I decided it should be named after "Narrow
World" and "Narrative" and picked the name "Narwhal". It is also a nod to computer
languages with animal names. Also it is nice we are talking about a marine
mammal, since I had so much fun with it in Woods Hole this last
summer. Glenway and Barb let me rant about it.