Monday, February 25, 2013

The general solution to a discrete expert system - the Best Model Estimate

Here mu is a set of measurements, the phi are sections of mu (one for each i in {1,2, ..., K}) thought of as parametrizations, and e is a real classification to be estimated. It is pretty easy to show that the average success rate of the estimator is the sum of the volumes of the Vi intersected with the inverse image of i under e( ). So when X has measure 1, the total error is:
I guess it makes sense to call this a best model estimate.

Thursday, February 21, 2013

Best fit step

Consider "fitting" a step shape (in red) to data (in blue) by simply matching step height to data height and choosing the first place the (lower) level encounters the data.
Once you have aligned the abstraction (in red) over the data (in blue) you can do a least squares fit. [This is the only evidence I have of the value of least squares best model.]

A prediction game

Hey this is kind of blogworthy:
Suppose that person A and person B agree that A will classify incoming objects into -say- n categories. And they agree that A will follow a systematic procedure. Person B has the task of predicting what person A will do when a new object arrives to be classified. Person B gets to examine the object first but must be in a different room from person A (so they cannot observe each other).
The essence of what an expert system must do is exactly this, but with the further burden that "person B" must be a computer equipped with rulers/calipers/sensors/detectors/etc to process sensor data about incoming objects. So an expert system is an automated person B.

Sunday, February 17, 2013

An imaginary conversation

A customer walks in to the store and speaks to the clerk:
Customer: Do you have any yellow enamel paint?
Clerk: What kind of yellow?
Customer: A pale, butter yellow.
Clerk: What is it for?
Customer: The eye of a merganser.
...there is a pause...
Clerk: Then you won't be needing very much.

Tuesday, February 12, 2013

Hooded Merganser 1

We'll see if it gets better or worse as I continue carving. [It got worse.]

Thursday, February 7, 2013

Pattern recognition by the method of best models

I have been hunting for words in some of the previous posts. Let me try again: 
Suppose you have a space of objects, each given by data that can be measured and you wish to use the measurements to help recognize the object. Here is the method: use a discrete dictionary of parametrized ideal objects, called models, whose measurements are set to match the measurements of a given object you wish to recognize. There may be several models in the dictionary with these same measurements. Each such model is itself an object in the object space. Because the model is in the same space as the object to be recognized, comparison metrics can (and should) be based on model-to-object distance there, not on any distance concepts in the space of the measurements. The best model is the one closest to the object in object space, and having the same measurements as the object to be recognized.
In particular, you want to avoid the trap of defining recognition in terms of regions in the measurement space. That can lead to expert systems with training instability and an infinity of corner cases. This new better way of looking at it with a "Total Space" of objects, a "Base Space" of measurements, and a method for inverting the measurements, all are reminiscent of creating a section of a fibre bundle (like the logarithm). 
But what is most evocative to me, is that this recognition takes place in a context where measurement is possible - a context with some form of coordinate system and some mechanism for aligning the coordinates to the objects to be recognized, in order to perform measurements. Hence the recognition is a byproduct of a perception (the measurement) and a projection (the forming of models being compared to the data). If you think about it, this is a reasonable fit for how we navigate the world about us in a continuous feedback loop of perception and projection. But here is the hardest part of the idea: the initial measurement depends on a prior coordinate frame attachment which, itself, is a best model result. The process is inherently hierarchical and (I suppose) will work best when the pattern dictionaries are nested in the same way that details are related to the whole.
Since getting my PhD, I have been fascinated not just with the mathematics of moving frames but also with some of the applications of attaching coordinate frames to data - (eg "Anatomical Frame Standards" for medical imaging). So, to discover these ideas connected to those of another old friend, the logarithm, all in a way that is a reasonable description of a cognitive process - is quite gratifying. I'm sure it sounds looney but I did use it successfully to solve a problem at work involving automatic feature detection for surfaces in 3D. So here is the full-on crazy: