Thursday, March 27, 2014

Why geometry of language?

The theory of best model classification is essentially geometric and it seemed so successful in that domain that there was a strong desire to apply it to other things. And an impression it could be applied to reading text. Luckily I work at a place that needed language automation as well as geometry automation and I got to try out an approach informed by those ideas in the narrow world of custom part design. 

But at home, I am trying to really understand what is involved. This requires pursuing the analogy in more details. What is the space of points, and data fitting that can apply a best model approach to text? That question drove the formulation of a proto semantics and its definition of narrative fragments, the "geometric objects" of this linguistics. I have yet to carry out the complete program including a goodness-of-fit metric (something to do with  number of slots filled in a fragment).

Even in college, after reading Bertrand Russell, I became convinced that the way ideas merge together was often more intrinsic to the type of idea than to the grammar that combined words for the ideas. So "or" and "and" took there meaning largely from whatever was being juxtaposed. I was fascinated by how I cannot think of a square and a circle as the same object, but can easily have a square and a redness as the same object. If I try to make a single object red and green, it splits the color in two. I still don't understand how color and shape channels are what they are.

But although I do not understand this built-in logic, that comes with our use of words and thoughts about things, there is a reasonable, practical way to use data fitting ideas with text, provided you narrow the topic enough. Then the word definitions are what you make of them, and reside in your dictionaries and C++ class designs. Language recognition built in this way is real because the class definition is concrete and shared.

No comments:

Post a Comment