Sunday, July 31, 2016

Goodness of fit metric for text matching

I believe it involves the quantity:

g = (num words read)^2 / (last - first + 1)
where a pattern is used to match text, and 'first' and 'last' are the first and last indices of words read in the text.
Here is why. Increasingly I think pattern matching should require all of the pattern to be filled in some way so the number of words consumed will just be a function of how many of them are in the pattern. hence (num words read) / (last - first + 1) is simply measuring how spread out those same words are in the incoming text. The additional factor in the numerator of (num words read) gives greater weight to longer patterns. You might want  to consider G = g/(num words in text) so the quantity is never more than 1.

Saturday, July 30, 2016

Barbara Waksman

How to verify computer generated proofs that are too long for a human to read?

I propose that a proper implementation of Narwhal could solve this problem. In the previous post I showed how an "if...then" statement could be translated into a proto semantic expression. This would give Narwhal an opportunity to process an arbitrarily long statements and, in particular might be setup to read the proof automatically to determine its correctness. As it crunched along automatically through what would take a lifetime for a human: if an incoherent statement occurs in the proof, or if it finishes and says the proof is true or false, then the final verification would be to verify Narhwal itself.
Verifying Narwhal might be considerably more in the scope of a lifetime. I am sure you could do it by verifying the lowest levels and the inductive/recursive mechanisms for increasing complexity.

Hypotheticals are contrasted, multi-part, 2nd order narratives

Example:
If you have time after work then you should come over for a drink 
This is a hypothetical chronology. Actually of the form
((you have time after work), (you come over for drinks))*
There is no need for a separate concept of 'hypothetical' as it already resides within the contrast operator and forces derived from Truism #7. Given a stated a contrast, there will be a desire to resolve it. 
The narrative inside the parentheses that is contrasted is a second order narrative with first order sub narratives. [Concept of "order" is new to proto semantics but needed for the Narwhal implementation]

So this gives us that "if A then B" should translate into the proto semantic statement
(A,B)*
So for example "unless A, B" or "if not A then B" becomes:
 (A*,B)*
 Obviously the Narwhal engine has its work cut out for it, making these translations to text internally.
Update: I am not sure about this. It could also be interpreted as A::B. Perhaps the logician usage is a bit of a corruption between the two interpretations [yeah, don't expect me to be nice.]

Good vs Bad in proto semantics

Faced with how language builds up more complex statements from simpler ones and given how easy it is for expressions to become combinatorially difficult to analyze, the ancients found the True/False concept provided relief. As the thinking goes: True/False can be assigned to the smallest part of an expression, and the value percolates up and through all the complexity to reveal a total whole True/False "meaning" for the more complex expressions. Maybe it was pretty clever of them to realize the problem and solve it this way.

I have the same problem with proto semantics. Given a complex statement, what exactly has been said? I take relief in the idea that something like a Hotel review is an opinion. It expresses a value that is, in the end, either positive or negative, good or bad. So what percolates up through the complexity of an a opinion statement is a Good/Bad value that is meaningful for the smallest unit and also percolates up and through the complexity to provide a single Good/Bad value for the whole. [proto semantics also allows the "opinion" to be off topic or un-intelligible].

This means a Narwhal class must contain a "polarity" member variable, that works to retain the whole value of an input text. It defaults to "1" which is equivalent to "good" or "true" or any other polarity that might be around. I am learning lots of things from trying to implement proto semantics in this Narwhal language.

But not all statements are value statements. Lots of technical language describes things with attributes, chronologies, and interactions and is not concerned with the positive or negative values. They are just informational narratives. In that case the polarity can remain at its default ("Good") [where it can be ignored].

Tuesday, July 26, 2016

If Narwhal is right...

If Narwhal is right then I can hope geeks, from among the 4 billion people on the planet, will set about building little pockets of definition and after 5 years there will be standards committees and the collection of existing (and connected) pockets will spread out and provide a broad basis for understanding most language use. After 10 years it will be broad and deep and the internet will be able to understand what you type.
For example, someone might put their way of thinking on the internet, so people could use it without learning much about it.

You can't make this stuff up

Man fatally shoots doctor before killing elf at Berlin hospital

Monday, July 25, 2016

if A then B

A couple thoughts. One about how logicians co-opted a chronological term ("then") for a timeless logical relation. But actually it is like this: all mathematical certainty comes down to the same thing: I am playing a game with someone or I am not. For example:
P is playing a game with B: when he hands her a ball, she will hand it back.
P will know B is not playing if the ball does not come back in a timely manner.
For another example: Suppose we take a game with this rule: whenever I hand you an assumption A, you will hand me back a predictable result B. I then hand you an assumption A. Now I know you will hand me back a result B, or know that you are not playing the game. Was it Aristotle who wrote this down?
A
A=>B
B

Update: So what is at the root of the certainty that 'I am either playing a game or not'? I believe it derives from this same principle that defines the notion of a "channel"- where you can switch between channels but only watch one at a time. This happens in our heads, for any category that works exclusively - like a color channel, or a shape channel, or an intensity channel, or a position channel (two things can't be in the sample place), or an arbitrary True/False channel. We know that two different channel types are compatible when they can be superposed. I cannot tell if this derives from language or from perception, or what all.

 Playing a game with rules that are shared is also an underlying principle of communication - as Grice would have it. Beings that depend on communication for their prosperity have evolved to this mental capacity of creating new channels and then using them on purpose.

Saturday, July 23, 2016

Circling around the semantic underpinnings of symbolic logic

It is a goal of mine to better understand how mathematical necessity comes about. It some cases it seems to come from cultural games that develop expectations (like counting). Others arise naturally in the vocabulary of boxes and containers and, generally, 'thing' and 'place' words. But above all, I would hope to derive the necessity of symbolic logic ideas from simpler semantics. That hasn't happened yet but I am getting some insights into how the proto semantic operators are related to language usage in comparison with how logic operators co-opt the same usages.

The ',' of proto semantics is equivalent to the natural usage of the word "then", as in "we went to town, then we came home, then we ate lunch". Pretty close to "we went to town and we came home, and we ate lunch". Now this same "and" was borrowed by symbolic logic to mean a version of  sequence that is independent of order. To question whether order is important in a sequential statement is to create a straw horse, meaningful to the logicians' co-opted version of "and" but which is not part of the original natural usage of "and".

Very much the same is true for "or" which does not really have a representation in proto semantics. It means, generally, to make a choice. To ask about whether it is an "exclusive" choice creates a straw horse, meaningful to the logicians who co-opted the term but not part of its original natural usage. A choice is a choice and the additional ("and not both") is an artificial add-on from logical usage. So why is "or" not part of proto semantics? I suppose it is too close to requiring a concept of 'collection' that is not available in "proto" land.

Finally the '::' of proto semantics is the "because" or "so" of natural language. It's closest analog in logic is the "therefor" of syllogism. But interestingly, it often corresponds with a different natural usage of the word "and" [example?]. "He sailed beyond the horizon and that was the last they saw of him".

Update (in favor of proto semantics): the two natural meanings of "and" are captured by the notations ',' and '::'. Proto semantics has no concept of set, so no concept of "or" and choice - although that might be added in a post-proto semantics. Logicians have added the (unnatural) assumption of order-independence to the definition of "and" and the (unnatural) assumption of "both/not both" to the definition of "or". Meanwhile the natural word "then" is captured as ',' and is not locked into the logicians warm embrace of the "if....then..." format. Also the word "if" is in no way special in proto semantics. We say "if you are not busy after work then you should come over for drinks" and take this as a statement with a ',' in it for the "then". The "if" partakes more a matter of choice availability - something at the level of "or" and a bit out of reach to proto semantics.

Sunday, July 17, 2016

The Narwhal Language - a Python extension that implements semantic concepts

I started thinking about promoting my "proto semantics" via a computer language for narrow world language programming. It is fun and convenient to call this the Narhwal Language and thinking about it has clarified several things.
At a theoretical level, I finally see that the base space in a Best Model implementation of language recognition, is a structure of keyword dictionaries. Like this:
Entities like this play the same role as numbers and measurements play in geometric pattern recognition. But it is the narrative patterns that play the role of geometric figures. These narrative patterns live in the total space of meanings. For noise, we have these functional narratives:
     SOUND
     SOUND_/TOD
     PROBLEM_/SOUND, (PROBLEM_/SOUND)*
     LOC _RELATION_/ SOURCE
     MATERIAL -OPACITY-> SOUND
    (SOURCE,LOCATION)_/INTENSITY :: SOUND


So Narhwal is designed around making these ideas accessible to a programmer who wants to write text-aware classes but wants to focus on the details of his subject, not on generalities about how language works.
At a practical level, one discovery helps me to see how Narhwal could be implemented. This is the separation of the summary narrative:
        (SOUND->[ME] :: [ME]_/AFFECT)
from the functional ones and the realization that the functional narratives need a hard coded mapping to the summary narrative. But once you see how to do that and see how text can be filtered through the functional narrative patterns, an implementation starts to become visible on the horizon.
Also at a practical level, it is worth clarifying the tree of keyword dictionaries illustrated above. One basic concept is the difference between OR'ing and XOR'ing of sub-dictionaries. Another is the difference between sub-dictionary and child dictionary. These are needed to specify the structure that I think is required.
Back to the theoretical level for a moment: incoming text is seen as a path through the base space and its interpretation is a lifting to the total space. By using a goodness of fit measure that counts the number of words consumed, one hopes that the best fit lifting is a reasonable approximation to the "true" meaning that will forever be stored, untouchably, in different people's different minds.
Update:  Not long after seeing that a computer language was possible, I was noodling around in a tongue-in-cheek sort of way trying to think of a name for the language. "Narhwal" works for narrative patterns as well as for narrow worlds and, given the number of computer languages that are named after animals, seemed like a winner. In fact, as soon as I had a name to use, I started using it and the project was launched.

Saturday, July 9, 2016

The abstract form of the noise statement

But interestingly, this is not the form that appears in natural expressions which mostly take the form of cause and effect statements: "the room was near the elevators and we were kept awake all night by the clanking of the old equipment". Or "the windows did little to block the sounds of heavy traffic from I-270".
This suggests the need for two levels of semantic processing in an automated reading system. A first level reads natural expressions using smaller "functional" story forms. These are then hard coded to fill in the larger general form. [For hotel rooms there are only about 5 of these smaller story forms that are about noise.]
Update: This basic idea of two levels of semantic processing was the beginning of believing I might be able to implement these things in software.