Friday, September 30, 2016

Hate Speech may be a "Narrow World"

I hope to study hate speech and build a Narwhal class to read it if it uses a limited set of narratives. Using Narwhal to read hate speech would seem to be a natural application. Hope so, cuz it looks like the internet needs it.

I am up late reading about "AI and Hate Speech" (my Google search terms) and about something that removes 90% of the offending speech. That is probably as good as it will ever get, being model-less.

(A few minutes later after reading more) It looks way more complicated than simply finding keywords and narrative patterns. So perhaps my "hope" is absurd.

Refloated my boat

Sent the current version of "The Elements of Narrative" to a Journal that solicited a publication called "IOSRJEN" the International Organization of Science Journal of Enginering. I'll probably regret it as the organization is pretty 3rd world. The paper has been significantly cleaned up. So we'll see.
Update: I am caught flat footed by the fact that it got accepted.  Ah well, I could write shorter versions for other journals.

Friday, September 23, 2016

Lots of blogging over at "Rock Piles"

Cranked out ~40 pages of writing and pictures over at my main blog. Ach! I am putting off coding the readText() function with recursive temporary vaulting.
Update: nah! recursive, but vaulting only at the highest level.

Origins of the word "Narwhal"

Got this from a comment in Dictionary.com [here] by a person named Kald:

And by the way, “narr” also means both a “fool” and “jester” in Norwegian… I would guess it came from the Latin word “narrare” (a story or a tale). Jester = storyteller. And I would not be surprised if that is the origin of your Hebrew word as well.

Update: So it looks like I am completing the circle of the word back to its roots. And while we are on the subject of its roots, how about the 4-beat "5 3-6 5 3" sung with the words "nah nah-na nah nah". I would imagine it to be older than these languages. I would not be surprised if birds sing something like it. [Or I could be wrong and crazy.]

(I am using 3-6 to mean dotted quarter note on three and eighth note on six.)

Update II: Korean song: 5-5 3-3 5, 3-3 4-4 3. Sing it and you will see an family relation with the English version.

Thursday, September 15, 2016

The Pre-History of Narwhal

Only cuz I hope it will be of interest to someone:

I work at a dental company where we do lots of 3D geometry pattern recognition. I did this with the best of them for a few years before learning that the right approach is often through some kind of model fitting process. Recently a colleague developed a "model" for the entire pair of upper/lower arch of teeth, by doing principal component analysis on many examples of upper/lower arches. Using this model to fit new examples was a totally different exercise. Not like what you do in calculus, where the best fit is calculated, but rather something more like an exhaustive search through the possibilities. Nevertheless, the results are very effective.

I had similar success measuring the profiles of teeth and coming up with 2D polygon models, to fit to those profiles. In my case I could measure widths and heights of features in the profiles, and use these numbers to parameterize the 2D models. This gave a shortcut to finding a best model and a best fit. Before I developed a model based approach I spent a month or so trying to do pattern recognition solely with algebraic relations between the feature measurements. It just didn't work. But the models did (or did significantly better).

Somewhere around in there I started thinking about the ideas I published in the "Best Models" paper. While trying to write about them, at the same time I was seeing examples in how my cognitive processing was handling routine tasks: like getting up in the morning and going to the kitchen or, one time, when I started reading a paragraph. I noticed how the leading sentence of the paragraph gave me a frame of reference for the following sentences. That planted a seed.

It was in this intellectual environment that I was given a new assignment: To sort incoming text from the customers into "design" and "non-design" statements. It turns out customers use a text feature to override what they ordered or to ask for custom features. Sometimes they simply say "Hello" or "Merry Christmas". All of which makes order fulfillment harder. I was bound and determined to come at this problem using a best model approach.

Now everyone was an "expert" in Natural Language Processing and they were all braying about Bayesian Statistics (stuff I know how to do using AO Diagrams and Data Equilibrium) But I was convinced the best model approach could be applied to language and set about hacking together something that used keywords and tried to use key phrases and even tried to contain an information model for statements of particular importance. At some point I wished I had a dictionary of word patterns I could use, like a keyword dictionary. For example:
"As [blank] as possible"
I did not have that sort of generality and did the best I could. I learned that our customers had about 14 different topics they wrote about and came to call this "corpus" of text a narrow world. In a narrow world you want to sort text into its appropriate sub-topic and then try, as best as possible, to understand and be able to act on, whatever is expressed about that subtopic. Not only were the possibilities quite limited but, as a matter of fact, most people express themselves the same way or in one of several different ways. Since the possibilities are finite, processing text accurately should be possible.

I spent part of a year building an even better reader at home. I studied hotel reviews as a narrow world and tried to read what people were writing when they complained about noise. It was an interesting subject and I built up some good word lists. But, at the time, there was much I did not understand about approaching language with geometric preconceptions. The best model approach lays out a specific Fibre Bundle description for the relation between a total 'Pattern Space' and a base "Measurement Space' [see illustration here]. I never could understand which was which in the case of words. My noise reader was OK but got messier and messier and, in the end, Trip Advisor told me I could not mine their reviews. So dreams of money ended.
 
Then I spent a year thinking harder about word patterns, calling them narrative patterns, and deriving a new formalism called proto semantics for analyzing story structure. I wrote it up in a paper "The Elements of Narrative" and got it rejected by a couple of journals. (I am revising it.) In fact I got distracted from the pursuit of language programming by some actually startling discoveries about simple narrative patterns we live with. There is an entire world of meaning between words and thoughts that has never been discussed because there was never an adequate notation for it. So proto semantics opens other doors it is always tempting to wander through (even in my dottage) . Hence this Sphinxmoth blog.

Then I got sick and badly depressed for 2 month. Recovered with an bubbling up of joy and went to Woods Hole promising myself I would take another close look at noise complaint examples I still have. So this summer I looked hard at the examples and tried to translate them into the new proto semantic notation. As it turned out there were about 6 distinct stories being used to described noise. Like:
"The room was near the elevator"
or:
"Open windows let in the noise"
I showed an image of the six recently here.

Then I started thinking harder about how to organize keyword dictionaries and had an important insight: once you arrange keyword lists into hierarchies you have, effectively, a number system. Then I realized that the "Measurement Space" of the best model approach is this hierarchy of words. And that the "Pattern Space", contained the meanings and the ideal objects defined in proto semantics that were to be fitted to those meanings. So a sequence of words in a text, is a path not just through the tree of words but a path, lifted up, into the space of meanings. The final hurdle in this approach is to have a goodness of fit calculation to compute how well a narrative pattern fits an incoming text. This is the key difficulty at present in Narwhal. (I just realized it has to be recursive and am writing this instead of thinking about that).

I want to say that pattern recognition really does work better with models. It is inevitable that one starts by measuring the data and looking at how relationships between the numbers relate to relationships between the patterns of interest. But it is a mistake to get stuck in the "Measurement Space", and it always goes better when you include an understanding of the "Pattern Space".

I was in Woods Hole a different time later in the summer and was looking at the noise complaint narrative structure and was able, for the first time, to conceive of how to write an automated "reader" in general. Once I saw the possibility, I decided it should be named after "Narrow World" and "Narrative" and picked the name "Narwhal". It is also a nod to computer languages with animal names. Also it is nice we are talking about a marine mammal, since I had so much fun with it in Woods Hole this last summer. Glenway and Barb let me rant about it.

Tuesday, September 13, 2016

Bad becomes Good

Let's suppose a rule for the narrative X::Y that says if one of the two is Bad and one is Good, then the whole statement is Bad. If they are both Good, or both Bad then that is Good. I am assuming '::' has an interpretation as "if...then".

Here is an example that seems correct:
"if you like noise then you will like this hotel" (Bad-to-Good, a negative about the hotel)
"if you like noise then you will not like this hotel" (Bad-to-Bad, a positive about the hotel)
"if you like quiet you will not like this hotel" (Good-to-Bad, a negative about the hotel)
"if you like quiet you will like this hotel" (Good-to-Good, a positive about the hotel).

If this is the general case, then value or sentiment transmits through an "if...then" by this rule:

X\Y      Good   Bad
Good  |  Good   Bad
Bad   |  Bad    Good

Interestingly this can be derived from Truism 4 - "Things remain the same." The stories about things remaining the same are positive and the ones where things change are negative.
Update: I am not sure if it this works in general. It may be because of the "like".

Saturday, September 10, 2016

The "n" and the "d"

I was driving and trying to make a joke of the words "Black Widow" I saw on the back of a passing car. I tried "Black Window" and, for some reason, this really did not work. Pausing to think about it, it may be because the sounds "win" and "wid" are so different. It feels as if the "n" and "d" endings are almost opposites of each other.
Just a thought to follow up on sometime: the exclusive "|" of Narwhal is related to the reversal of sentiment value, as per the "block()" commands and the implementation of ' *'. The same thread of narrative exclusion may serve to describe very low level aspects of behavior and language. In other words the "n" versus "d" sounds may live in a world (of phonemes) that can also be explained using narrative patterns.

Tuesday, September 6, 2016

Narhwal's darkest hour

...writing the UpdateVault() routine

Sunday, September 4, 2016