Sphinxmoth: December 2016

Thursday, December 29, 2016

More Narwhal progress

I have been debugging Narwhal steadily over Christmas vacation. Wrote a preliminary regression test and bogged down in further bugs. But today I was free enough from deep bugs to begin using the regression test to tune the NoiseApp. Just now I took a chance and fired it up on a pretty long sentence I have been waiting to try:

"Not one to complain normally I was able to overlook this however I was not able to overlook the fact that the walls are paper thin-every footstep, toilet flush, tap turned, and word spoken was heard through the walls and to top it all off we were unfortunate to have a wedding party staying on our floor."

The result is quite excellent:

A 'sound' narrative is certainly being told, as well as a 'proximity' narrative. It would be great if this was the way it worked on most examples - even the 'affect' narrative gets a bit of a hit. And this starts to give me a perspective on the app narratives fitting around the text together.
Update: This was not as "new" a sentence as I thought. I had debugged it.

Wednesday, December 28, 2016

Revised GOF formula

I had an original gof formula (see here):

gof = (u/n) *( r/f)

where

u = num used slots of narrative
n = num slots of narrative
r = num words read (corrected for control words, dull words, and anything else I can skip)
f = (last word read index) - (first word read index) + 1

This had an issue when the narrative is a single VAR that is found in the text, because the formula gives it a "perfect match" as compared with what usually happens with multi slot narratives. So I engaged in the unpleasant exercise of compensating for the total number of words, thus penalizing all the other narratives in order to handle single-VAR narratives.
After a bit of soul searching, is it reasonable to think that a singe VAR is not really a narrative? In the proto semantics I took the simplest narrative fragment to be a 'thing'. Perhaps it should be 'thing has attribute"? If that was so, it means the slot count should really be at least 2. So instead of leveraging the total word count into a formula that applies to everything, let us compensate with this version of the formula, that is only different for single-VAR narratives:

gof = (u/max(n,2)) * (r/f)

Tuesday, December 27, 2016

Language is precise

I think one reason mathematicians do not consider language the proper subject of mathematics is because they believe language is vague. By contrast I think language is precise and what seems vague is the use of implicit language. However implicit language follows very specific rules (laid out as the "Truisms") and it is highly efficient. When truisms are explicitly re-inserted into the sentences where they apply, the result is an object with well defined mathematical properties. By which I mean a sentence has an exact description as a formal expression, that is subject to well defined transformations by substitutions that preserve narrative role.

When I'm 64

That would be today.

Saturday, December 24, 2016

Anti trust Law should apply to Big Company Chatbots

Just read an article where some law professors are warning about the forms of user discrimination that large company "personal assistants" can embed. It seems totally obvious that a platform AI like Google Assistant should protect the customer not the vendor but that Google's profit model is in conflict with customer protection. This is true in general, a large company with a profit agenda should not be allowed to also "advise" on what purchases to make. It is a conflict of interest and bad for the consumer.

Thursday, December 22, 2016

Worst mixed metaphor of 2016

(found on Venture Beat, without any preamble):
"The very best marketing campaigns grab them by the gonads. But it takes data to figure out exactly what goes into reaching out and touching someone."

What happened to editors?

Monday, December 19, 2016

Up late

Had some good experiences with Narwhal yesterday and today. It has been like a goose reluctant to lay a golden egg - that just started laying. Enough of Narwhal is working to begin to show strength - but I will continue to be paranoid that a show stopping conjunction or punctuation shows up to spoil everything. Or that it be too hard to define the narratives of a narrow world.
Update: actually what happens is that the mechanism of cause(X,Y) is challenged by the need to both work with bi-directional syntax ('as' versus 'so') and be able to mis parse a partial sentence that has lost an earlier part.

Monday, December 12, 2016

Wow, my code is working

I just tried the equivalent of SOUND_/[SOURCE]_/[INTENSITY]_/[TOD] using the highly nested, and untested function Attribute(X ,Y) along with 'implicit' notation "[]" in the syntax:

s = Attribute( SOUND,[Attribute(SOURCE,[Attribute(INTENSITY,[TOD])] )] )

And, what you know? It seems to have worked, cuz the standard score on the standard sentence, went up a bit. But it was slow!

Thursday, December 8, 2016

Probability of a subsequence

I am a little surprised to find no quick online answer for the question: what is the probability of finding a fixed 'search for' sequence of length K within a longer 'search in' sequence of length N?
Assume a common alphabet - say - {0,1}. The total number of sequences of length N is 2^N [actually more like (2^N - K)]. The number of these that begin with the 'search for' sequence is 2^(N-K). Call that set "D"

Consider applying the shift operator to the elements of D and how D has an orbit, within the whole space of sequences, that is the same as the set of sub sequences of interest.

The total number of things you get that way divided by 2^N is the desired probability. I suppose it is complicated because not all sequences have the same orbit size, so depending on where you start in D you get a different orbit length. So actually, rather than caring how to count these things, we should be more interested in how they fit together geometrically. You are looking at the points of a discrete surface within a discrete volume, so counting points may not be as interesting as other geometric properties of the sets. Not going to figure it out though. I see why now.

However we can say that the maximum orbit size for a point in D, under shifts, is N - cuz that is how many possible shifts are available. So the probability is < N*2^(N-K) / 2^N . So an estimate is:

probability < N / 2^K

Unfortunately this is a lousy estimate. It misses the subtler point that periodicity of 'search for' within 'search in' must bring the numerator down a lot.
Update: If N is a prime number then only a constant sequence like 0000000 can be self similar and have an orbit under the shift operator that is less than 2^N - K long.
Update: The proof is that is shift^n has a fixed point, then the subgroup with this fixed point would have order dividing a prime.

Wednesday, December 7, 2016

Implicit NARs in Narwhal

Not debugged but seems straightforward: implicit sub narratives are implemented in Narwhal through the clunky api of

nar.makeImplicit()

The result is that all sub nars and all VARs at the bottom will have a self.explicit field set to False. The consequences is in GOF slot counting, where we use the "active" slots in the denominator instead of the total number of slots. The "used slots" is as before, the total "num slots" is as before. But now the "num active" is defined as found or explicit; and this is used in the denominator. This allows an unused and implicit part of the narrative to be ignored in the GOF - hence making the nar more multi-purpose.
There is a reward for filling implicit sub narratives: more words from the text are read.

Monday, December 5, 2016

Getting closer

Toot toot! (my own horn). After a lot of debugging of the NWObject, my first attempt at running the noise application gives good results:

I don't dare stress test it. But I'll have to. [In fact the next thing I tried failed.]
I was depressed thinking how hard it will be to market Narwhal to people who already think they know about language interfaces. Then I had a cheerful thought: I can write an article about the need for language interfaces to understand product reviews - and go into details of the noise complaint - as an example of the psychology of a particular specific "product" review. After all, I am an expert, and there is nothing technical about the topic.

Sunday, December 4, 2016

The "final" meaning

I puzzle over whether a filled narrative is the final form of information or whether that can be transformed, one last time, into a better, static data structure. Mostly, I conclude the efficient static structure is the filled narrative itself together with ifound[] information about what words were 'read' in the text.

Sphinxmoth