Tuesday, January 31, 2017

Some kind of watershed moment for this Sphinxmoth blog

For the first time, when I type "Barbara Waksman" into Google, the first link is to this blog. They may be tweaking the algorithms. I am pretty sure the quotation mark behavior changed in the last week. They might also be giving Sphinxmoth just a tiny bit higher priority for the obscure reason that I have been getting a few more visitors than usual. The "51-state flag" remains the only thing anyone ever comes to Sphinxmoth to read. But 1 visitor from Oberlin and 1 visitor from Miami - reading the real stuff - may have tipped the Google scale.

Monday, January 30, 2017

Just asking....dont mean to be political or anything like that...but...

Is there a law that says a tech company can't cut off free products - like Facebook, Google search, Wikipedia, Twitter, etc, from certain parties that are deemed "undeserving"? Like how about Google says: no more free searches if you work for government; or Facebook says: no free accounts for government employees. Or Twitter shuts down if you live in DC?

That sort of thing. Is there something stopping these companies from building their own walls? If so, then posting pictures of civil rights icons and making vague statements about corporate values are pretty weak tea. How about a user policy against vitriol, that gets enforced in a spotty way? Or are these famous tech companies chicken? [We used to say: "Puk....Puk....Puk"].

PTVS is great...but

(P)ython (T)ools for (V)isual (S)tudio are great. But the red squiggles (telling me I have an illegal python file format) are way too aggressive. When I am typing a comment it is busily trying to correct my indents, and when I type a blank line with the cursor not on a tab stop, more red squiggles for a while; until PTVS decides..."oh I guess it is OK".
Dear Microsoft: please put in a fixed 3 second delay before any red squiggles. For God's sake! At least give me a chance to finish typing.

PERSONAL chat bots and dealing with faceless beurocracies - and THEIR corporate chat bots

Sure big companies can save money and improve the customer service but do I want to wait around for my insurance carrier to get it together? What if their business model includes giving me the run around?
This is one (more) example of why we need personal chatbots to do our bidding on the internet. I want to be able to tell my chatbot: call the insurance company, talk to their chatbot (or a person if they have one) and find out why they denied my recent claim.

Saturday, January 28, 2017

Thoughts in the shower

Narwhal is pushing me in more mathematical direction with the idea of a segmented text supplanting the idea of tokenized text. Instead of tokens, a 'segment' replaces words of text with keyword id's called VARs. The VAR also stores the indices of tokens where they are found.

In the shower I was thinking about segmented text and the analogy to a spatial curve. We fit a narrative frame to the segment analogously with fitting a Frenet frame to the curve. So in the shower I ask "what do the Truisms have to do with this?". And that reminds me that things like Truism 4 are specifying the "parallel transport" for some of the dimensions of the moving frame, and that reminds me that the concept of Narrative Continuity introduced in Elements, which refers to a kind of connectivity property of coherent narratives. This needs to be seen as a kind of topological property of the segmented text.
In turn, I am reminded that the definition of Narrative Continuity is made awkward by the possibility of a sub narrative structure where a local variable introduced in one place doesn't occur in the next sub narrative but does occur in a later one. A key mathematical trick is to define away the problem. This leads to the idea of a coherent segment as one that can be divided into sub segments s1,s2,...., sN so that if a local VAR appears in si then either it also appears in si-1 or in si+1. All other VARs are assumed global.
Lemma: The alternating sum of the local VARs in a coherent segment cancel.
Proof: definition of coherent segment.
Coming back to the relation of Truisms to segment text, a Truism is an insert-able segment. So I guess the Truisms represent transformations or dimensions in the total space that permit narrative coherence to be implicit in the segmented text. It is like a Truism allows continuity violations...have to think about that.

Friday, January 27, 2017

"Announce.txt" from Narwhal

Contains something along these lines:
Version 1 uses direct text processing, which leads to the same KList lookups occurring over and over - completely inefficiently. It was my goal for the code to be true to an intuition about the "moving topic", so pre-processing the text wasn't allowed. Now for performance I may need to move past intuition and need to conceptualize entities that are not directly intuitive. In particular concepts of 'token' need to be replaced with concepts of filled VAR. A sequence of tokens is replaced by a sequence of filled VARs called a "segment". Tokenizing is replaced by segmentation. All other operations, up to the NWReader, need to be re-written using segmented text.
But I have to develop new intuitions for segments. They will be the "spatial curves" to which we fit the moving frames of narrative.
I like that last sentence.

Voice interfaces can be more efficient than button pushing

I was trying to decide whether the final filled narrative, extracted from text, should be boiled down into a compact "final" data structure. To my surprise every "compact" structure" forced me to make choices that limited what was being stored and that **surprise!** the most compact data structure was the narrative itself. In other words the narrative 'format' is closer to how we think and tabular or hierarchical formats of the content have more unnatural structure and less information.

Sunday, January 22, 2017

I dreamt I was a butterfly

"I dreamt I was a butterfly....Now I do not know whether I was then a man dreaming I was a butterfly, or a bird dreaming I was a butterfly".

Friday, January 20, 2017

Wouldn't you want to domesticate a moose?

Yeah well apparently its hard cuz they want to roam around in solitude most of the time over a large area - they are not herd animals. It would take time to develop different personalities.

Thursday, January 19, 2017

Pragmatic Linguistics is Begging to be Formalized

In "Elements of Narrative" I write:
 "I  find  that what  I  call narrative structure not only meets the requirements for a Sematic Frame as described by [Fillmore 3], but also gives a more specific representation to some of the ideas of semantic implicature [Grice 4] and semantic underderminacy [Carston 5, Belleri 6]. "
I should have added - "an area of pragmatic linguistics that is begging to be formalized."

Monday, January 16, 2017

pre occupations of the moment

I am lucky to have so many things going on in a brain that just wants to roll over and go back to sleep on a weekend:
  • ver.2 of Narhwal uses segmented indexing, rather than text, to fix performance problems
  • 'boolean' rules of value - how value propagates and combines from sub narrative to larger narrative
  • topology of narrative. Things like circular stories are invariant under word substitution. Narrative Continuity might be expressed as a rule of "conservation of energy" saying that all local variables cancel when we take alternating sums of sub narrative variables
  • set up dropbox for family photos
  • send out Impatience When a Red Light Turns Green for comments
Or I could hang out and watch TV. Tomorrow I have to go back to work where the pre-occupations are different and related to dentistry:
  • new features in the Virtual Articulator, for manipulating the Keep Out Zone (KOZ), user training, and patent filings related to crown design tools.
  • regular work on automation enhancements
  • start listening to voice recordings from customer service
  • make a scene about antiquated product catalogues

Sunday, January 15, 2017

Claims to old Semiconductor Metrology and Defect Inspection Algorithms

Just for the record, Nanometrics bought Soluris, bought IVS where I developed the 'ripple' algorithm - for finding and measuring sub "peaks" in an optical signal. I also did the contact ("via") measurement algorithms. Tony Saporetti was one of three founders of IVS and did the initial metrology algorithms using WWII radar algorithms - based on the mean of a Gaussian to measure the location of a "peak" in a signal. He also was an original specifier of "frame grabber" electronics and the start of modern digital imaging - a real smart engineer. Insiders at Nanometrics will know that optical metrology depends on this use of the Gaussian mean. Tony also did the auto focus algorithm and for years the people at Cognex (every time I made the mistake of interviewing there) tried to pry the concept from me [guys: look up the "triangle inequality"]. But I figured out how to latch onto sub peaks in an optical signal and how to measure between inflection points and that is a key technology also.

Just for the record, KLA's bin defect pattern recognition uses the corpse of an algorithm I developed for a failed startup called "Applied Geometry Inc". The algorithm is the application of chi-squared to point scatters in a gridded field of view, where you count events in a grid cell and compare the distribution to what it would be if there was no spatial pattern to the events. A good basic algorithm for measuring goodness of fit between a pattern (usually an outline in 2D) and a scatter. I filed a patent but it got rejected for the strangest reason - the reviewer took issue with my using the word "dotted" to describe a line in a diagram. The patent application must be on file.

These algorithms still exist and are in use, although the companies got bought by larger and larger parents.

Saturday, January 14, 2017

A circular narrative always has narrative continuity

I just wrote a circular narrative [here] that ends where it begins. In the context of each local variable reappearing in a subsequent sub narrative as the definition of narrative continuity, it always holds when the narrative ends using the same variables it begins with.

Thursday, January 12, 2017

Writing a paper

I particularly like these lines, where I am commenting on the Bloom baby experiment: 
It is possible that babies recognize the sequence of a chest being closed – in fact that is one of the premises of the research. To the baby, the frustrated bunny is involved in a “Contrast” that is “Resolved” when the brown teddy bear helps him. In the second episode the white bunny is associated with the “Contrast” being not “Resolved” and perhaps this makes the baby uncomfortable. By this interpretation, pre verbal babies already have the narrative preference. They like one story more than the other.

So it would be like showing a puppet show to a New York commuter involving cars and red lights turning green; and determing later whether the driver who did or didn’t go when he had chance was a bad guy.  

Kirsten Dunst

My wife and I were trying to remember the name of the actress and I was remembering one visual image of her face and outfit and not able to retrieve the name from this old memory. So I deliberately tried to remember her face and outfit from a different movie and - sure enough - the name came floating up as well.

Saturday, January 7, 2017


I am posting this lest I forget about it, as an unfinished idea. It is about a meta notation on top of proto semantics. Some of it is pretty unclear particularly the notational substitution '=>' versus a semantic '::'. So for what it is worth:

With reference to the proto semantics, here we develop some abstract entities built from sets of proto semantic narratives.

Narrative Context
To analyze the meaning of a sentence dynamically as it begins, continues, and ends requires some kind of space or context comprising a set of narrative fragments (with an internal structure that remains TBD). To represent the set, use curly braces around the narratives and use semi-color ‘;’ to separate them as elements of the set. For example {X_/A} is a context containing one narrative ‘X_/A’. And
{ X->T ; T_/A } is a context containing two narratives ‘X->T’ and ‘T_/A’.

Narrative equivalence
An immediate advantage of this notation is the possibility of an equivalence relation between narratives other than exact identity. For example, we will be able to equate {(X)} and {X} and assume other equivalences like {X_/(A_/B)} and  {(X_/A)_/B}. Therefore we define a one-way equivalence ‘=>’ which permits substitution of the right hand side for the left hand side; and we write ‘=’ for bi-directional substitution. I will also use a ‘+’ to indicate that two contexts in braces are part of a single larger context. Some typical eqivalences are given later as “postulates”.

Effects – notation relating actions to persistent properties
An effect is a persistent attribute arising from a action. As a covenience, for an effect A we use the notation dA to indicate the action it arises from or is changed by. Thus the verb narrative:
        X-dA -> Y
Corresponds to to the property:
For example we will write
        {X-dA->Y} => {Y_/A}
This convention can be modified or extended as needed to handle other aspects of the action/attribute relation, for example changes to the actor and differences between beginning and ending of a persistent property.

This convention is not part of proto semantics because it is a notational convention and not part of a narrative.

Another immediate advantage of the curly brace notation is the possibility of symbolic definition like the following.
Notational equivalences
{(X)} => {X}
{X}+{Y} = {X ; Y} 
{X_/[A]} = {[X_/A]}
{XdA->Y} => {Y_/A}
X**=>X (WRONG!)
There are probably many other notational equivalences that could be assumed. 
{X ; Y } =  {X,Y}
{ X_/(A_/B) } = { (X_/A)_/B }
{X->Y, Z->Y } = {(X,Z)->Y }
{X->Y, X->Z} = {(X->(Y,Z)}
 X::(Y,Z) = (X::Y,X::Z)
[X::Y]* = [X::Y*]

Dynamic equivalences (and the ghosts of syllogism)
{XdA->Y} => {Y_/A; X_/A’ }  (This is the assumption of effects)
{A::B} => {B}
{A::B} + {B::C} => {A::C}    
{A::C} => {A::[B]} + {[B]::C}


Then there are those myriad of grammatical modifiers: plurals, possesives, genders, etc. What do they do? They imbue the existing narrative fragment with auxilliary meanings, and luckily that can be expressed in the proto semantics using ‘_/’. But there is no end in sight for what conventions can be assumed about the implicit ‘[Z]’ in a particular manipulator. It is a large family of entities.
A manipulator is a narrative valued function M() returning a narrative. They typically take the form:
M(X) = [X_/Z]
(Although I am sure you can come up with others.) For example “he” is a function M(X)=[X_/male]. The gender can remain implicit unless neded for later narrative continuity.

I don’t know how to formulate this. But by 6N,  ‘X_/[Z]’ is expected to be followed by ‘X_/Z’.
The parts of speech known as quantifiers or determiners such as “all”, “one”, “some”, “most”, “much”, “fifteen”, “every”, “the”, “a” are set-theoretic and beyond the scope of proto semantics. For now let us ignore words like “the” or “a” if they occur in an expression to be analyzed. In the same spirit, let us treat pluralized entities as nouns of type ‘thing’.
The word “she” adds female attribute to the noun role:
“she was hungry”
If desired we can diagram this with a plus sign and an implicit additional fragment:
        {she_/hungry} + {[she_/female]}
From our point of view, the conjunctions are a mixed bag. The word “because” is equivalent to the becoming operator ‘::’.   The words “and” and “or” perform the consecutive operator ‘,’. They take their meaning from the entities being juxtaposed. 
One version of the word “and”, in an asymmetric use, anticipates what follows:
“I was hungry and…”
We can diagram this as:
        {I_/hungry} + {[Z]}
Words like “but”, and “while” are particularly interesting as they both anticipate and exclude:
“I was hungry but…”

A more dynamic view of narrative allows for a collecion of narrative fragments to evolve together within a context. Let me assume the context for an expression has already been narrowed down to being the description of an object – even (why not) an instance of a C++ object with a default constructor. Call this class the topic, or the narrow world, or the frame.
An expression comes in and is determined to be about the topic or not. If it is about the topic it is examined by the object instance and becomes part of its description – something you can now query the object about. It has become available information.
In this dynamic context a variety of narrative fragments can be considered equivalent and to define this equivalence we define resolution. We say a narrative resolves to another narrative when we observe it happening or wish to postulate that it happens. Here are some postulates:

The Diagramming Loop

Practical language recognition builds up dictionaries and narrative fragments, etc. But there is a lot of work needed to parse and prepare incoming text.

You have incoming text: 
You need topic specific dictionaries, fragments, punctuation handling, and other things in order to consider all possible meaningful tokenizations. For example “subgingival” could be broken in two, and “1.7 mm” could be retained as a single token. However you do it, you end by considering one or more tokenizations:

Assume we have “dictionaries” (small synonym lists) D1, D2, D3 and one narrative fragment N(d1,d2,d3) where the di are from dictionary Di; and a tokenization.
~~~  ~~~  ~  ~~~~~ ~~  ~~~~~
Every token should be marked as something in a known dictionary, or as an unrecognized dreg. (The basic rule for dregs is that if they come in a group you can ignore them as off topic. But when they are mixed with dictionary words, it is a bit riskier to ignore them.) Anyway, you get an ordered list of tokens:
~~~   ~~~   ~   ~~~~~   ~~   ~~~~~
Let D be the union of the Di. Here is an algorithm:
Step Find the first element of D among the ordered tokens, fill in that entry in N() . Then read to the right looking for other dictionary entries. Either all slots of N() are filled or we run out of tokens. The result is a partially or completely filled narrative object, say N1. Now throw away all tokens used and keep the dregs. If what is left is more than dregs, repeat the Step to produce another partially or completely filled N2. Proceed this way generating N1,N2, etc. until all that is left is dregs. The N1, N2, etc. are the candidate interpretations of the text. 

The theory of Best Models says we can entertain all the alternative Nj, and try to choose one based on a “goodness of fit measure”.  For the most part this will be a measure of how completely the narrative is filled and how “cleanly” within the grammar of the source language. Starting to get it? I think I am.