Monday, November 28, 2016

Narwhal reaches "Design Complete" stage

This means I have done all the basic typing I expect to do and now have to debug. In principle, the tough ideas and design choices are behind us, and Narwhal is at "Alpha". Of course I fear the debugging still ahead but, as I wrote in my GitHub commit, the NWObject works in at least one case. That is the Narwhal object.
As far as that goes, here is an encouraging firsts impression:
Of course a moment later it falls on its face - ah well.

Friday, November 18, 2016

Slicing the "search" problem differently

Suppose that Google focused on algorithms that were entirely personal and connected with a user's profile, while at the same time it built a neutral backend for its indexed data. I suppose the problem is that you cannot index without making assumptions. Anyway, if the profile became the thing, then you could always step out of it and do neutral exact searching if desired.

Wednesday, November 16, 2016

"End of Life" for a neural net

Because they are black boxes giving no window into the multi dimensional measurement space where "clusters" are forming during machine learning, you will never see just how non-convex the classification regions are - how topologically different they are from the original regions in object space. Un-justified assumptions of convexity and a blind belief in the applicability of a random Euclidean metric have created a situation where, inevitably, a sample will get added from "category A" that is closer the known examples from "category B". From there, it is only a matter of time when the two categories start merging and the system deliver more and more incorrect classifications.
It seems to me this is almost inevitable.
At first a neural net system seems great. A small number of examples have been added to the system and they are far enough apart to work as nearest-neighbor classifiers. But then we start adding other examples for greater "accuracy". From personal experience, two things are happening now. Counter examples are starting to show up and so they are added as new training examples. Also the developer is beginning to be hypnotized (PRADHS - "pattern recognition algorithm developers hypnosis syndrome") into believing objects belong in the category, if their system tells them that is where the object belongs. This leads to the addition of more and more boundary case examples. Rather than becoming more accurate, the system has actually become useless and incapable of delivering accuracy greater than a not-very-good level like 65%. That is machine learning "end of life".

I believe that Google may have reached end of life in its search algorithm. You can always find straw in a haystack but I am afraid that you can no longer find a needle there. As far as I am concerned, when I search for "Barbara Waksman" and they return pages with both words but not all pages where the words are adjacent, then what I am seeing is a whole lot of false positives. The world seems much too accepting of these Google errors. When Netflix makes the same error it is SO bad that I can utilize their bogus search results as a backdoor search for random movie titles that are not available otherwise - a different Netflix error. 

Tuesday, November 15, 2016

current GOF formula

(min(u,r)/N) * (r/F)
Where:
N = num slots in narrative
u = num slots used
F = num all words between first and last word read
r = number of words read

Update: Skip the min(u,r) but DO use the total number T of words, so the latest version of the formula is:
(u/N)*(r/F)*(r/T)
This makes a single word match less good than a multi word match; and it favors longer narrative patterns.

I should mention that dull words are discounted in F and in T. For no particular reason they are added to the numerator for (r/F) and subtracted from the denominator for (r/T)). Either way the scores I am getting now look better (eg "1.0"). 
Update: I changed it again, so the score in the numerator of (r/T) is corrected by replacing r with r + ur, where ur is a count of dull and control words in the full segment of text.

Monday, November 14, 2016

Artificial Intelligence, embedded intelligence, and real intelligence in a computer

Count me as an AI skeptic. When I hear about the program that beat a Go master, I think: there is no way that program arose by machine learning or any "AI" technique. Instead, the programmers knew how to play the game well, and embedded their intelligence into a computer. Given sufficiently good Go playing programmers, getting a machine to do what they do, but with the overview of seeing many steps ahead, it is not surprising the program beats the master. All you need is near-master level programmers. This was real intelligence that they embedded into a computer game. But calling it an "AI success" is claiming a mantle of success that had nothing to do with so called "AI".

On the other hand, I propose that a computer that brings me a Coke when I say "bring me a Coke", is exactly as intelligent as needed. It satisfies my requirement, and so it is real intelligence.

As for artificial intelligence, I guess it deserves its own title. Anyone know examples that are 85% accurate?
Update: I see "embedded intelligence" is already taken, how about  "embedded cognition"? There's a buzzword for ya. OOPS! That's taken too. Can I go with "machine cognition"? Nope. All combinations of these words have been taken. So I might as well stick with "embedded intelligence".

Friday, November 11, 2016

Thursday, November 10, 2016

Exact string match searches are no longer possible on Google

I have been trying to get Google to "find" my photos of Barbara, here on Sphinxmoth. I believe Google does index this blog, so why don't they know about her? Frustration over this led me to search for "how to get exact matches?" and to discussions of how Google went with matching to "close Variants" instead of exact matches, at the end of 2014. There doesn't seem to be any explanation except it is somehow beneficial to their advertisers.

So I have a couple of thoughts about why Google had to stop providing exact matches. First off, why should a user care about Google's Ad revenue; and shouldn't Google prioritize users before customers?* Is it possible that exact search would conflict with broader "variant" searches that match customers to Ads? Given the difficulty of blending different algorithms without ad hoc decisions, I still wonder: why not add a button, to allow a choice of "exact matching"? 

 I suspect the reasons are deep and, in fact, would be embarrassing to admit: Google has been suckling too long on the 'bottle' of neural nets. They made the fundamental mistake of thinking they were "learning" and, in fact, they were only averaging. After a while the averages turn to mud. The coefficients become bloated with contradictory data. You add some 'poison' samples and (I know from personal experience) your entire library becomes corrupt, because it contains samples from too many diverse populations. Try treating a multi-modal distribution as a simple Gaussian!

If I am right, Google is doomed. So is Apple. Separately I see that Google's head of R&D is a neural net guru. How delightful.

(*) What part of "do no evil" did they forget?

Wednesday, November 9, 2016

A rare time I agree completely with Chomsky

"So if you get more and more data, and better and better statistics, you can get a better and better approximation to some immense corpus of text, like everything in The Wall Street Journal archives -- but you learn nothing about the language. "

Narhwal and the Virtual Articulator

Yesterday at work I saw my Virtual Articulator (C++) perform beautifully for the first time. Last night at home I saw my Narwhal (Python) outer loop "readText()" perform beautifully for the first time. For the Virtual Articulator, it is more or less fully developed. For Narwhal, the worst of it may be over, but there remain miles to go before I sleep.
I have been agonizing about these two pieces of software for a while. The VA, at work, for most of the last year and Narwhal, at home, since August. It is interesting that these two have been on a sort-of parallel trajectory. Here is a VA demo.

Tuesday, November 8, 2016

AI discussions are like Science Fiction

Except the discussion authors seem to forget that it is fiction. Reading about how capitalism will fail because of AI. AI is going to take your job. I don't see it. Until a robotic hamburger delivery system is cost competitive with a human hamburger delivery system, why fictionalize? Cuz reality is boring?

Sunday, November 6, 2016

righ....t?

Several ways to say the word "right" as a sharing of emotional state with the listener. In one case I say "right?" with a drawn out increasing inflection that asks the listener to share my puzzlement. In another case I say "right!" with a sharp termination that asks the listener to share my enthusiasm.

Interestingly, these different narratives are given by both a written word spelled r i g h t and by an inflection.

Saturday, November 5, 2016

Why does Google not see my photos of Barbara Waksman?

I posted so many. EG here. No reason why Sphinxmoth cannot help promote these photos of Barbara Waksman (I am writing it over and over for the damn search engines.) Barbara Waksman, Barbara Jones Waksman.

Friday, November 4, 2016

New Bread Recipe

2 cups of King Arthur Flour plus 1 tsp of salt in a large bowl
1cup +2 tbs of luke warm water and 1 tsp of yeast (eg Fleischmann "Active Dry") in a small bowl

Stir the yeast into the water in the small bowl until dissolved. Then pour it into the large bowl and mix until most material comes off the sides of the bowl. Dump it out on a surface to rest 5 minutes. [It is a moist dough and the moisture is redistributing evenly]. Clean the bowl.

Use a large flat knife to fold the dough once, then again at ninety degrees. Then put it back into the (clean) bowl. Cover it and let it sit for 3-4 hours at 65-70 F. It should inflate 3X or 4X.

After waiting, pull the dough away from the sides of the bowl, plopping it into the center (like Jacques Pepin) until it somewhat separates from the bowl, and you can dump it onto a flat surface. Clean the bowl. Use the flat knife to again fold the dough twice again. By now there are bubbles in the dough, so try to handle it gently, and don't squash it while folding. Put the dough back in the bowl. The main difference with Pepin, is that he seems to punch his dough down too much for my flour. I have to be gentler with it.

Let it rise for 3-4 hours again at 65-70F. Again pull it gently out of the bowl and fold it twice with a flat knife. Then put it in the fridge for 4-5 hours. (Any longer and the bread gets rubbery). After this rest, take the bread out of the fridge and let it come to room temperature and, again rise for an 2-3 hours.

Then gently take the dough out of the bowl and form it into a loaf  on a surface- either on the final baking pan (which you previously coated with corn meal, or with baking paper), or on a board from which you can transfer it to said baking pan after another hour. Then get ready to bake:

Oven at 425, spritz the bread and the oven interior with water and bake for 10 minutes. Reduce heat to 405, spritz again, and bake for another 17-18 minutes.

To cool the bread, do not put it on a horizontal surface. Instead, prop it up on something so it cools standing vertically, or on a side. That way the bubbles are less deformed during the cooling.

Keeping GitHub up to date

This blog was showing up as a "referring site" in my GitHub repository
 https://github.com/peterwaksman/Narwhal
Perhaps, when it gets archived, the current version of the blog main page no longer contains the link and no longer get seen by GitHub as a referrer? So maybe I have to keep it fresh. Consider this an experiment.
Update: Succcess. The link from Narwhal just has returned.

Thursday, November 3, 2016

What is the "information model"? A summary narrative?

The underlying idea of Narwhal is: if you have a model for information to be found in text, you can start with the model, then see how much of it is filled from the text. This notion of an information model is close to what I understand as the database format they use at FrameNet to store the semantic frame alternatives.

So here is an anecdote: I am moving towards developing the higher level work of the Narwhal class - the work where multiple narratives interact. So I was thinking about an underlying information model, and it is very tempting to come up with a class definition for "Hotel" with sub classes for "Room" and all kinds of structure around descriptions of sound. Trying to diagram it, it quickly becomes a confusing mess of boxes and arrows. But that approach is an alternative to using a summary narrative, which captures all possible stories. For noise, various versions of this parent narrative occur, such as
Sound->Me :: Me_/affect , Me->staff : staff_/ action
I am starting to understand that this narrative format is much more concise than any attempt to break it out as a collection of connected boxes.

The key discovery, this summer, which enabled designing Narwhal, was the realization that this parent narrative rarely occurs but instead a variety of smaller partial narratives appear. These "empirical narrative" are to be read and then corresponding parts of the information model are to be set, and delivered. However, with the anecdote above, it is highly recommended that one thinks of the totality of information as itself a narrative, the "summary/parent" narrative, and that smaller partial narratives should be translated into versions of the parent. Hence the Narwhal developer can skip designing a "schema" for the information, and instead focus on determining a parent narrative and rules for translating the partial narratives into it.

Tuesday, November 1, 2016

Where is the chatbot-chatbot interaction concept?

I am mystified that with all the instant experts, none has grasped the need for data exchange between chatbots. There is lots of talk of collections of chatbots (on the subject of searching for a chatbot) but none considers the obvious: chatbots will interact with each other.

Most importantly the personal chatbot "space" must include a concept where my chatbot talks to a travel website chatbot. More generally, my chatbot does my shopping for me.