1/1/08

On "On Intelligence" (edited)

This post is history, see core article, parts 4, 6, & 8.

Derek Zahn, via AGI list, with my response:

> It seems like a reasonable and not uncommon idea that an AI could be built as a mostly-hierarchical autoassiciative memory.
> As you point out, it's not so different from Hawkins's ideas. Neighboring "pixels" will correlate in space and time;
>"features" such as edges should become principle components given enough data, and so on.
>There is a bunch of such work on self-organizing the early visual system like this.
>That overall concept doesn't get you very far though; the trick is to make it work past the first few rather
> obvious feature extraction stages of sensory data, and to account for things like episodic memory,
> language use, goal-directed behavior, and all other cognitive activity that is not just statistical categorization.
> I sympathize with your approach and wish you luck.
> If you think you have something that produce more than Hawkins has with his HTM,
> please explain it with enough precision that we can understand the details.

I agree with you on Hawkins & HTM, but his main problem is conceptual.
He seems to be profoundly confused as to what the hierarchy should select for: generality or novelty. He nominates both, apparently not realizing that they're mutually exclusive. This creates a difficulty in defining a quantitative criterion for selection, which is a key for my approach. This inconsistency leads to haphazard hacking in the HTM. For example, he starts by comparing 2D frames in a binary fashion, which is pretty perverse for an incremental approach. I start from the beginning by comparing pixels: the limit of resolution. I quantify the degree of match right there, as a distinct variable, & also record & compare explicit coordinates & derivatives, while he simply junks all that information. HTM doesn't scale because it's not consistent & incremental enough.

Both generality & novelty are valuable, but only because they both contribute to predictive power,- the ultimate value.
Generality is a macro-dimension of cortical hierarchy because it itself is a retrospective predictive power.
Besides, it takes an extended, hierarchically differentiated, search to recognize generality.
With novelty, there're two different aspects: proximity & change. Recent inputs are relatively more predictive than the old ones by the virtue of their proximity to future inputs. Thus, proximity is a micro-dimension: order of search within every level of generality. It's not hierarchical because range of search & the resulting complexity of match is lower for novel inputs.
Change, on the other hand, has a "contrast" effect: its value is determined by, & subtracted from, the recurrent pattern it interrupts. In other words, change has "negative" value, it's important only to the extent that it cancels positive predictive value of the interrupted pattern. The change within noise does not interrupt any pattern & has no independent value.

I disagree that we need to specifically code episodic memory, language, & action, - to me these are "emergent properties" (damn, I hate that word:)).

There are more details on my top post & related discussion on "DoxSpot".

No comments: