5/5/12

Discussions with Jim Bromer on AGI list



On Sat, Aug 18, 2012 at 9:51 AM,
Boris,
You and I do not need to understand the particle physics of cellular microbiology in order to study an introductory text of biology. And in order to learn what the text is presenting, we do not need to reduce everything mentioned in the text to the order of particle physics.
So while I agree that we need to go to the basis of knowledge to resolve some scalability issues, and derived knowledge is often based on raw sensory experience, the point that I am trying to make is that the basis of knowledge that we have to use in many scalability scenarios are not raw sensory experience.
For example, to really understand what is presented in the biology text we do not need to recall the sensory experience of reading. (I guess it would be nice to be able to do that but it is not necessary for the problem of learning to understand what the text referred to.) So we really do not need to reduce all problems to primitive forms.
Also, some scalability issues cannot be resolved just by having the foundations of the subject (or object) handy. The potential complexity of interrelations (as in derivable interrelations) may make scalability infeasible.
Jim Bromer

Boris Kazachenko wrote:
Jim, It would help if you tried to address specific points that I made.

> You and I do not need to understand the particle physics of cellular microbiology in order to study an introductory text of biology. And in order to learn what the text is presenting, we do not need to reduce everything mentioned in the text to the order of particle physics...

You are confusing ontological hierarchy, in which we always start from some arbitrary point, & epistemological hierarchy, in which the brain ) civilization of brains *always* starts with analog / un-encoded / uncompressed data. GI is the algorithm of *unsupervised* pattern discovery, supervised education always piggybacks on the former done by prior generations.

> For example, to really understand what is presented in the biology text we do not need to recall the sensory experience of reading.

Sensory experience is how we learn phonemes, alphabet, words, & the concepts behind basic words in the first place.

> Also, some scalability issues cannot be resolved just by having the foundations of the subject (or object) handy.

I said it's necessary, not sufficient. My whole approach is about cognitive economics, I quantify costs & benefits on the lowest level of representation. That's the basis for predictive search pruning, which is what scalability is all about.

> The potential complexity of interrelations (as in derivable interrelations) may make scalability infeasible.

We are a living proof that in our world effective scalability *is* feasible.
From: Jim Bromer Saturday, August 18, 2012 10:14 AM

> We are a living proof that in our world effective scalability *is* feasible.

I was talking about artificial general intelligence.
I am interested in discovering the weak points of your theory so that I am better able to understand it. Sorry if that is rude. The philosophical issue that I was discussing is whether or not you actually have some evidence - or really good reasons - to think that your approach to AGI will eventually work (without some major advancement in computer science outside of your work.) Your response that "we are living proof..." really did not answer my question in this case. I am not arguing against the possibility of artificial general intelligence but I was asking you why you think that your approach to scalability would make your AGI method feasible. I agree that we have to be able to examine the foundations of an (artificial) idea to make a convergence of (artificial) thoughts scalable -in some cases- but I was also saying that the reference to raw sensory data is not generally sufficient for general (artificial) reasoning.
Take a look at what you said in response to my comments:

> You are confusing ontological hierarchy, in which we always start from some arbitrary point, & epistemological hierarchy, in which the brain ) civilization of brains *always* starts with analog / un-encoded / uncompressed data. GI is the algorithm of *unsupervised* pattern discovery, supervised education always piggybacks on the former done by prior generations.

So you are saying that ontological hierarchy always piggybacks epistemological hierarchy which (I believe you are saying) is the algorithm of *unsupervised* pattern discovery based on analog uncompressed data. Aren't you effectively saying that ontological hierarchy has to be reduced to raw sensory data since that has been the basis of your scalability argument?
I wasn't confusing ontological hierarchy with epistemological hierarchy by the way. The question which is irrelevant to your presentation but relevant to my effort to understand the substance of your presentation is whether or not you realized that I hadn't. if it was a mistake that you made then ok, but if you were misrepresenting my views in order to dismiss my comments then I would discontinue taking that tact with you because I have learned that it is almost hopeless to continue with people who do that. In the one case, you simply misunderstood what I was saying, in the other, you will insist that I am the one who does not understand a foundation of what we are talking about in order to avoid dealing with an issue of relative complexity that no one has solved.
Jim Bromer


> We are a living proof that in our world effective scalability *is* feasible.
Jim: I was talking about artificial general intelligence.

Boris: That distinction is artificial, it's all algorithms.

Jim: I am interested in discovering the weak points of your theory so that I am better able to understand it. Sorry if that is rude.

Boris: I don't mind "rude", as long as it's interesting. Which it would be if you did address my weak points, but you can't unless you have a stronger alternative. I think the weak points are problems I am currently working on, but I can't explain them if you don't understand those that I already solved.

Jim:
I was asking you why you think that your approach to scalability would make your AGI method feasible. I agree that we have to be able to examine the foundations of an (artificial) idea to make a convergence of (artificial) thoughts scalable -in some cases- but I was also saying that the reference to raw sensory data is not generally sufficient for general (artificial) reasoning.

Boris: You must've missed this:
> I said it's necessary, not sufficient.
> My whole approach is about cognitive economics, I quantify costs & benefits on the lowest level of representation (& consistently translated on incremental higher levels). That's the basis for predictive search pruning, which is what scalability is all about.

If you agree with this, show me who else is doing it. If there isn't anyone, then I am a frontrunner.

Jim: So you are saying that ontological hierarchy always piggybacks epistemological hierarchy

Boris: Ontological hierarchy is the external reality, epistemological hierarchy is the sequence in which we discover infinitesimal subset of the that, via iterative application of *unsupervised* pattern discovery algorithm, that *always* starts with analog uncompressed data. If it's not analog, then it's already part of our collective epistemological hierarchy.

Jim: Aren't you effectively saying that ontological hierarchy has to be reduced to raw sensory data since that has been the basis of your scalability argument?

Boris: On the opposite, it's manifested to us via raw sensory data...

Jim Bromer: 

I guess you must mind my being rude since you are not able to appreciate the substance of my criticisms.

Boris: I think it's the other way around :). It should be obvious to both of us that I am a lot rudder than you. What we disagree on is which one of us doesn't appreciate the substance :).

Jim:
> My whole approach is about cognitive economics, I quantify costs & benefits on the lowest level of representation (& consistently translated on incremental higher levels).
> That's the basis for predictive search pruning, which is what scalability is all about.

Are you saying that you consistently use costs and benefits derived at the lowest level of representation through all incremented higher levels?

Boris: Yes, except that "opportunity cost" of utilized computational resources is a feedback from relatively higher levels. Benefit is projected match: current match also adjusted by such downward feedback.

Jim: And then you are saying that is the basis of pruning searches based on predictions...(of what is being looked for?)

Boris: Yes, inputs are forwarded to higher levels if their additive projected match exceeds opportunity cost of thus- expanded search. You're maximizing predictive power of a whole system. I have a lot more details in my intro.
Jim Bromer: On the other hand I am interested in conjectures about conceptual vectors and stuff like that

Boris: You can't formalize "conceptual" vectors, except in terms of "conceptual" coordinates .

Jim Bromer: Thanks for the smiley faces Boris...

I disagree that you have to multiply all the vectors in a pattern by a relative distance to a target coordinate in order to combine
imagined complex ideas and related observations. Our theories are very different. (On the other hand I am interested in conjectures
about conceptual vectors and stuff like that.)
I am interested in a continuation of the explanation of your theories and I hope to get back to it soon.


Jim Bromer: 

Where Boris and I disagree is that I feel that because of relativity the input source of an idea may not be the most elemental source of the idea that needs to be considered.

Boris:
Right, but that's the simplest assumption, you must make it unless you know otherwise. And you only know otherwise if you've discovered more "elemental" (stable) source on some higher level of search & generalization. That would generate a focusing / motor
feedback, always derived from prior feedforward. As I keep saying, complexity must be incremental :).

Jim: One simple example is that we can use our imagination and study of the subject of the concept in order to extend our ideas about the subject beyond those ideas which came directly from observations of it.

Boris:
This is interactive pattern projection, but you have to discover those patterns first. Technically, you simply multiply all the vectors in a pattern by a relative distance to a target coordinate. And then you compare multiple patterns projected to the same coordinate, & multiply the difference by relative strength of each pattern. That gives you a combined prediction, or probability distribution if the patterns are mutually exclusive :).
Jim Bromer: I don't understand your comments about detecting patterns. You said:

> This is interactive pattern projection, but you have to discover those
> patterns first. Technically, you simply multiply all the vectors in a
> pattern by a relative distance to a target coordinate. And then you
> compare multiple patterns projected to the same coordinate, & multiply
> the difference by relative strength of each pattern. That gives you a
> combined prediction, or probability distribution if the patterns are
> mutually exclusive.

Boris: That comment was about projecting patterns, not detecting them.

Jim: What kind of patterns are you talking about? How do the elemental observations (from the sensory device) get turned into vectors?

Boris: Comparisons generate derivatives. A vector is d(input) over d(coordinate). Conventionally, it's over multiple coordinates (dimensions), & the input can be a lower coordinate, but that's not essential.

Jim: Are you saying that the "higher level of search and generalization" are where/how the pattern vectors are created?

Boris: No, all levels.

Jim: Why or how would you pick out a particular target coordiate to use to combine a prediction?

Boris: Well, coordinate resolution is variable, so I am talking about a min->max span. Basically, vector projection is part of input selection for a higher-level search. The target coordinate span is a feedback from that higher level, or, if there aren't any, current_search_span *
selection_rate: preset lossiness / sparseness of representation on the higher level.

Jim: Are you saying that all predictions have individual coordinates?

Boris: Individual coordinate span. It's what * where, you can't have a prediction without both.

Jim: That alone means that they would have to exist in dynamic virtual space of many dimensions. Forcing semantic values into 3-dimensional orthogonal space seems amazingly confused to me.

Boris: You keep confusing source with destination, because you insist on operating within your declarative memory, which is a rather superficial subset of your cognitive model :).
We *derive* all our "semantic" values from 4D-continuous observation, no need to "force" them into it.

Jim: What kind of space would your vectors exist in, how do they get there and why do you choose a particular coordinate for a combination of predictions?

Boris:
As I said, hierarchical search generates incremental syntax, & variables within it are individually evaluated for search on successive levels. The strongest variable, whether it's an original coordinate | modality or a derivative thereof, becomes a coordinate for a higher level. The strength here must be averaged over higher level span.

It's hard to explain this on "semantic" level, which is profoundly confused in humans anyway. But a good intermediate example is Periodic Table. You take atomic mass (which is a derived, not an original variable) as top coordinate, compare pH value along that coordinate, & notice recurrent periodicity in it's variation. Since pH is a main chemical property, you then use it as a primary dimension that defines a period, & atomic mass becomes a secondary dimension that defines a sequence of periods. Both dimensions are derived, they may seem kind of a halfway between original & "semantic", but the same derivation process will get you to the latter.

Jim Bromer:

> "You keep confusing source with destination, because you insist on
> operating within your declarative memory, which is a rather
> superficial subset of your cognitive model :)."

Are you replying using your theory as a model of the mind (indeed, as a model of my mind!)

Boris: It's not *my* theory, a mainstream position in neuroscience is that neocortex is a hierarchy of generalization, from primary sensory & motor areas to incrementally higher association areas. It's also well known that declarative memory is restricted to the latter. Besides, these things are tautologically self-evident to me.

Jim: with a smiley face to represent some humor about doing that?

Boris: That mostly represents my satisfaction with making a good point :).

Jim: And, are you saying that declarative memory is a destination in your model rather than a source? Is declarative memory derived? That is what you are saying right?

Boris: Yes, see the above. If you want a mainstream source, read "Cortex & Mind" by Joaquin Fuster, he is a top authority on neocortex.

Jim: Is your theory a theory of how the brain works, a theory for artificial general intelligence using computers or both?

Boris: Both, but the artificial version is a whole lot cleaner, the brain is loaded with evolutionary artifacts. For example, I don't have this artificial distinction between implicit & declarative memory, between sensory & motor hierarchies, & a bunch of other things.

Jim: Do you regularly see the kinds of thinking that people do in the terms of your model?

Boris: Yes, except that "my" part of it is well below the surface (low-level processing), the mainstream part is usually sufficient to qualitatively explain declarative thinking.

Jim Bromer: I am talking about my own theories now, please try to remain calm:

Boris:
It's hard, because you're maddeningly vague :). And you can't help being vague, because you don't get that complexity of representation, & all relevant definitions, must be incremental. You can only be explicit if you start from minimal complexity.

Jim:
I think that it is important to be able to store or to find the data that represents the basis of a concept or grouping of related concepts in order to resolve some issues that will become apparent as the AGI program learns about the concept or as it relates it to other concepts. However, the program will not be able to store all input that it is exposed to, and this basis has to be derived from, or represent, a collection of variations on the primary subject, so for those reasons the concept has to be composed of generalizations and variations.

Boris: Yes, I call those match & miss :).

Jim: Are you thinking of storing representations of all primitives that would be used by your program (raw sensory data) so that comparisons might be later made against some of them?

Boris: That would be buffering, it's optional for inputs that are pruned-out, - not selected for immediate search. The same cost-benefit analysis applies, but the cost of buffering is a lot lower than that of search. This is done on all levels.

Jim: Or are the compressions going to be taken from generalizations of the variations of sensory data that commonly represent a particular event to be gauged?

Boris: Generalization is compression. There're all kinds of possible variations, - syntactic complexity of inputs is incremental, & individual variables are pruned just like multi-variable inputs.

Jim: Are comparisons going to be made against partial decompressions of previously compressed representations?

Boris: This would be a comparison to feedback, & that's only cost-efficient if the feedback is aggregated over all inputs of higher-level search span. I call it evaluation for elevation, rather than comparison. All that & a lot more is in my intro.

Jim: You don't have to continue if you don't want to, however, I am curious about what you are talking about.

Boris: I'd love to continue, as long as we're talking substance.



No comments: