This post contains old comments on the knol.
Below that is ancient history, in case you want to see how things progressed.
Derek Zahn:
Understanding another human being's thoughts is hard. :)
Hi
Boris,
Sorry for the delay... I wrote a long time ago something to the effect that I like to try understanding the ideas of other researchers working on AGI-related theories (at least those that seem to have some hope of being interesting) and wanted to try and understand yours. I have returned to your pages once in a while but have great difficulty even starting to try and get a grip on what you are writing about. Part of the blame for that is the difficulty of the subject matter, part is that I'm just not very smart, but mostly (and frustratingly) it is simply very hard for human beings to communicate with each other -- when reading, we have to fill in so much from our own viewpoint and experience, and that is a very error-prone process. So, although I'm afraid that my questions will be stupid and nitpicky and possibly a waste of time to answer, they are the only way for me to figure out how to interpret what you are saying. On the plus side, maybe any clarifications you make for me would be useful for other readers as well.
Although general motivations, and criticisms of other AI approaches can be fun, I'm going to ignore that stuff unless it becomes critical for my main purpose, which is understanding your theory in its current state.
One way to facilitate communication is to develop a concrete frame of reference as a starting point. So: although I imagine your theory is intended to be very general in nature (and thus applicable to a variety of agents and environments), it is helpful for me to pick a particular case, so that general points can be applied to this concrete situation... very general abstract theories are almost impossible to communicate from one person to another because there are so many possible interpretations of language; having a concrete situation as a reference will help me fill in some meaning.
So: Suppose I have a robot roaming around my neighborhood. It has one sense modality: a black-and-white video camera affixed to the front of the robot. At fixed intervals (say 30 times per second but the exact rate isn't important), a video frame gets digitized and handed to the "intelligence" program implementing your theory. Although it won't be needed for a while :) suppose that the robot has tank tracks for its drive and a signed output signal controls the speed of each side track.
Can we use this system as a concrete reference? Is it missing something needed for your theory to apply to it?
Assuming it's okay... from your description, I understand that you save a history of past input frames, indexed by their offset into the past. You also compute the derivative between two successive frames on a per-pixel basis using the numerical difference between pixel values at each point.
The goal is to predict the pixel values in the next input frame.
Ok, let me stop there to make sure we are on the same page. Comments? If you don't have time to mess with what is likely to be a bunch of incomprehension on my part, I understand.... in that case, just don't respond to my comment. :)
Take care,
Derek
http://supermodellin g.net
Sorry for the delay... I wrote a long time ago something to the effect that I like to try understanding the ideas of other researchers working on AGI-related theories (at least those that seem to have some hope of being interesting) and wanted to try and understand yours. I have returned to your pages once in a while but have great difficulty even starting to try and get a grip on what you are writing about. Part of the blame for that is the difficulty of the subject matter, part is that I'm just not very smart, but mostly (and frustratingly) it is simply very hard for human beings to communicate with each other -- when reading, we have to fill in so much from our own viewpoint and experience, and that is a very error-prone process. So, although I'm afraid that my questions will be stupid and nitpicky and possibly a waste of time to answer, they are the only way for me to figure out how to interpret what you are saying. On the plus side, maybe any clarifications you make for me would be useful for other readers as well.
Although general motivations, and criticisms of other AI approaches can be fun, I'm going to ignore that stuff unless it becomes critical for my main purpose, which is understanding your theory in its current state.
One way to facilitate communication is to develop a concrete frame of reference as a starting point. So: although I imagine your theory is intended to be very general in nature (and thus applicable to a variety of agents and environments), it is helpful for me to pick a particular case, so that general points can be applied to this concrete situation... very general abstract theories are almost impossible to communicate from one person to another because there are so many possible interpretations of language; having a concrete situation as a reference will help me fill in some meaning.
So: Suppose I have a robot roaming around my neighborhood. It has one sense modality: a black-and-white video camera affixed to the front of the robot. At fixed intervals (say 30 times per second but the exact rate isn't important), a video frame gets digitized and handed to the "intelligence" program implementing your theory. Although it won't be needed for a while :) suppose that the robot has tank tracks for its drive and a signed output signal controls the speed of each side track.
Can we use this system as a concrete reference? Is it missing something needed for your theory to apply to it?
Assuming it's okay... from your description, I understand that you save a history of past input frames, indexed by their offset into the past. You also compute the derivative between two successive frames on a per-pixel basis using the numerical difference between pixel values at each point.
The goal is to predict the pixel values in the next input frame.
Ok, let me stop there to make sure we are on the same page. Comments? If you don't have time to mess with what is likely to be a bunch of incomprehension on my part, I understand.... in that case, just don't respond to my comment. :)
Take care,
Derek
http://supermodellin
Last edited Sep 20, 2011 4:34
PM
Report abusive comment
Sorry for
being difficult, Derek!
The problem is, to be on the same page we have to be on the same level of generalization: = decontextualization. You‘re asking for conctrete examples. While it is (theoretically) possible to explain how my algorithm will act in simple cases, such examples will not impress you. You’ll need to understand why I think it can scale to complex cases, & that reasoning is necessarily *abstract*. But, for some mysterious reason, you do find my approach interesting, so I’ll try:
> video frame gets digitized and handed to the "intelligence" program implementing your theory.
Actually, my theory *includes* digitization as the first step of compression, which maximizes correspondence_per_c ost: my overall
fitness function. This is important because these steps form a pattern, which
must be indefinitely projectable, for the “program” to scale in complexity of
such algorithms.
> I understand that you save a history of past input frames, indexed by their offset into the past. You also compute the derivative between two successive frames on a per-pixel basis using the numerical difference between pixel values at each point.
The goal is to predict the pixel values in the next input frame.
Given a non-random environment, every input *is* a prediction for subsequent inputs (no prediction is 100% certain anyway). These inputs have incremental dimensionality: from 0D pixels, to 1D patterns: sequences of matching pixels, to 2D patterns: sequences of matching 1D patterns, & then to 3D, TD, & discontinuously matching patterns.
This is an indefinitely expensible hierarchy, where older inputs (history) are selectively stored (patterns vs. noise) & searched on higher levels. Each of higher-level patterns is a "prediction" for lower-level inputs. 2D frames have no special status in my approach.
Notice that I start by defining match. Then I define a pattern as a set of matching inputs, & derivatives are computed by comparing among individually selected (stronger than average) patterns within corresponding level of search. This is not an indiscriminate all-to-all indexing, that would be a transform. These derivatives then form vectors to project their patterns (further refining their predictive value), & to form their own patterns. All of that is selective (according to predictive values of each variable), otherwise you get a combinatorial explosion.
The problem is, to be on the same page we have to be on the same level of generalization: = decontextualization. You‘re asking for conctrete examples. While it is (theoretically) possible to explain how my algorithm will act in simple cases, such examples will not impress you. You’ll need to understand why I think it can scale to complex cases, & that reasoning is necessarily *abstract*. But, for some mysterious reason, you do find my approach interesting, so I’ll try:
> video frame gets digitized and handed to the "intelligence" program implementing your theory.
Actually, my theory *includes* digitization as the first step of compression, which maximizes correspondence_per_c
> I understand that you save a history of past input frames, indexed by their offset into the past. You also compute the derivative between two successive frames on a per-pixel basis using the numerical difference between pixel values at each point.
The goal is to predict the pixel values in the next input frame.
Given a non-random environment, every input *is* a prediction for subsequent inputs (no prediction is 100% certain anyway). These inputs have incremental dimensionality: from 0D pixels, to 1D patterns: sequences of matching pixels, to 2D patterns: sequences of matching 1D patterns, & then to 3D, TD, & discontinuously matching patterns.
This is an indefinitely expensible hierarchy, where older inputs (history) are selectively stored (patterns vs. noise) & searched on higher levels. Each of higher-level patterns is a "prediction" for lower-level inputs. 2D frames have no special status in my approach.
Notice that I start by defining match. Then I define a pattern as a set of matching inputs, & derivatives are computed by comparing among individually selected (stronger than average) patterns within corresponding level of search. This is not an indiscriminate all-to-all indexing, that would be a transform. These derivatives then form vectors to project their patterns (further refining their predictive value), & to form their own patterns. All of that is selective (according to predictive values of each variable), otherwise you get a combinatorial explosion.
Report abusive comment
Hi
Boris,
You're right that we have to be on the same level of decontextualization; I was hoping to drag you down to my level :) because if we refer to concrete things (like that robot) there is less room for misunderstanding. If I generalize into abstractions I won't end up the same place as you because my abstractions aren't the same as yours... and the result is that I don't know what the words you use are supposed to mean.
I don't care about "impressiveness" on simple examples, just clarity.
I'll try to climb into the clouds, but it will probably take a while. :) So, a few questions to start with:
> correspondence_per_c ost: my overall
fitness function.
Correspondence of what? Measured how? What does "cost" mean and how is it measured?
> every input *is* a prediction for subsequent inputs (no prediction is 100% certain anyway).
> These inputs have incremental dimensionality: from 0D pixels, to 1D patterns: sequences
> of matching pixels, to 2D patterns: sequences of matching 1D patterns [...]
I certainly get that an input *can be used as* a prediction for subsequent inputs (by an entity whose goal is prediction, for example -- with a prediction algorithm), and for some inputs in some environments (like the robot example) there will be a correlation between in(t) and in(t+1). Other kinds of "inputs" (say... the value of an audio sensor in a las vegas casino sampled every 41 hours) may not have any discernable correlation at all. But I don't think it's right to say that an input *is* a prediction, which is a confusing conflation of terms.
I don't understand what you mean by "inputs have incremental dimensionality". Incremented by who? How did "1D patterns: sequences of matching pixels" become an "input"? You say "pixels" which implies a visual semantics for an input...
Maybe these questions illustrate the confusion I experience when I even begin to try and understand what it is you are talking about...
Thanks!
Derek
http://supermodellin g.net
You're right that we have to be on the same level of decontextualization; I was hoping to drag you down to my level :) because if we refer to concrete things (like that robot) there is less room for misunderstanding. If I generalize into abstractions I won't end up the same place as you because my abstractions aren't the same as yours... and the result is that I don't know what the words you use are supposed to mean.
I don't care about "impressiveness" on simple examples, just clarity.
I'll try to climb into the clouds, but it will probably take a while. :) So, a few questions to start with:
> correspondence_per_c
Correspondence of what? Measured how? What does "cost" mean and how is it measured?
> every input *is* a prediction for subsequent inputs (no prediction is 100% certain anyway).
> These inputs have incremental dimensionality: from 0D pixels, to 1D patterns: sequences
> of matching pixels, to 2D patterns: sequences of matching 1D patterns [...]
I certainly get that an input *can be used as* a prediction for subsequent inputs (by an entity whose goal is prediction, for example -- with a prediction algorithm), and for some inputs in some environments (like the robot example) there will be a correlation between in(t) and in(t+1). Other kinds of "inputs" (say... the value of an audio sensor in a las vegas casino sampled every 41 hours) may not have any discernable correlation at all. But I don't think it's right to say that an input *is* a prediction, which is a confusing conflation of terms.
I don't understand what you mean by "inputs have incremental dimensionality". Incremented by who? How did "1D patterns: sequences of matching pixels" become an "input"? You say "pixels" which implies a visual semantics for an input...
Maybe these questions illustrate the confusion I experience when I even begin to try and understand what it is you are talking about...
Thanks!
Derek
http://supermodellin
Report abusive comment
> If I
generalize into abstractions I won't end up the same place as you because my
abstractions aren't the same as yours... and the result is that I don't know
what the words you use are supposed to mean.
My meanings are the most basic (decontextualized) possible, you *will* end in the same place if you just let go of your context (scary, I know). We all work off the same innate “algorithm”. If our generalizations don’t agree, then either we’re on different levels, or the level is too low for both of us.
> I don’t care about "impressiveness" on simple examples, just clarity.
But there must be a reason for you to *work* on understanding me, rather than a bunch of other things.
> Correspondence of what? Measured how? What does "cost" mean and how is it measured?
See section 1: definition of match, then of incrementally derived projected match. Cost (memory + operations) is initially the same for a basic comparison, so you normalize for it by subtracting average match from the prior search cycle: # comparisons. I’ve tried to explain all this in the knol, let me know what part is unclear. Beyond the first cycle, the cost is multiplied by additional # & power of comparisons, represented in the resulting patterns.
> I certainly get that an input *can be used as* a prediction for subsequent inputs…
It’s more basic than that, *any* prediction must be derived from past inputs. But these inputs have varying “predictive value”, both overall & for specific sources: lower-level locations. Patterns are inputs for higher levels, each representing multiple matching lower-level inputs. I try to quantify all of that.
> I don't understand what you mean by "inputs have incremental dimensionality". Incremented by who?
By comparing lower D patterns across higher-D coordinate, on a higher level of search.
> How did "1D patterns: sequences of matching pixels" become an "input"?
This is a hierarchy Derek, above-average lower-level patterns *are* higher-level inputs.
> You say "pixels" which implies a visual semantics for an input...
That's simply a visual version of maximal resolution 0D input, there is an equivalent in any modality.
My meanings are the most basic (decontextualized) possible, you *will* end in the same place if you just let go of your context (scary, I know). We all work off the same innate “algorithm”. If our generalizations don’t agree, then either we’re on different levels, or the level is too low for both of us.
> I don’t care about "impressiveness" on simple examples, just clarity.
But there must be a reason for you to *work* on understanding me, rather than a bunch of other things.
> Correspondence of what? Measured how? What does "cost" mean and how is it measured?
See section 1: definition of match, then of incrementally derived projected match. Cost (memory + operations) is initially the same for a basic comparison, so you normalize for it by subtracting average match from the prior search cycle: # comparisons. I’ve tried to explain all this in the knol, let me know what part is unclear. Beyond the first cycle, the cost is multiplied by additional # & power of comparisons, represented in the resulting patterns.
> I certainly get that an input *can be used as* a prediction for subsequent inputs…
It’s more basic than that, *any* prediction must be derived from past inputs. But these inputs have varying “predictive value”, both overall & for specific sources: lower-level locations. Patterns are inputs for higher levels, each representing multiple matching lower-level inputs. I try to quantify all of that.
> I don't understand what you mean by "inputs have incremental dimensionality". Incremented by who?
By comparing lower D patterns across higher-D coordinate, on a higher level of search.
> How did "1D patterns: sequences of matching pixels" become an "input"?
This is a hierarchy Derek, above-average lower-level patterns *are* higher-level inputs.
> You say "pixels" which implies a visual semantics for an input...
That's simply a visual version of maximal resolution 0D input, there is an equivalent in any modality.
Report abusive comment
Hi
Boris,
I'm interested in understanding you because I am curious about all serious detailed theories of intelligence. There are many different approaches to this, and I'm interested in any that have significant amounts of precision or detail and seem intuitively plausible (as opposed to shallow, fundamentally incoherent, inconsistent, or simply insane). The trick is understanding them. It would be relatively easy to convince myself that I understand you at a rough overview level... but such characterization just feeds my ego, it doesn't (usually) increase my actual knowledge or insight.
In an approach like yours, I am most interested in a few interrelated particulars (in as much detail as I can manage): the "language" that is used to express patterns at each level of abstraction (as a combination of inputs, or more), the specific way that temporal relationships are incorporated into patterns, and the method used to individuate patterns as learned entities. I am fairly certain that you believe you have explained all these things in your knols, but I have not yet succeeded in extracting this information from your text. I also believe that other people bounce off of your writing for similar reasons. You say that your meanings are the most basic (decontextualized) possible, but natural language doesn't work that way, and in fact the meanings of what you write are largely embedded in your own context; failure to recognize this is what causes incomprehensibility. Although we all share a lot of cultural context, we are islands in many ways, and we have to build stepping stones to cross the deep and murky inferential gaps.
I think I will try to take into account everything you have said in this conversation, along with your conversation with Ben on the AGI list, and start over from the beginning. I'll return after I have bashed away at that for a while. If you care to say anything more about the things above that I mentioned as particularly interesting, that would be cool.
Thanks for taking the time to answer my questions, and I wish further success for you as you continue to develop your ideas!
Derek Zahn
http://supermodellin g.net
I'm interested in understanding you because I am curious about all serious detailed theories of intelligence. There are many different approaches to this, and I'm interested in any that have significant amounts of precision or detail and seem intuitively plausible (as opposed to shallow, fundamentally incoherent, inconsistent, or simply insane). The trick is understanding them. It would be relatively easy to convince myself that I understand you at a rough overview level... but such characterization just feeds my ego, it doesn't (usually) increase my actual knowledge or insight.
In an approach like yours, I am most interested in a few interrelated particulars (in as much detail as I can manage): the "language" that is used to express patterns at each level of abstraction (as a combination of inputs, or more), the specific way that temporal relationships are incorporated into patterns, and the method used to individuate patterns as learned entities. I am fairly certain that you believe you have explained all these things in your knols, but I have not yet succeeded in extracting this information from your text. I also believe that other people bounce off of your writing for similar reasons. You say that your meanings are the most basic (decontextualized) possible, but natural language doesn't work that way, and in fact the meanings of what you write are largely embedded in your own context; failure to recognize this is what causes incomprehensibility. Although we all share a lot of cultural context, we are islands in many ways, and we have to build stepping stones to cross the deep and murky inferential gaps.
I think I will try to take into account everything you have said in this conversation, along with your conversation with Ben on the AGI list, and start over from the beginning. I'll return after I have bashed away at that for a while. If you care to say anything more about the things above that I mentioned as particularly interesting, that would be cool.
Thanks for taking the time to answer my questions, and I wish further success for you as you continue to develop your ideas!
Derek Zahn
http://supermodellin
Report abusive comment
> You
say that your meanings are the most basic (decontextualized) possible, but
natural language doesn't work that way, and in fact the meanings of what you
write are largely embedded in your own context; failure to recognize this is
what causes incomprehensibility.
Maybe you can point out my biases, I promise to exterminate them without mercy :).
> I think I will try to take into account everything you have said in this conversation, along with your conversation with Ben on the AGI list,
That definitely turned you off :).
Maybe you can point out my biases, I promise to exterminate them without mercy :).
> I think I will try to take into account everything you have said in this conversation, along with your conversation with Ben on the AGI list,
That definitely turned you off :).
Report abusive comment
Derek>
I am fairly certain that you believe you have explained all these things in your
knols, but I have not yet succeeded in extracting this information from your
text. I also believe that other people bounce off of your writing for similar
reasons.
I guess this is a deliberate filter - too much of explicit explanations may make it seem too easy and obvious.
I guess this is a deliberate filter - too much of explicit explanations may make it seem too easy and obvious.
Report abusive comment
Derek: I
am most interested in a few interrelated particulars (in as much detail as I can
manage): the "language" that is used to express patterns at each level of
abstraction (as a combination of inputs, or more), the specific way that
temporal relationships are incorporated into patterns, and the method used to
individuate patterns as learned entities. I am fairly certain that you believe
you have explained all these things in your knols...
Boris: Yes I did, the "language" (I prefer "syntax") is simply a record of past operations, assigned to the data they produced. I tried to explain the initial set of such operations, & general principles that drive the expansion of this set.
Todor: I guess this is a deliberate filter...
Boris: it's partly deliberate in a sense that examples may mislead people into thinking that they understand the generalization, while in fact they only understand the examples. But mostly it's because creative writing is not my top priority, - I have work to do. And this is an exceptional problem, so most people *should* "bounce off".
Boris: Yes I did, the "language" (I prefer "syntax") is simply a record of past operations, assigned to the data they produced. I tried to explain the initial set of such operations, & general principles that drive the expansion of this set.
Todor: I guess this is a deliberate filter...
Boris: it's partly deliberate in a sense that examples may mislead people into thinking that they understand the generalization, while in fact they only understand the examples. But mostly it's because creative writing is not my top priority, - I have work to do. And this is an exceptional problem, so most people *should* "bounce off".
Todor Arnaudov:
Higher match within derivatives in a pattern, than between templates and lower level output:
Boris: "I
won’t get into details here, but a higher level of feedback should suppress
empirical data entirely, & select only the operations that process it. That
would result in purely algebraic equations, “compared” to achieve mathematical
compression. We can expect that better math will facilitate future discovery of
empirical patterns, but at the cost of reduced correspondence of current memory
contents."
Todor: This is maybe another phenomenon or not elaborated enough or wrong, but I made it up after reading this paragraph.
(1) Higher level patterns get complex - long, carrying lots of derivatives and heavy operations.
(2) Comparison gets more expensive than the predictive benefits. In the past the level may have been more predictive, but if it gets expensive to support it, the level can be either optimized or lost to free resources. While optimizing, higher level suppress lower because now it doesn't expect benefits from the new lower level input.
(3) The derivatives in long patterns (in any length patterns) can turn into local coordinate spaces on their own. A hierarchy on the derivatives is initiated, as if they (or selected parts of them) were raw sensory inputs - some derivative become "x", another "y", another "iB" etc. Longer patterns are more likely to have linear dependencies and other correlations, and patterns within patterns will be discovered.
(4) The process of (3) can start if high matches - within the patterns themselves, or between templates at the same level (let's call them InternalMatch) - are discovered. I suspect - when this InternalMatch is higher than the match between this level and the output from the lower level.
(5) In brief, if it once gets more predictive and cheaper to do the algebra using the already collected higher level derivatives, than to compute and store new high level derivatives from the lower level derivatives (input), then do the algebra and stop accumulating more "junk" data.
Todor: This is maybe another phenomenon or not elaborated enough or wrong, but I made it up after reading this paragraph.
(1) Higher level patterns get complex - long, carrying lots of derivatives and heavy operations.
(2) Comparison gets more expensive than the predictive benefits. In the past the level may have been more predictive, but if it gets expensive to support it, the level can be either optimized or lost to free resources. While optimizing, higher level suppress lower because now it doesn't expect benefits from the new lower level input.
(3) The derivatives in long patterns (in any length patterns) can turn into local coordinate spaces on their own. A hierarchy on the derivatives is initiated, as if they (or selected parts of them) were raw sensory inputs - some derivative become "x", another "y", another "iB" etc. Longer patterns are more likely to have linear dependencies and other correlations, and patterns within patterns will be discovered.
(4) The process of (3) can start if high matches - within the patterns themselves, or between templates at the same level (let's call them InternalMatch) - are discovered. I suspect - when this InternalMatch is higher than the match between this level and the output from the lower level.
(5) In brief, if it once gets more predictive and cheaper to do the algebra using the already collected higher level derivatives, than to compute and store new high level derivatives from the lower level derivatives (input), then do the algebra and stop accumulating more "junk" data.
Last edited Sep 2, 2011 4:29
PM
Report abusive comment
>
Comparison gets more expensive than the predictive benefits. In the past the
level may have been more predictive, but if it gets expensive to support it, the
level can be either optimized or lost to free resources.
You know I don’t like meaningless words like “optimized”. If a variable or a whole pattern becomes less predictive than the average per resources used, then it simply loses resolution: lower bits of value &| of coordinate (through aggregation across them).
> While optimizing, higher level suppress lower because now it doesn't expect benefits from the new lower level input.
That’s what any feedback is for.
> The derivatives in long patterns (in any length patterns) can turn into local coordinate spaces on their own...
These sub-coordinates are formed | incremented with every new type of derivative. It’s not an optional process, you need them for selective access. Comparing across these “syntactic coordinates” is how you get higher powers of comparison (by division: iterative comparison between difference & match, etc.), dimensional proportions, & so on. But you’re right, I should make it more explicit.
> between templates at the same level (let's call them InternalMatch) - are discovered.
Between templates is not “internal”. You don’t compare across external & across syntactic coordinates at the same time, - that’s not incremental in complexity.
> I suspect - when this InternalMatch is higher than the match between this level and the output from the lower level.
Actually, comparison across syntax is done after evaluation before output, initially if its (across-level projected match) * (internal syntactic span) = *above average*. That means it’ll search on higher level, *&* is likely to be compressed by intra-comparison, which makes the search easier. It’s only after such intra-comparison that you can project & prioritize internal match independently from the external kind.
> In brief, if it once gets more predictive and cheaper to do the algebra using the already collected higher level derivatives, than to compute and store new high level derivatives from the lower level derivatives (input), then do the algebra and stop accumulating more "junk" data.
None of the above is about algebra. Internally or externally, you’re still comparing data, not operations. Comparing operations means comparing syntactic coordinates themselves, that’s what they stand for. (Can you think of initial types of such comparison?).
“Algebra” by itself is not predictive, it only gives you shorter equations to compute predictions from future data. It’s still all about data in the end, but math lets you be more selective in collecting it.
You know I don’t like meaningless words like “optimized”. If a variable or a whole pattern becomes less predictive than the average per resources used, then it simply loses resolution: lower bits of value &| of coordinate (through aggregation across them).
> While optimizing, higher level suppress lower because now it doesn't expect benefits from the new lower level input.
That’s what any feedback is for.
> The derivatives in long patterns (in any length patterns) can turn into local coordinate spaces on their own...
These sub-coordinates are formed | incremented with every new type of derivative. It’s not an optional process, you need them for selective access. Comparing across these “syntactic coordinates” is how you get higher powers of comparison (by division: iterative comparison between difference & match, etc.), dimensional proportions, & so on. But you’re right, I should make it more explicit.
> between templates at the same level (let's call them InternalMatch) - are discovered.
Between templates is not “internal”. You don’t compare across external & across syntactic coordinates at the same time, - that’s not incremental in complexity.
> I suspect - when this InternalMatch is higher than the match between this level and the output from the lower level.
Actually, comparison across syntax is done after evaluation before output, initially if its (across-level projected match) * (internal syntactic span) = *above average*. That means it’ll search on higher level, *&* is likely to be compressed by intra-comparison, which makes the search easier. It’s only after such intra-comparison that you can project & prioritize internal match independently from the external kind.
> In brief, if it once gets more predictive and cheaper to do the algebra using the already collected higher level derivatives, than to compute and store new high level derivatives from the lower level derivatives (input), then do the algebra and stop accumulating more "junk" data.
None of the above is about algebra. Internally or externally, you’re still comparing data, not operations. Comparing operations means comparing syntactic coordinates themselves, that’s what they stand for. (Can you think of initial types of such comparison?).
“Algebra” by itself is not predictive, it only gives you shorter equations to compute predictions from future data. It’s still all about data in the end, but math lets you be more selective in collecting it.
Report abusive comment
>You
know I don’t like meaningless words like “optimized”. If a variable or a whole
pattern becomes less predictive than the average per resources used, then it
simply loses resolution: lower bits of value &| of coordinate (through
aggregation across them).
OK, I know about lowering the resolution to increase match. "Optimize" here - to make comparison of the same derivatives/at the same level cheaper by finding correlations within the level data and between derivatives in a pattern.
>> The derivatives in long patterns (in any length patterns) can turn into local coordinate spaces on their own...
>These sub-coordinates are formed | incremented with every new type of derivative.
>It’s not an optional process, you need them for selective access. Comparing across these “syntactic coordinates” is how you get higher powers of comparison (by division: iterative comparison between difference & match, etc.), dimensional proportions, & so on. But you’re right, I should make it more explicit.
OK, so that's when it's done (from the knol): "the power of comparison is increased if current match-per-costs predicts further improvement, as determined by “secondary” comparison of results from different powers of comparison, which forms algorithms or metapatterns."
>> While optimizing, higher level suppress lower because now it doesn't expect benefits from the new lower level input.
>That’s what any feedback is for.
I mean after a reliable formula is inferred, giving sufficiently high match/prediction, new lower level samples to improve prediction are not necessary.
>“Algebra” by itself is not predictive, it only gives you shorter equations to compute predictions from future data. It’s still all about data in the end, but math lets you be more selective in collecting it. (...)
>None of the above is about algebra. Internally or externally, you’re still comparing data, not operations. Comparing operations means comparing syntactic coordinates themselves, that’s what >they stand for. (Can you think of initial types of such comparison?).
Not yet. However it seems there are not many combinations. There are position within the internal variables, levels in the sub-coordinate hierarchy of this position; basic comparison operations are just a few; iteration is supposed to be repetition until given match/miss is achieved (above/below average or so).
OK, I know about lowering the resolution to increase match. "Optimize" here - to make comparison of the same derivatives/at the same level cheaper by finding correlations within the level data and between derivatives in a pattern.
>> The derivatives in long patterns (in any length patterns) can turn into local coordinate spaces on their own...
>These sub-coordinates are formed | incremented with every new type of derivative.
>It’s not an optional process, you need them for selective access. Comparing across these “syntactic coordinates” is how you get higher powers of comparison (by division: iterative comparison between difference & match, etc.), dimensional proportions, & so on. But you’re right, I should make it more explicit.
OK, so that's when it's done (from the knol): "the power of comparison is increased if current match-per-costs predicts further improvement, as determined by “secondary” comparison of results from different powers of comparison, which forms algorithms or metapatterns."
>> While optimizing, higher level suppress lower because now it doesn't expect benefits from the new lower level input.
>That’s what any feedback is for.
I mean after a reliable formula is inferred, giving sufficiently high match/prediction, new lower level samples to improve prediction are not necessary.
>“Algebra” by itself is not predictive, it only gives you shorter equations to compute predictions from future data. It’s still all about data in the end, but math lets you be more selective in collecting it. (...)
>None of the above is about algebra. Internally or externally, you’re still comparing data, not operations. Comparing operations means comparing syntactic coordinates themselves, that’s what >they stand for. (Can you think of initial types of such comparison?).
Not yet. However it seems there are not many combinations. There are position within the internal variables, levels in the sub-coordinate hierarchy of this position; basic comparison operations are just a few; iteration is supposed to be repetition until given match/miss is achieved (above/below average or so).
Report abusive comment
> OK,
so that's when it's done (from the knol): "the power of comparison is increased
if current match-per-costs predicts further improvement, as determined by
“secondary” comparison of results from different powers of comparison, which
forms algorithms or metapatterns."
That’s only the first step: a comparison across adjacent derivation orders (syntactic coordinates). Beyond that are comparisons across syntactic discontinuity, such as between lengths of different dimensions within a pattern, & so on. I’ll make separate chapter on syntax in the next edit, coming soon. That’ll include the “algebra” part, it really doesn’t belong in the feedback section.
> I mean after a reliable formula is inferred, giving sufficiently high match/prediction, new lower level samples to improve prediction are not necessary.
I think you mean reliable *pattern*, algebraic formulas are not predictive per se. In that case, *local* sampling is suppressed by expectations, but in favor of more distant sampling. I covered that in the section on feedback: “Downward suppression of locations with expected inputs will result in a preference for exploration & discovery of new patterns, vs. confirmation of the old ones”.
> Not yet. However it seems there are not many combinations. There are position within the internal variables, levels in the sub-coordinate hierarchy of this position; basic comparison operations are just a few; iteration is supposed to be repetition until given match/miss is achieved (above/below average or so)
There is an infinite number of potential combinations, the trick is to explore them incrementally. Re iteration, it continues till match/cost is exhausted, not achieved.
Report abusive comment
Report abusive comment
Report abusive comment
Report abusive comment
Report abusive comment
Report abusive comment
Report abusive comment
Report abusive comment
Report abusive comment
Report abusive comment
Report abusive comment
Report abusive comment
Report abusive comment
Todor Arnaudov:
Report abusive comment
Maximizing
predictive-correspon
Report abusive comment
Report abusive comment
Report abusive comment
Report abusive comment
Report abusive comment
Report abusive comment
Report abusive comment
Report abusive comment
Report abusive comment
Report abusive comment
That’s only the first step: a comparison across adjacent derivation orders (syntactic coordinates). Beyond that are comparisons across syntactic discontinuity, such as between lengths of different dimensions within a pattern, & so on. I’ll make separate chapter on syntax in the next edit, coming soon. That’ll include the “algebra” part, it really doesn’t belong in the feedback section.
> I mean after a reliable formula is inferred, giving sufficiently high match/prediction, new lower level samples to improve prediction are not necessary.
I think you mean reliable *pattern*, algebraic formulas are not predictive per se. In that case, *local* sampling is suppressed by expectations, but in favor of more distant sampling. I covered that in the section on feedback: “Downward suppression of locations with expected inputs will result in a preference for exploration & discovery of new patterns, vs. confirmation of the old ones”.
> Not yet. However it seems there are not many combinations. There are position within the internal variables, levels in the sub-coordinate hierarchy of this position; basic comparison operations are just a few; iteration is supposed to be repetition until given match/miss is achieved (above/below average or so)
There is an infinite number of potential combinations, the trick is to explore them incrementally. Re iteration, it continues till match/cost is exhausted, not achieved.
Andrey Panin:
Boris, interesting perspective
creating
general AI is a very addictive problem I have to say - the one that fools many
into thinking it that it's solution is just around the corner alas... all
existing approaches lead to dead end. I share your hope that there are
structural reasons for that such as either real world constraints that force
those working on it to be practical in short term leading them to specialized
solutions, or lack of knowledge in those for whom this is just a hobby. I am
curious if you made any progress in the years since you posted this knol? Also I
am curious to know if you discounted connectionist approaches (for anything
other than perception) in favor of algorithmic/symbolic approach, or you think
it's a false dichotomy?
My personal feelings is that solution will be in form of NN because I haven't seen anything else come conseptually close to linking what at first seem like a completely unrelated peaces of information.
My personal feelings is that solution will be in form of NN because I haven't seen anything else come conseptually close to linking what at first seem like a completely unrelated peaces of information.
Last edited Aug 8, 2011 10:04
AM
Report abusive comment
Thanks!
My knol is continuously updated, last time only a month ago. I am making a "theoretical" progress, - simulation would be pointless since I refine the algorithm almost daily. What is it that you find interesting, &| unclear? I make no hard distinction between perception & "conceptual" levels, it's just a degree of generalization. Connectionist approach is not analytical enough, I think on the level of algorithms: nodes, not networks. Also, as I mentioned in the knol, it's not incremental enough, thus not scalable. I add one dimention at a time, starting from 0D, NNs start from 2D.
My knol is continuously updated, last time only a month ago. I am making a "theoretical" progress, - simulation would be pointless since I refine the algorithm almost daily. What is it that you find interesting, &| unclear? I make no hard distinction between perception & "conceptual" levels, it's just a degree of generalization. Connectionist approach is not analytical enough, I think on the level of algorithms: nodes, not networks. Also, as I mentioned in the knol, it's not incremental enough, thus not scalable. I add one dimention at a time, starting from 0D, NNs start from 2D.
Report abusive comment
I am
interested in how far from completion do you think your algorithm is? enough to
try it out because I am sure you know that no matter how nice a theory is unless
it's tested you can never be sure of what you have.
When you say NN is not scalable I hope you mean current implementations, but in theory it's the most scalable thing known as of now, by it's concept - since our brain is but one version of it. Regarding dimension, not sure I see the limitation. A network of 1 node is 0D isn't it?
What attracts me about NN is the concept of emergence of complexity out of simple units, seems that it the underlying force in nature. To me over relying on an analytical approach is too brave of a step since it's basically saying - we will find an alternative way to recreate intelligence other then the path we know already leads to one. To me it's like trying to understand the intricacies of an anthill through architectural focus rather then by generalizing from a unit of interaction between one ant and another ant.
With regard to solving the intelligence issue one of the issues I find most challenging (apart from infinitelly many others) is this: How to implement an inner drive (aka motivation) - that gives rise to switching and focusing attention, because without it an intelligent system would either be completely sensory driven (without internal importance filter) or it would engage in infinite pattern searching of one random (and could be completely useless) problem. How do you address this problem?
When you say NN is not scalable I hope you mean current implementations, but in theory it's the most scalable thing known as of now, by it's concept - since our brain is but one version of it. Regarding dimension, not sure I see the limitation. A network of 1 node is 0D isn't it?
What attracts me about NN is the concept of emergence of complexity out of simple units, seems that it the underlying force in nature. To me over relying on an analytical approach is too brave of a step since it's basically saying - we will find an alternative way to recreate intelligence other then the path we know already leads to one. To me it's like trying to understand the intricacies of an anthill through architectural focus rather then by generalizing from a unit of interaction between one ant and another ant.
With regard to solving the intelligence issue one of the issues I find most challenging (apart from infinitelly many others) is this: How to implement an inner drive (aka motivation) - that gives rise to switching and focusing attention, because without it an intelligent system would either be completely sensory driven (without internal importance filter) or it would engage in infinite pattern searching of one random (and could be completely useless) problem. How do you address this problem?
Report abusive comment
> I am
sure you know that no matter how nice a theory is unless it's tested you can
never be sure of what you have.
If you're not interested in a theory, you're talking to a wrong guy. I don't care for blind tinkering.
> When you say NN is not scalable I hope you mean current implementations, but in theory it's the most scalable thing known as of now, by it's concept - since our brain is but one version of it.
It's matter of interpretation. ANNs have very little to do with real neurons / columns, to understand the latter you should be a neuroscientist. That's a legitimate route, guess I am too "brave" for that.
> Regarding dimension, not sure I see the limitation. A network of 1 node is 0D isn't it?
The limitation is inefficiency. Adding 1 dimension per level of search lets you select only the lower-D patterns that are strong enough to the carry the overhead of additional coordinates. Without incremental selection you hit combinatorial explosion. Predictions are vectors, you can't have them without explicit coordinates.
> To me it's like trying to understand the intricacies of an anthill through architectural focus rather then by generalizing from a unit of interaction between one ant and another ant.
My approach *is* bottom-up, I start from pixels, you can't get any lower than that. But I do so using criteria derived from my definition of intelligence, without one you're flying blind.
> How to implement an inner drive (aka motivation) - that gives rise to switching and focusing attention, because without it an intelligent system would either be completely sensory driven (without internal importance filter) or it would engage in infinite pattern searching of one random (and could be completely useless) problem. How do you address this problem?
The only drive I care about is curiosity, - a cortical instinct. It's implemented by introducing a universal selection criterion, - predictive power. I am perfectly fine with "sensory-driven", the rest is either gross physiology or acquired through conditioning.
If you're not interested in a theory, you're talking to a wrong guy. I don't care for blind tinkering.
> When you say NN is not scalable I hope you mean current implementations, but in theory it's the most scalable thing known as of now, by it's concept - since our brain is but one version of it.
It's matter of interpretation. ANNs have very little to do with real neurons / columns, to understand the latter you should be a neuroscientist. That's a legitimate route, guess I am too "brave" for that.
> Regarding dimension, not sure I see the limitation. A network of 1 node is 0D isn't it?
The limitation is inefficiency. Adding 1 dimension per level of search lets you select only the lower-D patterns that are strong enough to the carry the overhead of additional coordinates. Without incremental selection you hit combinatorial explosion. Predictions are vectors, you can't have them without explicit coordinates.
> To me it's like trying to understand the intricacies of an anthill through architectural focus rather then by generalizing from a unit of interaction between one ant and another ant.
My approach *is* bottom-up, I start from pixels, you can't get any lower than that. But I do so using criteria derived from my definition of intelligence, without one you're flying blind.
> How to implement an inner drive (aka motivation) - that gives rise to switching and focusing attention, because without it an intelligent system would either be completely sensory driven (without internal importance filter) or it would engage in infinite pattern searching of one random (and could be completely useless) problem. How do you address this problem?
The only drive I care about is curiosity, - a cortical instinct. It's implemented by introducing a universal selection criterion, - predictive power. I am perfectly fine with "sensory-driven", the rest is either gross physiology or acquired through conditioning.
Report abusive comment
your
definition of intelligence gives too broad of a range to be really useful as a
discriminatory tool. An animal that hunts predicts and plans, computer playing
chess predicts and plans, a 2 year old child predicts and plans, a semi retarded
person predicts and plans etc. What matters is how wide is the scope of
prediction, how good is the planning need to be - to be considered a success in
achieving AGI. Currently success is very incremental which brings about a moving
target in terms of what would and would not be considered AGI. I would be very
curious if someone would actually discover a suitable intelligence
criteria.
a computerized version of a neuron is first and foremost a conceptualized version - the reason I think building algorithmic AGI is braver then building a NN AGI is because the order of complexity is very different. It's much easier to concentrate on a small "unintelligent" building block (i.e neuron) which, once conceptualized correctly will lead to intelligence, vs trying to reconstruct intelligence directly, wouldn't you agree?
Regarding pixels and 0 dimension - brain is powerful enough that even being blind and deaf - and only heaving access to touch - which is very crude input source - it is still able to acquire a picture of the world. So I am not convinced that dealing with 0 dimension is really required for true intelligence.
Curiosity is an interesting criteria, but I don't think is sufficient - imagine an autistic person staring for hours/days at a flame because he/she curious to find a pattern to it and predict the way flame will look pix by pix a few minutes later. Something inside us tells us "that's not important - move on". No way getting around a necessity of having internal selection criteria that would say what's important and what's not, don't you think?
a computerized version of a neuron is first and foremost a conceptualized version - the reason I think building algorithmic AGI is braver then building a NN AGI is because the order of complexity is very different. It's much easier to concentrate on a small "unintelligent" building block (i.e neuron) which, once conceptualized correctly will lead to intelligence, vs trying to reconstruct intelligence directly, wouldn't you agree?
Regarding pixels and 0 dimension - brain is powerful enough that even being blind and deaf - and only heaving access to touch - which is very crude input source - it is still able to acquire a picture of the world. So I am not convinced that dealing with 0 dimension is really required for true intelligence.
Curiosity is an interesting criteria, but I don't think is sufficient - imagine an autistic person staring for hours/days at a flame because he/she curious to find a pattern to it and predict the way flame will look pix by pix a few minutes later. Something inside us tells us "that's not important - move on". No way getting around a necessity of having internal selection criteria that would say what's important and what's not, don't you think?
Report abusive comment
> your
definition of intelligence gives too broad of a range to be really useful as a
discriminatory tool. An animal that hunts predicts and plans, computer playing
chess predicts and plans, a 2 year old child predicts and plans, a semi retarded
person predicts and plans etc. What matters is how wide is the scope of
prediction, how good is the planning need to be - to be considered a success in
achieving AGI.
Precisely, intelligence is a matter of degree, & I am suggesting a way to quantify & maximize it. What are you arguing against?
> I would be very curious if someone would actually discover a suitable intelligence criteria.
"Suitable" is a two-way street.
> It's much easier to concentrate on a small "unintelligent" building block (i.e neuron) which, once conceptualized correctly will lead to intelligence, vs trying to reconstruct intelligence directly, wouldn't you agree?
I don't think it is conceptualized correctly, otherwise we'd have intelligent computers running around. You don't know what's easier till you've done it, Markram now wants ~1B$ & 10 years to get "close" doing it. What I do know is that there are ~250K neuroscientists beating around the bushes, & 1 of me making good progress theoretically. It takes guts to do AGI.
> Regarding pixels and 0 dimension - brain is powerful enough that even being blind and deaf - and only heaving access to touch - which is very crude input source - it is still able to acquire a picture of the world. So I am not convinced that dealing with 0 dimension is really required for true intelligence.
Pixels is just an example of 0D processing, any sense would do, though not as well as vision.
> Curiosity is an interesting criteria, but I don't think is sufficient - imagine an autistic person staring for hours/days at a flame because he/she curious to find a pattern to it and predict the way flame will look pix by pix a few minutes later.
Curiosity is a motive, in psych. terms, a criterion is predictive power. You need expand your scope of experience to maximize it, though specific scope vs. precision trade-off depends on the noise in the inputs, & on subject's time horizon. It's the same for autistics, they just put relatively greater value on precision.
Precisely, intelligence is a matter of degree, & I am suggesting a way to quantify & maximize it. What are you arguing against?
> I would be very curious if someone would actually discover a suitable intelligence criteria.
"Suitable" is a two-way street.
> It's much easier to concentrate on a small "unintelligent" building block (i.e neuron) which, once conceptualized correctly will lead to intelligence, vs trying to reconstruct intelligence directly, wouldn't you agree?
I don't think it is conceptualized correctly, otherwise we'd have intelligent computers running around. You don't know what's easier till you've done it, Markram now wants ~1B$ & 10 years to get "close" doing it. What I do know is that there are ~250K neuroscientists beating around the bushes, & 1 of me making good progress theoretically. It takes guts to do AGI.
> Regarding pixels and 0 dimension - brain is powerful enough that even being blind and deaf - and only heaving access to touch - which is very crude input source - it is still able to acquire a picture of the world. So I am not convinced that dealing with 0 dimension is really required for true intelligence.
Pixels is just an example of 0D processing, any sense would do, though not as well as vision.
> Curiosity is an interesting criteria, but I don't think is sufficient - imagine an autistic person staring for hours/days at a flame because he/she curious to find a pattern to it and predict the way flame will look pix by pix a few minutes later.
Curiosity is a motive, in psych. terms, a criterion is predictive power. You need expand your scope of experience to maximize it, though specific scope vs. precision trade-off depends on the noise in the inputs, & on subject's time horizon. It's the same for autistics, they just put relatively greater value on precision.
Report abusive comment
>Precisely, intelligence is a matter of degree, &
I am suggesting a way to quantify & maximize it. What are you arguing
against?
I guess I missed/misunderstood the part where you quantified it, do you mind restating it for my benefit how do you quantify it? That would answer my intelligence test question as well.
> 1 of me making good progress theoretically
hence my question about how far from completion are you (as defined by your own min intelligence test) - do you have all necessary components in place (alas even if in unrefined state) or are there some that you are still have to solve?
> You need expand your scope of experience to maximize it, though specific scope vs. precision trade-off depends on the noise in the inputs, & on subject's time horizon
I disagree that scope vs precision depends on inputs only. It has to depend in large part on internally set goals/values. Inputs don't assign values - with no values trying to understand the complexity of a dust mite is the equivalent to understanding how to solve humanity's garbage crisis. I think unless we give an internal drive/goal/value criteria - intelligence produced by us will be a) useless b) will be difficult to test since it may appear autistic to all.
I guess I missed/misunderstood the part where you quantified it, do you mind restating it for my benefit how do you quantify it? That would answer my intelligence test question as well.
> 1 of me making good progress theoretically
hence my question about how far from completion are you (as defined by your own min intelligence test) - do you have all necessary components in place (alas even if in unrefined state) or are there some that you are still have to solve?
> You need expand your scope of experience to maximize it, though specific scope vs. precision trade-off depends on the noise in the inputs, & on subject's time horizon
I disagree that scope vs precision depends on inputs only. It has to depend in large part on internally set goals/values. Inputs don't assign values - with no values trying to understand the complexity of a dust mite is the equivalent to understanding how to solve humanity's garbage crisis. I think unless we give an internal drive/goal/value criteria - intelligence produced by us will be a) useless b) will be difficult to test since it may appear autistic to all.
Report abusive comment
> I
guess I missed/misunderstood the part where you quantified it, do you mind
restating it for my benefit how do you quantify it? That would answer my
intelligence test question as well.
This whole knol is about that. 2nd paragraph: "the criterion must be predictive correspondence of recorded inputs.., - their cumulative match to future inputs".
I then quantified match on a single-variable level, latter relative & unique match (2nd section), then introduced projected match (vs. contrast) & additive projection (vs. confirmation) in the 3rd section.
More abstract forms of correspondence (cumulative match) are defined by incrementally complex algorithm, but allow for greater scope * precision of prediction.
> hence my question about how far from completion are you...
Completion is when the algorithm can self-improve (add efficient complexity) through computer simulation faster than I can improve it theoretically. That depends largely on the basal complexity of the algorithm, & I don't feel it's complex enough yet. I have several levels in mind that don't quite fit the already established pattern, once I have a better pattern (of increasing complexity) it should scale better.
> I disagree that scope vs precision depends on inputs only.
I didn't say that it does.
> It has to depend in large part on internally set goals/values. Inputs don't assign values - with no values trying to understand the complexity of a dust mite is the equivalent to understanding how to solve humanity's garbage crisis. I think unless we give an internal drive/goal/value criteria - intelligence produced by us will be a) useless b) will be difficult to test since it may appear autistic to all.
That kind of loose talk kept philosophers busy for millenia. To be constructive you need to work bottom-up.
This whole knol is about that. 2nd paragraph: "the criterion must be predictive correspondence of recorded inputs.., - their cumulative match to future inputs".
I then quantified match on a single-variable level, latter relative & unique match (2nd section), then introduced projected match (vs. contrast) & additive projection (vs. confirmation) in the 3rd section.
More abstract forms of correspondence (cumulative match) are defined by incrementally complex algorithm, but allow for greater scope * precision of prediction.
> hence my question about how far from completion are you...
Completion is when the algorithm can self-improve (add efficient complexity) through computer simulation faster than I can improve it theoretically. That depends largely on the basal complexity of the algorithm, & I don't feel it's complex enough yet. I have several levels in mind that don't quite fit the already established pattern, once I have a better pattern (of increasing complexity) it should scale better.
> I disagree that scope vs precision depends on inputs only.
I didn't say that it does.
> It has to depend in large part on internally set goals/values. Inputs don't assign values - with no values trying to understand the complexity of a dust mite is the equivalent to understanding how to solve humanity's garbage crisis. I think unless we give an internal drive/goal/value criteria - intelligence produced by us will be a) useless b) will be difficult to test since it may appear autistic to all.
That kind of loose talk kept philosophers busy for millenia. To be constructive you need to work bottom-up.
Report abusive comment
>Completion is when the algorithm can
self-improve
you know I am seeing it very often among AGI thinkers - what I think is a conflation of two independent problems. It's hard enough to build intelligence, but to merge it with even harder problem if building the kind of intelligence that improves itself is I think an indication of not understanding the problem in the first place.
>That kind of loose talk kept philosophers busy for millennia. To be constructive you need to work bottom-up.
that's not a serious answer. I don't know anything about what kept philosophers busy - I don't study philosophy, but in building AGI I did run into a problem of a need for an ability to shift focus. Selecting for predictivness is not a sufficient criteria because watching a movie a second time increases predictiveness for the next 2 hours - maximizing predictivness forces us to keep watching the movie - but it takes something else to shift focus away. Everything you said thus far makes me think that you don't recognize this as a problem. I think that's something you will have to deal with when you actually try to run your program if you ever get to that point. I think you are falling for the same fallacy as Jeff Hawkins does in his book that is to assume that intelligence can be passive i.e. input dictates output, when intelligence has to be active and even proactive.
you know I am seeing it very often among AGI thinkers - what I think is a conflation of two independent problems. It's hard enough to build intelligence, but to merge it with even harder problem if building the kind of intelligence that improves itself is I think an indication of not understanding the problem in the first place.
>That kind of loose talk kept philosophers busy for millennia. To be constructive you need to work bottom-up.
that's not a serious answer. I don't know anything about what kept philosophers busy - I don't study philosophy, but in building AGI I did run into a problem of a need for an ability to shift focus. Selecting for predictivness is not a sufficient criteria because watching a movie a second time increases predictiveness for the next 2 hours - maximizing predictivness forces us to keep watching the movie - but it takes something else to shift focus away. Everything you said thus far makes me think that you don't recognize this as a problem. I think that's something you will have to deal with when you actually try to run your program if you ever get to that point. I think you are falling for the same fallacy as Jeff Hawkins does in his book that is to assume that intelligence can be passive i.e. input dictates output, when intelligence has to be active and even proactive.
Report abusive comment
I already
answered re shifting: predictive power = scope * precision, you need to increase
both. And beyond that, I explained in the knol why you need discontinuous
shifting, 3rd section, 4th paragraph:
"The next level of selection by feedback results in a preference for exploration over confirmation: we skip over too predictable sources / locations, thereby *reducing* match of new inputs to older templates. This doesn’t select for either proximity or contrast, & seems to contradict my premise. However, exploration should increase *projected* correspondence, which is a higher-level criterion than concurrently reduced *confirmed* correspondence."
Every issue you raised is addressed in the knol. You simply don't seem to care for theoretical understanding, & I don't care for tinkering. Too bad.
"The next level of selection by feedback results in a preference for exploration over confirmation: we skip over too predictable sources / locations, thereby *reducing* match of new inputs to older templates. This doesn’t select for either proximity or contrast, & seems to contradict my premise. However, exploration should increase *projected* correspondence, which is a higher-level criterion than concurrently reduced *confirmed* correspondence."
Every issue you raised is addressed in the knol. You simply don't seem to care for theoretical understanding, & I don't care for tinkering. Too bad.
Report abusive comment
> you
know I am seeing it very often among AGI thinkers - what I think is a conflation
of two independent problems. It's hard enough to build intelligence, but to
merge it with even harder problem if building the kind of intelligence that
improves itself is I think an indication of not understanding the problem in the
first place.
It's not a different problem, - learning (increasing predictive correspondence) *is* self-improvement. And there should be no hard distinction between learning data & learning code, - both are driven by the same criterion, or fitness function. But it is a common fallacy to see intelligence as a fixed object.
It's not a different problem, - learning (increasing predictive correspondence) *is* self-improvement. And there should be no hard distinction between learning data & learning code, - both are driven by the same criterion, or fitness function. But it is a common fallacy to see intelligence as a fixed object.
Todor Arnaudov:
Events in programming are all the way through the hierarchy...
Hi,
Boris, I happened to check you out in the right moment, a few notes in a domain
I guess I'm competent:
>Besides, the events are assumed to be high-level concepts, preprocessed by human cognition. That’s the type of data
>programmers usually deal with, but general intelligence should not depend on preprocessing.
I beg to differ - not true for the "real programmers".
Events in programming start from hardware interrupts and binary flags (set/reset), it's abstraction of "change", "difference" and "selection" (message to this specific receiver who recognizes the event).
Also, in hardware and software does exist a deep hierarchy of abstraction, starting from "sensory inputs" (IC electrical inputs), flat and hierarchical blocks inside the IC, going to inter-ICs, multi levels of redirection in OS and the software.
IMHO high level view on events belongs more likely to people from humanities, who have hard time thinking and remembering all those specific details.
>My approach, on the other hand, is to search for patterns within environmental input flow. I don’t even
>make a distinction
>between input patterns & problem-solving algorithms, -
OK.
> that’s an artifact of the way we design computers, to run hand-coded programs for specific tasks. It
>makes no sense in the
>context of continuous evolution of general intelligence, which should be recapitulated in AGI design.
I'm not sure the distinction comes from this per se, computers evolve to be ever more general tools, to run ever more general code (solve more general problems in one monolithic system) with ever less efforts for coding and ever more reuse and speed ups - from assembler, to functions, more complex built-in CPU instructions, ever higher level languages, libraries, OOP, OSes, hardware abstractions etc.
I think the issue comes from the way most computer users think, they don't realize how brain starts crunching data and the basic principles of GI. I guess this is similar to the way some AGI-haters say "computers can't understand language" or "computer can't never think", and they explain it by claiming: "computers do exactly what we tell them to do" ==> they, users, are incompetent and can't understand language, they can't make computers think.
>Besides, the events are assumed to be high-level concepts, preprocessed by human cognition. That’s the type of data
>programmers usually deal with, but general intelligence should not depend on preprocessing.
I beg to differ - not true for the "real programmers".
Events in programming start from hardware interrupts and binary flags (set/reset), it's abstraction of "change", "difference" and "selection" (message to this specific receiver who recognizes the event).
Also, in hardware and software does exist a deep hierarchy of abstraction, starting from "sensory inputs" (IC electrical inputs), flat and hierarchical blocks inside the IC, going to inter-ICs, multi levels of redirection in OS and the software.
IMHO high level view on events belongs more likely to people from humanities, who have hard time thinking and remembering all those specific details.
>My approach, on the other hand, is to search for patterns within environmental input flow. I don’t even
>make a distinction
>between input patterns & problem-solving algorithms, -
OK.
> that’s an artifact of the way we design computers, to run hand-coded programs for specific tasks. It
>makes no sense in the
>context of continuous evolution of general intelligence, which should be recapitulated in AGI design.
I'm not sure the distinction comes from this per se, computers evolve to be ever more general tools, to run ever more general code (solve more general problems in one monolithic system) with ever less efforts for coding and ever more reuse and speed ups - from assembler, to functions, more complex built-in CPU instructions, ever higher level languages, libraries, OOP, OSes, hardware abstractions etc.
I think the issue comes from the way most computer users think, they don't realize how brain starts crunching data and the basic principles of GI. I guess this is similar to the way some AGI-haters say "computers can't understand language" or "computer can't never think", and they explain it by claiming: "computers do exactly what we tell them to do" ==> they, users, are incompetent and can't understand language, they can't make computers think.
Last edited Jul 13, 2011 6:24
PM
Report abusive comment
Hi
Todor,
I was talking about "events" in BI (probabilistic calculus). They're assumed to be discrete, rather than artificially quantized analog sensory input flow. They call them "hypotheses" & "confirmations", does it sound like a low-level mindset to you? Re programming, I also meant high-level (symbolic) input data, rather than the code / hardware that manipulates it. All of our conscious experience is "high-level", more so for "people in humanities", but the programmers are not immune. Computers are general-purpose, but the programs aren't, except for their hardware interface. An example of the "artifact" I was talking about is separation between data cache & instruction cache on ALU level, I have no use for that.
I was talking about "events" in BI (probabilistic calculus). They're assumed to be discrete, rather than artificially quantized analog sensory input flow. They call them "hypotheses" & "confirmations", does it sound like a low-level mindset to you? Re programming, I also meant high-level (symbolic) input data, rather than the code / hardware that manipulates it. All of our conscious experience is "high-level", more so for "people in humanities", but the programmers are not immune. Computers are general-purpose, but the programs aren't, except for their hardware interface. An example of the "artifact" I was talking about is separation between data cache & instruction cache on ALU level, I have no use for that.
Report abusive comment
Thanks
for the reply!
About BI - sure, I also taught students that starting from high level is not going to scale, like Prolog, Cyc, expert systems, frame-based cognitive architectures etc.
>Re programming, I also meant high-level (symbolic) input data, rather than the code / hardware that manipulates it.
I tried to point the understanding of the concept of "event" itself. For a banking software or a researcher who's bad in programming, "event" might be "being sunny or rainy". Real programmers and engineers who do DSP, computer vision, ML/RL or just low level coding have a better "physical" idea.
>Re programming, I also meant high-level (symbolic) input data, rather than the code / hardware that manipulates it.
OK, but input data for some kind of software is as symbolic as quantized sensory matrix.
> All of our
>conscious experience is "high-level", more so for "people in humanities", but the programmers are not immune. Computers are
>general-purpose, but the programs aren't, except for their hardware interface. An example of the "artifact" I was talking
>about is separation between data cache & instruction cache on ALU level, I have no use for that.
This seems to me rather a detail and an optimization (paralellization): two independent buses (for speed, physical limitations), also instruciton/data division is for simplicity and speed (preferrably sequential reading for part of the input); for cache - data changes more rapidly than instructions, because self-modifying machine code is usually forbidden today etc.
It's a specialization, but it's transparent to target work, and there always will be some sort of physical or practical basis which will force some design decisions at the low level of implementation.
As for the seeing division instr./data artificial/practical - I agree that it's a POV/frame what to be interpreted as what, even for the "stupid" algorithms data is a part of the running algorithm (actual sequence and causal forces changing the system).
>Computers are general-purpose, but the programs aren't, except for their hardware interface.
Isn't it a matter of scope and complexity. Bigger "programs" such as OSes are big deal general purpose, and complex application software gets more general during development. Sure, not AGI, but as the number of functions grow, they're generalized as long as their parameters and structure start to repeat. And after all, generalization starts from comparing specific samples, programming complex system generates samples to be generalized, it's how functions, structured programming, OOP and Design Patterns originated.
As of philosophers/social science types and programmers - you give more favour to the former, but typical philosophers have no chance formalizing intelligence themselves as well, because they don't understand, don't care ("it's beneath them") or don't have skills in programming, i.e. low level data representation and processing. IMHO a lot of philosophy consists of simple, obvious low complexity concepts - higher generality doesn't strictly mean complex or hard to derive. However these simple things are masked with big words.
Long ago I tried to explain to a philosopher, that computers just seem "dull" to him, because he sees them as "1 or 0", but actually he doesn't understand them. Well, let he say computers are doing exactly what he tell them after defining and understanding the dynamics of 10 or 100 billions of dumb "1s and 0s", he's pushing buttons, billions of bits are updating. Programming seems "dull" to them as "generalist-type", because it's too hard for them.
Bottom line from me on this point is that there's more than being specialist/generalis t or depth of hierarchy, it's
also the resolution and scale of processing you do over that hierarchy.
About BI - sure, I also taught students that starting from high level is not going to scale, like Prolog, Cyc, expert systems, frame-based cognitive architectures etc.
>Re programming, I also meant high-level (symbolic) input data, rather than the code / hardware that manipulates it.
I tried to point the understanding of the concept of "event" itself. For a banking software or a researcher who's bad in programming, "event" might be "being sunny or rainy". Real programmers and engineers who do DSP, computer vision, ML/RL or just low level coding have a better "physical" idea.
>Re programming, I also meant high-level (symbolic) input data, rather than the code / hardware that manipulates it.
OK, but input data for some kind of software is as symbolic as quantized sensory matrix.
> All of our
>conscious experience is "high-level", more so for "people in humanities", but the programmers are not immune. Computers are
>general-purpose, but the programs aren't, except for their hardware interface. An example of the "artifact" I was talking
>about is separation between data cache & instruction cache on ALU level, I have no use for that.
This seems to me rather a detail and an optimization (paralellization): two independent buses (for speed, physical limitations), also instruciton/data division is for simplicity and speed (preferrably sequential reading for part of the input); for cache - data changes more rapidly than instructions, because self-modifying machine code is usually forbidden today etc.
It's a specialization, but it's transparent to target work, and there always will be some sort of physical or practical basis which will force some design decisions at the low level of implementation.
As for the seeing division instr./data artificial/practical - I agree that it's a POV/frame what to be interpreted as what, even for the "stupid" algorithms data is a part of the running algorithm (actual sequence and causal forces changing the system).
>Computers are general-purpose, but the programs aren't, except for their hardware interface.
Isn't it a matter of scope and complexity. Bigger "programs" such as OSes are big deal general purpose, and complex application software gets more general during development. Sure, not AGI, but as the number of functions grow, they're generalized as long as their parameters and structure start to repeat. And after all, generalization starts from comparing specific samples, programming complex system generates samples to be generalized, it's how functions, structured programming, OOP and Design Patterns originated.
As of philosophers/social science types and programmers - you give more favour to the former, but typical philosophers have no chance formalizing intelligence themselves as well, because they don't understand, don't care ("it's beneath them") or don't have skills in programming, i.e. low level data representation and processing. IMHO a lot of philosophy consists of simple, obvious low complexity concepts - higher generality doesn't strictly mean complex or hard to derive. However these simple things are masked with big words.
Long ago I tried to explain to a philosopher, that computers just seem "dull" to him, because he sees them as "1 or 0", but actually he doesn't understand them. Well, let he say computers are doing exactly what he tell them after defining and understanding the dynamics of 10 or 100 billions of dumb "1s and 0s", he's pushing buttons, billions of bits are updating. Programming seems "dull" to them as "generalist-type", because it's too hard for them.
Bottom line from me on this point is that there's more than being specialist/generalis
Report abusive comment
> I
tried to point the understanding of the concept of "event" itself.
That "concept" is meaningless by itself.
> As for the seeing division instr./data artificial/practical - I agree that it's a POV/frame what to be interpreted as what,
Right, & cache division is just one example of such "hard" separation, in programmer's mind as well as in computer architecture. It won't be an "optimization" if your code is incrementally derived from your data.
>> Computers are general-purpose, but the programs aren't, except for their hardware interface.
> Isn't it a matter of scope and complexity...
No, it's not simple scaling, the higher levels are mostly application-specific handles. Look, if you want to talk superficially related computerese, may I suggest AGI list?
> Bottom line from me on this point is that there's more than being specialist/generalis t or
depth of hierarchy, it's also the resolution and scale of processing you do over
that hierarchy.
That's as trivial as your earlier talk of "raw power".
I don't favor philosophers, I said many times that philosophy is the most dysfunctional discipline next to theology. I just don't talk about them as much, because they don't try to build an AGI.
But you do sound like a philosopher yourself, talking about anything *but* the actual subject matter.
Care to discuss something potentially constructive?
That "concept" is meaningless by itself.
> As for the seeing division instr./data artificial/practical - I agree that it's a POV/frame what to be interpreted as what,
Right, & cache division is just one example of such "hard" separation, in programmer's mind as well as in computer architecture. It won't be an "optimization" if your code is incrementally derived from your data.
>> Computers are general-purpose, but the programs aren't, except for their hardware interface.
> Isn't it a matter of scope and complexity...
No, it's not simple scaling, the higher levels are mostly application-specific handles. Look, if you want to talk superficially related computerese, may I suggest AGI list?
> Bottom line from me on this point is that there's more than being specialist/generalis
That's as trivial as your earlier talk of "raw power".
I don't favor philosophers, I said many times that philosophy is the most dysfunctional discipline next to theology. I just don't talk about them as much, because they don't try to build an AGI.
But you do sound like a philosopher yourself, talking about anything *but* the actual subject matter.
Care to discuss something potentially constructive?
Report abusive comment
>But
you do sound like a philosopher yourself, talking about anything *but* the
actual subject matter.
OK... I'm not even warming up now, "pre-warming".
>Care to discuss something potentially constructive?
Sure...
...
>No, it's not simple scaling, there's a ton of application-specific biases mixed-in on higher levels.
Nobody claims this is scaling the way you do it, it's not AGI. I claim that software engineers know about scaling, even if they're spoiling it for practical reasons.
>Look, if you want to talk superficially related computerese, go to AGI list.
I don't want, I wanted to share few thoughts on these computer related topics.
>That's as trivial as "raw power" from your earlier attempts.
A bit reworded bottom line: Real programmers shouldn't be underestimated, they have "raw power" and an idea of scaling.
>I don't favor philosophers, I said many times that philosophy is the most dysfunctional discipline
I know, but you mention them as supposed to possess a more appropriate mindset, while programmers are completely lost in your opinion.
OK... I'm not even warming up now, "pre-warming".
>Care to discuss something potentially constructive?
Sure...
...
>No, it's not simple scaling, there's a ton of application-specific biases mixed-in on higher levels.
Nobody claims this is scaling the way you do it, it's not AGI. I claim that software engineers know about scaling, even if they're spoiling it for practical reasons.
>Look, if you want to talk superficially related computerese, go to AGI list.
I don't want, I wanted to share few thoughts on these computer related topics.
>That's as trivial as "raw power" from your earlier attempts.
A bit reworded bottom line: Real programmers shouldn't be underestimated, they have "raw power" and an idea of scaling.
>I don't favor philosophers, I said many times that philosophy is the most dysfunctional discipline
I know, but you mention them as supposed to possess a more appropriate mindset, while programmers are completely lost in your opinion.
Report abusive comment
> Real
programmers shouldn't be underestimated, they have "raw power" and an idea of
scaling.
Show me.
> I know, but you mention them as supposed to possess a more appropriate mindset,
That was re "real" philosophers, not the kind you would hear about. Except for myself.
If I grew up in the west, I'd probably start by studying philosophy (esp. philosophy of science), but drop it after realizing that cognition must be defined at sensory level. Cognitive process is the only legitimate subject for philosophy, the fact that "philosophers" aren't working on it is a different matter.
Show me.
> I know, but you mention them as supposed to possess a more appropriate mindset,
That was re "real" philosophers, not the kind you would hear about. Except for myself.
If I grew up in the west, I'd probably start by studying philosophy (esp. philosophy of science), but drop it after realizing that cognition must be defined at sensory level. Cognitive process is the only legitimate subject for philosophy, the fact that "philosophers" aren't working on it is a different matter.
Todor Arnaudov:
Task: Comparing a single-integer input to a fixed-length continuous sequence of older inputs
Hi Boris,
I'm loading my gun for a new shoot. :)
B>If you want to get constructive (meaningful), try to formalize comparing a single-integer input to a fixed-length continuous sequence of older inputs, & then form its prediction over the next sequence of the same length & direction.
Actually I got an idea immediately, even shared a bit with my students, but it seemed too simple. However now I believe it should be simple, there shouldn't be rocket science in a few numbers and all patterns should be derivable from the mere numbers and their relations, such as start value, differences, changes.
So, my first guess is that this seems similar to DSP and might be related to delta coding and linear prediction. For a start I thought only of subtraction difference, it's effective for low ratio smooth changes. However I see now you've added more clues in the article and also division and logarithm are more appropriate for high ratio and very high ratio changes.
I see also that applying different kinds of comparison is needed in order to be able to *select* the right one if some matched, some mismatched; like in the following example with the shortest possible sequences:
First sequence:
[5 6]
Pattern:
Length = 1
Start = 5
Add Diff = 1
Ratio Diff = 1,2
Direction Diff = 1 (+)
New number:
[5 6] 7
Compare the difference to the last number of sequence, and the match to the pattern:
Add Diff = 1 (match 1)
Ratio Diff = 7/6 (match 0.935)
Direction Diff = 0 (+) (match 1)
A new sequence, assuming algorithm doesn't care about the mismatch of the Start value (coordinates).
[50 51]
Add Diff = 1 (m 1)
Ratio Diff = 51/50 ( m 0.85)
Dir Diff = 0 (m 1)
What matches better is the Add Diff, the algorithm should ignore ratio mismatch and will predict 52.
However:
[100 110]
Add Diff = 10 (m 0.1)
Ratio Diff = 1,1 ( m 0,92)
Dir Diff = 0 (m 1)
Now Add Diff match is very low, but Ratio Diff match is almost identical to the match between the pattern and the new number, therefore: 7/6*110 = 128,33 = (int) 128.
B>If you want to get constructive (meaningful), try to formalize comparing a single-integer input to a fixed-length continuous sequence of older inputs, & then form its prediction over the next sequence of the same length & direction.
Actually I got an idea immediately, even shared a bit with my students, but it seemed too simple. However now I believe it should be simple, there shouldn't be rocket science in a few numbers and all patterns should be derivable from the mere numbers and their relations, such as start value, differences, changes.
So, my first guess is that this seems similar to DSP and might be related to delta coding and linear prediction. For a start I thought only of subtraction difference, it's effective for low ratio smooth changes. However I see now you've added more clues in the article and also division and logarithm are more appropriate for high ratio and very high ratio changes.
I see also that applying different kinds of comparison is needed in order to be able to *select* the right one if some matched, some mismatched; like in the following example with the shortest possible sequences:
First sequence:
[5 6]
Pattern:
Length = 1
Start = 5
Add Diff = 1
Ratio Diff = 1,2
Direction Diff = 1 (+)
New number:
[5 6] 7
Compare the difference to the last number of sequence, and the match to the pattern:
Add Diff = 1 (match 1)
Ratio Diff = 7/6 (match 0.935)
Direction Diff = 0 (+) (match 1)
A new sequence, assuming algorithm doesn't care about the mismatch of the Start value (coordinates).
[50 51]
Add Diff = 1 (m 1)
Ratio Diff = 51/50 ( m 0.85)
Dir Diff = 0 (m 1)
What matches better is the Add Diff, the algorithm should ignore ratio mismatch and will predict 52.
However:
[100 110]
Add Diff = 10 (m 0.1)
Ratio Diff = 1,1 ( m 0,92)
Dir Diff = 0 (m 1)
Now Add Diff match is very low, but Ratio Diff match is almost identical to the match between the pattern and the new number, therefore: 7/6*110 = 128,33 = (int) 128.
Last edited Jul 9, 2010 5:49
AM
Huh? This
is not even wrong, - there's no algorithm, just ad hock examples. I don't *ever*
want *any* examples, - they pollute the mind. Use algebraic variables, not the
actual numbers.
Actually, the examples *are* wrong. Forget about higher orders of comparison, DSP, & whatever other "hammers" you happen to know about, think in terms of the purpose. You keep talking about the differences, but the purpose is to project *match*, as a distinct variable. You don't predict the next input, every past input is already a prediction. You need to quantify accuracy (match) of that prediction for the next n comparisons, based on the past n comparisons. Hint: *projecting* a match means adjusting it for the "cost" of search, & for competing projection of accumulated difference. If you figure this out, it'll be a first step down a long road.
Actually, the examples *are* wrong. Forget about higher orders of comparison, DSP, & whatever other "hammers" you happen to know about, think in terms of the purpose. You keep talking about the differences, but the purpose is to project *match*, as a distinct variable. You don't predict the next input, every past input is already a prediction. You need to quantify accuracy (match) of that prediction for the next n comparisons, based on the past n comparisons. Hint: *projecting* a match means adjusting it for the "cost" of search, & for competing projection of accumulated difference. If you figure this out, it'll be a first step down a long road.
Report abusive comment
OK, it's
wrong and I've misinterpreted it, but "next sequence" in the question could mean
also a new different one, not only a continuation of the past.
My example was about: [a1, ..., an, x] -?-> [b1, ... bn, y=?], [c1, ... , cn, z=?], ...
While now I guess it should be: [a1, a2, ... an x b1 b2 ... bn] ==> (a1 .. an) x -?-> (b1 ... bn),
A correlation rather than an extrapolation. How the past input was predictive to the following input that *really happened*, rather than about a prediction of the next value *before it happens*.
Prediction before a value happens should come later, using justified selected predictive patterns with quantified match. I'll think about it.
My example was about: [a1, ..., an, x] -?-> [b1, ... bn, y=?], [c1, ... , cn, z=?], ...
While now I guess it should be: [a1, a2, ... an x b1 b2 ... bn] ==> (a1 .. an) x -?-> (b1 ... bn),
A correlation rather than an extrapolation. How the past input was predictive to the following input that *really happened*, rather than about a prediction of the next value *before it happens*.
Prediction before a value happens should come later, using justified selected predictive patterns with quantified match. I'll think about it.
Maximizing
predictive-correspon dence which maximizes reward
Hi
Boris,
A guess... (Or a new shot in the dark :) )
I think that the mind favors maximizing predictive-correspon dence which maximizes
reward, I suppose this is related to what you and psychologists call hierarchy
of needs. Maximizing predictive-correspon dence/compression can be
assumed as a form of reward for itself, as well as misses/errors – a
“punishment”, but there must also be lower “root” rewards to generate initial
behavior and to drive initial focus on selected stimuli.
>past patterns are decreasingly predictive with the distance / delay from expected inputs,
>Recent inputs are relatively more predictive than the old ones by the virtue of their
> proximity to future inputs. Thus, proximity should determine the order of search within
> a level of generality.
Is it always so? I suspect it may be not always the case. It is possible to have delayed patterns, where activity “now” is dependent on changes that happened long ago. The fresh input buffers are cheapest to check quickly, even if they're not the the most relevant, and if the input buffers are too short, mind has no choice, but searching for patterns there; a machine with longer buffer may learn much faster. There could be a “cache”/”stack” for old inputs which are expected to be predictive with a delay.
Also, such a correlation between recent inputs and close future inputs is apparent when the patterns are inertial/slow-changi ng/low frequency ones and the activity passes
through adjacent coordinates, like in the HTM basic vision demo. Many (or most)
of the input patterns do, but I guess - not all.
Also rewarding old inputs can be much more predictive than new unrewarding ones, because mind searches how to maximize their predictive power to future inputs, while it may ignore and miss to evaluate recent inputs which are expected to be unrewarding (and nonthreatening), unless they are attached to rewarding ones making them rewarding as well (or such to avoid punishment).
Overall, I suspect that a reward function(s) need to be added to predictive correspondence, and proximity and recentness may need to be more abstract.
Regards
Todor
A guess... (Or a new shot in the dark :) )
I think that the mind favors maximizing predictive-correspon
>past patterns are decreasingly predictive with the distance / delay from expected inputs,
>Recent inputs are relatively more predictive than the old ones by the virtue of their
> proximity to future inputs. Thus, proximity should determine the order of search within
> a level of generality.
Is it always so? I suspect it may be not always the case. It is possible to have delayed patterns, where activity “now” is dependent on changes that happened long ago. The fresh input buffers are cheapest to check quickly, even if they're not the the most relevant, and if the input buffers are too short, mind has no choice, but searching for patterns there; a machine with longer buffer may learn much faster. There could be a “cache”/”stack” for old inputs which are expected to be predictive with a delay.
Also, such a correlation between recent inputs and close future inputs is apparent when the patterns are inertial/slow-changi
Also rewarding old inputs can be much more predictive than new unrewarding ones, because mind searches how to maximize their predictive power to future inputs, while it may ignore and miss to evaluate recent inputs which are expected to be unrewarding (and nonthreatening), unless they are attached to rewarding ones making them rewarding as well (or such to avoid punishment).
Overall, I suspect that a reward function(s) need to be added to predictive correspondence, and proximity and recentness may need to be more abstract.
Regards
Todor
Last edited May 22, 2010 8:06
PM
Report abusive comment
> A
guess... (Or a new shot in the dark :) )
At least you're shooting at the right target :).
> I think that the mind favors maximizing predictive-correspon dence which maximizes reward, I suppose this is
related to what you and psychologists call hierarchy of needs.
My hierarchy is a sequential development of generalized means, which are then conditioned to become needs/wants. Basic cognition is driven by a very low-level inherited algorithm, without it this development can't even start.
> Maximizing predictive-correspon dence/compression can be assumed as a
form of reward for itself, as well as misses/errors – a “punishment”, but there
must also be lower “root” rewards to generate initial behavior and to drive
initial focus on selected stimuli.
Initial behavior is instinctive, & curiousity is one of the most basic: the knowledge instinct: http://en.wikipedia. org/wiki/Leonid_Perl ovsky
Initial cognition is driven by a low-level design of neocortex (most likely minicolumn: http://brain.oxfordj ournals.org/cgi/cont ent/full/125/5/935
), it doesn't need any extra-cortical "rewards".
> It is possible to have delayed patterns, where activity “now” is dependent on changes that happened long ago. The fresh input buffers are cheapest to check quickly, even if they're not the the most relevant, and if the input buffers are too short, mind has no choice, but searching for patterns there; a machine with longer buffer may learn much faster. There could be a “cache”/”stack” for old inputs which are expected to be predictive with a delay.
Yeah, that's what I call "higher levels of generalization". Those *are* older inputs, only compressed, & selected accordingly.
New Edit: my wrong, that's a good idea, though not well justified. See on the first prize in the knol.
> Also rewarding old inputs can be much more predictive than new unrewarding ones, because mind searches how to maximize their predictive power to future inputs, while it may ignore and miss to evaluate recent inputs which are expected to be unrewarding (and nonthreatening), unless they are attached to rewarding ones making them rewarding as well (or such to avoid punishment).
You mean that we can make inputs more predictive by reproducing them? That means going way back to a lower stage of meta-evolution :).
> Overall, I suspect that a reward function(s) need to be added to predictive correspondence,
I'd suggest that you forget about subcortical nonsense, it's part of the problem, not part of the solution.
> and proximity and recentness may need to be more abstract.
That's already explained in the knol. It's true that a mind will skip over too predictable inputs, even if not driven by non-cognitive rewards. It's a form of novelty seeking that is not maximizing proximity, contrast, or even actual match. I didn't explain that in the knol. If you can define the criterion that's maximized in such "exploration mode", that would warrant a consolation prize :).
Boris
At least you're shooting at the right target :).
> I think that the mind favors maximizing predictive-correspon
My hierarchy is a sequential development of generalized means, which are then conditioned to become needs/wants. Basic cognition is driven by a very low-level inherited algorithm, without it this development can't even start.
> Maximizing predictive-correspon
Initial behavior is instinctive, & curiousity is one of the most basic: the knowledge instinct: http://en.wikipedia.
Initial cognition is driven by a low-level design of neocortex (most likely minicolumn: http://brain.oxfordj
> It is possible to have delayed patterns, where activity “now” is dependent on changes that happened long ago. The fresh input buffers are cheapest to check quickly, even if they're not the the most relevant, and if the input buffers are too short, mind has no choice, but searching for patterns there; a machine with longer buffer may learn much faster. There could be a “cache”/”stack” for old inputs which are expected to be predictive with a delay.
Yeah, that's what I call "higher levels of generalization". Those *are* older inputs, only compressed, & selected accordingly.
New Edit: my wrong, that's a good idea, though not well justified. See on the first prize in the knol.
> Also rewarding old inputs can be much more predictive than new unrewarding ones, because mind searches how to maximize their predictive power to future inputs, while it may ignore and miss to evaluate recent inputs which are expected to be unrewarding (and nonthreatening), unless they are attached to rewarding ones making them rewarding as well (or such to avoid punishment).
You mean that we can make inputs more predictive by reproducing them? That means going way back to a lower stage of meta-evolution :).
> Overall, I suspect that a reward function(s) need to be added to predictive correspondence,
I'd suggest that you forget about subcortical nonsense, it's part of the problem, not part of the solution.
> and proximity and recentness may need to be more abstract.
That's already explained in the knol. It's true that a mind will skip over too predictable inputs, even if not driven by non-cognitive rewards. It's a form of novelty seeking that is not maximizing proximity, contrast, or even actual match. I didn't explain that in the knol. If you can define the criterion that's maximized in such "exploration mode", that would warrant a consolation prize :).
Boris
Report abusive comment
>At
least you're shooting at the right target :)
Finally! ;)
>> Maximizing predictive-correspon dence/compression can be assumed as a
form of reward for
>>itself, as well as misses/errors – a “punishment”, but there must also be lower “root” rewards
>>to generate initial behavior and to drive initial focus on selected stimuli.
>Initial behavior is instinctive, & curiousity is one of the most basic: the knowledge instinct:
>http://en.wikipedia .org/wiki/Leonid_Per lovsky
>Initial cognition is driven by a low-level design of neocortex (most likely minicolumn:
>http://brain.oxford journals.org/cgi/con tent/full/125/5/935
), it doesn't need any extra-cortical
>"rewards".
Thanks for the links! I've missed Leonid and yes, I do have to check out the "raw scientific input" about the columns...
>>Also rewarding old inputs can be much more predictive than new unrewarding ones,
>>because mind searches how
>>to maximize their predictive power to future inputs, while it may ignore and miss to evaluate
>>recent inputs which are expected to be unrewarding (and nonthreatening), unless they are
>>attached to rewarding ones making them rewarding as well (or such to avoid punishment).
>>Overall, I suspect that a reward function(s) need to be added to predictive correspondence,
>You mean that we can make inputs more predictive by reproducing them? That means going
>way back to a lower stage of meta-evolution :).
>I'd suggest that you forget about subcortical nonsense, it's part of the problem, not part of the
>solution.
Elegantly said... :)
If I'm getting this right:
>My hierarchy is a sequential development of generalized means, which are then
>===conditioned to become needs/wants.===
Then to you is behavioral/condition ing part - this makes sense. I think that
is probably another hierarchy (what you call hierarchy of needs), where lower
brains (brainstem, amygdala, hypothalamus) are higher levels of *control* (basic
needs) than the highest level of cognitive hierarchy, and the direction is
evolutionary backwards. At least this is true right when you "switch on" a
human.
However I do believe going back in meta-evolution makes sense, because subcortical regions are more primitive. Actually inputs do get sort of more predictable (or at least subject's behavior gets more predictable, so pleasing patterns are generally more predictive than not pleasing ones).
This is how love and addictions self-feed - by reproduction of recorded behaviors that led to a pleasure.
>> It is possible to have delayed patterns, where activity “now” is dependent on changes that
>>happened long ago. The fresh input buffers are cheapest to check quickly, even if they're not
>>the the most relevant, and if the input buffers are too short, mind has no choice, but searching
>>for patterns there; a machine with longer buffer may learn much faster. There could be a
>>“cache”/”stack” for old inputs which are expected to be predictive with a delay.
>Yeah, that's what I call "higher levels of generalization".
>Those *are* older inputs, only compressed, & >selected accordingly.
OK. :) At this point my terminology is “higher level virtual universes”, “higher level virtual simulators of virtual universes”, “higher level of control”.
The laws of physics of the higher level universes are built by sequences and sets of lower level laws, which on their own have their laws of physics and sub-universes. Laws of physics and virtual universes are predictive patterns (systems of patterns), extracted from sensory input and used to predict. On the lowest level, laws are not compressed, this is "the reality"::
- in real Universe, you have to simulate all in order to predict and have exact representation of the future at Universe meaningful resolution (Plank's constants etc.)
- in thinking machine or human mind this is the raw sensory input that causes cognition to start
In order to interact/interface with the lowest level universe for the system, higher level must decompress its representations throughout the hierarchy, and each level down adds details, making the picture increasingly sharper.
>> and proximity and recentness may need to be more abstract.
>That's already explained in the knol.
Just going up in the hierarchy?
>It's true that a mind will skip over too predictable inputs, even if not driven by non-cognitive
>rewards. It's a form of novelty seeking that is not maximizing proximity, contrast, or even actual
>match. I didn't explain that in the knol. If you can define the criterion that's maximized in such
>"exploration mode", that would warrant a consolation prize :).
Nice... :)
My first intuitive guess is predictive range, compression ratio; I think it's related to minimum message length/Kolmogorov's complexity.
I'm not sure if these concepts are the answer to your question, but they sound interesting to me anyway. Sounds like “Predictability ...
Finally! ;)
>> Maximizing predictive-correspon
>>itself, as well as misses/errors – a “punishment”, but there must also be lower “root” rewards
>>to generate initial behavior and to drive initial focus on selected stimuli.
>Initial behavior is instinctive, & curiousity is one of the most basic: the knowledge instinct:
>http://en.wikipedia
>Initial cognition is driven by a low-level design of neocortex (most likely minicolumn:
>http://brain.oxford
>"rewards".
Thanks for the links! I've missed Leonid and yes, I do have to check out the "raw scientific input" about the columns...
>>Also rewarding old inputs can be much more predictive than new unrewarding ones,
>>because mind searches how
>>to maximize their predictive power to future inputs, while it may ignore and miss to evaluate
>>recent inputs which are expected to be unrewarding (and nonthreatening), unless they are
>>attached to rewarding ones making them rewarding as well (or such to avoid punishment).
>>Overall, I suspect that a reward function(s) need to be added to predictive correspondence,
>You mean that we can make inputs more predictive by reproducing them? That means going
>way back to a lower stage of meta-evolution :).
>I'd suggest that you forget about subcortical nonsense, it's part of the problem, not part of the
>solution.
Elegantly said... :)
If I'm getting this right:
>My hierarchy is a sequential development of generalized means, which are then
>===conditioned to become needs/wants.===
Then to you is behavioral/condition
However I do believe going back in meta-evolution makes sense, because subcortical regions are more primitive. Actually inputs do get sort of more predictable (or at least subject's behavior gets more predictable, so pleasing patterns are generally more predictive than not pleasing ones).
This is how love and addictions self-feed - by reproduction of recorded behaviors that led to a pleasure.
>> It is possible to have delayed patterns, where activity “now” is dependent on changes that
>>happened long ago. The fresh input buffers are cheapest to check quickly, even if they're not
>>the the most relevant, and if the input buffers are too short, mind has no choice, but searching
>>for patterns there; a machine with longer buffer may learn much faster. There could be a
>>“cache”/”stack” for old inputs which are expected to be predictive with a delay.
>Yeah, that's what I call "higher levels of generalization".
>Those *are* older inputs, only compressed, & >selected accordingly.
OK. :) At this point my terminology is “higher level virtual universes”, “higher level virtual simulators of virtual universes”, “higher level of control”.
The laws of physics of the higher level universes are built by sequences and sets of lower level laws, which on their own have their laws of physics and sub-universes. Laws of physics and virtual universes are predictive patterns (systems of patterns), extracted from sensory input and used to predict. On the lowest level, laws are not compressed, this is "the reality"::
- in real Universe, you have to simulate all in order to predict and have exact representation of the future at Universe meaningful resolution (Plank's constants etc.)
- in thinking machine or human mind this is the raw sensory input that causes cognition to start
In order to interact/interface with the lowest level universe for the system, higher level must decompress its representations throughout the hierarchy, and each level down adds details, making the picture increasingly sharper.
>> and proximity and recentness may need to be more abstract.
>That's already explained in the knol.
Just going up in the hierarchy?
>It's true that a mind will skip over too predictable inputs, even if not driven by non-cognitive
>rewards. It's a form of novelty seeking that is not maximizing proximity, contrast, or even actual
>match. I didn't explain that in the knol. If you can define the criterion that's maximized in such
>"exploration mode", that would warrant a consolation prize :).
Nice... :)
My first intuitive guess is predictive range, compression ratio; I think it's related to minimum message length/Kolmogorov's complexity.
I'm not sure if these concepts are the answer to your question, but they sound interesting to me anyway. Sounds like “Predictability ...
Report abusive comment
Hmm, my
comment seemed too long, coninues:
>It's true that a mind will skip over too predictable inputs, even if not driven by non-cognitive
>rewards. It's a form of novelty seeking that is not maximizing proximity, contrast, or even actual
>match. I didn't explain that in the knol. If you can define the criterion that's maximized in such
>"exploration mode", that would warrant a consolation prize :).
Nice... :)
My first intuitive guess is predictive range, compression ratio; I think it's related to minimum message length/Kolmogorov's complexity.
I'm not sure if these concepts are the answer to your question, but they sound interesting to me anyway. Sounds like “Predictability Analysis/Calculus”. :)
- For how long in the future/in space predictions are expected to match real input, based on the pattern and how much input data are enough to predict the whole future input, generated by the pattern. This is particularly apparent for simple patterns that are expected to take a lot of time, like speaking aloud 1, 2, 3, ..., 1 million. :) Generally, if you know the end from the beginning, you don't need to keep attention on the process.
- Predictability range and predictability precision of the new input, based on the recent/immediate or local input from a pattern, or more generally - how parts from an input assist in prediction/compressi on
of other parts of the input. If going meta – how parts from a pattern assist in
prediction of other parts of the pattern itself.
I noticed this in the past in a section from my writings with speculations about interestingness in pictures, e.g. generally a photograph would qualify this photo http://eim.hit.bg/3/ 25/tee1.jpg as boring, while the
next one - (more) interesting: http://eim.hit.bg/3/ 25/kalof94.jpg
Interestingness is subjective, but this is true at least for the measure below:
The first photo can be drawn by a portion from it, extended with a simple cycle with instructions how to stretch and copy in perspective (implying mind does this and stores images this way - compressed and doing transformations and operations). The second one can't be compressed that way (not so simple), also there are more meaningful recognizable objects and mind needs to engage more. This is what Interestingness is all about - engaging mind to watch and try to predict what would come next. There are other aesthetics reasons for the interestingness as well - emotional, “organic” appearance/smoothnes s,
dynamics - expected possible change in pictures with animate objects; however,
this is another story.
- Function of predictability in time/space. How prediction precision changes throughout the accumulation of more data. If precision stops rising, rises too slow or reaches to very high levels, the watch may stop – this is a saturation of the function of predictability through time. If I try to use your terms (hoping correctly) – if it's not possible to discover increasingly predictive short-cuts for a particular pattern anymore, it may be skipped over. This rule skips noise, as well.
It is possible the function of predictability to rise in a moment, e.g. seeing a flat blue banner.
Also, for a level of hierarchy, when predictability saturates, that is when a level can predict the future with a precision over a threshold, the hierarchy may grow and start searching for more complex patterns (in my terms - construct higher level virtual universes/simulators of universes).
Right, hierarchy may grow and probably should try to grow all the time, but the upper level would not be reliable until the base level stabilizes.
Todor
>It's true that a mind will skip over too predictable inputs, even if not driven by non-cognitive
>rewards. It's a form of novelty seeking that is not maximizing proximity, contrast, or even actual
>match. I didn't explain that in the knol. If you can define the criterion that's maximized in such
>"exploration mode", that would warrant a consolation prize :).
Nice... :)
My first intuitive guess is predictive range, compression ratio; I think it's related to minimum message length/Kolmogorov's complexity.
I'm not sure if these concepts are the answer to your question, but they sound interesting to me anyway. Sounds like “Predictability Analysis/Calculus”. :)
- For how long in the future/in space predictions are expected to match real input, based on the pattern and how much input data are enough to predict the whole future input, generated by the pattern. This is particularly apparent for simple patterns that are expected to take a lot of time, like speaking aloud 1, 2, 3, ..., 1 million. :) Generally, if you know the end from the beginning, you don't need to keep attention on the process.
- Predictability range and predictability precision of the new input, based on the recent/immediate or local input from a pattern, or more generally - how parts from an input assist in prediction/compressi
I noticed this in the past in a section from my writings with speculations about interestingness in pictures, e.g. generally a photograph would qualify this photo http://eim.hit.bg/3/
Interestingness is subjective, but this is true at least for the measure below:
The first photo can be drawn by a portion from it, extended with a simple cycle with instructions how to stretch and copy in perspective (implying mind does this and stores images this way - compressed and doing transformations and operations). The second one can't be compressed that way (not so simple), also there are more meaningful recognizable objects and mind needs to engage more. This is what Interestingness is all about - engaging mind to watch and try to predict what would come next. There are other aesthetics reasons for the interestingness as well - emotional, “organic” appearance/smoothnes
- Function of predictability in time/space. How prediction precision changes throughout the accumulation of more data. If precision stops rising, rises too slow or reaches to very high levels, the watch may stop – this is a saturation of the function of predictability through time. If I try to use your terms (hoping correctly) – if it's not possible to discover increasingly predictive short-cuts for a particular pattern anymore, it may be skipped over. This rule skips noise, as well.
It is possible the function of predictability to rise in a moment, e.g. seeing a flat blue banner.
Also, for a level of hierarchy, when predictability saturates, that is when a level can predict the future with a precision over a threshold, the hierarchy may grow and start searching for more complex patterns (in my terms - construct higher level virtual universes/simulators of universes).
Right, hierarchy may grow and probably should try to grow all the time, but the upper level would not be reliable until the base level stabilizes.
Todor
Report abusive comment
>
Thanks for the links! I've missed Leonid and yes, I do have to check out the
"raw scientific input" about the columns...
I like Perlovsky’s explanation of the “knowledge instinct”, but his “Dynamic Logic” doesn’t seem to be very deep.
> Then to you is behavioral/condition ing part - this makes
sense. I think that is probably another hierarchy (what you call hierarchy of
needs), where lower brains (brainstem, amygdala, hypothalamus) are higher levels
of *control* (basic needs) than the highest level of cognitive hierarchy, and
the direction is evolutionary backwards.
There’s no “control”, - computer analogies are misleading. Analogical thinking is a blunt instrument, try to avoid it. Higher motives are the ones that ultimately win over, not the ones that develop earlier. All brain areas have an inherited structure that determines their initial (instinctive) operation. Brain stem, amygdala, hypothalamus develop earlier, & their instincts dominate at first. Basic curiosity is a “cortical instinct”, likely driven by the structure of minicolumn, & neocortex is the last to fully develop. But that’s a genetically determined part. Postnatally, motivation develops by competitive conditioning of inherited motives & acquired value-loaded patterns in all of those areas. Conditioning is reinforcement of coincident (instrumental) & suppression of counter-incident (interfering) motives & stimuli patterns by all other motives. These patterns become acquired motives, but they're *not* lower than the original ones. Higher or lower is matter of strength, not of origin. Cortical cognition discovers more general patterns, that get relatively stronger because they stay instrumental longer. And curiosity itself is instrumental for discovery of all these patterns, so it ultimately becomes the top value & suppresses all others. You don't need any subcortical drives even to start, unless you have human physiology to take care of. But basic curiosity (I don't know the full "structure" of it yet) is only a start too. Introspective cognition derives higher orders of correspondence, developing things like mathematical curiosity.
> OK. :) At this point my terminology is “higher level virtual universes”, “higher level virtual simulators of virtual universes”, “higher level of control”.
I don't like your terminology. It's "fluffy": redundant, pretentious, fuzzy & misleading. That's a bad taste in science, as distinct from art. Artist thrives on analogical confusion, Scientist abhors it & craves analytical clarity. Make your choice. Your interpretations “sound” wrong on many levels, but you don’t really define your terms, to the extent that they're different from mine. If you think they’re more expressive, or your conclusions are different from mine, please explain how. Try to think more & talk less, you know, review & rewrite your reply for a few days before posting it :).
> In order to interact/interface with the lowest level universe for the system, higher level must decompress its representations throughout the hierarchy, and each level down adds details, making the picture increasingly sharper.
I don't know if there's any need for decompression, higher levels may only adjust focus (input span & resolution) for lower levels. Patterns of different scope / generality must be kept separate to avoid "paradoxes" :).
>>> and proximity and recentness may need to be more abstract. >>That's already explained in the knol. >Just going up in the hierarchy?
Up *&* down, that's what the hierarchy is all about. But neither direction is fully explained in the knol (even to the extent that I understand them), so use your imagination.
> My first intuitive guess is predictive range, compression ratio; I think it's related to minimum message length/Kolmogorov's complexity...
It all sounds vaguely relevant, but defining a criterion means quantifying it. It’s not match so it’s not an actual compression, or even future compression. “Expected”, “predicted”, "partial" - how do you derive those things from pixel-level inputs? Because if you can't do it there, you can't do it anywhere, - combinatorial explosion gets you. I’ve shown how to quantify a basic match, & that still stands as an initial criterion. How do you derive from it a higher-order criterion that drives exploration? You gave a bunch of higher-level examples, but I am not even going to bother with them, that's not where I operate.
If you want to get constructive (meaningful), try to formalize comparing a single-integer input to a fixed-length continuous sequence of older inputs, & then form its prediction over the next sequence of the same length & direction.
I like Perlovsky’s explanation of the “knowledge instinct”, but his “Dynamic Logic” doesn’t seem to be very deep.
> Then to you is behavioral/condition
There’s no “control”, - computer analogies are misleading. Analogical thinking is a blunt instrument, try to avoid it. Higher motives are the ones that ultimately win over, not the ones that develop earlier. All brain areas have an inherited structure that determines their initial (instinctive) operation. Brain stem, amygdala, hypothalamus develop earlier, & their instincts dominate at first. Basic curiosity is a “cortical instinct”, likely driven by the structure of minicolumn, & neocortex is the last to fully develop. But that’s a genetically determined part. Postnatally, motivation develops by competitive conditioning of inherited motives & acquired value-loaded patterns in all of those areas. Conditioning is reinforcement of coincident (instrumental) & suppression of counter-incident (interfering) motives & stimuli patterns by all other motives. These patterns become acquired motives, but they're *not* lower than the original ones. Higher or lower is matter of strength, not of origin. Cortical cognition discovers more general patterns, that get relatively stronger because they stay instrumental longer. And curiosity itself is instrumental for discovery of all these patterns, so it ultimately becomes the top value & suppresses all others. You don't need any subcortical drives even to start, unless you have human physiology to take care of. But basic curiosity (I don't know the full "structure" of it yet) is only a start too. Introspective cognition derives higher orders of correspondence, developing things like mathematical curiosity.
> OK. :) At this point my terminology is “higher level virtual universes”, “higher level virtual simulators of virtual universes”, “higher level of control”.
I don't like your terminology. It's "fluffy": redundant, pretentious, fuzzy & misleading. That's a bad taste in science, as distinct from art. Artist thrives on analogical confusion, Scientist abhors it & craves analytical clarity. Make your choice. Your interpretations “sound” wrong on many levels, but you don’t really define your terms, to the extent that they're different from mine. If you think they’re more expressive, or your conclusions are different from mine, please explain how. Try to think more & talk less, you know, review & rewrite your reply for a few days before posting it :).
> In order to interact/interface with the lowest level universe for the system, higher level must decompress its representations throughout the hierarchy, and each level down adds details, making the picture increasingly sharper.
I don't know if there's any need for decompression, higher levels may only adjust focus (input span & resolution) for lower levels. Patterns of different scope / generality must be kept separate to avoid "paradoxes" :).
>>> and proximity and recentness may need to be more abstract. >>That's already explained in the knol. >Just going up in the hierarchy?
Up *&* down, that's what the hierarchy is all about. But neither direction is fully explained in the knol (even to the extent that I understand them), so use your imagination.
> My first intuitive guess is predictive range, compression ratio; I think it's related to minimum message length/Kolmogorov's complexity...
It all sounds vaguely relevant, but defining a criterion means quantifying it. It’s not match so it’s not an actual compression, or even future compression. “Expected”, “predicted”, "partial" - how do you derive those things from pixel-level inputs? Because if you can't do it there, you can't do it anywhere, - combinatorial explosion gets you. I’ve shown how to quantify a basic match, & that still stands as an initial criterion. How do you derive from it a higher-order criterion that drives exploration? You gave a bunch of higher-level examples, but I am not even going to bother with them, that's not where I operate.
If you want to get constructive (meaningful), try to formalize comparing a single-integer input to a fixed-length continuous sequence of older inputs, & then form its prediction over the next sequence of the same length & direction.
Report abusive comment
I
realized an important difference - a different POV to a mind. My theory was not
inspired by brains, minicolumn hypothesis or so, it was a
sketch/direction/aim ed at a unifying theory of mind and systems
evolution in Universe.
Attempts to fit it exactly to brain regions causes a mess - there are overlaps and similarities to HTM, minicolulmn hypothesis, your theory, brains, but mine is different, digital, sketchy and was not as precisely defined, I see there were implied things which were not clearly specified and separated.
My speculations were based on observations on causality/determinis m (causal interdependency) and tendency of
evolving systems at prediction, repetitive and predictive behavior with ever
higher precision, resolution and range. "Control" in my writings was meaningful
and it's system's (module's) capability to predict and cause the future of what
it controls with certain probability/precisio n, where control is
formalized as a write to a memory, i.e. making certain target changes in an
output environment.
Mind is a compound/complex "control unit" itself, aiming at maximizing its capabilities to predict (imagine) and cause, where Universe is the ultimate control unit, "predicting" and causing everything at the maximum possible resolution, including mind itself, which is a "virtual sub-universe".
>There’s no “control”, - computer analogies are misleading.
My "mind sketch" was digital.
>(...) All brain areas have an inherited structure that determines their initial (instinctive) operation.
>Brain stem, amygdala, hypothalamus develop earlier, & their instincts dominate at first. Basic
curiosity is a “cortical instinct”, likely driven by the
>structure of minicolumn, & neocortex is the last to fully develop.
> (...) But basic curiosity (I don't know the full "structure" of it yet) is only a start too. Introspective
>cognition derives higher orders of correspondence, developing things like mathematical curiosity. "
Thanks, I see.
>I don't like your terminology. It's "fluffy": redundant, pretentious, fuzzy & misleading.
>That's a bad taste in science, as distinct from art. Artist thrives
>on analogical confusion, Scientist abhors it & craves analytical clarity. Make your choice.
>Your interpretations “sound” wrong on many levels, but you don’t really define your terms, to the extent that they're different from mine. If you
>think they’re more expressive, or your conclusions are different from mine, please explain how.
I'd make both choices. :)
Sometimes your definitions remind my observations and my interpretations are related to my theory and it makes sense *there*. Right - this is a mess.
Match, comparison, difference between predicted and expected, compression, a basic algorithm that learns other algorithms and data and collects them, complexity grow and sort of algorithmic complexity (but re-invented) etc. are some terms and topics from my writings. I'm not ready with a solid compressed explanations yet, though.
>I don't know if there's any need for decompression, higher levels may only adjust focus (input span &
>resolution) for lower levels. Patterns of different scope / generality must be kept separate to avoid
>"paradoxes" :).
Does adjusting focus mean:
- selecting/allowing comparison with more recorded samples is sort of widening of span - more general comparison.
- lowering the resolution allows recognition of fuzzy/pixelized images and results at a higher match ratio - a more general comparison.
I mean this: a word, a concept can be recorded and operated with a few bits from highest levels, but this is just a label, it makes sense in a high level virtual universe (imagination), but it needs much more raw data in order to be derived from a low level and to be expressed back there.
>Up *&* down, that's what the hierarchy is all about. But neither direction is fully explained in the
>knol (even to the extent that I understand them), so
>use your imagination.
OK
>It all sounds vaguely relevant, but defining a criterion means quantifying it.
>It’s not match so it’s not an actual compression, or even future compression.
>“Expected”, “predicted”, "partial" - how do you derive those things from pixel-level inputs?
I perfectly understand that it should start from the lowest level and the mechanics must be precisely defined. That is what I'm supposed to do "when I manage to concentrate"...
>I’ve shown how to quantify a basic match, & that still stands as an initial criterion.
>How do you derive from it a higher order criterion that drives exploration?
>(...)
>try to formalize comparing a single-integer input to a fixed-length continuous sequence of older
>inputs, & then form its prediction over the next sequence of the same length & direction.
Thanks for the task! I may have a break now and will be back later.
Todor
Attempts to fit it exactly to brain regions causes a mess - there are overlaps and similarities to HTM, minicolulmn hypothesis, your theory, brains, but mine is different, digital, sketchy and was not as precisely defined, I see there were implied things which were not clearly specified and separated.
My speculations were based on observations on causality/determinis
Mind is a compound/complex "control unit" itself, aiming at maximizing its capabilities to predict (imagine) and cause, where Universe is the ultimate control unit, "predicting" and causing everything at the maximum possible resolution, including mind itself, which is a "virtual sub-universe".
>There’s no “control”, - computer analogies are misleading.
My "mind sketch" was digital.
>(...) All brain areas have an inherited structure that determines their initial (instinctive) operation.
>Brain stem, amygdala, hypothalamus develop earlier, & their instincts dominate at first. Basic
curiosity is a “cortical instinct”, likely driven by the
>structure of minicolumn, & neocortex is the last to fully develop.
> (...) But basic curiosity (I don't know the full "structure" of it yet) is only a start too. Introspective
>cognition derives higher orders of correspondence, developing things like mathematical curiosity. "
Thanks, I see.
>I don't like your terminology. It's "fluffy": redundant, pretentious, fuzzy & misleading.
>That's a bad taste in science, as distinct from art. Artist thrives
>on analogical confusion, Scientist abhors it & craves analytical clarity. Make your choice.
>Your interpretations “sound” wrong on many levels, but you don’t really define your terms, to the extent that they're different from mine. If you
>think they’re more expressive, or your conclusions are different from mine, please explain how.
I'd make both choices. :)
Sometimes your definitions remind my observations and my interpretations are related to my theory and it makes sense *there*. Right - this is a mess.
Match, comparison, difference between predicted and expected, compression, a basic algorithm that learns other algorithms and data and collects them, complexity grow and sort of algorithmic complexity (but re-invented) etc. are some terms and topics from my writings. I'm not ready with a solid compressed explanations yet, though.
>I don't know if there's any need for decompression, higher levels may only adjust focus (input span &
>resolution) for lower levels. Patterns of different scope / generality must be kept separate to avoid
>"paradoxes" :).
Does adjusting focus mean:
- selecting/allowing comparison with more recorded samples is sort of widening of span - more general comparison.
- lowering the resolution allows recognition of fuzzy/pixelized images and results at a higher match ratio - a more general comparison.
I mean this: a word, a concept can be recorded and operated with a few bits from highest levels, but this is just a label, it makes sense in a high level virtual universe (imagination), but it needs much more raw data in order to be derived from a low level and to be expressed back there.
>Up *&* down, that's what the hierarchy is all about. But neither direction is fully explained in the
>knol (even to the extent that I understand them), so
>use your imagination.
OK
>It all sounds vaguely relevant, but defining a criterion means quantifying it.
>It’s not match so it’s not an actual compression, or even future compression.
>“Expected”, “predicted”, "partial" - how do you derive those things from pixel-level inputs?
I perfectly understand that it should start from the lowest level and the mechanics must be precisely defined. That is what I'm supposed to do "when I manage to concentrate"...
>I’ve shown how to quantify a basic match, & that still stands as an initial criterion.
>How do you derive from it a higher order criterion that drives exploration?
>(...)
>try to formalize comparing a single-integer input to a fixed-length continuous sequence of older
>inputs, & then form its prediction over the next sequence of the same length & direction.
Thanks for the task! I may have a break now and will be back later.
Todor
Report abusive comment
> I
realized an important difference - a different POV to a mind. My theory was not
inspired by brains, minicolumn hypothesis or so,
Me neither, I am a generalist.
> it was a sketch/direction/aim ed at a unifying
theory of mind and systems evolution in Universe. My "mind sketch" was
digital.
I suspect it was an attempt to project your computer experience into areas where it doesn't belong. Very typical for AI tinkerers, - lots of ambition, but no clue.
> Attempts to fit it exactly to brain regions causes a mess - there are overlaps and similarities to HTM, minicolulmn hypothesis, your theory, brains, but mine is different, digital, sketchy and was not as precisely defined
You don't really understand things you can't define. Your attachment to ill-formed assumptions of your youth, as well as constant self-promotion, is probably a sign of insecurity.
> I'd make both choices. :)
That's not making a choice. You'll do neither well, & even “well” is useless here, only the-best-in-the-worl d will
do.
> Does adjusting focus mean:
- selecting/allowing comparison with more recorded samples is sort of widening of span - more general comparison.
- lowering the resolution allows recognition of fuzzy/pixelized images and results at a higher match ratio - a more general comparison.
Neither, both work "upward", focusing is downward. Guess again.
> I mean this: a word, a concept can be recorded and operated with a few bits from highest levels, but this is just a label, it makes sense in a high level virtual universe (imagination), but it needs much more raw data in order to be derived from a low level and to be expressed back there.
Raw data is what you start with. It’s lost during selective elevation & you won’t regain it by decompression. Patterns on every level are search range –defined. “Expressing” high-level patterns on lower levels will only create confusion about their “true” range (& you’re confused enough:)). There’s no need for it anyway, higher levels “expectations” are compared to lower-level “experience” when the latter is selectively elevated, not vice-versa.
> I perfectly understand that it should start from the lowest level and the mechanics must be precisely defined. That is what I'm supposed to do "when I manage to concentrate"...
You won’t, until & unless you change lifestyle. You need a boring life.
Me neither, I am a generalist.
> it was a sketch/direction/aim
I suspect it was an attempt to project your computer experience into areas where it doesn't belong. Very typical for AI tinkerers, - lots of ambition, but no clue.
> Attempts to fit it exactly to brain regions causes a mess - there are overlaps and similarities to HTM, minicolulmn hypothesis, your theory, brains, but mine is different, digital, sketchy and was not as precisely defined
You don't really understand things you can't define. Your attachment to ill-formed assumptions of your youth, as well as constant self-promotion, is probably a sign of insecurity.
> I'd make both choices. :)
That's not making a choice. You'll do neither well, & even “well” is useless here, only the-best-in-the-worl
> Does adjusting focus mean:
- selecting/allowing comparison with more recorded samples is sort of widening of span - more general comparison.
- lowering the resolution allows recognition of fuzzy/pixelized images and results at a higher match ratio - a more general comparison.
Neither, both work "upward", focusing is downward. Guess again.
> I mean this: a word, a concept can be recorded and operated with a few bits from highest levels, but this is just a label, it makes sense in a high level virtual universe (imagination), but it needs much more raw data in order to be derived from a low level and to be expressed back there.
Raw data is what you start with. It’s lost during selective elevation & you won’t regain it by decompression. Patterns on every level are search range –defined. “Expressing” high-level patterns on lower levels will only create confusion about their “true” range (& you’re confused enough:)). There’s no need for it anyway, higher levels “expectations” are compared to lower-level “experience” when the latter is selectively elevated, not vice-versa.
> I perfectly understand that it should start from the lowest level and the mechanics must be precisely defined. That is what I'm supposed to do "when I manage to concentrate"...
You won’t, until & unless you change lifestyle. You need a boring life.
Report abusive comment
>I
suspect it was an attempt to project your computer experience into areas where
it doesn't
>belong. Very typical for AI tinkerers, - lots of ambition, but no clue.
Don't forget imagination and creativity - my kingdom :) - "universal simulators of virtual universes" are engines of imagination. Indeed I think art gives many clues about intelligence and the big picture of mind.
>You don't really understand things you can't define.
>Your attachment to ill-formed assumptions of your youth,
>as well as constant self-promotion, is probably a sign of insecurity.
Insecure - I am, this is correct. I need to make a breakthrough in order to stabilize life, income and start feeling more secure: a successful novel, a beautiful film with touching performance or so, and it's frustrating to balance time, wait and be unable to rise the resources needed.
Self-promotion - I don't have a real personal PR, an agent or so, I'm not acknowledged yet. Must attract followers and make contacts somehow, I want to start-up a business out of my art after all. I'd prefer somebody else to promote me.
Youth assumptions - I want to focus, understand and clear them out, before throwing them away. I'm attached, because I haven't finished with this.
See you in the next iteration!
T
>belong. Very typical for AI tinkerers, - lots of ambition, but no clue.
Don't forget imagination and creativity - my kingdom :) - "universal simulators of virtual universes" are engines of imagination. Indeed I think art gives many clues about intelligence and the big picture of mind.
>You don't really understand things you can't define.
>Your attachment to ill-formed assumptions of your youth,
>as well as constant self-promotion, is probably a sign of insecurity.
Insecure - I am, this is correct. I need to make a breakthrough in order to stabilize life, income and start feeling more secure: a successful novel, a beautiful film with touching performance or so, and it's frustrating to balance time, wait and be unable to rise the resources needed.
Self-promotion - I don't have a real personal PR, an agent or so, I'm not acknowledged yet. Must attract followers and make contacts somehow, I want to start-up a business out of my art after all. I'd prefer somebody else to promote me.
Youth assumptions - I want to focus, understand and clear them out, before throwing them away. I'm attached, because I haven't finished with this.
See you in the next iteration!
T
Report abusive comment
Art =
fluff. You love fluff, & crave attention, the rest is just an
excuse.
Trying to focus on "understanding" the assumptions made when you understood a lot less then you do now is pathetic. You need to understand the subject matter - cognitive algorithm.
Trying to focus on "understanding" the assumptions made when you understood a lot less then you do now is pathetic. You need to understand the subject matter - cognitive algorithm.
Report abusive comment
I
appreciate your badass wise sentences, but I like both art & science and
wanted and want to understand art as a cognitive process as well, it's a part of
the same machinery. Re-understanding operation is in progress, new understanding
is not in vain, this won't take much; and one of my immediate next AGI tasks is
to manage to think and write about cognition in your terms - will teach your
stuff and your comments to my students on Friday.
BTW, I believe a little bit of promotion may help even such a detached person like you. You agree that collaboration is the best "cognitive accelerator" and I'm sure at least some of the famous and smart AGI people such as Schmidhuber would spend some time with your articles and may make others consider them.
All needed is to let him/them know about you somehow.
BTW, I believe a little bit of promotion may help even such a detached person like you. You agree that collaboration is the best "cognitive accelerator" and I'm sure at least some of the famous and smart AGI people such as Schmidhuber would spend some time with your articles and may make others consider them.
All needed is to let him/them know about you somehow.
Report abusive comment
> I
appreciate your badass wise sentences, but I like both art & science and
wanted and want to understand art as a cognitive process as well, it's a part of
the same machinery. Re-understanding operation is in progress, new understanding
is not in vain,
Generalization is a reduction. Yes, everything you know is related to it, but you won't get anywhere by piling things up.
> one of my immediate next AGI tasks is to manage to think and write about cognition in your terms - will teach your stuff and your comments to my students on Friday.
Holding my breath :)
> BTW, I believe a little bit of promotion may help even such a detached person like you. You agree that collaboration is the best "cognitive accelerator" and I'm sure at least some of the famous and smart AGI people such as Schmidhuber would spend some time with
your articles and may make others consider them.
I appreciate your appreciation (& promotion), but you forgot the second best accelerator. The reason I am, IM!HO, a lightyear ahead of anyone else is that I gave up on recognition & collaboration with tinkerers+fluffers that populate the field. It's not what you got, it's how you use it. Smarts won't do any good if you lack motivation to focus on the only problem that matters. Famous people have their blinders on. They're too distracted by, & protective of, their fame to pay attention to some security bum who tells them that their lifework is a pile of irrelevant crap.
Yes, collaboration would be great, but... I despair. Anyone who knows how to punch right keywords into Google will find me (there's *nothing* else), & those who don't are likely to be more trouble than help.
Generalization is a reduction. Yes, everything you know is related to it, but you won't get anywhere by piling things up.
> one of my immediate next AGI tasks is to manage to think and write about cognition in your terms - will teach your stuff and your comments to my students on Friday.
Holding my breath :)
> BTW, I believe a little bit of promotion may help even such a detached person like you. You agree that collaboration is the best "cognitive accelerator" and I'm sure at least some of the famous and smart AGI people such as Schmidhuber would spend some time with
your articles and may make others consider them.
I appreciate your appreciation (& promotion), but you forgot the second best accelerator. The reason I am, IM!HO, a lightyear ahead of anyone else is that I gave up on recognition & collaboration with tinkerers+fluffers that populate the field. It's not what you got, it's how you use it. Smarts won't do any good if you lack motivation to focus on the only problem that matters. Famous people have their blinders on. They're too distracted by, & protective of, their fame to pay attention to some security bum who tells them that their lifework is a pile of irrelevant crap.
Yes, collaboration would be great, but... I despair. Anyone who knows how to punch right keywords into Google will find me (there's *nothing* else), & those who don't are likely to be more trouble than help.
Report abusive comment
No comments:
Post a Comment