Intelligence
This blog has only one frequently updated core post, on the top. It's an intro to my life's work, to be continued until I am obsolete.
Wednesday, February 1, 2012
Cognitive Algorithm
I define intelligence as an ability to predict & plan (self-predict) by discovering & projecting patterns. In other words, it’s a capacity for cognition, formalized as a hierarchical pattern discovery process. The algorithm implementing this process should start by cross-comparing quantized sensory input flow within a fixed range. Comparisons discover initial patterns, which are selectively forwarded to the next level for expanded search. Recursive comparison & selective elevation of input patterns on successive levels of search will discover increasingly general patterns or concepts. Higher levels generate downward feedback, ultimately a motor action, to focus on sources expected to be additively predictive for a target level.
Hierarchical approaches are pretty common, & many use some sort of pattern recognition. But none that I know of attempts to implement a strictly incremental growth in scope & complexity of discoverable patterns. This is critical for efficient scalability because it allows for input selection at each increment. Without such selection, by theoretically derived criteria, a combinatorial explosion in potential search space is inevitable.
Most people find my writing on the subject “obtuse”, - that‘s not for the lack of effort. For a more gentle & contextual introduction see “On Intelligence” by Jeff Hawkins, but… it’s also more shallow & confused. I am actually pretty good at explaining normal things, but formalized intelligence must be a generalization of our entire experience. I think working on this level requires an extreme “generalist mindset“: encyclopedic theoretical curiosity at the young age, followed by systematic introspection & rigorous deduction from the resulting generalizations. This can’t be done as a hobby: many people, otherwise quite accomplished in superficially related fields, are reduced to piecemeal trial & error by the vast scope of this problem.
Contents:
1. Cognition vs. evolution & related approaches
2. Comparison: quantifying match & miss per input
3. Search: incremental range & derivation of comparisons
4. Patterns: syntactic expansion & re-integration
5. Feedback: selection by higher-level aggregate representations
6. Hierarchical short-cuts: selection by downward expectations
7. Notes on a working mindset & a prize for ideas
1. Cognition vs. evolution & related approaches.
We know one mechanism that produced a human-level intelligence: biological evolution.
Initially algorithmically very simple, the evolution alters heritable traits at random & selects those with above-average reproductive fitness. But, evolution is horribly inefficient because selection is extremely coarse: on the level of a whole genome rather than individual traits, & also because intelligence is only one of many factors for reproductive fitness. And obviously, there’s nothing intelligent about random variation.
Unlike any evolutionary algorithm, I think cognitive process should be altered solely by the data: environmental stimuli, incrementally generalized into patterns, & then into algorithms. These categories differ only in the degree of discovered recurrence, or generality.
A popular attitude in AI is that intelligence can be recognized but not defined. That seems absurd to me: recognition *is* a match between an input & a definition. Some researchers would agree with my definition, but none that I know of apply it as a criterion for bottom-up development. I believe this lack of theoretical integrity is a main reason for the failure to scale, even in principle, in all AI/ AGI attempts to date.
If intelligence is an ability to predict, then cognitive fitness function is predictive correspondence of recorded inputs: their cumulative match to future inputs. We have no direct knowledge of the future, so predictive correspondence must be estimated by criteria that are found to correlate with it. Initial sensory inputs should be single variables, such as pixels in case of vision. On this level, predictive correspondence can be estimated only by past matches of these variables. In the following sections, I’ll try to show how more complex multi-variable criteria are discovered on higher levels of search, & form a downward feedback. Such “recursively self-improved” input selection should allow for scalable search expansion, where newly discovered predictive correspondence grows roughly in proportion with the quantity of searched inputs.
This intro obviously suffers a dire scarcity of references. The dilemma here is that everything ever written is somehow related to my subject. But, generalization is a reduction. And cognitive algorithm is a meta-generalization, which demands an utterly ruthless reduction. Unfortunately, I am yet to see an approach that is sufficiently consistent with the principles outlined here, to be a non-obvious foundation for mine.
My presentation is self-contained: strictly & formally bottom-up, thus can be understood without references. However, it does require a “clean” context: relinquishing assumptions from any higher-level approach.
Two of the closest approaches seem to be Algorithmic information theory & Bayesian inference, which use the same criteria as mine: compression & prediction. A good introduction is Philosophical Treatise of Universal Induction by S. Rathmanner & M. Hutter.
While a progress vs. a static “frequentist” probability, BI & AIT still assume a “prior”, which doesn’t belong in a consistently inductive approach. To generalize it, Solomonoff introduced universal prior: “a class of all models“. A priori infinity of this class means that he hit a combinatorial explosion even *before* receiving actual inputs, - “solution” that only a mathematician may find interesting. In my approach, the models are simply past inputs & correlations among them. Environmentally specific priors could speed-up learning, but a general pattern discovery algorithm must be the core on which such short-cuts are added or removed from.
Also perverse is binary resolution of initial inputs in BI & AIT: confirmation / disconfirmation events. In reality, expectations are rarely matched or missed precisely, so the degree of confirmation must be quantified for individual events. Quantifying partial match would add a micro-grayscale to the binary value of events in Bayesian prediction, just like the latter added macro-grayscale (partial probability) to binary (true| false) predictions of classical logic.
Besides, the events are assumed to be high-level concepts, the kind that occupy our conscious minds. But a scalable search algorithm must start from sensory data processing that’s subconscious for us, rather than depend on human preprocessing. So, this definition of initial inputs in BI & AIT already shows a total lack of discipline in incrementing complexity: a fatal fault for any attempt at scalability.
There’re plenty of loosely incremental approaches to AGI, but the increments always seem to be coarse & arbitrary. Thus, they miss the opportunity for intermediate selection, decreasing efficiency of search as it expands. Of course, a general intelligence must have an indefinite range of search. Even a minor inefficiency of selection is multiplied at each search expansion, soon resulting in predominantly junk comparisons.
I propose a strictly incremental inductive approach: search must start with minimal-complexity inputs & proceed with minimal-complexity increments in scope & syntax for selected inputs. There is always only *one* logical next step: an additive complexity cost that brings the greatest additive predictive value:
2. Comparison: quantifying match & miss per input.
It seems obvious to me that all our non-combinatorial knowledge is ultimately derived from senses. The "symbolic" data is implicitly encoded by the “neural algorithm” that derived it, & understanding this algorithm should be exponentially easier than automating further generalization without decoding the source. Thus, initial inputs for generally intelligent algorithm should be context-free: un-encoded or analog, such as those received by senses. Basically, if we can’t do it there, we can’t do it anywhere.
“Cognitive” purpose of processing inputs is to predict future inputs, where prediction is quantified as a match of an input to the expectations. In a non-random environment, the most basic expectations are simply older inputs, & their estimated predictive value is past average match among adjacent inputs. Subsequent individual matches refine expectations of future matches for a given input. I define match per comparison as a reduction (compression) of recorded magnitude by replacing a larger comparand with its derivative (miss) relative to a smaller comparand. Magnitude is a main criterion here because it represents physical values that we want to predict. Coincident compression | expansion of used record space & processing time affects the costs, but not the benefits, of comparison & prediction.
Given incremental complexity of representation, initial inputs should have binary resolution. However, average binary match won’t justify the cost of comparison, which adds a syntactic overhead of newly differentiated match & miss to positionally distinct inputs. Rather, these binary inputs are compressed by digitization: a comparison across sequentially formed levels of scale, forming integers represented as a hierarchy of digits. Digitization is performed on all inputs within a shared coordinate. Resolution of such coordinate is defined by feedback, to form integers that are large enough for an average match between them to merit the above-mentioned costs of comparison (match is a subset of magnitude).
Hence, the next order of compression is comparison across coordinates, initially defined with binary (before | after) resolution. Comparison forms signed derivatives, complemented by which new inputs can losslessly & compressively replace older templates. At the same time, current input match determines whether individual derivatives are also compared (vs. aggregated), forming successively higher derivatives. “Atomic” comparison is between a single-variable input & a template (older input):
Comparison: match= min (input, template), miss= dif (i-t): aggregated over the span of constant sign.
Evaluation: match - average_match_per_average_difference_match, formed on the next search level.
Actually, evaluation can be increasingly complex, but I will need a meaningful feedback to elaborate.
Any comparison is an inverse arithmetic operation of incremental power: Boolean AND, subtraction, division, logarithm, & so on. Binary match is a sum of AND: partial identity of uncompressed bit strings, & miss is an offset. Binary comparison is useful for digitization, but it won’t further compress the integers produced thereby. This is a common principle, the products of a given-power comparison can be further compressed only by a higher-power comparison between them.
Thus, subsequent comparison between integers is done by subtraction, which increases match by compressing miss from offset to difference, in which opposite-sign bits cancel each other via carry. The match is increased because it is a complimentary of difference, equal to the smaller of the comparands.
Division further reduces difference to a ratio, which can then be reduced to a logarithm, & so on. Thus, complimentary match is increased with the power of comparison. But the costs may grow even faster, for both operations & incremental syntax to record incidental sign, fraction, irrational fraction. The power of comparison is increased if current match & miss predict further improvement, as determined by higher-order comparison between the results from different powers of comparison. This forms algorithms or meta-patterns. Again, I’ll need a constructive feedback to elaborate on the mechanism.
Note that in my approach, the relation of input’s resolution to its position’s resolution on 1D level is reversed from corresponding relation used in algorithmic information theory & Bayesian inference.
Transformation (as in Fourier transform) from input values to their derivatives is a basic compression method for positionally differentiated data. But it can’t scale without evaluation & selection of inputs (starting from pixels) for incremental expansion in range & power of comparison (transformation):
3. Search: incremental range & derivation of comparisons.
Assuming that the environment is not random, average match will decline with the distance: older templates are decreasingly predictive of future inputs. To maintain proximity to future inputs, the queue is FIFO: the oldest template is displaced by a new input & becomes an output. Along with declining match, continuous search span (number of templates in a queue) is also limited by the cost of increasing redundancy in representation of derivatives (match & miss). Aggregated derivatives are redundant to compared templates & their derivatives, to the extent of overlap in their aggregation span. Length of a queue is limited by the distance at which these costs exceed match for average comparison.
Queue’s outputs are evaluated for extended search on a higher-level queue, which must increasingly selective to avoid combinatorial explosion. Selection criteria are partial aggregated representation of a target queue, formed by adding corresponding variables of its inputs & subtracting the outputs.
The most basic criterion is an average match, multiplied by redundancy in representation of the output. I’ll expand on increasingly complex & predictive selection criteria in the “Feedback” part. Evaluation is a comparison of selection criteria in an output, to those in aggregated representation of a target queue: the output is selected as input to the higher level if the formed are greater than the later.
Non-selected outputs are aggregated within a minimal span: between individually represented outputs, or a maximal span, which increases aggregated magnitude to a value that forms an average match in a higher-level comparison. Match is limited by (& crudely correlates with) the magnitude of comparands. Thus, a higher queue will consist of selected discrete templates, interlaced with aggregated templates. Even though the magnitude of the later is increased, any higher-resolution matches within aggregation span are lost, & variable span makes comparison between aggregated templates more difficult. Partial aggregation among outputs expands coordinate span for a higher level queue of a fixed length.
Copies of non-selected outputs can also be stored in longer-term buffers, to delay the loss of detail. Such buffers are implemented in slower & cheaper media (tape vs. RAM), with possible compression by non-selective transforms, & multiple stages for parallel access. The buffers are accessed only if higher-level patterns that represent buffered templates at a lower resolution become strong enough to justify increased resolution, or if new inputs get relatively “close” to the buffered templates again.
For those with ANN background, I want make clear that each level of search here is 1D queue, not a 2D layer of ANN. Adding a single 1D -expansion at a time allows for incremental selection to compensate for the cost of each coordinate, & correspondingly more efficient scaling of search. So, while internally represented dimensionality of input patterns is increasing, their external order remains a 1D sequence. Also, dimensions are added in the order of decreasing rate of change therein. This means that spatial dimensions (with controllable rate) must be scanned first, while comparison over purely temporal sequence is delayed until accumulated variation within it justifies search for additional compression.
These principles are not empirically specific, but our practically universal primary environment is a spatio-temporal continuum. Here, initial levels should be incremental dimensions of search, & that of resulting patterns: 0D ) 1D ) 2D ) 3D ) TD. Each additional dimension requires its own coordinate, increasing the costs of “syntactic overhead”. This implies incremental selection: the inputs for search on higher dimensions must have greater projected match, capable of bearing additional costs. Initial dimensional definition is binary: before| after, integer-level relative coordinates are only needed to represent discontinuity between inputs, produced by their selective representation on higher levels.
Sequential increase of dimensionality can be traced in most biological senses. In vision, original stimuli are aggregated into 0D “pixels” of brightness, represented by each rod or cone cell in the retina. Output here is a spike train produced by accumulate- &- fire, which in functional terms performs digitization. All of these cell "scan" in the same direction of eye tremor or a saccade, receiving 1D queue of inputs. Some form of “comparison” over this queue might also be done within each sensor cell, or its 1D axon. 2D would be integration of 1D outputs of sensor cell across retina, then LGN, & primary visual cortex. 3D is formed by integration across ocular dominance columns in V1|V2, colors are integrated in V4, & temporal sequences, probably in V5 (MT) & beyond.
Even higher levels should process search & integration across sensory modalities, & then over increasing spatio-temporal distance between multi-modal & multi-dimensional patterns.
My approach may seem similar to Levin Search, but the latter randomly generates algorithms of incremental complexity, & selects those that happen to solve a problem or compress a bitstring. What I proposed is a search for patterns within environmental input flow. Cognition must start with empirical data, pure math becomes cost-efficient only on much higher levels of generalization. In any case, a hard distinction between input patterns & algorithms only makes sense in the context of special-purpose programs. It fades away if the algorithms are seen as simply incrementally complex short-cuts to pattern discovery, themselves discovered by search across derivatives produced by prior comparisons. “Search” may sound too simple, & working intelligence is obviously very complex. But, given a general fitness function, incremental complexity is discoverable by higher-order comparisons:
4. Patterns: syntactic expansion & re-integration.
A pattern is simply a higher-level input, I use a different term here just to emphasize their increasing predictive value & internal complexity. Every level of search potentially adds a new layer of syntax to an input: selected (initially all) variables of an input are compared, which “splits” into new derivatives: match & miss. These derivatives are either aggregated between comparisons or, in special cases, also individually compared. As with templates, minimal aggregation span is between individually compared derivatives, & maximal span is determined by average magnitude (thus match) of these derivatives on a higher level. Hence, a basic comparison cycle generates queues of interlaced individual & aggregated derivatives at each template variable, & conditional higher derivatives on each of the former.
I use “syntax” to mean identification of different variable types by their position (syntactic coordinate) within a pattern. This is analogous to recognizing parts of speech by their position within a sentence.
Initial variable types are different stimuli (such as different colors in vision), of corresponding senses (modalities). Subsequently, quantized inputs of each stimuli type are compared across the sequence of prior inputs, & then across incrementally higher spatial dimensions, as explained above.
Beyond original modalities & dimensions, comparisons & evaluations during search generate an indefinite number of secondary variable types. Such extended syntax records conditional operations that formed each variable, to cross-translate the variables for future comparisons & evaluations.
So, each basic comparison “splits” an input variable into two higher-order variables: relative match (m) & miss (d) between an input & template. Both of them are signed, as well as aggregated across multiple comparisons within the length of a constant sign: L(m) & L(d). Relative match determines comparison vs. aggregation for individual differences, forming additional queue of ds within each positive L(m).
On the following levels of search, same-type derivatives are also selectively compared between patterns. This generates secondary derivatives over greater distance, &|or over different types of coordinates. Such syntactic expansion is pruned by selective representation of variable types in each input, vs. their aggregation within a lower resolution syntax, coordinate, or magnitude.
A sufficiently complex syntax will justify comparing variables within a pattern & across a “syntactic“ coordinates, analogous to comparison across external coordinates. In fact, that’s what happens with higher-power comparisons. For example, division is an iterative comparison between a difference & a match, - across the order of derivation. Next, there should be selective comparisons across increasing syntactic discontinuity. For example, comparison of lengths: L(m) & L(d), across different dimensions within a multi-D pattern will compute potentially recurrent dimensional proportions.
Internal comparisons can further compress a pattern, but at the cost of adding a higher-order syntax, which means that they must be increasingly selective. This selection will increase “discontinuity” over syntactic coordinates: operations necessary to convert the variables before comparison.
At some point, these operators will become “large” enough to merit direct comparison / search among them. This will produce purely algebraic equations, where the match (compression) is a reduction in the number of operations needed to produce a result. The first such short-cut is probably a version of Pythagorean theorem, to be discovered during search in 2D as a way compute cosines. Cosines are necessary to normalize lengths: L(m) & L(d), initially in 1D, to a value they would have in an orthogonal “subjective” angle of 1D scan lines. Basically, while comparing 1D Ls (adjacent in 2D) by division, across an angle (1D distance & its derivatives), the algorithm should discover a relatively constant ratio between the ratio of 1D Ls & a second derivative of 1D distance. That ratio is a cosine, & resulting normalization for POV change is so basic that it might’ve evolved & is genetically encoded in animals.
The patterns I described are not qualitatively different from our largely intuitive semantic concepts, - most them are generalized empirical objects & processes. Given sufficient computational resources (combined with autonomously discoverable mathematical shortcuts), the search over incrementally complex derivatives should discover patterns / concepts on & beyond the level of natural language. Combined with the distances, derivatives can form “motor” feedback vectors that select sources (location & resolution of future inputs) projected to maximize predictive correspondence of a target.
5. Feedback: selection by higher-level aggregate representations.
This chapter will expand on the principles of selecting inputs for the next level of search by feedback. The most basic feedback is aggregated representation of selection criteria in a feedforward: inputs to the next level queue. Whole-queue representation may also contain averages of other input variables, for preliminary comparison, but that will have lower value than selection of an input as a whole.
As proposed above, the outputs are initially selected for extended search by their accumulated match, minus the average (normalized aggregate) match on the higher level, which is multiplied by the redundancy of output’s prior representation there. This selects for generality, or invariance in terms of Jeff Hawkins (in “On Intelligence”).
He also suggests selection by novelty, but that would be mutually exclusive with generality. The scope of discovered generality is limited by the span of experience searched by an input, & the longer it searches (especially over the following inputs) the less “novel” it becomes. Any pattern is defined by some sort of repetition, so prioritizing novelty per se would simply select for random noise.
Generality must be the ultimate criterion, but incrementally more abstract types of correspondence (which defines generality) are “novel” relative to the lower ones. The most basic form of such apparent novelty-seeking, that actually increases predictive power, is simply a preference for more recent inputs.
Recent inputs are relatively more predictive than the older ones because of their temporal proximity to future inputs. Thus, proximity should determine the span of search within a level of generality. But it can’t select for hierarchical elevation: search expansion increases the distance among comparands. Selection for proximity is implemented via FIFO design of template queues, see the “Search” chapter.
A higher form of such coincidental novelty is change, or the difference between S-T proximate inputs. Difference is an alternative representation of a comparand, but its association with match may make it stronger than the original one. Strong enough difference is compared to a queue of other selected or aggregate differences between the same template & other comparands, & then evaluated as an output. Evaluation is subtraction of target queue’s average difference, which indicates the trend for subsequent inputs. This subtraction forms relative difference, similar to relative match.
Value of output template in projected for a target queue by “recombining” it with co-derived difference, which is also projected: multiplied by its relative distance to an average coordinate of a target queue. Projection of a difference “competes” with that of a template, so both must be adjusted proportionally: projected_template = template + average_distance * relative_difference * aggregate difference / match.
So, the two most basic selection criteria are unique relative match & projected relative difference. These are initial derivatives (match & difference), minus corresponding averages (normalized aggregates) of a higher level. What makes selection more complex than simple comparison is the necessity to adjust (multiply) the averages by newly formed parameters: redundancy for match & distance for difference. These new parameters are products of prior input selection, which increases redundancy of a potential new input, or prior input aggregation, which increases coordinate span of a target queue.
Again, difference is primarily an alternative representation for compared template. Hawkins proposed that a main value of change | contrast is its “novelty”, but from my perspective that value is negative: contrast cancels positive predictive value of an interrupted pattern. The value here is not independent, but “borrowed” from the pattern. There’s plenty of “change” within random noise, but it has no value because there’s no pattern to be interrupted. So, selection by contrast is partial cancellation of a pattern by co-derived negative vectors, proportional to relative strength of difference vs. that of the match.
Further notes on Hawkins, semi-related to this chapter:
In Hawkins' HTM model (as in many conventional ANNs) match per pixel is defined with binary resolution, making evaluation meaningless. There is also no selection per 1D line of pixels, which would reduce complexity before adding higher dimensions. I guess he sees no need or opportunity for that because he doesn’t even “see” explicit coordinates. Hawkins begins selection by evaluating match between 2D frames: the level of complexity that leads to combinatorial explosion almost immediately (on the 4th layer of his ANN). Also, he arbitrarily ignores derivatives (0D miss), as well as coordinates & distances (1-4D miss). These parameters are necessary to form vectors, without which pattern prediction must remain extremely crude. I think such externally & internally coarse comparison & selection is the reason why HTM & similar RNN models don’t scale.
6. Hierarchical short-cuts: selection by downward expectations.
In addition to selection by representation of the next level, the output may also be filtered by direct feedback from the levels beyond the next. Such higher-level feedback is what we call “expectations”. Just like feedback from immediately higher level, these short-cuts select inputs for their additive generality, but with potentially higher derivation for average (expected) matches & differences. Expectations represented by a higher-level short-cut have correspondingly increased range.
A common objection to having predictive correspondence as a fitness function of intelligence is that it would keep you “staring at a wall”: lock into predictable environments. We tend to do the opposite, - skip over too predictable locations, thereby reducing match of new inputs to known templates. But, I submit, that’s because we maximize projected (higher-level) rather than current correspondence. The value of expectation is negative for potential inputs because confirmation is redundant, - predictive value of higher level templates will only increase (decrease) to the extent that the match (miss) is unexpected. Thus, expected match to higher level represented by a short-cut is a form of redundancy.
Downward suppression of locations with expected inputs will result in preference for exploration & discovery of new patterns vs. confirmation of the old ones. This is the opposite of upward selection for stronger patterns, but this “sign reversal” in selection criteria is a basic feature of feedback, - this also holds for subtracting the average, redundancy, & projected differences, explained in previous section.
Focus on unknown locations is different from the elevation of unexpected, as suggested by Hawkins, because all outputs of a given location are equally suppressed by expectations. In relative terms, stronger patterns within a location (continuous search span) still win over the noise (weaker patterns). Expectations can’t be pattern-specific on a lower level because individual patterns would be “out of range” there. Basically, a range of search is what defines a level & its syntax. Thus, lower levels simply don’t have the syntax to represent the origin of higher-level patterns, & will “mistake” them for the local ones. Note that my interpretation is in agreement with the consensus in neuroscience that cortical layer-I feedback is modulatory in nature, rather than containing “driving“ inputs as Hawkins proposes.
Of course, human curiosity is not purely cognitive, it is biased toward a survival value of information. Proximate & changing objects are more likely to affect a subject in a short term, thus attracting far more attention than they would for their immediate contribution to predictive power. That should also be the case for a purely cognitive system with enough introspection: it will seek impacts | materials that could buildup its predictive capacity, & avoid those that threaten it. Maximizing cognitive capacity will increase predictive correspondence of self-representation, rather than that of external representation. But longer term, it will increase external predictive correspondence more than direct pursuit thereof.
“Seeking” in the above describes both modulatory feedback downward a hierarchy of representations, & a motor feedback as a modulation of sensors / actuators “below” that hierarchy. Modulation is an adjustment of scope & resolution for both coordinates & sensitivity of lower-level sources. The scope (temporal & spatial) is defined in units of resolution for corresponding coordinates, which is adjusted (reduced) in proportion to the variation in local output flow.
Higher types of curiosity, which I formalize as a higher-syntax selection criteria, maximize more abstract forms of correspondence. Any representation can be seen as a partial & mediated form of reproduction, & such “abstraction” is what distinguishes any cognitive process from biomorphic evolution that maximizes “direct” reproduction.
7. Notes on a working mindset & a prize for ideas.
The algorithm of intelligence is a meta-generalization of entire cognitive experience, “lossy” in proportion to its vast scope. Such inductive lossiness is an anathema to deduction-oriented culture of math, computer science & artificial intelligence. I think this culture is largely responsible for barely detectable progress in theoretical understanding of cognition, hence AI’s focus on narrow tasks.
Many researches believe that AGI is a mathematical problem. I think that’s is a "man with a hammer" syndrome. Cognition is an incremental problem, - intelligence is a matter of degree. Higher math can accelerate learning, but is not necessary to initiate it on a human level, - cavemen didn’t do calculus.
Many more think it’s an engineering problem. As with math, it depends on how you define engineering. But there is an obvious trade-off between theoretical scope & practical certainty. General intelligence is an ultimate extreme of former, while any conventional engineering heavily favors the latter.
At a higher level, scientific approach to the problem is cognitive psychology. It’s far more intuitive in the way that theories are formed, but these theories must then be confirmed by high-level observations. That’s also too coarse & slow to produce a noticeable progress.
It seems obvious to me that AGI is a meta-scientific problem, requiring more theoretical (detached from immediate verification) attitude than what’s acceptable in any established field. The method here must be introspective generalization, plus understanding of unconscious low-level sensory information processing. Conceptually, it’s a province of philosophy, but philosophy is a clearly dysfunctional field. That’s not entirely surprising, - its main source of income are clueless freshmen, who replaced equally clueless aristocratic highbrows. Anyway, no one seems to be enough of a generalist. We didn’t evolve for this level of introspection, - there was no way to implement the results. Now we have computers, but human psychology is largely stuck in the farming age, if not retreating to hunter-gatherer age.
So, that’s my excuse for the lack of credentials: none are sufficiently relevant & I don’t have time to waste. What I do have is a far more advanced work-in-progress, but will need a meaningful feedback to elaborate. Will appreciate specific questions, pointers to flawed logic or complimentary approaches.
Lots of people think that a major problem in AI is a lack of funding. I disagree, Einstein didn't need to be paid to work on his theory. Real science (especially theoretical) is driven by curiosity, & that should be even more true of meta-science. Anyway, I made a few bucks on my investments & want to put the money to work. If you're interested in scalable complexity pattern discovery, here's an incentive:
I offer prizes from $100 to $100,000 for the ideas that refute, correct, or further develop principles explained above. $100 could be a "consolation prize" for ideas that I already incorporated but did not fully explain here, or that are largely cosmetic in nature. $100,000 is for a major advance, - there won’t be many. A winner will have an option to convert his prize into an interest in all commercial applications of a final algorithm, at the rate of $10,000 per 1% share. This would be informal & likely unprofitable, - mine is not a commercial enterprise. Again, I don't believe money can be primary motivation here, but it may help a bit & has a way of attracting attention.
So far, - one winner: Todor Arnaudov, thank you very much! His idea seems obvious in retrospect (they always do), - simple buffering of old inputs after search. The "buffer" is accessed if the inputs' location gets relatively "close" again. It occurred to me more than once before, but I rejected it as redundant to potential elevation of those same inputs. Now that he made me think about it again, & stuck to it, I realized that partial redundancy can be justifiable. Buffering is much cheaper than elevation, & could be done in parallel to delay the loss of detail. It didn’t feel right because neocortex has no substrate for passive storage, but that should not matter here. The prize was $600, would've been higher, but... there was a problem with signal-to-noise ratio :). Well, it's a start (May 2010).
Todor also won $400 consolation prize for understanding some basic principles that weren’t clearly explained here. (September 2011).
Saturday, January 7, 2012
Discussions with Ben Geortzel on AGI list
Some of my replies are edited for clarity:
Ben G: About predictive accuracy as an intelligence measure.... What matters is if the system is good at predicting which sequences of its action will lead to achievement of its goals in the contexts relevant to its life. This is different than, though related to, general predictive capability.
Boris: For a purely cognitive system the only goal is maximizing its predictive correspondence (accuracy * range). That goes for both internal information processing & external action (see the end of part 4). We can adopt it for our goals by pre-selecting inputs.
Ben G: "Hard distinction between input patterns & algorithms exists only for special-purpose programs. "
Hmmm, well clearly in the brain there's a distinction between its input patterns and the algorithms implicit in its wiring, no?
Boris: I think you're talking about a distinction between innate & acquired wiring patterns. Yes, some level of algorithms must be built-in to initiate learning at an acceptable speed, but the cut-off is not a qualitative distinction. Cognitive algorithm can then be refined & extended indefinitely. All of our math is such an extension.
Ben G: "... if a given location is projected to be important enough (per “hierarchical feedback” part), all its outputs are elevated losslessly & eventually compared in all possible combinations. "
How can you take all possible combinations, in reality?
Boris: You can't, not if you include combinations of subsequent derivatives, I was talking "in the limit".
Ben G: Don't you need to prune the space of possible combinations? How is this done?
Boris: I already described evalutation by projected match for empirical inputs & patterns. In pure math (on much higher levels), the criterion is reduction (as in equations), which is an equivalent of match. You prune the expressions with below-average resuts-per-operation
Ben G: Are the following simple points correct?
-- you're building a hierarchical pattern recognition network,
Boris: Obviously.
-- it's substantially aligned with the spatiotemporal structure of the perceived world
Boris: Initially, but a comparison sequence can be re-ordered on higher levels according to more compressive "coordinates".
-- higher levels of the network embody more abstract patterns, combining the outputs of the lower levels
Boris: Yes, but "network" implies lateral data transfers, while in my model primary data transef is vertical: across levels.
-- perception and action are carried out in the same network, so that action control and planning are part of the same process as top-down perceptual feedback
Boris: Right.
-- the system's goal is accurate prediction,
Boris: More like a "predicted... prediction", actual confirmation is not always necessary.
-- and somehow those patterns that lead to accurate predictions are going to be rewarded and have more likelihood of surviving and being used again
Boris: They're selected as inputs to higher levels, which have a longer search cycle, thus slower content "recycling".
Ben G: I don't think I'll be able to fully grok your design and algorithms in detail without putting us both through more QA than we want to do,
Boris: No problem. BTW, I may post (a constructive part of) this discussion as comment on the knol, do you mind? Might save me a few questions in the future.
osted by Boris
Kazachenko, last edited Sep 21, 2011 10:20
AM
>
Boris: For a purely cognitive system the only goal is maximizing its
> predictive correspondence (accuracy * range).
Ben G: Hmmmm.. in that case I don't think "purely cognitive systems" are
going to be very useful in reality. Given limited compute resources,
they will be massively outperformed by systems that are oriented
toward maximizing *useful* predictive correspondence...
Boris: You don't know what's "useful" till you have a big chunk of
"correspondence" in the first place. And you can't define "useful" in general terms anyway,
so this is just another one of your excuses for lazy thinking.
> Boris: I already described evaluation by projected match for empirical
> inputs & patterns. In pure math (on much higher levels), the criterion is
> reduction (as in equations), which is an equivalent of match. You prune
> the expressions with below-average results-per-operatio ns ratio.
Ben G: Yeah, but there are very many expressions due to combinatorial
explosion -- you can't just produce them all then prune the bad ones.
Is your approach to incrementally build up complex expressions
compositionally from simpler ones?
Boris: I think I have "incremental" in just about every paragraph in the knol, starting from the 1st. You seem to have too many things on your mind to keep track of this discussion.
Ben G: If so, why don't you run into the same problems as greedy learning
systems? Presumably because the
top-down feedback from the existing complex expressions guides the
formation of new complex ones from simpler components, I suppose. But
this is a key point and it's not very clear to me how your system does
it...
Boris: Maybe you can point out what part of the process I already described
is not clear to you.
> predictive correspondence (accuracy * range).
Ben G: Hmmmm.. in that case I don't think "purely cognitive systems" are
going to be very useful in reality. Given limited compute resources,
they will be massively outperformed by systems that are oriented
toward maximizing *useful* predictive correspondence...
Boris: You don't know what's "useful" till you have a big chunk of
"correspondence" in the first place. And you can't define "useful" in general terms anyway,
so this is just another one of your excuses for lazy thinking.
> Boris: I already described evaluation by projected match for empirical
> inputs & patterns. In pure math (on much higher levels), the criterion is
> reduction (as in equations), which is an equivalent of match. You prune
> the expressions with below-average results-per-operatio
Ben G: Yeah, but there are very many expressions due to combinatorial
explosion -- you can't just produce them all then prune the bad ones.
Is your approach to incrementally build up complex expressions
compositionally from simpler ones?
Boris: I think I have "incremental" in just about every paragraph in the knol, starting from the 1st. You seem to have too many things on your mind to keep track of this discussion.
Ben G: If so, why don't you run into the same problems as greedy learning
systems? Presumably because the
top-down feedback from the existing complex expressions guides the
formation of new complex ones from simpler components, I suppose. But
this is a key point and it's not very clear to me how your system does
it...
Boris: Maybe you can point out what part of the process I already described
is not clear to you.
Posted by Boris
Kazachenko, last edited Sep 21, 2011 10:26
AM
>
Boris: You don't know what's "useful" till you have a big chunk of
> "correspondence" in the first place. And you can't define "useful" anyway,
Ben G: It seems a baby learns pretty quickly that getting milk from the tit
is useful, whereas the specific pattern of wrinkles on its blanket is
irrelevant -- and this learning then helps focus its ongoing learning
activity (which then leads to further focusing, etc.)
Boris: I am working on a scalable intelligence, not just another animal with primitive drives.
Ben G:
> so this is just another of your excuses for lazy thinking.
I wonder what purpose you think is served by throwing insults like
that into a conversation?
I hope at least you find it entertaining; to me it's rather dull ;p
Boris: I have nothing to lose, - either I shock you into actually working, or save
myself a distraction. Guess it's the latter.
Ben G: Boris,
I'm not shocked by being insulted, not even nontrivially annoyed --
it's par for the course on unmoderated email lists.
And I **am** already working on AGI, inasmuch as my personal economic
situation permits (meaning, around 50% of my working time, i.e. about
30-35 hours/week).... I happen not to be working on it according to
the precise approach you prefer, however...
> Boris: Maybe you can point out what part of the process I already described
> is not clear to you.
Ben G: Too many parts are unclear to me, and we're both busy, so I guess we
should drop off the conversation here.
As I said, I'll be eager to read a detailed description of your ideas
if/when you choose to publish one. The knol is evocative but has a
high density of obscurities as compared to, say, a typical research
paper.
> "correspondence" in the first place. And you can't define "useful" anyway,
Ben G: It seems a baby learns pretty quickly that getting milk from the tit
is useful, whereas the specific pattern of wrinkles on its blanket is
irrelevant -- and this learning then helps focus its ongoing learning
activity (which then leads to further focusing, etc.)
Boris: I am working on a scalable intelligence, not just another animal with primitive drives.
Ben G:
> so this is just another of your excuses for lazy thinking.
I wonder what purpose you think is served by throwing insults like
that into a conversation?
I hope at least you find it entertaining; to me it's rather dull ;p
Boris: I have nothing to lose, - either I shock you into actually working, or save
myself a distraction. Guess it's the latter.
Ben G: Boris,
I'm not shocked by being insulted, not even nontrivially annoyed --
it's par for the course on unmoderated email lists.
And I **am** already working on AGI, inasmuch as my personal economic
situation permits (meaning, around 50% of my working time, i.e. about
30-35 hours/week).... I happen not to be working on it according to
the precise approach you prefer, however...
> Boris: Maybe you can point out what part of the process I already described
> is not clear to you.
Ben G: Too many parts are unclear to me, and we're both busy, so I guess we
should drop off the conversation here.
As I said, I'll be eager to read a detailed description of your ideas
if/when you choose to publish one. The knol is evocative but has a
high density of obscurities as compared to, say, a typical research
paper.
Comments from the knol
Well, that didn't work out. 3.5 years, ~20K views, & 69 comments latter, I
am back on the blogger.
This post contains old comments on the knol.
Below that is ancient history, in case you want to see how things progressed.
Report abusive commentHide report window
Report abusive commentHide report window
Report abusive commentHide report window
Report abusive commentHide report window
Report abusive commentHide report window
Report abusive commentHide report window
Report abusive commentHide report window
Report abusive commentHide report window
This post contains old comments on the knol.
Below that is ancient history, in case you want to see how things progressed.
Derek Zahn:
Understanding another human being's thoughts is hard. :)
Hi
Boris,
Sorry for the delay... I wrote a long time ago something to the effect that I like to try understanding the ideas of other researchers working on AGI-related theories (at least those that seem to have some hope of being interesting) and wanted to try and understand yours. I have returned to your pages once in a while but have great difficulty even starting to try and get a grip on what you are writing about. Part of the blame for that is the difficulty of the subject matter, part is that I'm just not very smart, but mostly (and frustratingly) it is simply very hard for human beings to communicate with each other -- when reading, we have to fill in so much from our own viewpoint and experience, and that is a very error-prone process. So, although I'm afraid that my questions will be stupid and nitpicky and possibly a waste of time to answer, they are the only way for me to figure out how to interpret what you are saying. On the plus side, maybe any clarifications you make for me would be useful for other readers as well.
Although general motivations, and criticisms of other AI approaches can be fun, I'm going to ignore that stuff unless it becomes critical for my main purpose, which is understanding your theory in its current state.
One way to facilitate communication is to develop a concrete frame of reference as a starting point. So: although I imagine your theory is intended to be very general in nature (and thus applicable to a variety of agents and environments), it is helpful for me to pick a particular case, so that general points can be applied to this concrete situation... very general abstract theories are almost impossible to communicate from one person to another because there are so many possible interpretations of language; having a concrete situation as a reference will help me fill in some meaning.
So: Suppose I have a robot roaming around my neighborhood. It has one sense modality: a black-and-white video camera affixed to the front of the robot. At fixed intervals (say 30 times per second but the exact rate isn't important), a video frame gets digitized and handed to the "intelligence" program implementing your theory. Although it won't be needed for a while :) suppose that the robot has tank tracks for its drive and a signed output signal controls the speed of each side track.
Can we use this system as a concrete reference? Is it missing something needed for your theory to apply to it?
Assuming it's okay... from your description, I understand that you save a history of past input frames, indexed by their offset into the past. You also compute the derivative between two successive frames on a per-pixel basis using the numerical difference between pixel values at each point.
The goal is to predict the pixel values in the next input frame.
Ok, let me stop there to make sure we are on the same page. Comments? If you don't have time to mess with what is likely to be a bunch of incomprehension on my part, I understand.... in that case, just don't respond to my comment. :)
Take care,
Derek
http://supermodellin g.net
Sorry for the delay... I wrote a long time ago something to the effect that I like to try understanding the ideas of other researchers working on AGI-related theories (at least those that seem to have some hope of being interesting) and wanted to try and understand yours. I have returned to your pages once in a while but have great difficulty even starting to try and get a grip on what you are writing about. Part of the blame for that is the difficulty of the subject matter, part is that I'm just not very smart, but mostly (and frustratingly) it is simply very hard for human beings to communicate with each other -- when reading, we have to fill in so much from our own viewpoint and experience, and that is a very error-prone process. So, although I'm afraid that my questions will be stupid and nitpicky and possibly a waste of time to answer, they are the only way for me to figure out how to interpret what you are saying. On the plus side, maybe any clarifications you make for me would be useful for other readers as well.
Although general motivations, and criticisms of other AI approaches can be fun, I'm going to ignore that stuff unless it becomes critical for my main purpose, which is understanding your theory in its current state.
One way to facilitate communication is to develop a concrete frame of reference as a starting point. So: although I imagine your theory is intended to be very general in nature (and thus applicable to a variety of agents and environments), it is helpful for me to pick a particular case, so that general points can be applied to this concrete situation... very general abstract theories are almost impossible to communicate from one person to another because there are so many possible interpretations of language; having a concrete situation as a reference will help me fill in some meaning.
So: Suppose I have a robot roaming around my neighborhood. It has one sense modality: a black-and-white video camera affixed to the front of the robot. At fixed intervals (say 30 times per second but the exact rate isn't important), a video frame gets digitized and handed to the "intelligence" program implementing your theory. Although it won't be needed for a while :) suppose that the robot has tank tracks for its drive and a signed output signal controls the speed of each side track.
Can we use this system as a concrete reference? Is it missing something needed for your theory to apply to it?
Assuming it's okay... from your description, I understand that you save a history of past input frames, indexed by their offset into the past. You also compute the derivative between two successive frames on a per-pixel basis using the numerical difference between pixel values at each point.
The goal is to predict the pixel values in the next input frame.
Ok, let me stop there to make sure we are on the same page. Comments? If you don't have time to mess with what is likely to be a bunch of incomprehension on my part, I understand.... in that case, just don't respond to my comment. :)
Take care,
Derek
http://supermodellin
Last edited Sep 20, 2011 4:34
PM
Report abusive commentHide report window
Sorry for
being difficult, Derek!
The problem is, to be on the same page we have to be on the same level of generalization: = decontextualization. You‘re asking for conctrete examples. While it is (theoretically) possible to explain how my algorithm will act in simple cases, such examples will not impress you. You’ll need to understand why I think it can scale to complex cases, & that reasoning is necessarily *abstract*. But, for some mysterious reason, you do find my approach interesting, so I’ll try:
> video frame gets digitized and handed to the "intelligence" program implementing your theory.
Actually, my theory *includes* digitization as the first step of compression, which maximizes correspondence_per_c ost: my overall
fitness function. This is important because these steps form a pattern, which
must be indefinitely projectable, for the “program” to scale in complexity of
such algorithms.
> I understand that you save a history of past input frames, indexed by their offset into the past. You also compute the derivative between two successive frames on a per-pixel basis using the numerical difference between pixel values at each point.
The goal is to predict the pixel values in the next input frame.
Given a non-random environment, every input *is* a prediction for subsequent inputs (no prediction is 100% certain anyway). These inputs have incremental dimensionality: from 0D pixels, to 1D patterns: sequences of matching pixels, to 2D patterns: sequences of matching 1D patterns, & then to 3D, TD, & discontinuously matching patterns.
This is an indefinitely expensible hierarchy, where older inputs (history) are selectively stored (patterns vs. noise) & searched on higher levels. Each of higher-level patterns is a "prediction" for lower-level inputs. 2D frames have no special status in my approach.
Notice that I start by defining match. Then I define a pattern as a set of matching inputs, & derivatives are computed by comparing among individually selected (stronger than average) patterns within corresponding level of search. This is not an indiscriminate all-to-all indexing, that would be a transform. These derivatives then form vectors to project their patterns (further refining their predictive value), & to form their own patterns. All of that is selective (according to predictive values of each variable), otherwise you get a combinatorial explosion.
The problem is, to be on the same page we have to be on the same level of generalization: = decontextualization. You‘re asking for conctrete examples. While it is (theoretically) possible to explain how my algorithm will act in simple cases, such examples will not impress you. You’ll need to understand why I think it can scale to complex cases, & that reasoning is necessarily *abstract*. But, for some mysterious reason, you do find my approach interesting, so I’ll try:
> video frame gets digitized and handed to the "intelligence" program implementing your theory.
Actually, my theory *includes* digitization as the first step of compression, which maximizes correspondence_per_c
> I understand that you save a history of past input frames, indexed by their offset into the past. You also compute the derivative between two successive frames on a per-pixel basis using the numerical difference between pixel values at each point.
The goal is to predict the pixel values in the next input frame.
Given a non-random environment, every input *is* a prediction for subsequent inputs (no prediction is 100% certain anyway). These inputs have incremental dimensionality: from 0D pixels, to 1D patterns: sequences of matching pixels, to 2D patterns: sequences of matching 1D patterns, & then to 3D, TD, & discontinuously matching patterns.
This is an indefinitely expensible hierarchy, where older inputs (history) are selectively stored (patterns vs. noise) & searched on higher levels. Each of higher-level patterns is a "prediction" for lower-level inputs. 2D frames have no special status in my approach.
Notice that I start by defining match. Then I define a pattern as a set of matching inputs, & derivatives are computed by comparing among individually selected (stronger than average) patterns within corresponding level of search. This is not an indiscriminate all-to-all indexing, that would be a transform. These derivatives then form vectors to project their patterns (further refining their predictive value), & to form their own patterns. All of that is selective (according to predictive values of each variable), otherwise you get a combinatorial explosion.
Report abusive commentHide report window
Posted by Boris
Kazachenko, Sep 14, 2011 4:24 PMPosted by Boris
Kazachenko, last edited Sep 14, 2011 4:24
PM
Hi
Boris,
You're right that we have to be on the same level of decontextualization; I was hoping to drag you down to my level :) because if we refer to concrete things (like that robot) there is less room for misunderstanding. If I generalize into abstractions I won't end up the same place as you because my abstractions aren't the same as yours... and the result is that I don't know what the words you use are supposed to mean.
I don't care about "impressiveness" on simple examples, just clarity.
I'll try to climb into the clouds, but it will probably take a while. :) So, a few questions to start with:
> correspondence_per_c ost: my overall
fitness function.
Correspondence of what? Measured how? What does "cost" mean and how is it measured?
> every input *is* a prediction for subsequent inputs (no prediction is 100% certain anyway).
> These inputs have incremental dimensionality: from 0D pixels, to 1D patterns: sequences
> of matching pixels, to 2D patterns: sequences of matching 1D patterns [...]
I certainly get that an input *can be used as* a prediction for subsequent inputs (by an entity whose goal is prediction, for example -- with a prediction algorithm), and for some inputs in some environments (like the robot example) there will be a correlation between in(t) and in(t+1). Other kinds of "inputs" (say... the value of an audio sensor in a las vegas casino sampled every 41 hours) may not have any discernable correlation at all. But I don't think it's right to say that an input *is* a prediction, which is a confusing conflation of terms.
I don't understand what you mean by "inputs have incremental dimensionality". Incremented by who? How did "1D patterns: sequences of matching pixels" become an "input"? You say "pixels" which implies a visual semantics for an input...
Maybe these questions illustrate the confusion I experience when I even begin to try and understand what it is you are talking about...
Thanks!
Derek
http://supermodellin g.net
You're right that we have to be on the same level of decontextualization; I was hoping to drag you down to my level :) because if we refer to concrete things (like that robot) there is less room for misunderstanding. If I generalize into abstractions I won't end up the same place as you because my abstractions aren't the same as yours... and the result is that I don't know what the words you use are supposed to mean.
I don't care about "impressiveness" on simple examples, just clarity.
I'll try to climb into the clouds, but it will probably take a while. :) So, a few questions to start with:
> correspondence_per_c
Correspondence of what? Measured how? What does "cost" mean and how is it measured?
> every input *is* a prediction for subsequent inputs (no prediction is 100% certain anyway).
> These inputs have incremental dimensionality: from 0D pixels, to 1D patterns: sequences
> of matching pixels, to 2D patterns: sequences of matching 1D patterns [...]
I certainly get that an input *can be used as* a prediction for subsequent inputs (by an entity whose goal is prediction, for example -- with a prediction algorithm), and for some inputs in some environments (like the robot example) there will be a correlation between in(t) and in(t+1). Other kinds of "inputs" (say... the value of an audio sensor in a las vegas casino sampled every 41 hours) may not have any discernable correlation at all. But I don't think it's right to say that an input *is* a prediction, which is a confusing conflation of terms.
I don't understand what you mean by "inputs have incremental dimensionality". Incremented by who? How did "1D patterns: sequences of matching pixels" become an "input"? You say "pixels" which implies a visual semantics for an input...
Maybe these questions illustrate the confusion I experience when I even begin to try and understand what it is you are talking about...
Thanks!
Derek
http://supermodellin
Report abusive commentHide report window
Posted by Derek Zahn, Sep
17, 2011 10:36 AMPosted by Derek Zahn, last
edited Sep 17, 2011 10:36
AM
> If I
generalize into abstractions I won't end up the same place as you because my
abstractions aren't the same as yours... and the result is that I don't know
what the words you use are supposed to mean.
My meanings are the most basic (decontextualized) possible, you *will* end in the same place if you just let go of your context (scary, I know). We all work off the same innate “algorithm”. If our generalizations don’t agree, then either we’re on different levels, or the level is too low for both of us.
> I don’t care about "impressiveness" on simple examples, just clarity.
But there must be a reason for you to *work* on understanding me, rather than a bunch of other things.
> Correspondence of what? Measured how? What does "cost" mean and how is it measured?
See section 1: definition of match, then of incrementally derived projected match. Cost (memory + operations) is initially the same for a basic comparison, so you normalize for it by subtracting average match from the prior search cycle: # comparisons. I’ve tried to explain all this in the knol, let me know what part is unclear. Beyond the first cycle, the cost is multiplied by additional # & power of comparisons, represented in the resulting patterns.
> I certainly get that an input *can be used as* a prediction for subsequent inputs…
It’s more basic than that, *any* prediction must be derived from past inputs. But these inputs have varying “predictive value”, both overall & for specific sources: lower-level locations. Patterns are inputs for higher levels, each representing multiple matching lower-level inputs. I try to quantify all of that.
> I don't understand what you mean by "inputs have incremental dimensionality". Incremented by who?
By comparing lower D patterns across higher-D coordinate, on a higher level of search.
> How did "1D patterns: sequences of matching pixels" become an "input"?
This is a hierarchy Derek, above-average lower-level patterns *are* higher-level inputs.
> You say "pixels" which implies a visual semantics for an input...
That's simply a visual version of maximal resolution 0D input, there is an equivalent in any modality.
My meanings are the most basic (decontextualized) possible, you *will* end in the same place if you just let go of your context (scary, I know). We all work off the same innate “algorithm”. If our generalizations don’t agree, then either we’re on different levels, or the level is too low for both of us.
> I don’t care about "impressiveness" on simple examples, just clarity.
But there must be a reason for you to *work* on understanding me, rather than a bunch of other things.
> Correspondence of what? Measured how? What does "cost" mean and how is it measured?
See section 1: definition of match, then of incrementally derived projected match. Cost (memory + operations) is initially the same for a basic comparison, so you normalize for it by subtracting average match from the prior search cycle: # comparisons. I’ve tried to explain all this in the knol, let me know what part is unclear. Beyond the first cycle, the cost is multiplied by additional # & power of comparisons, represented in the resulting patterns.
> I certainly get that an input *can be used as* a prediction for subsequent inputs…
It’s more basic than that, *any* prediction must be derived from past inputs. But these inputs have varying “predictive value”, both overall & for specific sources: lower-level locations. Patterns are inputs for higher levels, each representing multiple matching lower-level inputs. I try to quantify all of that.
> I don't understand what you mean by "inputs have incremental dimensionality". Incremented by who?
By comparing lower D patterns across higher-D coordinate, on a higher level of search.
> How did "1D patterns: sequences of matching pixels" become an "input"?
This is a hierarchy Derek, above-average lower-level patterns *are* higher-level inputs.
> You say "pixels" which implies a visual semantics for an input...
That's simply a visual version of maximal resolution 0D input, there is an equivalent in any modality.
Report abusive commentHide report window
Posted by Boris
Kazachenko, last edited Sep 17, 2011 2:40
PM
Hi
Boris,
I'm interested in understanding you because I am curious about all serious detailed theories of intelligence. There are many different approaches to this, and I'm interested in any that have significant amounts of precision or detail and seem intuitively plausible (as opposed to shallow, fundamentally incoherent, inconsistent, or simply insane). The trick is understanding them. It would be relatively easy to convince myself that I understand you at a rough overview level... but such characterization just feeds my ego, it doesn't (usually) increase my actual knowledge or insight.
In an approach like yours, I am most interested in a few interrelated particulars (in as much detail as I can manage): the "language" that is used to express patterns at each level of abstraction (as a combination of inputs, or more), the specific way that temporal relationships are incorporated into patterns, and the method used to individuate patterns as learned entities. I am fairly certain that you believe you have explained all these things in your knols, but I have not yet succeeded in extracting this information from your text. I also believe that other people bounce off of your writing for similar reasons. You say that your meanings are the most basic (decontextualized) possible, but natural language doesn't work that way, and in fact the meanings of what you write are largely embedded in your own context; failure to recognize this is what causes incomprehensibility. Although we all share a lot of cultural context, we are islands in many ways, and we have to build stepping stones to cross the deep and murky inferential gaps.
I think I will try to take into account everything you have said in this conversation, along with your conversation with Ben on the AGI list, and start over from the beginning. I'll return after I have bashed away at that for a while. If you care to say anything more about the things above that I mentioned as particularly interesting, that would be cool.
Thanks for taking the time to answer my questions, and I wish further success for you as you continue to develop your ideas!
Derek Zahn
http://supermodellin g.net
I'm interested in understanding you because I am curious about all serious detailed theories of intelligence. There are many different approaches to this, and I'm interested in any that have significant amounts of precision or detail and seem intuitively plausible (as opposed to shallow, fundamentally incoherent, inconsistent, or simply insane). The trick is understanding them. It would be relatively easy to convince myself that I understand you at a rough overview level... but such characterization just feeds my ego, it doesn't (usually) increase my actual knowledge or insight.
In an approach like yours, I am most interested in a few interrelated particulars (in as much detail as I can manage): the "language" that is used to express patterns at each level of abstraction (as a combination of inputs, or more), the specific way that temporal relationships are incorporated into patterns, and the method used to individuate patterns as learned entities. I am fairly certain that you believe you have explained all these things in your knols, but I have not yet succeeded in extracting this information from your text. I also believe that other people bounce off of your writing for similar reasons. You say that your meanings are the most basic (decontextualized) possible, but natural language doesn't work that way, and in fact the meanings of what you write are largely embedded in your own context; failure to recognize this is what causes incomprehensibility. Although we all share a lot of cultural context, we are islands in many ways, and we have to build stepping stones to cross the deep and murky inferential gaps.
I think I will try to take into account everything you have said in this conversation, along with your conversation with Ben on the AGI list, and start over from the beginning. I'll return after I have bashed away at that for a while. If you care to say anything more about the things above that I mentioned as particularly interesting, that would be cool.
Thanks for taking the time to answer my questions, and I wish further success for you as you continue to develop your ideas!
Derek Zahn
http://supermodellin
Report abusive commentHide report window
Posted by Derek Zahn, Sep
19, 2011 9:28 AMPosted by Derek Zahn, last
edited Sep 19, 2011 9:28
AM
> You
say that your meanings are the most basic (decontextualized) possible, but
natural language doesn't work that way, and in fact the meanings of what you
write are largely embedded in your own context; failure to recognize this is
what causes incomprehensibility.
Maybe you can point out my biases, I promise to exterminate them without mercy :).
> I think I will try to take into account everything you have said in this conversation, along with your conversation with Ben on the AGI list,
That definitely turned you off :).
Maybe you can point out my biases, I promise to exterminate them without mercy :).
> I think I will try to take into account everything you have said in this conversation, along with your conversation with Ben on the AGI list,
That definitely turned you off :).
Report abusive commentHide report window
Posted by Boris
Kazachenko, Sep 19, 2011 10:01 AMPosted by Boris
Kazachenko, last edited Sep 19, 2011 10:01
AM
Derek>
I am fairly certain that you believe you have explained all these things in your
knols, but I have not yet succeeded in extracting this information from your
text. I also believe that other people bounce off of your writing for similar
reasons.
I guess this is a deliberate filter - too much of explicit explanations may make it seem too easy and obvious.
I guess this is a deliberate filter - too much of explicit explanations may make it seem too easy and obvious.
Report abusive commentHide report window
Posted by Todor
Arnaudov, last edited Sep 22, 2011 3:38
AM
Derek: I
am most interested in a few interrelated particulars (in as much detail as I can
manage): the "language" that is used to express patterns at each level of
abstraction (as a combination of inputs, or more), the specific way that
temporal relationships are incorporated into patterns, and the method used to
individuate patterns as learned entities. I am fairly certain that you believe
you have explained all these things in your knols...
Boris: Yes I did, the "language" (I prefer "syntax") is simply a record of past operations, assigned to the data they produced. I tried to explain the initial set of such operations, & general principles that drive the expansion of this set.
Todor: I guess this is a deliberate filter...
Boris: it's partly deliberate in a sense that examples may mislead people into thinking that they understand the generalization, while in fact they only understand the examples. But mostly it's because creative writing is not my top priority, - I have work to do. And this is an exceptional problem, so most people *should* "bounce off".
Boris: Yes I did, the "language" (I prefer "syntax") is simply a record of past operations, assigned to the data they produced. I tried to explain the initial set of such operations, & general principles that drive the expansion of this set.
Todor: I guess this is a deliberate filter...
Boris: it's partly deliberate in a sense that examples may mislead people into thinking that they understand the generalization, while in fact they only understand the examples. But mostly it's because creative writing is not my top priority, - I have work to do. And this is an exceptional problem, so most people *should* "bounce off".
Todor Arnaudov:
Higher match within derivatives in a pattern, than between templates and lower level output:
Boris: "I
won’t get into details here, but a higher level of feedback should suppress
empirical data entirely, & select only the operations that process it. That
would result in purely algebraic equations, “compared” to achieve mathematical
compression. We can expect that better math will facilitate future discovery of
empirical patterns, but at the cost of reduced correspondence of current memory
contents."
Todor: This is maybe another phenomenon or not elaborated enough or wrong, but I made it up after reading this paragraph.
(1) Higher level patterns get complex - long, carrying lots of derivatives and heavy operations.
(2) Comparison gets more expensive than the predictive benefits. In the past the level may have been more predictive, but if it gets expensive to support it, the level can be either optimized or lost to free resources. While optimizing, higher level suppress lower because now it doesn't expect benefits from the new lower level input.
(3) The derivatives in long patterns (in any length patterns) can turn into local coordinate spaces on their own. A hierarchy on the derivatives is initiated, as if they (or selected parts of them) were raw sensory inputs - some derivative become "x", another "y", another "iB" etc. Longer patterns are more likely to have linear dependencies and other correlations, and patterns within patterns will be discovered.
(4) The process of (3) can start if high matches - within the patterns themselves, or between templates at the same level (let's call them InternalMatch) - are discovered. I suspect - when this InternalMatch is higher than the match between this level and the output from the lower level.
(5) In brief, if it once gets more predictive and cheaper to do the algebra using the already collected higher level derivatives, than to compute and store new high level derivatives from the lower level derivatives (input), then do the algebra and stop accumulating more "junk" data.
Todor: This is maybe another phenomenon or not elaborated enough or wrong, but I made it up after reading this paragraph.
(1) Higher level patterns get complex - long, carrying lots of derivatives and heavy operations.
(2) Comparison gets more expensive than the predictive benefits. In the past the level may have been more predictive, but if it gets expensive to support it, the level can be either optimized or lost to free resources. While optimizing, higher level suppress lower because now it doesn't expect benefits from the new lower level input.
(3) The derivatives in long patterns (in any length patterns) can turn into local coordinate spaces on their own. A hierarchy on the derivatives is initiated, as if they (or selected parts of them) were raw sensory inputs - some derivative become "x", another "y", another "iB" etc. Longer patterns are more likely to have linear dependencies and other correlations, and patterns within patterns will be discovered.
(4) The process of (3) can start if high matches - within the patterns themselves, or between templates at the same level (let's call them InternalMatch) - are discovered. I suspect - when this InternalMatch is higher than the match between this level and the output from the lower level.
(5) In brief, if it once gets more predictive and cheaper to do the algebra using the already collected higher level derivatives, than to compute and store new high level derivatives from the lower level derivatives (input), then do the algebra and stop accumulating more "junk" data.
Last edited Sep 2, 2011 4:29
PM
Report abusive commentHide report window
>
Comparison gets more expensive than the predictive benefits. In the past the
level may have been more predictive, but if it gets expensive to support it, the
level can be either optimized or lost to free resources.
You know I don’t like meaningless words like “optimized”. If a variable or a whole pattern becomes less predictive than the average per resources used, then it simply loses resolution: lower bits of value &| of coordinate (through aggregation across them).
> While optimizing, higher level suppress lower because now it doesn't expect benefits from the new lower level input.
That’s what any feedback is for.
> The derivatives in long patterns (in any length patterns) can turn into local coordinate spaces on their own...
These sub-coordinates are formed | incremented with every new type of derivative. It’s not an optional process, you need them for selective access. Comparing across these “syntactic coordinates” is how you get higher powers of comparison (by division: iterative comparison between difference & match, etc.), dimensional proportions, & so on. But you’re right, I should make it more explicit.
> between templates at the same level (let's call them InternalMatch) - are discovered.
Between templates is not “internal”. You don’t compare across external & across syntactic coordinates at the same time, - that’s not incremental in complexity.
> I suspect - when this InternalMatch is higher than the match between this level and the output from the lower level.
Actually, comparison across syntax is done after evaluation before output, initially if its (across-level projected match) * (internal syntactic span) = *above average*. That means it’ll search on higher level, *&* is likely to be compressed by intra-comparison, which makes the search easier. It’s only after such intra-comparison that you can project & prioritize internal match independently from the external kind.
> In brief, if it once gets more predictive and cheaper to do the algebra using the already collected higher level derivatives, than to compute and store new high level derivatives from the lower level derivatives (input), then do the algebra and stop accumulating more "junk" data.
None of the above is about algebra. Internally or externally, you’re still comparing data, not operations. Comparing operations means comparing syntactic coordinates themselves, that’s what they stand for. (Can you think of initial types of such comparison?).
“Algebra” by itself is not predictive, it only gives you shorter equations to compute predictions from future data. It’s still all about data in the end, but math lets you be more selective in collecting it.
You know I don’t like meaningless words like “optimized”. If a variable or a whole pattern becomes less predictive than the average per resources used, then it simply loses resolution: lower bits of value &| of coordinate (through aggregation across them).
> While optimizing, higher level suppress lower because now it doesn't expect benefits from the new lower level input.
That’s what any feedback is for.
> The derivatives in long patterns (in any length patterns) can turn into local coordinate spaces on their own...
These sub-coordinates are formed | incremented with every new type of derivative. It’s not an optional process, you need them for selective access. Comparing across these “syntactic coordinates” is how you get higher powers of comparison (by division: iterative comparison between difference & match, etc.), dimensional proportions, & so on. But you’re right, I should make it more explicit.
> between templates at the same level (let's call them InternalMatch) - are discovered.
Between templates is not “internal”. You don’t compare across external & across syntactic coordinates at the same time, - that’s not incremental in complexity.
> I suspect - when this InternalMatch is higher than the match between this level and the output from the lower level.
Actually, comparison across syntax is done after evaluation before output, initially if its (across-level projected match) * (internal syntactic span) = *above average*. That means it’ll search on higher level, *&* is likely to be compressed by intra-comparison, which makes the search easier. It’s only after such intra-comparison that you can project & prioritize internal match independently from the external kind.
> In brief, if it once gets more predictive and cheaper to do the algebra using the already collected higher level derivatives, than to compute and store new high level derivatives from the lower level derivatives (input), then do the algebra and stop accumulating more "junk" data.
None of the above is about algebra. Internally or externally, you’re still comparing data, not operations. Comparing operations means comparing syntactic coordinates themselves, that’s what they stand for. (Can you think of initial types of such comparison?).
“Algebra” by itself is not predictive, it only gives you shorter equations to compute predictions from future data. It’s still all about data in the end, but math lets you be more selective in collecting it.
Report abusive commentHide report window
Posted by Boris
Kazachenko, Aug 29, 2011 1:23 AMPosted by Boris
Kazachenko, last edited Aug 29, 2011 1:23
AM
>You
know I don’t like meaningless words like “optimized”. If a variable or a whole
pattern becomes less predictive than the average per resources used, then it
simply loses resolution: lower bits of value &| of coordinate (through
aggregation across them).
OK, I know about lowering the resolution to increase match. "Optimize" here - to make comparison of the same derivatives/at the same level cheaper by finding correlations within the level data and between derivatives in a pattern.
>> The derivatives in long patterns (in any length patterns) can turn into local coordinate spaces on their own...
>These sub-coordinates are formed | incremented with every new type of derivative.
>It’s not an optional process, you need them for selective access. Comparing across these “syntactic coordinates” is how you get higher powers of comparison (by division: iterative comparison between difference & match, etc.), dimensional proportions, & so on. But you’re right, I should make it more explicit.
OK, so that's when it's done (from the knol): "the power of comparison is increased if current match-per-costs predicts further improvement, as determined by “secondary” comparison of results from different powers of comparison, which forms algorithms or metapatterns."
>> While optimizing, higher level suppress lower because now it doesn't expect benefits from the new lower level input.
>That’s what any feedback is for.
I mean after a reliable formula is inferred, giving sufficiently high match/prediction, new lower level samples to improve prediction are not necessary.
>“Algebra” by itself is not predictive, it only gives you shorter equations to compute predictions from future data. It’s still all about data in the end, but math lets you be more selective in collecting it. (...)
>None of the above is about algebra. Internally or externally, you’re still comparing data, not operations. Comparing operations means comparing syntactic coordinates themselves, that’s what >they stand for. (Can you think of initial types of such comparison?).
Not yet. However it seems there are not many combinations. There are position within the internal variables, levels in the sub-coordinate hierarchy of this position; basic comparison operations are just a few; iteration is supposed to be repetition until given match/miss is achieved (above/below average or so).
OK, I know about lowering the resolution to increase match. "Optimize" here - to make comparison of the same derivatives/at the same level cheaper by finding correlations within the level data and between derivatives in a pattern.
>> The derivatives in long patterns (in any length patterns) can turn into local coordinate spaces on their own...
>These sub-coordinates are formed | incremented with every new type of derivative.
>It’s not an optional process, you need them for selective access. Comparing across these “syntactic coordinates” is how you get higher powers of comparison (by division: iterative comparison between difference & match, etc.), dimensional proportions, & so on. But you’re right, I should make it more explicit.
OK, so that's when it's done (from the knol): "the power of comparison is increased if current match-per-costs predicts further improvement, as determined by “secondary” comparison of results from different powers of comparison, which forms algorithms or metapatterns."
>> While optimizing, higher level suppress lower because now it doesn't expect benefits from the new lower level input.
>That’s what any feedback is for.
I mean after a reliable formula is inferred, giving sufficiently high match/prediction, new lower level samples to improve prediction are not necessary.
>“Algebra” by itself is not predictive, it only gives you shorter equations to compute predictions from future data. It’s still all about data in the end, but math lets you be more selective in collecting it. (...)
>None of the above is about algebra. Internally or externally, you’re still comparing data, not operations. Comparing operations means comparing syntactic coordinates themselves, that’s what >they stand for. (Can you think of initial types of such comparison?).
Not yet. However it seems there are not many combinations. There are position within the internal variables, levels in the sub-coordinate hierarchy of this position; basic comparison operations are just a few; iteration is supposed to be repetition until given match/miss is achieved (above/below average or so).
Report abusive commentHide report window
Posted by Todor
Arnaudov, last edited Sep 1, 2011 2:47
PM
> OK,
so that's when it's done (from the knol): "the power of comparison is increased
if current match-per-costs predicts further improvement, as determined by
“secondary” comparison of results from different powers of comparison, which
forms algorithms or metapatterns."
That’s only the first step: a comparison across adjacent derivation orders (syntactic coordinates). Beyond that are comparisons across syntactic discontinuity, such as between lengths of different dimensions within a pattern, & so on. I’ll make separate chapter on syntax in the next edit, coming soon. That’ll include the “algebra” part, it really doesn’t belong in the feedback section.
> I mean after a reliable formula is inferred, giving sufficiently high match/prediction, new lower level samples to improve prediction are not necessary.
I think you mean reliable *pattern*, algebraic formulas are not predictive per se. In that case, *local* sampling is suppressed by expectations, but in favor of more distant sampling. I covered that in the section on feedback: “Downward suppression of locations with expected inputs will result in a preference for exploration & discovery of new patterns, vs. confirmation of the old ones”.
> Not yet. However it seems there are not many combinations. There are position within the internal variables, levels in the sub-coordinate hierarchy of this position; basic comparison operations are just a few; iteration is supposed to be repetition until given match/miss is achieved (above/below average or so)
There is an infinite number of potential combinations, the trick is to explore them incrementally. Re iteration, it continues till match/cost is exhausted, not achieved.
Report abusive commentHide report window
Report abusive commentHide report window
Report abusive commentHide report window
Report abusive commentHide report window
Report abusive commentHide report window
Report abusive commentHide report window
Report abusive commentHide report window
Report abusive commentHide report window
Report abusive commentHide report window
Report abusive commentHide report window
Report abusive commentHide report window
Report abusive commentHide report window
Report abusive commentHide report window
Todor Arnaudov:
Report abusive commentHide report window
Maximizing
predictive-correspon
Report abusive commentHide report window
Report abusive commentHide report window
Report abusive commentHide report window
Report abusive commentHide report window
Report abusive commentHide report window
Report abusive commentHide report window
Report abusive commentHide report window
Report abusive commentHide report window
Report abusive commentHide report window
Report abusive commentHide report window
That’s only the first step: a comparison across adjacent derivation orders (syntactic coordinates). Beyond that are comparisons across syntactic discontinuity, such as between lengths of different dimensions within a pattern, & so on. I’ll make separate chapter on syntax in the next edit, coming soon. That’ll include the “algebra” part, it really doesn’t belong in the feedback section.
> I mean after a reliable formula is inferred, giving sufficiently high match/prediction, new lower level samples to improve prediction are not necessary.
I think you mean reliable *pattern*, algebraic formulas are not predictive per se. In that case, *local* sampling is suppressed by expectations, but in favor of more distant sampling. I covered that in the section on feedback: “Downward suppression of locations with expected inputs will result in a preference for exploration & discovery of new patterns, vs. confirmation of the old ones”.
> Not yet. However it seems there are not many combinations. There are position within the internal variables, levels in the sub-coordinate hierarchy of this position; basic comparison operations are just a few; iteration is supposed to be repetition until given match/miss is achieved (above/below average or so)
There is an infinite number of potential combinations, the trick is to explore them incrementally. Re iteration, it continues till match/cost is exhausted, not achieved.
Andrey Panin:
Boris, interesting perspective
creating
general AI is a very addictive problem I have to say - the one that fools many
into thinking it that it's solution is just around the corner alas... all
existing approaches lead to dead end. I share your hope that there are
structural reasons for that such as either real world constraints that force
those working on it to be practical in short term leading them to specialized
solutions, or lack of knowledge in those for whom this is just a hobby. I am
curious if you made any progress in the years since you posted this knol? Also I
am curious to know if you discounted connectionist approaches (for anything
other than perception) in favor of algorithmic/symbolic approach, or you think
it's a false dichotomy?
My personal feelings is that solution will be in form of NN because I haven't seen anything else come conseptually close to linking what at first seem like a completely unrelated peaces of information.
My personal feelings is that solution will be in form of NN because I haven't seen anything else come conseptually close to linking what at first seem like a completely unrelated peaces of information.
Last edited Aug 8, 2011 10:04
AM
Report abusive commentHide report window
Thanks!
My knol is continuously updated, last time only a month ago. I am making a "theoretical" progress, - simulation would be pointless since I refine the algorithm almost daily. What is it that you find interesting, &| unclear? I make no hard distinction between perception & "conceptual" levels, it's just a degree of generalization. Connectionist approach is not analytical enough, I think on the level of algorithms: nodes, not networks. Also, as I mentioned in the knol, it's not incremental enough, thus not scalable. I add one dimention at a time, starting from 0D, NNs start from 2D.
My knol is continuously updated, last time only a month ago. I am making a "theoretical" progress, - simulation would be pointless since I refine the algorithm almost daily. What is it that you find interesting, &| unclear? I make no hard distinction between perception & "conceptual" levels, it's just a degree of generalization. Connectionist approach is not analytical enough, I think on the level of algorithms: nodes, not networks. Also, as I mentioned in the knol, it's not incremental enough, thus not scalable. I add one dimention at a time, starting from 0D, NNs start from 2D.
Report abusive commentHide report window
Posted by Boris
Kazachenko, Aug 4, 2011 2:41 PMPosted by Boris
Kazachenko, last edited Aug 4, 2011 2:41
PM
I am
interested in how far from completion do you think your algorithm is? enough to
try it out because I am sure you know that no matter how nice a theory is unless
it's tested you can never be sure of what you have.
When you say NN is not scalable I hope you mean current implementations, but in theory it's the most scalable thing known as of now, by it's concept - since our brain is but one version of it. Regarding dimension, not sure I see the limitation. A network of 1 node is 0D isn't it?
What attracts me about NN is the concept of emergence of complexity out of simple units, seems that it the underlying force in nature. To me over relying on an analytical approach is too brave of a step since it's basically saying - we will find an alternative way to recreate intelligence other then the path we know already leads to one. To me it's like trying to understand the intricacies of an anthill through architectural focus rather then by generalizing from a unit of interaction between one ant and another ant.
With regard to solving the intelligence issue one of the issues I find most challenging (apart from infinitelly many others) is this: How to implement an inner drive (aka motivation) - that gives rise to switching and focusing attention, because without it an intelligent system would either be completely sensory driven (without internal importance filter) or it would engage in infinite pattern searching of one random (and could be completely useless) problem. How do you address this problem?
When you say NN is not scalable I hope you mean current implementations, but in theory it's the most scalable thing known as of now, by it's concept - since our brain is but one version of it. Regarding dimension, not sure I see the limitation. A network of 1 node is 0D isn't it?
What attracts me about NN is the concept of emergence of complexity out of simple units, seems that it the underlying force in nature. To me over relying on an analytical approach is too brave of a step since it's basically saying - we will find an alternative way to recreate intelligence other then the path we know already leads to one. To me it's like trying to understand the intricacies of an anthill through architectural focus rather then by generalizing from a unit of interaction between one ant and another ant.
With regard to solving the intelligence issue one of the issues I find most challenging (apart from infinitelly many others) is this: How to implement an inner drive (aka motivation) - that gives rise to switching and focusing attention, because without it an intelligent system would either be completely sensory driven (without internal importance filter) or it would engage in infinite pattern searching of one random (and could be completely useless) problem. How do you address this problem?
Report abusive commentHide report window
Posted by Andrey Panin,
Aug 4, 2011 9:03 PMPosted by Andrey Panin,
last edited Aug 4, 2011
9:03 PM
> I am
sure you know that no matter how nice a theory is unless it's tested you can
never be sure of what you have.
If you're not interested in a theory, you're talking to a wrong guy. I don't care for blind tinkering.
> When you say NN is not scalable I hope you mean current implementations, but in theory it's the most scalable thing known as of now, by it's concept - since our brain is but one version of it.
It's matter of interpretation. ANNs have very little to do with real neurons / columns, to understand the latter you should be a neuroscientist. That's a legitimate route, guess I am too "brave" for that.
> Regarding dimension, not sure I see the limitation. A network of 1 node is 0D isn't it?
The limitation is inefficiency. Adding 1 dimension per level of search lets you select only the lower-D patterns that are strong enough to the carry the overhead of additional coordinates. Without incremental selection you hit combinatorial explosion. Predictions are vectors, you can't have them without explicit coordinates.
> To me it's like trying to understand the intricacies of an anthill through architectural focus rather then by generalizing from a unit of interaction between one ant and another ant.
My approach *is* bottom-up, I start from pixels, you can't get any lower than that. But I do so using criteria derived from my definition of intelligence, without one you're flying blind.
> How to implement an inner drive (aka motivation) - that gives rise to switching and focusing attention, because without it an intelligent system would either be completely sensory driven (without internal importance filter) or it would engage in infinite pattern searching of one random (and could be completely useless) problem. How do you address this problem?
The only drive I care about is curiosity, - a cortical instinct. It's implemented by introducing a universal selection criterion, - predictive power. I am perfectly fine with "sensory-driven", the rest is either gross physiology or acquired through conditioning.
If you're not interested in a theory, you're talking to a wrong guy. I don't care for blind tinkering.
> When you say NN is not scalable I hope you mean current implementations, but in theory it's the most scalable thing known as of now, by it's concept - since our brain is but one version of it.
It's matter of interpretation. ANNs have very little to do with real neurons / columns, to understand the latter you should be a neuroscientist. That's a legitimate route, guess I am too "brave" for that.
> Regarding dimension, not sure I see the limitation. A network of 1 node is 0D isn't it?
The limitation is inefficiency. Adding 1 dimension per level of search lets you select only the lower-D patterns that are strong enough to the carry the overhead of additional coordinates. Without incremental selection you hit combinatorial explosion. Predictions are vectors, you can't have them without explicit coordinates.
> To me it's like trying to understand the intricacies of an anthill through architectural focus rather then by generalizing from a unit of interaction between one ant and another ant.
My approach *is* bottom-up, I start from pixels, you can't get any lower than that. But I do so using criteria derived from my definition of intelligence, without one you're flying blind.
> How to implement an inner drive (aka motivation) - that gives rise to switching and focusing attention, because without it an intelligent system would either be completely sensory driven (without internal importance filter) or it would engage in infinite pattern searching of one random (and could be completely useless) problem. How do you address this problem?
The only drive I care about is curiosity, - a cortical instinct. It's implemented by introducing a universal selection criterion, - predictive power. I am perfectly fine with "sensory-driven", the rest is either gross physiology or acquired through conditioning.
Report abusive commentHide report window
Posted by Boris
Kazachenko, Aug 4, 2011 10:23 PMPosted by Boris
Kazachenko, last edited Aug 4, 2011 10:23
PM
your
definition of intelligence gives too broad of a range to be really useful as a
discriminatory tool. An animal that hunts predicts and plans, computer playing
chess predicts and plans, a 2 year old child predicts and plans, a semi retarded
person predicts and plans etc. What matters is how wide is the scope of
prediction, how good is the planning need to be - to be considered a success in
achieving AGI. Currently success is very incremental which brings about a moving
target in terms of what would and would not be considered AGI. I would be very
curious if someone would actually discover a suitable intelligence
criteria.
a computerized version of a neuron is first and foremost a conceptualized version - the reason I think building algorithmic AGI is braver then building a NN AGI is because the order of complexity is very different. It's much easier to concentrate on a small "unintelligent" building block (i.e neuron) which, once conceptualized correctly will lead to intelligence, vs trying to reconstruct intelligence directly, wouldn't you agree?
Regarding pixels and 0 dimension - brain is powerful enough that even being blind and deaf - and only heaving access to touch - which is very crude input source - it is still able to acquire a picture of the world. So I am not convinced that dealing with 0 dimension is really required for true intelligence.
Curiosity is an interesting criteria, but I don't think is sufficient - imagine an autistic person staring for hours/days at a flame because he/she curious to find a pattern to it and predict the way flame will look pix by pix a few minutes later. Something inside us tells us "that's not important - move on". No way getting around a necessity of having internal selection criteria that would say what's important and what's not, don't you think?
a computerized version of a neuron is first and foremost a conceptualized version - the reason I think building algorithmic AGI is braver then building a NN AGI is because the order of complexity is very different. It's much easier to concentrate on a small "unintelligent" building block (i.e neuron) which, once conceptualized correctly will lead to intelligence, vs trying to reconstruct intelligence directly, wouldn't you agree?
Regarding pixels and 0 dimension - brain is powerful enough that even being blind and deaf - and only heaving access to touch - which is very crude input source - it is still able to acquire a picture of the world. So I am not convinced that dealing with 0 dimension is really required for true intelligence.
Curiosity is an interesting criteria, but I don't think is sufficient - imagine an autistic person staring for hours/days at a flame because he/she curious to find a pattern to it and predict the way flame will look pix by pix a few minutes later. Something inside us tells us "that's not important - move on". No way getting around a necessity of having internal selection criteria that would say what's important and what's not, don't you think?
Report abusive commentHide report window
Posted by Andrey Panin,
Aug 5, 2011 2:01 PMPosted by Andrey Panin,
last edited Aug 5, 2011
2:01 PM
> your
definition of intelligence gives too broad of a range to be really useful as a
discriminatory tool. An animal that hunts predicts and plans, computer playing
chess predicts and plans, a 2 year old child predicts and plans, a semi retarded
person predicts and plans etc. What matters is how wide is the scope of
prediction, how good is the planning need to be - to be considered a success in
achieving AGI.
Precisely, intelligence is a matter of degree, & I am suggesting a way to quantify & maximize it. What are you arguing against?
> I would be very curious if someone would actually discover a suitable intelligence criteria.
"Suitable" is a two-way street.
> It's much easier to concentrate on a small "unintelligent" building block (i.e neuron) which, once conceptualized correctly will lead to intelligence, vs trying to reconstruct intelligence directly, wouldn't you agree?
I don't think it is conceptualized correctly, otherwise we'd have intelligent computers running around. You don't know what's easier till you've done it, Markram now wants ~1B$ & 10 years to get "close" doing it. What I do know is that there are ~250K neuroscientists beating around the bushes, & 1 of me making good progress theoretically. It takes guts to do AGI.
> Regarding pixels and 0 dimension - brain is powerful enough that even being blind and deaf - and only heaving access to touch - which is very crude input source - it is still able to acquire a picture of the world. So I am not convinced that dealing with 0 dimension is really required for true intelligence.
Pixels is just an example of 0D processing, any sense would do, though not as well as vision.
> Curiosity is an interesting criteria, but I don't think is sufficient - imagine an autistic person staring for hours/days at a flame because he/she curious to find a pattern to it and predict the way flame will look pix by pix a few minutes later.
Curiosity is a motive, in psych. terms, a criterion is predictive power. You need expand your scope of experience to maximize it, though specific scope vs. precision trade-off depends on the noise in the inputs, & on subject's time horizon. It's the same for autistics, they just put relatively greater value on precision.
Precisely, intelligence is a matter of degree, & I am suggesting a way to quantify & maximize it. What are you arguing against?
> I would be very curious if someone would actually discover a suitable intelligence criteria.
"Suitable" is a two-way street.
> It's much easier to concentrate on a small "unintelligent" building block (i.e neuron) which, once conceptualized correctly will lead to intelligence, vs trying to reconstruct intelligence directly, wouldn't you agree?
I don't think it is conceptualized correctly, otherwise we'd have intelligent computers running around. You don't know what's easier till you've done it, Markram now wants ~1B$ & 10 years to get "close" doing it. What I do know is that there are ~250K neuroscientists beating around the bushes, & 1 of me making good progress theoretically. It takes guts to do AGI.
> Regarding pixels and 0 dimension - brain is powerful enough that even being blind and deaf - and only heaving access to touch - which is very crude input source - it is still able to acquire a picture of the world. So I am not convinced that dealing with 0 dimension is really required for true intelligence.
Pixels is just an example of 0D processing, any sense would do, though not as well as vision.
> Curiosity is an interesting criteria, but I don't think is sufficient - imagine an autistic person staring for hours/days at a flame because he/she curious to find a pattern to it and predict the way flame will look pix by pix a few minutes later.
Curiosity is a motive, in psych. terms, a criterion is predictive power. You need expand your scope of experience to maximize it, though specific scope vs. precision trade-off depends on the noise in the inputs, & on subject's time horizon. It's the same for autistics, they just put relatively greater value on precision.
Report abusive commentHide report window
Posted by Boris
Kazachenko, last edited Aug 5, 2011 6:17
PM
>Precisely, intelligence is a matter of degree, &
I am suggesting a way to quantify & maximize it. What are you arguing
against?
I guess I missed/misunderstood the part where you quantified it, do you mind restating it for my benefit how do you quantify it? That would answer my intelligence test question as well.
> 1 of me making good progress theoretically
hence my question about how far from completion are you (as defined by your own min intelligence test) - do you have all necessary components in place (alas even if in unrefined state) or are there some that you are still have to solve?
> You need expand your scope of experience to maximize it, though specific scope vs. precision trade-off depends on the noise in the inputs, & on subject's time horizon
I disagree that scope vs precision depends on inputs only. It has to depend in large part on internally set goals/values. Inputs don't assign values - with no values trying to understand the complexity of a dust mite is the equivalent to understanding how to solve humanity's garbage crisis. I think unless we give an internal drive/goal/value criteria - intelligence produced by us will be a) useless b) will be difficult to test since it may appear autistic to all.
I guess I missed/misunderstood the part where you quantified it, do you mind restating it for my benefit how do you quantify it? That would answer my intelligence test question as well.
> 1 of me making good progress theoretically
hence my question about how far from completion are you (as defined by your own min intelligence test) - do you have all necessary components in place (alas even if in unrefined state) or are there some that you are still have to solve?
> You need expand your scope of experience to maximize it, though specific scope vs. precision trade-off depends on the noise in the inputs, & on subject's time horizon
I disagree that scope vs precision depends on inputs only. It has to depend in large part on internally set goals/values. Inputs don't assign values - with no values trying to understand the complexity of a dust mite is the equivalent to understanding how to solve humanity's garbage crisis. I think unless we give an internal drive/goal/value criteria - intelligence produced by us will be a) useless b) will be difficult to test since it may appear autistic to all.
Report abusive commentHide report window
Posted by Andrey Panin,
Aug 6, 2011 7:43 AMPosted by Andrey Panin,
last edited Aug 6, 2011
7:43 AM
> I
guess I missed/misunderstood the part where you quantified it, do you mind
restating it for my benefit how do you quantify it? That would answer my
intelligence test question as well.
This whole knol is about that. 2nd paragraph: "the criterion must be predictive correspondence of recorded inputs.., - their cumulative match to future inputs".
I then quantified match on a single-variable level, latter relative & unique match (2nd section), then introduced projected match (vs. contrast) & additive projection (vs. confirmation) in the 3rd section.
More abstract forms of correspondence (cumulative match) are defined by incrementally complex algorithm, but allow for greater scope * precision of prediction.
> hence my question about how far from completion are you...
Completion is when the algorithm can self-improve (add efficient complexity) through computer simulation faster than I can improve it theoretically. That depends largely on the basal complexity of the algorithm, & I don't feel it's complex enough yet. I have several levels in mind that don't quite fit the already established pattern, once I have a better pattern (of increasing complexity) it should scale better.
> I disagree that scope vs precision depends on inputs only.
I didn't say that it does.
> It has to depend in large part on internally set goals/values. Inputs don't assign values - with no values trying to understand the complexity of a dust mite is the equivalent to understanding how to solve humanity's garbage crisis. I think unless we give an internal drive/goal/value criteria - intelligence produced by us will be a) useless b) will be difficult to test since it may appear autistic to all.
That kind of loose talk kept philosophers busy for millenia. To be constructive you need to work bottom-up.
This whole knol is about that. 2nd paragraph: "the criterion must be predictive correspondence of recorded inputs.., - their cumulative match to future inputs".
I then quantified match on a single-variable level, latter relative & unique match (2nd section), then introduced projected match (vs. contrast) & additive projection (vs. confirmation) in the 3rd section.
More abstract forms of correspondence (cumulative match) are defined by incrementally complex algorithm, but allow for greater scope * precision of prediction.
> hence my question about how far from completion are you...
Completion is when the algorithm can self-improve (add efficient complexity) through computer simulation faster than I can improve it theoretically. That depends largely on the basal complexity of the algorithm, & I don't feel it's complex enough yet. I have several levels in mind that don't quite fit the already established pattern, once I have a better pattern (of increasing complexity) it should scale better.
> I disagree that scope vs precision depends on inputs only.
I didn't say that it does.
> It has to depend in large part on internally set goals/values. Inputs don't assign values - with no values trying to understand the complexity of a dust mite is the equivalent to understanding how to solve humanity's garbage crisis. I think unless we give an internal drive/goal/value criteria - intelligence produced by us will be a) useless b) will be difficult to test since it may appear autistic to all.
That kind of loose talk kept philosophers busy for millenia. To be constructive you need to work bottom-up.
Report abusive commentHide report window
Posted by Boris
Kazachenko, Aug 6, 2011 11:00 AMPosted by Boris
Kazachenko, last edited Aug 6, 2011 11:00
AM
>Completion is when the algorithm can
self-improve
you know I am seeing it very often among AGI thinkers - what I think is a conflation of two independent problems. It's hard enough to build intelligence, but to merge it with even harder problem if building the kind of intelligence that improves itself is I think an indication of not understanding the problem in the first place.
>That kind of loose talk kept philosophers busy for millennia. To be constructive you need to work bottom-up.
that's not a serious answer. I don't know anything about what kept philosophers busy - I don't study philosophy, but in building AGI I did run into a problem of a need for an ability to shift focus. Selecting for predictivness is not a sufficient criteria because watching a movie a second time increases predictiveness for the next 2 hours - maximizing predictivness forces us to keep watching the movie - but it takes something else to shift focus away. Everything you said thus far makes me think that you don't recognize this as a problem. I think that's something you will have to deal with when you actually try to run your program if you ever get to that point. I think you are falling for the same fallacy as Jeff Hawkins does in his book that is to assume that intelligence can be passive i.e. input dictates output, when intelligence has to be active and even proactive.
you know I am seeing it very often among AGI thinkers - what I think is a conflation of two independent problems. It's hard enough to build intelligence, but to merge it with even harder problem if building the kind of intelligence that improves itself is I think an indication of not understanding the problem in the first place.
>That kind of loose talk kept philosophers busy for millennia. To be constructive you need to work bottom-up.
that's not a serious answer. I don't know anything about what kept philosophers busy - I don't study philosophy, but in building AGI I did run into a problem of a need for an ability to shift focus. Selecting for predictivness is not a sufficient criteria because watching a movie a second time increases predictiveness for the next 2 hours - maximizing predictivness forces us to keep watching the movie - but it takes something else to shift focus away. Everything you said thus far makes me think that you don't recognize this as a problem. I think that's something you will have to deal with when you actually try to run your program if you ever get to that point. I think you are falling for the same fallacy as Jeff Hawkins does in his book that is to assume that intelligence can be passive i.e. input dictates output, when intelligence has to be active and even proactive.
Report abusive commentHide report window
Posted by Andrey Panin,
Aug 7, 2011 6:06 AMPosted by Andrey Panin,
last edited Aug 7, 2011
6:06 AM
I already
answered re shifting: predictive power = scope * precision, you need to increase
both. And beyond that, I explained in the knol why you need discontinuous
shifting, 3rd section, 4th paragraph:
"The next level of selection by feedback results in a preference for exploration over confirmation: we skip over too predictable sources / locations, thereby *reducing* match of new inputs to older templates. This doesn’t select for either proximity or contrast, & seems to contradict my premise. However, exploration should increase *projected* correspondence, which is a higher-level criterion than concurrently reduced *confirmed* correspondence."
Every issue you raised is addressed in the knol. You simply don't seem to care for theoretical understanding, & I don't care for tinkering. Too bad.
"The next level of selection by feedback results in a preference for exploration over confirmation: we skip over too predictable sources / locations, thereby *reducing* match of new inputs to older templates. This doesn’t select for either proximity or contrast, & seems to contradict my premise. However, exploration should increase *projected* correspondence, which is a higher-level criterion than concurrently reduced *confirmed* correspondence."
Every issue you raised is addressed in the knol. You simply don't seem to care for theoretical understanding, & I don't care for tinkering. Too bad.
Report abusive commentHide report window
Posted by Boris
Kazachenko, last edited Aug 7, 2011 10:26
AM
> you
know I am seeing it very often among AGI thinkers - what I think is a conflation
of two independent problems. It's hard enough to build intelligence, but to
merge it with even harder problem if building the kind of intelligence that
improves itself is I think an indication of not understanding the problem in the
first place.
It's not a different problem, - learning (increasing predictive correspondence) *is* self-improvement. And there should be no hard distinction between learning data & learning code, - both are driven by the same criterion, or fitness function. But it is a common fallacy to see intelligence as a fixed object.
It's not a different problem, - learning (increasing predictive correspondence) *is* self-improvement. And there should be no hard distinction between learning data & learning code, - both are driven by the same criterion, or fitness function. But it is a common fallacy to see intelligence as a fixed object.
Todor Arnaudov:
Events in programming are all the way through the hierarchy...
Hi,
Boris, I happened to check you out in the right moment, a few notes in a domain
I guess I'm competent:
>Besides, the events are assumed to be high-level concepts, preprocessed by human cognition. That’s the type of data
>programmers usually deal with, but general intelligence should not depend on preprocessing.
I beg to differ - not true for the "real programmers".
Events in programming start from hardware interrupts and binary flags (set/reset), it's abstraction of "change", "difference" and "selection" (message to this specific receiver who recognizes the event).
Also, in hardware and software does exist a deep hierarchy of abstraction, starting from "sensory inputs" (IC electrical inputs), flat and hierarchical blocks inside the IC, going to inter-ICs, multi levels of redirection in OS and the software.
IMHO high level view on events belongs more likely to people from humanities, who have hard time thinking and remembering all those specific details.
>My approach, on the other hand, is to search for patterns within environmental input flow. I don’t even
>make a distinction
>between input patterns & problem-solving algorithms, -
OK.
> that’s an artifact of the way we design computers, to run hand-coded programs for specific tasks. It
>makes no sense in the
>context of continuous evolution of general intelligence, which should be recapitulated in AGI design.
I'm not sure the distinction comes from this per se, computers evolve to be ever more general tools, to run ever more general code (solve more general problems in one monolithic system) with ever less efforts for coding and ever more reuse and speed ups - from assembler, to functions, more complex built-in CPU instructions, ever higher level languages, libraries, OOP, OSes, hardware abstractions etc.
I think the issue comes from the way most computer users think, they don't realize how brain starts crunching data and the basic principles of GI. I guess this is similar to the way some AGI-haters say "computers can't understand language" or "computer can't never think", and they explain it by claiming: "computers do exactly what we tell them to do" ==> they, users, are incompetent and can't understand language, they can't make computers think.
>Besides, the events are assumed to be high-level concepts, preprocessed by human cognition. That’s the type of data
>programmers usually deal with, but general intelligence should not depend on preprocessing.
I beg to differ - not true for the "real programmers".
Events in programming start from hardware interrupts and binary flags (set/reset), it's abstraction of "change", "difference" and "selection" (message to this specific receiver who recognizes the event).
Also, in hardware and software does exist a deep hierarchy of abstraction, starting from "sensory inputs" (IC electrical inputs), flat and hierarchical blocks inside the IC, going to inter-ICs, multi levels of redirection in OS and the software.
IMHO high level view on events belongs more likely to people from humanities, who have hard time thinking and remembering all those specific details.
>My approach, on the other hand, is to search for patterns within environmental input flow. I don’t even
>make a distinction
>between input patterns & problem-solving algorithms, -
OK.
> that’s an artifact of the way we design computers, to run hand-coded programs for specific tasks. It
>makes no sense in the
>context of continuous evolution of general intelligence, which should be recapitulated in AGI design.
I'm not sure the distinction comes from this per se, computers evolve to be ever more general tools, to run ever more general code (solve more general problems in one monolithic system) with ever less efforts for coding and ever more reuse and speed ups - from assembler, to functions, more complex built-in CPU instructions, ever higher level languages, libraries, OOP, OSes, hardware abstractions etc.
I think the issue comes from the way most computer users think, they don't realize how brain starts crunching data and the basic principles of GI. I guess this is similar to the way some AGI-haters say "computers can't understand language" or "computer can't never think", and they explain it by claiming: "computers do exactly what we tell them to do" ==> they, users, are incompetent and can't understand language, they can't make computers think.
Last edited Jul 13, 2011 6:24
PM
Report abusive commentHide report window
Hi
Todor,
I was talking about "events" in BI (probabilistic calculus). They're assumed to be discrete, rather than artificially quantized analog sensory input flow. They call them "hypotheses" & "confirmations", does it sound like a low-level mindset to you? Re programming, I also meant high-level (symbolic) input data, rather than the code / hardware that manipulates it. All of our conscious experience is "high-level", more so for "people in humanities", but the programmers are not immune. Computers are general-purpose, but the programs aren't, except for their hardware interface. An example of the "artifact" I was talking about is separation between data cache & instruction cache on ALU level, I have no use for that.
I was talking about "events" in BI (probabilistic calculus). They're assumed to be discrete, rather than artificially quantized analog sensory input flow. They call them "hypotheses" & "confirmations", does it sound like a low-level mindset to you? Re programming, I also meant high-level (symbolic) input data, rather than the code / hardware that manipulates it. All of our conscious experience is "high-level", more so for "people in humanities", but the programmers are not immune. Computers are general-purpose, but the programs aren't, except for their hardware interface. An example of the "artifact" I was talking about is separation between data cache & instruction cache on ALU level, I have no use for that.
Report abusive commentHide report window
Posted by Boris
Kazachenko, last edited Jul 5, 2011 4:44
PM
Thanks
for the reply!
About BI - sure, I also taught students that starting from high level is not going to scale, like Prolog, Cyc, expert systems, frame-based cognitive architectures etc.
>Re programming, I also meant high-level (symbolic) input data, rather than the code / hardware that manipulates it.
I tried to point the understanding of the concept of "event" itself. For a banking software or a researcher who's bad in programming, "event" might be "being sunny or rainy". Real programmers and engineers who do DSP, computer vision, ML/RL or just low level coding have a better "physical" idea.
>Re programming, I also meant high-level (symbolic) input data, rather than the code / hardware that manipulates it.
OK, but input data for some kind of software is as symbolic as quantized sensory matrix.
> All of our
>conscious experience is "high-level", more so for "people in humanities", but the programmers are not immune. Computers are
>general-purpose, but the programs aren't, except for their hardware interface. An example of the "artifact" I was talking
>about is separation between data cache & instruction cache on ALU level, I have no use for that.
This seems to me rather a detail and an optimization (paralellization): two independent buses (for speed, physical limitations), also instruciton/data division is for simplicity and speed (preferrably sequential reading for part of the input); for cache - data changes more rapidly than instructions, because self-modifying machine code is usually forbidden today etc.
It's a specialization, but it's transparent to target work, and there always will be some sort of physical or practical basis which will force some design decisions at the low level of implementation.
As for the seeing division instr./data artificial/practical - I agree that it's a POV/frame what to be interpreted as what, even for the "stupid" algorithms data is a part of the running algorithm (actual sequence and causal forces changing the system).
>Computers are general-purpose, but the programs aren't, except for their hardware interface.
Isn't it a matter of scope and complexity. Bigger "programs" such as OSes are big deal general purpose, and complex application software gets more general during development. Sure, not AGI, but as the number of functions grow, they're generalized as long as their parameters and structure start to repeat. And after all, generalization starts from comparing specific samples, programming complex system generates samples to be generalized, it's how functions, structured programming, OOP and Design Patterns originated.
As of philosophers/social science types and programmers - you give more favour to the former, but typical philosophers have no chance formalizing intelligence themselves as well, because they don't understand, don't care ("it's beneath them") or don't have skills in programming, i.e. low level data representation and processing. IMHO a lot of philosophy consists of simple, obvious low complexity concepts - higher generality doesn't strictly mean complex or hard to derive. However these simple things are masked with big words.
Long ago I tried to explain to a philosopher, that computers just seem "dull" to him, because he sees them as "1 or 0", but actually he doesn't understand them. Well, let he say computers are doing exactly what he tell them after defining and understanding the dynamics of 10 or 100 billions of dumb "1s and 0s", he's pushing buttons, billions of bits are updating. Programming seems "dull" to them as "generalist-type", because it's too hard for them.
Bottom line from me on this point is that there's more than being specialist/generalis t or depth of hierarchy, it's
also the resolution and scale of processing you do over that hierarchy.
About BI - sure, I also taught students that starting from high level is not going to scale, like Prolog, Cyc, expert systems, frame-based cognitive architectures etc.
>Re programming, I also meant high-level (symbolic) input data, rather than the code / hardware that manipulates it.
I tried to point the understanding of the concept of "event" itself. For a banking software or a researcher who's bad in programming, "event" might be "being sunny or rainy". Real programmers and engineers who do DSP, computer vision, ML/RL or just low level coding have a better "physical" idea.
>Re programming, I also meant high-level (symbolic) input data, rather than the code / hardware that manipulates it.
OK, but input data for some kind of software is as symbolic as quantized sensory matrix.
> All of our
>conscious experience is "high-level", more so for "people in humanities", but the programmers are not immune. Computers are
>general-purpose, but the programs aren't, except for their hardware interface. An example of the "artifact" I was talking
>about is separation between data cache & instruction cache on ALU level, I have no use for that.
This seems to me rather a detail and an optimization (paralellization): two independent buses (for speed, physical limitations), also instruciton/data division is for simplicity and speed (preferrably sequential reading for part of the input); for cache - data changes more rapidly than instructions, because self-modifying machine code is usually forbidden today etc.
It's a specialization, but it's transparent to target work, and there always will be some sort of physical or practical basis which will force some design decisions at the low level of implementation.
As for the seeing division instr./data artificial/practical - I agree that it's a POV/frame what to be interpreted as what, even for the "stupid" algorithms data is a part of the running algorithm (actual sequence and causal forces changing the system).
>Computers are general-purpose, but the programs aren't, except for their hardware interface.
Isn't it a matter of scope and complexity. Bigger "programs" such as OSes are big deal general purpose, and complex application software gets more general during development. Sure, not AGI, but as the number of functions grow, they're generalized as long as their parameters and structure start to repeat. And after all, generalization starts from comparing specific samples, programming complex system generates samples to be generalized, it's how functions, structured programming, OOP and Design Patterns originated.
As of philosophers/social science types and programmers - you give more favour to the former, but typical philosophers have no chance formalizing intelligence themselves as well, because they don't understand, don't care ("it's beneath them") or don't have skills in programming, i.e. low level data representation and processing. IMHO a lot of philosophy consists of simple, obvious low complexity concepts - higher generality doesn't strictly mean complex or hard to derive. However these simple things are masked with big words.
Long ago I tried to explain to a philosopher, that computers just seem "dull" to him, because he sees them as "1 or 0", but actually he doesn't understand them. Well, let he say computers are doing exactly what he tell them after defining and understanding the dynamics of 10 or 100 billions of dumb "1s and 0s", he's pushing buttons, billions of bits are updating. Programming seems "dull" to them as "generalist-type", because it's too hard for them.
Bottom line from me on this point is that there's more than being specialist/generalis
Report abusive commentHide report window
Posted by Todor
Arnaudov, last edited Jul 12, 2011 1:52
PM
> I
tried to point the understanding of the concept of "event" itself.
That "concept" is meaningless by itself.
> As for the seeing division instr./data artificial/practical - I agree that it's a POV/frame what to be interpreted as what,
Right, & cache division is just one example of such "hard" separation, in programmer's mind as well as in computer architecture. It won't be an "optimization" if your code is incrementally derived from your data.
>> Computers are general-purpose, but the programs aren't, except for their hardware interface.
> Isn't it a matter of scope and complexity...
No, it's not simple scaling, the higher levels are mostly application-specific handles. Look, if you want to talk superficially related computerese, may I suggest AGI list?
> Bottom line from me on this point is that there's more than being specialist/generalis t or
depth of hierarchy, it's also the resolution and scale of processing you do over
that hierarchy.
That's as trivial as your earlier talk of "raw power".
I don't favor philosophers, I said many times that philosophy is the most dysfunctional discipline next to theology. I just don't talk about them as much, because they don't try to build an AGI.
But you do sound like a philosopher yourself, talking about anything *but* the actual subject matter.
Care to discuss something potentially constructive?
That "concept" is meaningless by itself.
> As for the seeing division instr./data artificial/practical - I agree that it's a POV/frame what to be interpreted as what,
Right, & cache division is just one example of such "hard" separation, in programmer's mind as well as in computer architecture. It won't be an "optimization" if your code is incrementally derived from your data.
>> Computers are general-purpose, but the programs aren't, except for their hardware interface.
> Isn't it a matter of scope and complexity...
No, it's not simple scaling, the higher levels are mostly application-specific handles. Look, if you want to talk superficially related computerese, may I suggest AGI list?
> Bottom line from me on this point is that there's more than being specialist/generalis
That's as trivial as your earlier talk of "raw power".
I don't favor philosophers, I said many times that philosophy is the most dysfunctional discipline next to theology. I just don't talk about them as much, because they don't try to build an AGI.
But you do sound like a philosopher yourself, talking about anything *but* the actual subject matter.
Care to discuss something potentially constructive?
Report abusive commentHide report window
Posted by Boris
Kazachenko, last edited Jul 20, 2011 1:27
PM
>But
you do sound like a philosopher yourself, talking about anything *but* the
actual subject matter.
OK... I'm not even warming up now, "pre-warming".
>Care to discuss something potentially constructive?
Sure...
...
>No, it's not simple scaling, there's a ton of application-specific biases mixed-in on higher levels.
Nobody claims this is scaling the way you do it, it's not AGI. I claim that software engineers know about scaling, even if they're spoiling it for practical reasons.
>Look, if you want to talk superficially related computerese, go to AGI list.
I don't want, I wanted to share few thoughts on these computer related topics.
>That's as trivial as "raw power" from your earlier attempts.
A bit reworded bottom line: Real programmers shouldn't be underestimated, they have "raw power" and an idea of scaling.
>I don't favor philosophers, I said many times that philosophy is the most dysfunctional discipline
I know, but you mention them as supposed to possess a more appropriate mindset, while programmers are completely lost in your opinion.
OK... I'm not even warming up now, "pre-warming".
>Care to discuss something potentially constructive?
Sure...
...
>No, it's not simple scaling, there's a ton of application-specific biases mixed-in on higher levels.
Nobody claims this is scaling the way you do it, it's not AGI. I claim that software engineers know about scaling, even if they're spoiling it for practical reasons.
>Look, if you want to talk superficially related computerese, go to AGI list.
I don't want, I wanted to share few thoughts on these computer related topics.
>That's as trivial as "raw power" from your earlier attempts.
A bit reworded bottom line: Real programmers shouldn't be underestimated, they have "raw power" and an idea of scaling.
>I don't favor philosophers, I said many times that philosophy is the most dysfunctional discipline
I know, but you mention them as supposed to possess a more appropriate mindset, while programmers are completely lost in your opinion.
Report abusive commentHide report window
Posted by Todor
Arnaudov, last edited Jul 13, 2011 3:16
PM
> Real
programmers shouldn't be underestimated, they have "raw power" and an idea of
scaling.
Show me.
> I know, but you mention them as supposed to possess a more appropriate mindset,
That was re "real" philosophers, not the kind you would hear about. Except for myself.
If I grew up in the west, I'd probably start by studying philosophy (esp. philosophy of science), but drop it after realizing that cognition must be defined at sensory level. Cognitive process is the only legitimate subject for philosophy, the fact that "philosophers" aren't working on it is a different matter.
Show me.
> I know, but you mention them as supposed to possess a more appropriate mindset,
That was re "real" philosophers, not the kind you would hear about. Except for myself.
If I grew up in the west, I'd probably start by studying philosophy (esp. philosophy of science), but drop it after realizing that cognition must be defined at sensory level. Cognitive process is the only legitimate subject for philosophy, the fact that "philosophers" aren't working on it is a different matter.
Todor Arnaudov:
Task: Comparing a single-integer input to a fixed-length continuous sequence of older inputs
Hi Boris,
I'm loading my gun for a new shoot. :)
B>If you want to get constructive (meaningful), try to formalize comparing a single-integer input to a fixed-length continuous sequence of older inputs, & then form its prediction over the next sequence of the same length & direction.
Actually I got an idea immediately, even shared a bit with my students, but it seemed too simple. However now I believe it should be simple, there shouldn't be rocket science in a few numbers and all patterns should be derivable from the mere numbers and their relations, such as start value, differences, changes.
So, my first guess is that this seems similar to DSP and might be related to delta coding and linear prediction. For a start I thought only of subtraction difference, it's effective for low ratio smooth changes. However I see now you've added more clues in the article and also division and logarithm are more appropriate for high ratio and very high ratio changes.
I see also that applying different kinds of comparison is needed in order to be able to *select* the right one if some matched, some mismatched; like in the following example with the shortest possible sequences:
First sequence:
[5 6]
Pattern:
Length = 1
Start = 5
Add Diff = 1
Ratio Diff = 1,2
Direction Diff = 1 (+)
New number:
[5 6] 7
Compare the difference to the last number of sequence, and the match to the pattern:
Add Diff = 1 (match 1)
Ratio Diff = 7/6 (match 0.935)
Direction Diff = 0 (+) (match 1)
A new sequence, assuming algorithm doesn't care about the mismatch of the Start value (coordinates).
[50 51]
Add Diff = 1 (m 1)
Ratio Diff = 51/50 ( m 0.85)
Dir Diff = 0 (m 1)
What matches better is the Add Diff, the algorithm should ignore ratio mismatch and will predict 52.
However:
[100 110]
Add Diff = 10 (m 0.1)
Ratio Diff = 1,1 ( m 0,92)
Dir Diff = 0 (m 1)
Now Add Diff match is very low, but Ratio Diff match is almost identical to the match between the pattern and the new number, therefore: 7/6*110 = 128,33 = (int) 128.
B>If you want to get constructive (meaningful), try to formalize comparing a single-integer input to a fixed-length continuous sequence of older inputs, & then form its prediction over the next sequence of the same length & direction.
Actually I got an idea immediately, even shared a bit with my students, but it seemed too simple. However now I believe it should be simple, there shouldn't be rocket science in a few numbers and all patterns should be derivable from the mere numbers and their relations, such as start value, differences, changes.
So, my first guess is that this seems similar to DSP and might be related to delta coding and linear prediction. For a start I thought only of subtraction difference, it's effective for low ratio smooth changes. However I see now you've added more clues in the article and also division and logarithm are more appropriate for high ratio and very high ratio changes.
I see also that applying different kinds of comparison is needed in order to be able to *select* the right one if some matched, some mismatched; like in the following example with the shortest possible sequences:
First sequence:
[5 6]
Pattern:
Length = 1
Start = 5
Add Diff = 1
Ratio Diff = 1,2
Direction Diff = 1 (+)
New number:
[5 6] 7
Compare the difference to the last number of sequence, and the match to the pattern:
Add Diff = 1 (match 1)
Ratio Diff = 7/6 (match 0.935)
Direction Diff = 0 (+) (match 1)
A new sequence, assuming algorithm doesn't care about the mismatch of the Start value (coordinates).
[50 51]
Add Diff = 1 (m 1)
Ratio Diff = 51/50 ( m 0.85)
Dir Diff = 0 (m 1)
What matches better is the Add Diff, the algorithm should ignore ratio mismatch and will predict 52.
However:
[100 110]
Add Diff = 10 (m 0.1)
Ratio Diff = 1,1 ( m 0,92)
Dir Diff = 0 (m 1)
Now Add Diff match is very low, but Ratio Diff match is almost identical to the match between the pattern and the new number, therefore: 7/6*110 = 128,33 = (int) 128.
Last edited Jul 9, 2010 5:49
AM
Huh? This
is not even wrong, - there's no algorithm, just ad hock examples. I don't *ever*
want *any* examples, - they pollute the mind. Use algebraic variables, not the
actual numbers.
Actually, the examples *are* wrong. Forget about higher orders of comparison, DSP, & whatever other "hammers" you happen to know about, think in terms of the purpose. You keep talking about the differences, but the purpose is to project *match*, as a distinct variable. You don't predict the next input, every past input is already a prediction. You need to quantify accuracy (match) of that prediction for the next n comparisons, based on the past n comparisons. Hint: *projecting* a match means adjusting it for the "cost" of search, & for competing projection of accumulated difference. If you figure this out, it'll be a first step down a long road.
Actually, the examples *are* wrong. Forget about higher orders of comparison, DSP, & whatever other "hammers" you happen to know about, think in terms of the purpose. You keep talking about the differences, but the purpose is to project *match*, as a distinct variable. You don't predict the next input, every past input is already a prediction. You need to quantify accuracy (match) of that prediction for the next n comparisons, based on the past n comparisons. Hint: *projecting* a match means adjusting it for the "cost" of search, & for competing projection of accumulated difference. If you figure this out, it'll be a first step down a long road.
Report abusive commentHide report window
Posted by Boris
Kazachenko, last edited Jul 8, 2010 2:05
PM
OK, it's
wrong and I've misinterpreted it, but "next sequence" in the question could mean
also a new different one, not only a continuation of the past.
My example was about: [a1, ..., an, x] -?-> [b1, ... bn, y=?], [c1, ... , cn, z=?], ...
While now I guess it should be: [a1, a2, ... an x b1 b2 ... bn] ==> (a1 .. an) x -?-> (b1 ... bn),
A correlation rather than an extrapolation. How the past input was predictive to the following input that *really happened*, rather than about a prediction of the next value *before it happens*.
Prediction before a value happens should come later, using justified selected predictive patterns with quantified match. I'll think about it.
My example was about: [a1, ..., an, x] -?-> [b1, ... bn, y=?], [c1, ... , cn, z=?], ...
While now I guess it should be: [a1, a2, ... an x b1 b2 ... bn] ==> (a1 .. an) x -?-> (b1 ... bn),
A correlation rather than an extrapolation. How the past input was predictive to the following input that *really happened*, rather than about a prediction of the next value *before it happens*.
Prediction before a value happens should come later, using justified selected predictive patterns with quantified match. I'll think about it.
Maximizing
predictive-correspon dence which maximizes reward
Hi
Boris,
A guess... (Or a new shot in the dark :) )
I think that the mind favors maximizing predictive-correspon dence which maximizes
reward, I suppose this is related to what you and psychologists call hierarchy
of needs. Maximizing predictive-correspon dence/compression can be
assumed as a form of reward for itself, as well as misses/errors – a
“punishment”, but there must also be lower “root” rewards to generate initial
behavior and to drive initial focus on selected stimuli.
>past patterns are decreasingly predictive with the distance / delay from expected inputs,
>Recent inputs are relatively more predictive than the old ones by the virtue of their
> proximity to future inputs. Thus, proximity should determine the order of search within
> a level of generality.
Is it always so? I suspect it may be not always the case. It is possible to have delayed patterns, where activity “now” is dependent on changes that happened long ago. The fresh input buffers are cheapest to check quickly, even if they're not the the most relevant, and if the input buffers are too short, mind has no choice, but searching for patterns there; a machine with longer buffer may learn much faster. There could be a “cache”/”stack” for old inputs which are expected to be predictive with a delay.
Also, such a correlation between recent inputs and close future inputs is apparent when the patterns are inertial/slow-changi ng/low frequency ones and the activity passes
through adjacent coordinates, like in the HTM basic vision demo. Many (or most)
of the input patterns do, but I guess - not all.
Also rewarding old inputs can be much more predictive than new unrewarding ones, because mind searches how to maximize their predictive power to future inputs, while it may ignore and miss to evaluate recent inputs which are expected to be unrewarding (and nonthreatening), unless they are attached to rewarding ones making them rewarding as well (or such to avoid punishment).
Overall, I suspect that a reward function(s) need to be added to predictive correspondence, and proximity and recentness may need to be more abstract.
Regards
Todor
A guess... (Or a new shot in the dark :) )
I think that the mind favors maximizing predictive-correspon
>past patterns are decreasingly predictive with the distance / delay from expected inputs,
>Recent inputs are relatively more predictive than the old ones by the virtue of their
> proximity to future inputs. Thus, proximity should determine the order of search within
> a level of generality.
Is it always so? I suspect it may be not always the case. It is possible to have delayed patterns, where activity “now” is dependent on changes that happened long ago. The fresh input buffers are cheapest to check quickly, even if they're not the the most relevant, and if the input buffers are too short, mind has no choice, but searching for patterns there; a machine with longer buffer may learn much faster. There could be a “cache”/”stack” for old inputs which are expected to be predictive with a delay.
Also, such a correlation between recent inputs and close future inputs is apparent when the patterns are inertial/slow-changi
Also rewarding old inputs can be much more predictive than new unrewarding ones, because mind searches how to maximize their predictive power to future inputs, while it may ignore and miss to evaluate recent inputs which are expected to be unrewarding (and nonthreatening), unless they are attached to rewarding ones making them rewarding as well (or such to avoid punishment).
Overall, I suspect that a reward function(s) need to be added to predictive correspondence, and proximity and recentness may need to be more abstract.
Regards
Todor
Last edited May 22, 2010 8:06
PM
Report abusive commentHide report window
> A
guess... (Or a new shot in the dark :) )
At least you're shooting at the right target :).
> I think that the mind favors maximizing predictive-correspon dence which maximizes reward, I suppose this is
related to what you and psychologists call hierarchy of needs.
My hierarchy is a sequential development of generalized means, which are then conditioned to become needs/wants. Basic cognition is driven by a very low-level inherited algorithm, without it this development can't even start.
> Maximizing predictive-correspon dence/compression can be assumed as a
form of reward for itself, as well as misses/errors – a “punishment”, but there
must also be lower “root” rewards to generate initial behavior and to drive
initial focus on selected stimuli.
Initial behavior is instinctive, & curiousity is one of the most basic: the knowledge instinct: http://en.wikipedia. org/wiki/Leonid_Perl ovsky
Initial cognition is driven by a low-level design of neocortex (most likely minicolumn: http://brain.oxfordj ournals.org/cgi/cont ent/full/125/5/935
), it doesn't need any extra-cortical "rewards".
> It is possible to have delayed patterns, where activity “now” is dependent on changes that happened long ago. The fresh input buffers are cheapest to check quickly, even if they're not the the most relevant, and if the input buffers are too short, mind has no choice, but searching for patterns there; a machine with longer buffer may learn much faster. There could be a “cache”/”stack” for old inputs which are expected to be predictive with a delay.
Yeah, that's what I call "higher levels of generalization". Those *are* older inputs, only compressed, & selected accordingly.
New Edit: my wrong, that's a good idea, though not well justified. See on the first prize in the knol.
> Also rewarding old inputs can be much more predictive than new unrewarding ones, because mind searches how to maximize their predictive power to future inputs, while it may ignore and miss to evaluate recent inputs which are expected to be unrewarding (and nonthreatening), unless they are attached to rewarding ones making them rewarding as well (or such to avoid punishment).
You mean that we can make inputs more predictive by reproducing them? That means going way back to a lower stage of meta-evolution :).
> Overall, I suspect that a reward function(s) need to be added to predictive correspondence,
I'd suggest that you forget about subcortical nonsense, it's part of the problem, not part of the solution.
> and proximity and recentness may need to be more abstract.
That's already explained in the knol. It's true that a mind will skip over too predictable inputs, even if not driven by non-cognitive rewards. It's a form of novelty seeking that is not maximizing proximity, contrast, or even actual match. I didn't explain that in the knol. If you can define the criterion that's maximized in such "exploration mode", that would warrant a consolation prize :).
Boris
At least you're shooting at the right target :).
> I think that the mind favors maximizing predictive-correspon
My hierarchy is a sequential development of generalized means, which are then conditioned to become needs/wants. Basic cognition is driven by a very low-level inherited algorithm, without it this development can't even start.
> Maximizing predictive-correspon
Initial behavior is instinctive, & curiousity is one of the most basic: the knowledge instinct: http://en.wikipedia.
Initial cognition is driven by a low-level design of neocortex (most likely minicolumn: http://brain.oxfordj
> It is possible to have delayed patterns, where activity “now” is dependent on changes that happened long ago. The fresh input buffers are cheapest to check quickly, even if they're not the the most relevant, and if the input buffers are too short, mind has no choice, but searching for patterns there; a machine with longer buffer may learn much faster. There could be a “cache”/”stack” for old inputs which are expected to be predictive with a delay.
Yeah, that's what I call "higher levels of generalization". Those *are* older inputs, only compressed, & selected accordingly.
New Edit: my wrong, that's a good idea, though not well justified. See on the first prize in the knol.
> Also rewarding old inputs can be much more predictive than new unrewarding ones, because mind searches how to maximize their predictive power to future inputs, while it may ignore and miss to evaluate recent inputs which are expected to be unrewarding (and nonthreatening), unless they are attached to rewarding ones making them rewarding as well (or such to avoid punishment).
You mean that we can make inputs more predictive by reproducing them? That means going way back to a lower stage of meta-evolution :).
> Overall, I suspect that a reward function(s) need to be added to predictive correspondence,
I'd suggest that you forget about subcortical nonsense, it's part of the problem, not part of the solution.
> and proximity and recentness may need to be more abstract.
That's already explained in the knol. It's true that a mind will skip over too predictable inputs, even if not driven by non-cognitive rewards. It's a form of novelty seeking that is not maximizing proximity, contrast, or even actual match. I didn't explain that in the knol. If you can define the criterion that's maximized in such "exploration mode", that would warrant a consolation prize :).
Boris
Report abusive commentHide report window
Posted by Boris
Kazachenko, last edited Jun 29, 2010 5:11
AM
>At
least you're shooting at the right target :)
Finally! ;)
>> Maximizing predictive-correspon dence/compression can be assumed as a
form of reward for
>>itself, as well as misses/errors – a “punishment”, but there must also be lower “root” rewards
>>to generate initial behavior and to drive initial focus on selected stimuli.
>Initial behavior is instinctive, & curiousity is one of the most basic: the knowledge instinct:
>http://en.wikipedia .org/wiki/Leonid_Per lovsky
>Initial cognition is driven by a low-level design of neocortex (most likely minicolumn:
>http://brain.oxford journals.org/cgi/con tent/full/125/5/935
), it doesn't need any extra-cortical
>"rewards".
Thanks for the links! I've missed Leonid and yes, I do have to check out the "raw scientific input" about the columns...
>>Also rewarding old inputs can be much more predictive than new unrewarding ones,
>>because mind searches how
>>to maximize their predictive power to future inputs, while it may ignore and miss to evaluate
>>recent inputs which are expected to be unrewarding (and nonthreatening), unless they are
>>attached to rewarding ones making them rewarding as well (or such to avoid punishment).
>>Overall, I suspect that a reward function(s) need to be added to predictive correspondence,
>You mean that we can make inputs more predictive by reproducing them? That means going
>way back to a lower stage of meta-evolution :).
>I'd suggest that you forget about subcortical nonsense, it's part of the problem, not part of the
>solution.
Elegantly said... :)
If I'm getting this right:
>My hierarchy is a sequential development of generalized means, which are then
>===conditioned to become needs/wants.===
Then to you is behavioral/condition ing part - this makes sense. I think that
is probably another hierarchy (what you call hierarchy of needs), where lower
brains (brainstem, amygdala, hypothalamus) are higher levels of *control* (basic
needs) than the highest level of cognitive hierarchy, and the direction is
evolutionary backwards. At least this is true right when you "switch on" a
human.
However I do believe going back in meta-evolution makes sense, because subcortical regions are more primitive. Actually inputs do get sort of more predictable (or at least subject's behavior gets more predictable, so pleasing patterns are generally more predictive than not pleasing ones).
This is how love and addictions self-feed - by reproduction of recorded behaviors that led to a pleasure.
>> It is possible to have delayed patterns, where activity “now” is dependent on changes that
>>happened long ago. The fresh input buffers are cheapest to check quickly, even if they're not
>>the the most relevant, and if the input buffers are too short, mind has no choice, but searching
>>for patterns there; a machine with longer buffer may learn much faster. There could be a
>>“cache”/”stack” for old inputs which are expected to be predictive with a delay.
>Yeah, that's what I call "higher levels of generalization".
>Those *are* older inputs, only compressed, & >selected accordingly.
OK. :) At this point my terminology is “higher level virtual universes”, “higher level virtual simulators of virtual universes”, “higher level of control”.
The laws of physics of the higher level universes are built by sequences and sets of lower level laws, which on their own have their laws of physics and sub-universes. Laws of physics and virtual universes are predictive patterns (systems of patterns), extracted from sensory input and used to predict. On the lowest level, laws are not compressed, this is "the reality"::
- in real Universe, you have to simulate all in order to predict and have exact representation of the future at Universe meaningful resolution (Plank's constants etc.)
- in thinking machine or human mind this is the raw sensory input that causes cognition to start
In order to interact/interface with the lowest level universe for the system, higher level must decompress its representations throughout the hierarchy, and each level down adds details, making the picture increasingly sharper.
>> and proximity and recentness may need to be more abstract.
>That's already explained in the knol.
Just going up in the hierarchy?
>It's true that a mind will skip over too predictable inputs, even if not driven by non-cognitive
>rewards. It's a form of novelty seeking that is not maximizing proximity, contrast, or even actual
>match. I didn't explain that in the knol. If you can define the criterion that's maximized in such
>"exploration mode", that would warrant a consolation prize :).
Nice... :)
My first intuitive guess is predictive range, compression ratio; I think it's related to minimum message length/Kolmogorov's complexity.
I'm not sure if these concepts are the answer to your question, but they sound interesting to me anyway. Sounds like “Predictability ...
Finally! ;)
>> Maximizing predictive-correspon
>>itself, as well as misses/errors – a “punishment”, but there must also be lower “root” rewards
>>to generate initial behavior and to drive initial focus on selected stimuli.
>Initial behavior is instinctive, & curiousity is one of the most basic: the knowledge instinct:
>http://en.wikipedia
>Initial cognition is driven by a low-level design of neocortex (most likely minicolumn:
>http://brain.oxford
>"rewards".
Thanks for the links! I've missed Leonid and yes, I do have to check out the "raw scientific input" about the columns...
>>Also rewarding old inputs can be much more predictive than new unrewarding ones,
>>because mind searches how
>>to maximize their predictive power to future inputs, while it may ignore and miss to evaluate
>>recent inputs which are expected to be unrewarding (and nonthreatening), unless they are
>>attached to rewarding ones making them rewarding as well (or such to avoid punishment).
>>Overall, I suspect that a reward function(s) need to be added to predictive correspondence,
>You mean that we can make inputs more predictive by reproducing them? That means going
>way back to a lower stage of meta-evolution :).
>I'd suggest that you forget about subcortical nonsense, it's part of the problem, not part of the
>solution.
Elegantly said... :)
If I'm getting this right:
>My hierarchy is a sequential development of generalized means, which are then
>===conditioned to become needs/wants.===
Then to you is behavioral/condition
However I do believe going back in meta-evolution makes sense, because subcortical regions are more primitive. Actually inputs do get sort of more predictable (or at least subject's behavior gets more predictable, so pleasing patterns are generally more predictive than not pleasing ones).
This is how love and addictions self-feed - by reproduction of recorded behaviors that led to a pleasure.
>> It is possible to have delayed patterns, where activity “now” is dependent on changes that
>>happened long ago. The fresh input buffers are cheapest to check quickly, even if they're not
>>the the most relevant, and if the input buffers are too short, mind has no choice, but searching
>>for patterns there; a machine with longer buffer may learn much faster. There could be a
>>“cache”/”stack” for old inputs which are expected to be predictive with a delay.
>Yeah, that's what I call "higher levels of generalization".
>Those *are* older inputs, only compressed, & >selected accordingly.
OK. :) At this point my terminology is “higher level virtual universes”, “higher level virtual simulators of virtual universes”, “higher level of control”.
The laws of physics of the higher level universes are built by sequences and sets of lower level laws, which on their own have their laws of physics and sub-universes. Laws of physics and virtual universes are predictive patterns (systems of patterns), extracted from sensory input and used to predict. On the lowest level, laws are not compressed, this is "the reality"::
- in real Universe, you have to simulate all in order to predict and have exact representation of the future at Universe meaningful resolution (Plank's constants etc.)
- in thinking machine or human mind this is the raw sensory input that causes cognition to start
In order to interact/interface with the lowest level universe for the system, higher level must decompress its representations throughout the hierarchy, and each level down adds details, making the picture increasingly sharper.
>> and proximity and recentness may need to be more abstract.
>That's already explained in the knol.
Just going up in the hierarchy?
>It's true that a mind will skip over too predictable inputs, even if not driven by non-cognitive
>rewards. It's a form of novelty seeking that is not maximizing proximity, contrast, or even actual
>match. I didn't explain that in the knol. If you can define the criterion that's maximized in such
>"exploration mode", that would warrant a consolation prize :).
Nice... :)
My first intuitive guess is predictive range, compression ratio; I think it's related to minimum message length/Kolmogorov's complexity.
I'm not sure if these concepts are the answer to your question, but they sound interesting to me anyway. Sounds like “Predictability ...
Report abusive commentHide report window
Posted by Todor
Arnaudov, last edited May 12, 2010 4:04
PM
Hmm, my
comment seemed too long, coninues:
>It's true that a mind will skip over too predictable inputs, even if not driven by non-cognitive
>rewards. It's a form of novelty seeking that is not maximizing proximity, contrast, or even actual
>match. I didn't explain that in the knol. If you can define the criterion that's maximized in such
>"exploration mode", that would warrant a consolation prize :).
Nice... :)
My first intuitive guess is predictive range, compression ratio; I think it's related to minimum message length/Kolmogorov's complexity.
I'm not sure if these concepts are the answer to your question, but they sound interesting to me anyway. Sounds like “Predictability Analysis/Calculus”. :)
- For how long in the future/in space predictions are expected to match real input, based on the pattern and how much input data are enough to predict the whole future input, generated by the pattern. This is particularly apparent for simple patterns that are expected to take a lot of time, like speaking aloud 1, 2, 3, ..., 1 million. :) Generally, if you know the end from the beginning, you don't need to keep attention on the process.
- Predictability range and predictability precision of the new input, based on the recent/immediate or local input from a pattern, or more generally - how parts from an input assist in prediction/compressi on
of other parts of the input. If going meta – how parts from a pattern assist in
prediction of other parts of the pattern itself.
I noticed this in the past in a section from my writings with speculations about interestingness in pictures, e.g. generally a photograph would qualify this photo http://eim.hit.bg/3/ 25/tee1.jpg as boring, while the
next one - (more) interesting: http://eim.hit.bg/3/ 25/kalof94.jpg
Interestingness is subjective, but this is true at least for the measure below:
The first photo can be drawn by a portion from it, extended with a simple cycle with instructions how to stretch and copy in perspective (implying mind does this and stores images this way - compressed and doing transformations and operations). The second one can't be compressed that way (not so simple), also there are more meaningful recognizable objects and mind needs to engage more. This is what Interestingness is all about - engaging mind to watch and try to predict what would come next. There are other aesthetics reasons for the interestingness as well - emotional, “organic” appearance/smoothnes s,
dynamics - expected possible change in pictures with animate objects; however,
this is another story.
- Function of predictability in time/space. How prediction precision changes throughout the accumulation of more data. If precision stops rising, rises too slow or reaches to very high levels, the watch may stop – this is a saturation of the function of predictability through time. If I try to use your terms (hoping correctly) – if it's not possible to discover increasingly predictive short-cuts for a particular pattern anymore, it may be skipped over. This rule skips noise, as well.
It is possible the function of predictability to rise in a moment, e.g. seeing a flat blue banner.
Also, for a level of hierarchy, when predictability saturates, that is when a level can predict the future with a precision over a threshold, the hierarchy may grow and start searching for more complex patterns (in my terms - construct higher level virtual universes/simulators of universes).
Right, hierarchy may grow and probably should try to grow all the time, but the upper level would not be reliable until the base level stabilizes.
Todor
>It's true that a mind will skip over too predictable inputs, even if not driven by non-cognitive
>rewards. It's a form of novelty seeking that is not maximizing proximity, contrast, or even actual
>match. I didn't explain that in the knol. If you can define the criterion that's maximized in such
>"exploration mode", that would warrant a consolation prize :).
Nice... :)
My first intuitive guess is predictive range, compression ratio; I think it's related to minimum message length/Kolmogorov's complexity.
I'm not sure if these concepts are the answer to your question, but they sound interesting to me anyway. Sounds like “Predictability Analysis/Calculus”. :)
- For how long in the future/in space predictions are expected to match real input, based on the pattern and how much input data are enough to predict the whole future input, generated by the pattern. This is particularly apparent for simple patterns that are expected to take a lot of time, like speaking aloud 1, 2, 3, ..., 1 million. :) Generally, if you know the end from the beginning, you don't need to keep attention on the process.
- Predictability range and predictability precision of the new input, based on the recent/immediate or local input from a pattern, or more generally - how parts from an input assist in prediction/compressi
I noticed this in the past in a section from my writings with speculations about interestingness in pictures, e.g. generally a photograph would qualify this photo http://eim.hit.bg/3/
Interestingness is subjective, but this is true at least for the measure below:
The first photo can be drawn by a portion from it, extended with a simple cycle with instructions how to stretch and copy in perspective (implying mind does this and stores images this way - compressed and doing transformations and operations). The second one can't be compressed that way (not so simple), also there are more meaningful recognizable objects and mind needs to engage more. This is what Interestingness is all about - engaging mind to watch and try to predict what would come next. There are other aesthetics reasons for the interestingness as well - emotional, “organic” appearance/smoothnes
- Function of predictability in time/space. How prediction precision changes throughout the accumulation of more data. If precision stops rising, rises too slow or reaches to very high levels, the watch may stop – this is a saturation of the function of predictability through time. If I try to use your terms (hoping correctly) – if it's not possible to discover increasingly predictive short-cuts for a particular pattern anymore, it may be skipped over. This rule skips noise, as well.
It is possible the function of predictability to rise in a moment, e.g. seeing a flat blue banner.
Also, for a level of hierarchy, when predictability saturates, that is when a level can predict the future with a precision over a threshold, the hierarchy may grow and start searching for more complex patterns (in my terms - construct higher level virtual universes/simulators of universes).
Right, hierarchy may grow and probably should try to grow all the time, but the upper level would not be reliable until the base level stabilizes.
Todor
Report abusive commentHide report window
Posted by Todor
Arnaudov, last edited May 12, 2010 4:08
PM
>
Thanks for the links! I've missed Leonid and yes, I do have to check out the
"raw scientific input" about the columns...
I like Perlovsky’s explanation of the “knowledge instinct”, but his “Dynamic Logic” doesn’t seem to be very deep.
> Then to you is behavioral/condition ing part - this makes
sense. I think that is probably another hierarchy (what you call hierarchy of
needs), where lower brains (brainstem, amygdala, hypothalamus) are higher levels
of *control* (basic needs) than the highest level of cognitive hierarchy, and
the direction is evolutionary backwards.
There’s no “control”, - computer analogies are misleading. Analogical thinking is a blunt instrument, try to avoid it. Higher motives are the ones that ultimately win over, not the ones that develop earlier. All brain areas have an inherited structure that determines their initial (instinctive) operation. Brain stem, amygdala, hypothalamus develop earlier, & their instincts dominate at first. Basic curiosity is a “cortical instinct”, likely driven by the structure of minicolumn, & neocortex is the last to fully develop. But that’s a genetically determined part. Postnatally, motivation develops by competitive conditioning of inherited motives & acquired value-loaded patterns in all of those areas. Conditioning is reinforcement of coincident (instrumental) & suppression of counter-incident (interfering) motives & stimuli patterns by all other motives. These patterns become acquired motives, but they're *not* lower than the original ones. Higher or lower is matter of strength, not of origin. Cortical cognition discovers more general patterns, that get relatively stronger because they stay instrumental longer. And curiosity itself is instrumental for discovery of all these patterns, so it ultimately becomes the top value & suppresses all others. You don't need any subcortical drives even to start, unless you have human physiology to take care of. But basic curiosity (I don't know the full "structure" of it yet) is only a start too. Introspective cognition derives higher orders of correspondence, developing things like mathematical curiosity.
> OK. :) At this point my terminology is “higher level virtual universes”, “higher level virtual simulators of virtual universes”, “higher level of control”.
I don't like your terminology. It's "fluffy": redundant, pretentious, fuzzy & misleading. That's a bad taste in science, as distinct from art. Artist thrives on analogical confusion, Scientist abhors it & craves analytical clarity. Make your choice. Your interpretations “sound” wrong on many levels, but you don’t really define your terms, to the extent that they're different from mine. If you think they’re more expressive, or your conclusions are different from mine, please explain how. Try to think more & talk less, you know, review & rewrite your reply for a few days before posting it :).
> In order to interact/interface with the lowest level universe for the system, higher level must decompress its representations throughout the hierarchy, and each level down adds details, making the picture increasingly sharper.
I don't know if there's any need for decompression, higher levels may only adjust focus (input span & resolution) for lower levels. Patterns of different scope / generality must be kept separate to avoid "paradoxes" :).
>>> and proximity and recentness may need to be more abstract. >>That's already explained in the knol. >Just going up in the hierarchy?
Up *&* down, that's what the hierarchy is all about. But neither direction is fully explained in the knol (even to the extent that I understand them), so use your imagination.
> My first intuitive guess is predictive range, compression ratio; I think it's related to minimum message length/Kolmogorov's complexity...
It all sounds vaguely relevant, but defining a criterion means quantifying it. It’s not match so it’s not an actual compression, or even future compression. “Expected”, “predicted”, "partial" - how do you derive those things from pixel-level inputs? Because if you can't do it there, you can't do it anywhere, - combinatorial explosion gets you. I’ve shown how to quantify a basic match, & that still stands as an initial criterion. How do you derive from it a higher-order criterion that drives exploration? You gave a bunch of higher-level examples, but I am not even going to bother with them, that's not where I operate.
If you want to get constructive (meaningful), try to formalize comparing a single-integer input to a fixed-length continuous sequence of older inputs, & then form its prediction over the next sequence of the same length & direction.
I like Perlovsky’s explanation of the “knowledge instinct”, but his “Dynamic Logic” doesn’t seem to be very deep.
> Then to you is behavioral/condition
There’s no “control”, - computer analogies are misleading. Analogical thinking is a blunt instrument, try to avoid it. Higher motives are the ones that ultimately win over, not the ones that develop earlier. All brain areas have an inherited structure that determines their initial (instinctive) operation. Brain stem, amygdala, hypothalamus develop earlier, & their instincts dominate at first. Basic curiosity is a “cortical instinct”, likely driven by the structure of minicolumn, & neocortex is the last to fully develop. But that’s a genetically determined part. Postnatally, motivation develops by competitive conditioning of inherited motives & acquired value-loaded patterns in all of those areas. Conditioning is reinforcement of coincident (instrumental) & suppression of counter-incident (interfering) motives & stimuli patterns by all other motives. These patterns become acquired motives, but they're *not* lower than the original ones. Higher or lower is matter of strength, not of origin. Cortical cognition discovers more general patterns, that get relatively stronger because they stay instrumental longer. And curiosity itself is instrumental for discovery of all these patterns, so it ultimately becomes the top value & suppresses all others. You don't need any subcortical drives even to start, unless you have human physiology to take care of. But basic curiosity (I don't know the full "structure" of it yet) is only a start too. Introspective cognition derives higher orders of correspondence, developing things like mathematical curiosity.
> OK. :) At this point my terminology is “higher level virtual universes”, “higher level virtual simulators of virtual universes”, “higher level of control”.
I don't like your terminology. It's "fluffy": redundant, pretentious, fuzzy & misleading. That's a bad taste in science, as distinct from art. Artist thrives on analogical confusion, Scientist abhors it & craves analytical clarity. Make your choice. Your interpretations “sound” wrong on many levels, but you don’t really define your terms, to the extent that they're different from mine. If you think they’re more expressive, or your conclusions are different from mine, please explain how. Try to think more & talk less, you know, review & rewrite your reply for a few days before posting it :).
> In order to interact/interface with the lowest level universe for the system, higher level must decompress its representations throughout the hierarchy, and each level down adds details, making the picture increasingly sharper.
I don't know if there's any need for decompression, higher levels may only adjust focus (input span & resolution) for lower levels. Patterns of different scope / generality must be kept separate to avoid "paradoxes" :).
>>> and proximity and recentness may need to be more abstract. >>That's already explained in the knol. >Just going up in the hierarchy?
Up *&* down, that's what the hierarchy is all about. But neither direction is fully explained in the knol (even to the extent that I understand them), so use your imagination.
> My first intuitive guess is predictive range, compression ratio; I think it's related to minimum message length/Kolmogorov's complexity...
It all sounds vaguely relevant, but defining a criterion means quantifying it. It’s not match so it’s not an actual compression, or even future compression. “Expected”, “predicted”, "partial" - how do you derive those things from pixel-level inputs? Because if you can't do it there, you can't do it anywhere, - combinatorial explosion gets you. I’ve shown how to quantify a basic match, & that still stands as an initial criterion. How do you derive from it a higher-order criterion that drives exploration? You gave a bunch of higher-level examples, but I am not even going to bother with them, that's not where I operate.
If you want to get constructive (meaningful), try to formalize comparing a single-integer input to a fixed-length continuous sequence of older inputs, & then form its prediction over the next sequence of the same length & direction.
Report abusive commentHide report window
Posted by Boris
Kazachenko, last edited May 13, 2010 7:22
PM
I
realized an important difference - a different POV to a mind. My theory was not
inspired by brains, minicolumn hypothesis or so, it was a
sketch/direction/aim ed at a unifying theory of mind and systems
evolution in Universe.
Attempts to fit it exactly to brain regions causes a mess - there are overlaps and similarities to HTM, minicolulmn hypothesis, your theory, brains, but mine is different, digital, sketchy and was not as precisely defined, I see there were implied things which were not clearly specified and separated.
My speculations were based on observations on causality/determinis m (causal interdependency) and tendency of
evolving systems at prediction, repetitive and predictive behavior with ever
higher precision, resolution and range. "Control" in my writings was meaningful
and it's system's (module's) capability to predict and cause the future of what
it controls with certain probability/precisio n, where control is
formalized as a write to a memory, i.e. making certain target changes in an
output environment.
Mind is a compound/complex "control unit" itself, aiming at maximizing its capabilities to predict (imagine) and cause, where Universe is the ultimate control unit, "predicting" and causing everything at the maximum possible resolution, including mind itself, which is a "virtual sub-universe".
>There’s no “control”, - computer analogies are misleading.
My "mind sketch" was digital.
>(...) All brain areas have an inherited structure that determines their initial (instinctive) operation.
>Brain stem, amygdala, hypothalamus develop earlier, & their instincts dominate at first. Basic
curiosity is a “cortical instinct”, likely driven by the
>structure of minicolumn, & neocortex is the last to fully develop.
> (...) But basic curiosity (I don't know the full "structure" of it yet) is only a start too. Introspective
>cognition derives higher orders of correspondence, developing things like mathematical curiosity. "
Thanks, I see.
>I don't like your terminology. It's "fluffy": redundant, pretentious, fuzzy & misleading.
>That's a bad taste in science, as distinct from art. Artist thrives
>on analogical confusion, Scientist abhors it & craves analytical clarity. Make your choice.
>Your interpretations “sound” wrong on many levels, but you don’t really define your terms, to the extent that they're different from mine. If you
>think they’re more expressive, or your conclusions are different from mine, please explain how.
I'd make both choices. :)
Sometimes your definitions remind my observations and my interpretations are related to my theory and it makes sense *there*. Right - this is a mess.
Match, comparison, difference between predicted and expected, compression, a basic algorithm that learns other algorithms and data and collects them, complexity grow and sort of algorithmic complexity (but re-invented) etc. are some terms and topics from my writings. I'm not ready with a solid compressed explanations yet, though.
>I don't know if there's any need for decompression, higher levels may only adjust focus (input span &
>resolution) for lower levels. Patterns of different scope / generality must be kept separate to avoid
>"paradoxes" :).
Does adjusting focus mean:
- selecting/allowing comparison with more recorded samples is sort of widening of span - more general comparison.
- lowering the resolution allows recognition of fuzzy/pixelized images and results at a higher match ratio - a more general comparison.
I mean this: a word, a concept can be recorded and operated with a few bits from highest levels, but this is just a label, it makes sense in a high level virtual universe (imagination), but it needs much more raw data in order to be derived from a low level and to be expressed back there.
>Up *&* down, that's what the hierarchy is all about. But neither direction is fully explained in the
>knol (even to the extent that I understand them), so
>use your imagination.
OK
>It all sounds vaguely relevant, but defining a criterion means quantifying it.
>It’s not match so it’s not an actual compression, or even future compression.
>“Expected”, “predicted”, "partial" - how do you derive those things from pixel-level inputs?
I perfectly understand that it should start from the lowest level and the mechanics must be precisely defined. That is what I'm supposed to do "when I manage to concentrate"...
>I’ve shown how to quantify a basic match, & that still stands as an initial criterion.
>How do you derive from it a higher order criterion that drives exploration?
>(...)
>try to formalize comparing a single-integer input to a fixed-length continuous sequence of older
>inputs, & then form its prediction over the next sequence of the same length & direction.
Thanks for the task! I may have a break now and will be back later.
Todor
Attempts to fit it exactly to brain regions causes a mess - there are overlaps and similarities to HTM, minicolulmn hypothesis, your theory, brains, but mine is different, digital, sketchy and was not as precisely defined, I see there were implied things which were not clearly specified and separated.
My speculations were based on observations on causality/determinis
Mind is a compound/complex "control unit" itself, aiming at maximizing its capabilities to predict (imagine) and cause, where Universe is the ultimate control unit, "predicting" and causing everything at the maximum possible resolution, including mind itself, which is a "virtual sub-universe".
>There’s no “control”, - computer analogies are misleading.
My "mind sketch" was digital.
>(...) All brain areas have an inherited structure that determines their initial (instinctive) operation.
>Brain stem, amygdala, hypothalamus develop earlier, & their instincts dominate at first. Basic
curiosity is a “cortical instinct”, likely driven by the
>structure of minicolumn, & neocortex is the last to fully develop.
> (...) But basic curiosity (I don't know the full "structure" of it yet) is only a start too. Introspective
>cognition derives higher orders of correspondence, developing things like mathematical curiosity. "
Thanks, I see.
>I don't like your terminology. It's "fluffy": redundant, pretentious, fuzzy & misleading.
>That's a bad taste in science, as distinct from art. Artist thrives
>on analogical confusion, Scientist abhors it & craves analytical clarity. Make your choice.
>Your interpretations “sound” wrong on many levels, but you don’t really define your terms, to the extent that they're different from mine. If you
>think they’re more expressive, or your conclusions are different from mine, please explain how.
I'd make both choices. :)
Sometimes your definitions remind my observations and my interpretations are related to my theory and it makes sense *there*. Right - this is a mess.
Match, comparison, difference between predicted and expected, compression, a basic algorithm that learns other algorithms and data and collects them, complexity grow and sort of algorithmic complexity (but re-invented) etc. are some terms and topics from my writings. I'm not ready with a solid compressed explanations yet, though.
>I don't know if there's any need for decompression, higher levels may only adjust focus (input span &
>resolution) for lower levels. Patterns of different scope / generality must be kept separate to avoid
>"paradoxes" :).
Does adjusting focus mean:
- selecting/allowing comparison with more recorded samples is sort of widening of span - more general comparison.
- lowering the resolution allows recognition of fuzzy/pixelized images and results at a higher match ratio - a more general comparison.
I mean this: a word, a concept can be recorded and operated with a few bits from highest levels, but this is just a label, it makes sense in a high level virtual universe (imagination), but it needs much more raw data in order to be derived from a low level and to be expressed back there.
>Up *&* down, that's what the hierarchy is all about. But neither direction is fully explained in the
>knol (even to the extent that I understand them), so
>use your imagination.
OK
>It all sounds vaguely relevant, but defining a criterion means quantifying it.
>It’s not match so it’s not an actual compression, or even future compression.
>“Expected”, “predicted”, "partial" - how do you derive those things from pixel-level inputs?
I perfectly understand that it should start from the lowest level and the mechanics must be precisely defined. That is what I'm supposed to do "when I manage to concentrate"...
>I’ve shown how to quantify a basic match, & that still stands as an initial criterion.
>How do you derive from it a higher order criterion that drives exploration?
>(...)
>try to formalize comparing a single-integer input to a fixed-length continuous sequence of older
>inputs, & then form its prediction over the next sequence of the same length & direction.
Thanks for the task! I may have a break now and will be back later.
Todor
Report abusive commentHide report window
Posted by Todor
Arnaudov, last edited May 17, 2010 6:54
AM
> I
realized an important difference - a different POV to a mind. My theory was not
inspired by brains, minicolumn hypothesis or so,
Me neither, I am a generalist.
> it was a sketch/direction/aim ed at a unifying
theory of mind and systems evolution in Universe. My "mind sketch" was
digital.
I suspect it was an attempt to project your computer experience into areas where it doesn't belong. Very typical for AI tinkerers, - lots of ambition, but no clue.
> Attempts to fit it exactly to brain regions causes a mess - there are overlaps and similarities to HTM, minicolulmn hypothesis, your theory, brains, but mine is different, digital, sketchy and was not as precisely defined
You don't really understand things you can't define. Your attachment to ill-formed assumptions of your youth, as well as constant self-promotion, is probably a sign of insecurity.
> I'd make both choices. :)
That's not making a choice. You'll do neither well, & even “well” is useless here, only the-best-in-the-worl d will
do.
> Does adjusting focus mean:
- selecting/allowing comparison with more recorded samples is sort of widening of span - more general comparison.
- lowering the resolution allows recognition of fuzzy/pixelized images and results at a higher match ratio - a more general comparison.
Neither, both work "upward", focusing is downward. Guess again.
> I mean this: a word, a concept can be recorded and operated with a few bits from highest levels, but this is just a label, it makes sense in a high level virtual universe (imagination), but it needs much more raw data in order to be derived from a low level and to be expressed back there.
Raw data is what you start with. It’s lost during selective elevation & you won’t regain it by decompression. Patterns on every level are search range –defined. “Expressing” high-level patterns on lower levels will only create confusion about their “true” range (& you’re confused enough:)). There’s no need for it anyway, higher levels “expectations” are compared to lower-level “experience” when the latter is selectively elevated, not vice-versa.
> I perfectly understand that it should start from the lowest level and the mechanics must be precisely defined. That is what I'm supposed to do "when I manage to concentrate"...
You won’t, until & unless you change lifestyle. You need a boring life.
Me neither, I am a generalist.
> it was a sketch/direction/aim
I suspect it was an attempt to project your computer experience into areas where it doesn't belong. Very typical for AI tinkerers, - lots of ambition, but no clue.
> Attempts to fit it exactly to brain regions causes a mess - there are overlaps and similarities to HTM, minicolulmn hypothesis, your theory, brains, but mine is different, digital, sketchy and was not as precisely defined
You don't really understand things you can't define. Your attachment to ill-formed assumptions of your youth, as well as constant self-promotion, is probably a sign of insecurity.
> I'd make both choices. :)
That's not making a choice. You'll do neither well, & even “well” is useless here, only the-best-in-the-worl
> Does adjusting focus mean:
- selecting/allowing comparison with more recorded samples is sort of widening of span - more general comparison.
- lowering the resolution allows recognition of fuzzy/pixelized images and results at a higher match ratio - a more general comparison.
Neither, both work "upward", focusing is downward. Guess again.
> I mean this: a word, a concept can be recorded and operated with a few bits from highest levels, but this is just a label, it makes sense in a high level virtual universe (imagination), but it needs much more raw data in order to be derived from a low level and to be expressed back there.
Raw data is what you start with. It’s lost during selective elevation & you won’t regain it by decompression. Patterns on every level are search range –defined. “Expressing” high-level patterns on lower levels will only create confusion about their “true” range (& you’re confused enough:)). There’s no need for it anyway, higher levels “expectations” are compared to lower-level “experience” when the latter is selectively elevated, not vice-versa.
> I perfectly understand that it should start from the lowest level and the mechanics must be precisely defined. That is what I'm supposed to do "when I manage to concentrate"...
You won’t, until & unless you change lifestyle. You need a boring life.
Report abusive commentHide report window
Posted by Boris
Kazachenko, last edited May 17, 2010 9:09
PM
>I
suspect it was an attempt to project your computer experience into areas where
it doesn't
>belong. Very typical for AI tinkerers, - lots of ambition, but no clue.
Don't forget imagination and creativity - my kingdom :) - "universal simulators of virtual universes" are engines of imagination. Indeed I think art gives many clues about intelligence and the big picture of mind.
>You don't really understand things you can't define.
>Your attachment to ill-formed assumptions of your youth,
>as well as constant self-promotion, is probably a sign of insecurity.
Insecure - I am, this is correct. I need to make a breakthrough in order to stabilize life, income and start feeling more secure: a successful novel, a beautiful film with touching performance or so, and it's frustrating to balance time, wait and be unable to rise the resources needed.
Self-promotion - I don't have a real personal PR, an agent or so, I'm not acknowledged yet. Must attract followers and make contacts somehow, I want to start-up a business out of my art after all. I'd prefer somebody else to promote me.
Youth assumptions - I want to focus, understand and clear them out, before throwing them away. I'm attached, because I haven't finished with this.
See you in the next iteration!
T
>belong. Very typical for AI tinkerers, - lots of ambition, but no clue.
Don't forget imagination and creativity - my kingdom :) - "universal simulators of virtual universes" are engines of imagination. Indeed I think art gives many clues about intelligence and the big picture of mind.
>You don't really understand things you can't define.
>Your attachment to ill-formed assumptions of your youth,
>as well as constant self-promotion, is probably a sign of insecurity.
Insecure - I am, this is correct. I need to make a breakthrough in order to stabilize life, income and start feeling more secure: a successful novel, a beautiful film with touching performance or so, and it's frustrating to balance time, wait and be unable to rise the resources needed.
Self-promotion - I don't have a real personal PR, an agent or so, I'm not acknowledged yet. Must attract followers and make contacts somehow, I want to start-up a business out of my art after all. I'd prefer somebody else to promote me.
Youth assumptions - I want to focus, understand and clear them out, before throwing them away. I'm attached, because I haven't finished with this.
See you in the next iteration!
T
Report abusive commentHide report window
Posted by Todor
Arnaudov, last edited May 18, 2010 9:03
AM
Art =
fluff. You love fluff, & crave attention, the rest is just an
excuse.
Trying to focus on "understanding" the assumptions made when you understood a lot less then you do now is pathetic. You need to understand the subject matter - cognitive algorithm.
Trying to focus on "understanding" the assumptions made when you understood a lot less then you do now is pathetic. You need to understand the subject matter - cognitive algorithm.
Report abusive commentHide report window
Posted by Boris
Kazachenko, last edited May 18, 2010 4:18
PM
I
appreciate your badass wise sentences, but I like both art & science and
wanted and want to understand art as a cognitive process as well, it's a part of
the same machinery. Re-understanding operation is in progress, new understanding
is not in vain, this won't take much; and one of my immediate next AGI tasks is
to manage to think and write about cognition in your terms - will teach your
stuff and your comments to my students on Friday.
BTW, I believe a little bit of promotion may help even such a detached person like you. You agree that collaboration is the best "cognitive accelerator" and I'm sure at least some of the famous and smart AGI people such as Schmidhuber would spend some time with your articles and may make others consider them.
All needed is to let him/them know about you somehow.
BTW, I believe a little bit of promotion may help even such a detached person like you. You agree that collaboration is the best "cognitive accelerator" and I'm sure at least some of the famous and smart AGI people such as Schmidhuber would spend some time with your articles and may make others consider them.
All needed is to let him/them know about you somehow.
Report abusive commentHide report window
Posted by Todor
Arnaudov, last edited May 22, 2010 2:33
PM
> I
appreciate your badass wise sentences, but I like both art & science and
wanted and want to understand art as a cognitive process as well, it's a part of
the same machinery. Re-understanding operation is in progress, new understanding
is not in vain,
Generalization is a reduction. Yes, everything you know is related to it, but you won't get anywhere by piling things up.
> one of my immediate next AGI tasks is to manage to think and write about cognition in your terms - will teach your stuff and your comments to my students on Friday.
Holding my breath :)
> BTW, I believe a little bit of promotion may help even such a detached person like you. You agree that collaboration is the best "cognitive accelerator" and I'm sure at least some of the famous and smart AGI people such as Schmidhuber would spend some time with
your articles and may make others consider them.
I appreciate your appreciation (& promotion), but you forgot the second best accelerator. The reason I am, IM!HO, a lightyear ahead of anyone else is that I gave up on recognition & collaboration with tinkerers+fluffers that populate the field. It's not what you got, it's how you use it. Smarts won't do any good if you lack motivation to focus on the only problem that matters. Famous people have their blinders on. They're too distracted by, & protective of, their fame to pay attention to some security bum who tells them that their lifework is a pile of irrelevant crap.
Yes, collaboration would be great, but... I despair. Anyone who knows how to punch right keywords into Google will find me (there's *nothing* else), & those who don't are likely to be more trouble than help.
Generalization is a reduction. Yes, everything you know is related to it, but you won't get anywhere by piling things up.
> one of my immediate next AGI tasks is to manage to think and write about cognition in your terms - will teach your stuff and your comments to my students on Friday.
Holding my breath :)
> BTW, I believe a little bit of promotion may help even such a detached person like you. You agree that collaboration is the best "cognitive accelerator" and I'm sure at least some of the famous and smart AGI people such as Schmidhuber would spend some time with
your articles and may make others consider them.
I appreciate your appreciation (& promotion), but you forgot the second best accelerator. The reason I am, IM!HO, a lightyear ahead of anyone else is that I gave up on recognition & collaboration with tinkerers+fluffers that populate the field. It's not what you got, it's how you use it. Smarts won't do any good if you lack motivation to focus on the only problem that matters. Famous people have their blinders on. They're too distracted by, & protective of, their fame to pay attention to some security bum who tells them that their lifework is a pile of irrelevant crap.
Yes, collaboration would be great, but... I despair. Anyone who knows how to punch right keywords into Google will find me (there's *nothing* else), & those who don't are likely to be more trouble than help.
Report abusive commentHide report window
Posted by Boris
Kazachenko, last edited May 22, 2010 8:06
PM
How to filter out the improbable seems to me to be the key
Generation of a plethora of possible near-futures seems
possible, but how to filter out the staggering majority which are improbable, or
illegal in terms of the physical laws of the universe, seems complicated. Also,
how to collapse possibilities that are so similar as to be essentially the same
probabilistically? Then, your discussion of probability ranking the remaining
possibilities makes sense.
In any case, it would be a delight to hear from you Vitya/Burya. rick at bunkerplanet dot com.
In any case, it would be a delight to hear from you Vitya/Burya. rick at bunkerplanet dot com.
Last edited Jul 17, 2009 5:12
AM
Report abusive commentHide report window
This is a
bit backwards, Rick, I propose to *discover* possibilities (patterns), not to
generate them. "Generative" bias is typical for a programmer :). The patterns
are formed by comparing lower-level inputs, & projected to the extent of
cummulative match discovered by such comparison. This is how physical laws are
discovered, & it also answers your second question. This knol has a more
detailed discussion of the process, but I guess it's unbearably abstract.
Intelligence is a subject everyone feels competent to discuss, because everyone
has it. Yet, no one can reduce it to a formal procedure, or even formally define
its purpose. I feel such reduction requires an extreme "inductive" cognitive
bias, the opposite of the "deductive" bias selected for & cultivated by
Computer Science & Math.
Francesco Lentini:
Report abusive commentHide report window
Report abusive commentHide report window
Report abusive commentHide report window
Report abusive commentHide report window
Report abusive commentHide report window
Report abusive commentHide report window
Report abusive commentHide report window
AGI
Interesting. Very nice to see more people working on
Artificial General Intelligence.
I have written a few articles on self improving AI here: http://seedai.blogsp ot.com/2007_08_01_ar chive.html
In those, I agree with much of what is written here, for example "If we want to talk about improving programs, we have to define what it means to improve one's intelligence, and thus what it means to be intelligent. We want intelligent systems to be useful. Useful intelligence is, just as science, about prediction, planning and pattern recognition. These are all so intertwined as to be more or less the same thing."
You are very welcome to read and post your thoughts on my articles.
I have written a few articles on self improving AI here: http://seedai.blogsp
In those, I agree with much of what is written here, for example "If we want to talk about improving programs, we have to define what it means to improve one's intelligence, and thus what it means to be intelligent. We want intelligent systems to be useful. Useful intelligence is, just as science, about prediction, planning and pattern recognition. These are all so intertwined as to be more or less the same thing."
You are very welcome to read and post your thoughts on my articles.
Last edited Aug 11, 2008 10:28
PM
Report abusive commentHide report window
Thanks
David!
You're right, it sounds very similar on a high level, & I am sure there are many people who'd agree with the definition But I don't know of anyone who used it to derive a universal, low-level, quantitative criterion to select inputs & algorithms. The key is to start from the beginning: raw sensory inputs, & "test" their predictive value, in the process discovering more & more complex patterns. That's what scalability is all about, if you can't evaluate pixels, it'll be super-exponentially more difficult to start from more complex data. That's why I think Cyc, NLP, & high-level approaches in general are hopeless for AGI.
I am sorry, but your "Intelligence test" idea, besides being entirely hypothetical & presumably externally administered, has it exactly backwards. Just like many Algorithmic Learning approaches, you want to generate patterns & algorithms, instead of discovering them in a real world. Quite simply, we predict from experience, these patterns & algorithms will have *no* predictive value beyond mere chance, unless they're derived from the environment. Notice that the difference between patterns & algorithms is strictly in the origin: the former are discovered & the later are "invented".
You're right, it sounds very similar on a high level, & I am sure there are many people who'd agree with the definition But I don't know of anyone who used it to derive a universal, low-level, quantitative criterion to select inputs & algorithms. The key is to start from the beginning: raw sensory inputs, & "test" their predictive value, in the process discovering more & more complex patterns. That's what scalability is all about, if you can't evaluate pixels, it'll be super-exponentially more difficult to start from more complex data. That's why I think Cyc, NLP, & high-level approaches in general are hopeless for AGI.
I am sorry, but your "Intelligence test" idea, besides being entirely hypothetical & presumably externally administered, has it exactly backwards. Just like many Algorithmic Learning approaches, you want to generate patterns & algorithms, instead of discovering them in a real world. Quite simply, we predict from experience, these patterns & algorithms will have *no* predictive value beyond mere chance, unless they're derived from the environment. Notice that the difference between patterns & algorithms is strictly in the origin: the former are discovered & the later are "invented".
Francesco Lentini:
How about semantics?
Interesting article. Have you seen my "The machine to
read"?
Last edited Jan 23, 2011 5:39
PM
Report abusive commentHide report window
Thanks
Francesco!
Semantics(meaning) must be learned from experience, starting from sensory inputs. What I suggest a conditionally iterative learning algorithm, & syntax here is simply a record of operations perfomed by this algorithm on a given set of inputs. Such record is necessary to maintain comparability(readab ility) accross inputs of various "depth" of processing.
This processing is a form of compression, & recorded syntax makes it
possible to decompress data.
Thanks for the pointer, I'll take a look.
Semantics(meaning) must be learned from experience, starting from sensory inputs. What I suggest a conditionally iterative learning algorithm, & syntax here is simply a record of operations perfomed by this algorithm on a given set of inputs. Such record is necessary to maintain comparability(readab
Thanks for the pointer, I'll take a look.
Report abusive commentHide report window
Posted by Boris
Kazachenko, last edited Jul 27, 2008 6:40
PM
I agree
with your thesis. Well, general intelligence must be scalable, or
self-improving. Nevertheless, I am not sure that meaning *must* be learned from
experience. Meaning (or a certain level of meaning) would be an intrinsic
property of a message, and my algorithm Semantic Browsing http://www.intellibo ok.net/semanticbrows ing would
show really this. I collected a lot of examples (browsed texts) on my
site.
Returning now to your general intelligence definition, the focal point is the criterion of improvement. Can you explain better which this criterion should be, and/or can you furnish a practical example?
Returning now to your general intelligence definition, the focal point is the criterion of improvement. Can you explain better which this criterion should be, and/or can you furnish a practical example?
Report abusive commentHide report window
Posted by Francesco
Lentini, last edited Jul 27, 2008 12:45
PM
The
meaning "must" be learned, either by the algorithm, or by programmer's own
"learning algorithm". I am sure the later is common among people & some of
it is incorporated into natural language syntax, thus becoming "an intrinsic
property of a message". Other than that, you can try to build a universal
ontological database (as in Cyc) & use it to locate the "meaning" of
individual terms & phrases. A lot people work on "semantic search",
"semantic web", NLP in general, but this is not my focus & I am ill-equiped
to evaluate your algorithm.
Appreciate your interest in my "focal point". The criterion for intelligence is *predictive correspondence concentration*, or relative cumulative match of expectations to the following inputs. I've defined match on the lowest, single-variable, level. It's the same on higher levels, where inputs are multi-variable sequences. As long as you synchronize the syntax of the comparand sequences, the total match is the sum of corresponding variables' matches between the sequences. I suppose you're looking for NL-level examples, & that's where it gets extremely ambiguous. That sort of data went through a huge number of process iterations, & you have to rely on intuition to track it.
Take a look at "On Intelligence" by Jeff Hawkins, he is a lot better at high-level examples than I am.
Appreciate your interest in my "focal point". The criterion for intelligence is *predictive correspondence concentration*, or relative cumulative match of expectations to the following inputs. I've defined match on the lowest, single-variable, level. It's the same on higher levels, where inputs are multi-variable sequences. As long as you synchronize the syntax of the comparand sequences, the total match is the sum of corresponding variables' matches between the sequences. I suppose you're looking for NL-level examples, & that's where it gets extremely ambiguous. That sort of data went through a huge number of process iterations, & you have to rely on intuition to track it.
Take a look at "On Intelligence" by Jeff Hawkins, he is a lot better at high-level examples than I am.
Report abusive commentHide report window
Posted by Boris
Kazachenko, last edited Jul 27, 2008 7:33
PM
I notice
that http://www.intellibo ok.net/intellibook10 / is not
working anymore, it would have been Lentini's article. I have argued elsewhere
(and in vain) that any algorithm would have to be "seeded" with real world
"statistics", particularly something like vision has been shown to be heavily
informed about useful and usual colors and shapes, while it should not be
necessary to reproduce human handicaps like the difficulty of reading mirrored
text.
What I think is less understood is how "thinking" will also need its own set of "built ins", patterns and concepts and processes that would be unfair to expect an AGI to work out bit by bit. I am working on isolating these built ins, and would also like to offer a counterexample on the limits of reverse engineering input bits: imagine someone sends you the digits from pi's decimal expansion, and just to trick you out starts from an arbitrary position, lets say from the 100th onwards. It would be "intelligence" to come up with this explanation and predict the sequence ad infinitum, but what kind of IQ is required? I'd say infinite, the problem is intractable and would suggest that there is no intelligence at all "in general". Intelligence is a response to a constrained environment, it is about straight lines and circles and a few "primary colors" and the tendency of things too change at manageable rates and people having limited emotional states etc. Working with bits supposedly coming from an unconstrained/unknow n environment is a recipe for failure
methinks.
What I think is less understood is how "thinking" will also need its own set of "built ins", patterns and concepts and processes that would be unfair to expect an AGI to work out bit by bit. I am working on isolating these built ins, and would also like to offer a counterexample on the limits of reverse engineering input bits: imagine someone sends you the digits from pi's decimal expansion, and just to trick you out starts from an arbitrary position, lets say from the 100th onwards. It would be "intelligence" to come up with this explanation and predict the sequence ad infinitum, but what kind of IQ is required? I'd say infinite, the problem is intractable and would suggest that there is no intelligence at all "in general". Intelligence is a response to a constrained environment, it is about straight lines and circles and a few "primary colors" and the tendency of things too change at manageable rates and people having limited emotional states etc. Working with bits supposedly coming from an unconstrained/unknow
Report abusive commentHide report window
Posted by Anastasios
Tsiolakidis, last edited Jan 21, 2011 3:32
PM
Right,
any intelligence would be useless in an effectively random environment. But our
real environment is plenty constrained already, first by entropy growth, then by
evolution, now by technology. Constraining it even further is piece of cake, -
all you have to do is slow down time. I think you’re looking for easy problems
because you can’t deal with the hard one, - scalable pattern discovery in an
environment that our own intelligence handles easily.
Report abusive commentHide report window
Posted by Boris
Kazachenko, last edited Jan 21, 2011 5:12
PM
Let's
just say that I favor problems where "environmental statistics" are plenty or
even complete(in toy problems). In addition to natural language I would single
out these two problem domains: 1) language development between two agents, ie
using a communication channel between 2 protoAGIs to cooperate, or more
accurately to have AGI.b do "what AGI.a says, not what it does". This also
implies an independent "observation channel". What does it take for AGI.b to
turn left when receiving the message "l", starting from blank slates, tabula
rasa? Unintentionally you may have received insight into one of zoology's sad
stories, why intelligent animals are so bloody hierarchical!
2) the "embedded scientist", getting a protoAGI to predict/reverse engineer its environment while fully exposed to it, thus working around the problem "the observer changes the observation" and perhaps having to "fight for its life" as well. This needs a simulation of a different kind than your average game engine, probably a cellular automaton implementation.The real shoulders of giants for human intelligence is not so much Euclid and Einstein but the biological heritage which enables us to stay alive long enough as individuals and civilizations to slowly unravel the mystery of the world, it would be a miracle if an embedded intelligence in the Game of Life achieved that state where it can just wait and formulate algorithms. A big spanner in the works remains the "unsolved" problem of society and synergy, we have found a way it seems to benefit from millions of people who are mutually clueless, meaning they have different areas of expertise. Obviously we are not perfect at this, we may have failed to fully integrate the genius of, say, Tesla and Jesus, but we are better than any program I have seen. (on a tangent, I should add that it is anything but self-evident that we are benefiting from our synergy in any deeper sense, I simply refer to the build up of science and technology)
2) the "embedded scientist", getting a protoAGI to predict/reverse engineer its environment while fully exposed to it, thus working around the problem "the observer changes the observation" and perhaps having to "fight for its life" as well. This needs a simulation of a different kind than your average game engine, probably a cellular automaton implementation.The real shoulders of giants for human intelligence is not so much Euclid and Einstein but the biological heritage which enables us to stay alive long enough as individuals and civilizations to slowly unravel the mystery of the world, it would be a miracle if an embedded intelligence in the Game of Life achieved that state where it can just wait and formulate algorithms. A big spanner in the works remains the "unsolved" problem of society and synergy, we have found a way it seems to benefit from millions of people who are mutually clueless, meaning they have different areas of expertise. Obviously we are not perfect at this, we may have failed to fully integrate the genius of, say, Tesla and Jesus, but we are better than any program I have seen. (on a tangent, I should add that it is anything but self-evident that we are benefiting from our synergy in any deeper sense, I simply refer to the build up of science and technology)
Report abusive commentHide report window
Posted by Anastasios
Tsiolakidis, last edited Jan 22, 2011 7:30
AM
Anastasios,
thanks for reporting, the service has been restored! Go to www.intellibook.net and click "the machine to read".
Well, at moment this is my response to your clever chatter. Please enter in the box a text written in ANY language (Latin alphabet) from 1500 to 15000 chars in lenght, click a button and see what happens. For example, here is a RESUMEE of "Executive Attention" article by Boris.
Attention is a mechanism that focuses cognitive search.
Attention span as discussed here is not a simple duration of focus on a subject.
Rather, it’s a scope of cognitive search (level of generalized experience) that determines priorities, - selects subjects for focused ATTENTION.
Deliberate control over the focus of one's ATTENTION will be the most profound revolution yet, - it will change what we want out of life.
Precisely, this RESUMEE is based on the first 8K of the article, because you, as Guest, may not exceed this limit. Let me know if you want a registered (user payable) account.
Hi Boris, you know do more and better?
thanks for reporting, the service has been restored! Go to www.intellibook.net and click "the machine to read".
Well, at moment this is my response to your clever chatter. Please enter in the box a text written in ANY language (Latin alphabet) from 1500 to 15000 chars in lenght, click a button and see what happens. For example, here is a RESUMEE of "Executive Attention" article by Boris.
Attention is a mechanism that focuses cognitive search.
Attention span as discussed here is not a simple duration of focus on a subject.
Rather, it’s a scope of cognitive search (level of generalized experience) that determines priorities, - selects subjects for focused ATTENTION.
Deliberate control over the focus of one's ATTENTION will be the most profound revolution yet, - it will change what we want out of life.
Precisely, this RESUMEE is based on the first 8K of the article, because you, as Guest, may not exceed this limit. Let me know if you want a registered (user payable) account.
Hi Boris, you know do more and better?
Report abusive commentHide report window
Posted by Francesco
Lentini, last edited Jan 23, 2011 9:09
AM
Thanks
Francesco!
Not a bad summary, but it missed the meat of the knol, which is in “Practical Implications” part. The problem with your approach is that a summary should be an introduction to an article, & a good author would write his own (the knol starts with one). If your algorithm can do better than the author, then it should be writing its own articles :).
Not a bad summary, but it missed the meat of the knol, which is in “Practical Implications” part. The problem with your approach is that a summary should be an introduction to an article, & a good author would write his own (the knol starts with one). If your algorithm can do better than the author, then it should be writing its own articles :).
Subscribe to:
Posts (Atom)



Boris,
I spend most of my time focusing on "real AGI issues", but I don't consider this list the best place to do that.
Focusing on real AGI issues is best done within some particular paradigm and approach, within a community of people who have provisionally agreed to work within that approach to see where it leads. This list is so heterogeneous in nature, that it's not really possible to pursue in-depth AGI conversations here -- because as soon as you get started discussing a set of detailed ideas meaningful within one broad approach to AGI, the discussion gets sidetracked into foundational discussions with folks who don't like that broad approach.
I tried to resolve this problem a few years ago by starting an AGI forum site, but pretty much nobody came, so I killed it after a while...
So I've found private discussions on deep AGI issues much more productive... though this list is still useful as a generic "meeting ground" for various random AGI-interested people...
Anyway, it would be a big mistake to judge the level of overall discussions btw AGI researchers in the world, based on the discussions on this list ;p
... ben g
On Sun, Dec 18, 2011 at 8:08 PM, Boris Kazachenko wrote:
A thread is hijacked & turned into a pissmatch because no one here has an attention span to focus on real issues.
In terms science in general, I definitely agree with Ben, - an ability to work alone is a plus, but other things are more important.
But in terms of formalizing general intelligence, it's not a plus, it's an AND. One must work alone because no one else is working.
Boris:
Ben,
I appreciate your efforts (including this list) & didn't mean to blame you for sidetracking the thread. Heck, if no one wants to talk business, why not... Like you said, this list is only useful for introducing an approach & updates thereto.
> I spend most of my time focusing on "real AGI issues", but I don't consider this list the best place to do that.
The best place to focus is one's own website... & I don't see much focus on your blog.
> So I've found private discussions on deep AGI issues much more productive...
You only get private discussions *after* you introduced people to your approach. And you restrict your audience down to nothing unless that introduction is public.
> Anyway, it would be a big mistake to judge the level of overall discussions btw AGI researchers in the world, based on the discussions on this list ;p
That's not how I judge it. I follow links & do searches on *unavoidable* keyword combinations. The level of private discussions can't be much higher than that of public introductions.
From: Ben Goertzel
Sent: Monday, December 19, 2011 10:12 AM
To: AGI
Subject: Re: [agi] Intelligence as a cognitive algorithm.
Boris wrote,
OpenCog has its own website, which is not updated frequently enough, but does focus on OpenCog ;)
My personal blog is more wide-ranging, as you've noted. I spend a majority but not 100% of my time on AGI -- partly because I need to earn a living, and partly because that's just the way my mind works ... I guess we all need to strike our own balance between purposeful focus on one thing, and broad-ranging exploration...
You only get private discussions *after* you introduced people to your approach. And you restrict your audience down to nothing unless that introduction is public.
Sure, and this list is good for those sorts of introductions...
Well, private discussions can get much more in-depth both conceptually and technically.
But I guess it's true that, if you reject someone's approach based on a rough description (because it doesn't agree with your own intuition), you would probably still reject it after hearing more of the conceptual and technical details. Maybe you mean something like that...
Boris:
Ben,
> But I guess it's true that, if you reject someone's approach based on a rough description (because it doesn't agree with your own intuition),
> you would probably still reject it after hearing more of the conceptual and technical details. Maybe you mean something like that...
From OpenCog "Theory" section: "OpenCog is a diverse assemblage of cognitive algorithms, each embodying their own innovations — but what makes the overall architecture powerful is its careful adherence to the principle of cognitive synergy."
There's nothing for me to reject. You only know what's "synergetic" after experimentation, so your overall "theory" is trial & error. That took evolution >3B years on a planet-size quantum mechanical "computer".
From: Ben Goertzel
Sent: Monday, December 19, 2011 12:03 PM
To: AGI
Subject: Re: [agi] Intelligence as a cognitive algorithm.
No... in OpenCog we're trying to engineer synergy between a specific collection of cognitive processes,
architected according to specific principles, and there's a lot of theory underlying each of these processes and their interactions.
There is a certain amount of trial and error involved but also a lot of specialized theory...
ben
Boris:
But no overall theory.
Ben:
Hmmm...
Well, there is a high-level overall theory underlying OpenCog, which I wrote about at length during 1993-2006 in various books, e.g. The Hidden Pattern which gives a summary of many aspects (only semi-technically)
Then there is a lot of detailed theory underlying the different cognitive processes in the OpenCog design, and their interactions
However, while this detailed theory appears to be **compatible with** the high-level theory, it's not **derived from** the high-level theory.... This is a shortcoming.
However, I prefer to accept this shortcoming, than to adopt an alternate approach whose underlying theory appears to me fundamentally conceptually inadequate (which is my current reaction to your knol, though I must temper that with the comment that it's obviously a very compacted representation of your ideas, so there may be way more to your thinking and approach than I limned from that page...). I just don't buy the idea that hierarchical pattern recognition is the whole story, or even 40% of the story, for human-level AGI...
One of my ongoing compromises, is how to divide time btw building theoretical bridges between the high-level and detailed theory of my approach, versus guiding the practical implementation. I enjoy the theoretical aspect more, but feel the practical work is probably more valuable at this stage...
Boris:
> However, while this detailed theory appears to be **compatible with** the high-level theory, it's not **derived from** the high-level theory.... This is a shortcoming.
Right, you can't "derive" much from that hand-waving :). You need a formal definition of a "pattern", & I have it.
> However, I prefer to accept this shortcoming, than to adopt an alternate approach whose underlying theory appears to me
> fundamentally conceptually inadequate (which is my current reaction to your knol, though I must temper that with the comment
> that it's obviously a very compacted representation of your ideas, so there may be way more to your thinking and approach than I limned from that page...).
Use your imagination :). I should have an expanded edit soon, along with moving back to Blogger.
> I just don't buy the idea that hierarchical pattern recognition is the whole story, or even 40% of the story, for human-level AGI...
I think you're confusing general intelligence with a bunch of other things that clog human mind, as well as forgetting about modulatory / motor feedback.
Sent: Monday, December 19, 2011 1:56 PM
To: AGI
Subject: Re: [agi] Intelligence as a cognitive algorithm.
On Mon, Dec 19, 2011 at 1:37 PM, Boris Kazachenko wrote:
OK, a "compressed representation" is rather obvious, I meant that compression must be defined as an incremental & selective process.
That is also rather obvious ...
On Mon, Dec 19, 2011 at 2:59 PM, Boris Kazachenko wrote:
Show me where it's explained.
From: Ben Goertzel
Sent: Monday, December 19, 2011 3:04 PM
To: AGI
Subject: Re: [agi] Intelligence as a cognitive algorithm.
Essentially every proto-AGI architecture contains some component that does compression in an incremental and selective way, e.g. DeSTIN and MOSES certainly do... those are broad constraints that don't really say that much about how to do compression or pattern recognition....
In 1993 I wrote about the internal network of a mind as a dynamic "dual network" with linked (and co-evolving) hierarchical and heterarchical structures. The hierarchical network provides incremental pattern composition, the heterarchical network provides associational selection. Each network must be associated with appropriate learning algorithms. That was a long time ago and my AGI design is much more sophisticated now, but it's a similar principle...
Boris:
> Essentially every proto-AGI architecture contains some component that does compression in an incremental and selective way, e.g. DeSTIN and MOSES certainly do...
> those are broad constraints that don't really say that much about how to do compression or pattern recognition...
These are broad constraints for “broadly“ incremental approach. My approach is strictly incremental, - that’s not a “constraint“, it’s a direct determinant of what is being compared (=compressed) & how. There’s only one place to start: pixels, & only one way to go from there: compare them in 1D, & iterate from there. Well, it actually starts from binary inputs & digitization, but that’s harder to relate to the rest of the algorithm.
Any less incremental, & you lose opportunity for intermediate selection, which leads to less efficient search & then combinatorial explosion.
Boris:
> I think the learning/teaching approach is, to some extent, a separate issue from the system architecture and algorithms.
You can make it separate, but that would be a waste.
> ...DeSTIN and also Itamar's proprietary HDRN system are already applied in that manner...
Right, I keep hearing that. It's supposed to be a top secret, but half of what I've seen is incompatible with my approach, & your difficulties understanding the later suggest that so is much of the rest.
> I'm unconvinced that this is the best way to have one's AGI system learn.
> But, I do think one should build an AGI system **capable** of learning in such a
> manner, even if for practical expediency reasons one chooses a different sort of world-
> interfacing approach...
It is conceptually the simplest & the most fundamental way, any short-cuts should be an add-on.
> Some folks, like my friend Itamar Arel, seem to think all of abstract cognition can be gotten to emerge
> from this sort of perception / action / reinforcement focused architecture.
> According to my best understanding, you and Boris K share this general perspective
The fact that he needs additional action & reinforcement hierarchies suggests that his perceptual hierarchy is fundamentally deficient.
> I'm not so sure, I suspect other stuff may be required too...
Well, lets start from the basics, you can't avoid that anyway, right?
> I don't mean to be dismissive when I refer to "details" --- getting
> the details right is going to be critical to making a thinking machine work.
> And there may be many different ways of getting the details right...
In my approach, there's only one right way to get every detail, that's why I call it a theory.
Boris:
>> In my approach, there's only one right way to get every detail, that's
>> why I call it a theory.
>> Boris.
>
> Strange statement...
> For example, aerodynamics is a real theory (better established than
> anyone's theory of AGI), yet it admits multiple possible ways of
> creating flying machines... with rather large differences between them
> !!
All analogies are flawed, - it's a lazy way to think.
> (There may be one optimal way of creating a flying machine, given a
> certain set of well-specified constraints on the flying machine, where
> optimality is defined as minimum-energy or some such. But,
> aerodynamic theory as yet gives us no way to find this kind of optimal
> flying machine design...)
> So why should a theory of intelligence admit only one possible design
> for a thinking machine?
I didn't say possible, I said right way: directly derived from the way you
define a problem. A theory of intelligence is different because it's
supposed to be general, - context-free. At least the very core of it, which
should be a starting point anyway. Again, you can add environmentally &
application- specific adaptations latter, but the core algorithm must, in
principle, be able to learn them on it's own.
That's the very meaning of "general", as opposed to any empirically-specific
theory.
> I don't grok your theory of theories ;p
> ... ben g
It's a meta-theory, not a garden-variety kind :).
Boris:
Ben,
generally understood as 3D,
It's 4D, but that's our physics. General intelligence must be able to operate in any-dimensional space. I start with 1D because dimensionality (as well as any other form complexity) must be *incremental*. Search in higher dimensions adds syntactic cost, & we need to *select* inputs capable of bearing that extra cost.
understood as 2D arrays ... care to clarify?
It's 2D, but that's not the first level of processing. Eye tremor makes each rod | kone *see* & interrelate a largely horizontal scan line of inputs. Interaction among these cells can be interpreted as subsequent integration of these scan lines in 2D. But, as anywhere else in biology, there's a lot redundancy & evolutionary artifacts in retina, so I don't see it as a model.
as to 2D retinas, so you want to test it in the simpler 1D case first?
Every level is a simpler "test" before the next level, but primarily for inputs rather than algorithms. In my model 1D (horizontal scan line) search generates 1D patterns, which are selected & then compared in 2D (vertically) on a higher level, forming 2D patterns. The comparison algorithm is largely the same, but 1D patterns have mulitple additional variables (length & derivatives). Each variable is compared independently, & the results are then integrated within a pattern. And so on in higher dimensions.
results. Well, it actually starts from binary inputs & digitization thereof,
but its harder to see how this relates to the rest of the algorithm. Colors
& so on, as well as spatial dimensions & hardware details are not part a
core algorithm, - those are sensor / empirically specific & learnable,
though you can add short-cuts manually. We need to understand the core
algorithm *before* we can develop useful add-ons.
Any less incremental, & you lose opportunity for intermediate selection,
which leads to less efficient search.
be balanced against the cost of having more possibilities survive the
selection.... But indeed it's important to have the potential for
selection at all levels in the perception processing hierarchy, as
needed...
Boris: Selection is what intelligence is all about.
Ben G: As a couple almost examples of things that confused me in your knol ..:
"
This may seem similar to Levin Search, but the latter selects among randomly generated algorithms (of incremental complexity) that happen to solve a problem | compress a bitstring. My approach, on the other hand, is to search for patterns within environmental input flow. Hard distinction between input patterns & algorithms exists only for special-purpose programs.
"
Ben G: but it's not clear to me from the preceding paragraphs how your proposed system can recognize arbitrary computable patterns (as Levin search obviously can),
Boris:
I don’t like “arbitrary”, but if a given location is projected to be important enough (per “hierarchical feedback” part), all its outputs are elevated losslessly & eventually compared in all possible combinations.
Ben G: or if so what representation language it uses.
Boris: There’s no fixed “language”, the algorithm generates incrementally complex syntax on every level of generalization. I described first steps of this process in “syntactic expansion” part.
Ben G: My immediate impression is that your method would be limited to primitive recursive functions (which can be built up via composition from elementary functions), but the description isn't detailed enough for me to tell.
Boris:
It doesn’t need to be detailed, selective (pruned) recursion & combinatorics is the only method we have for generating functions of any complexity. But that’s math, empirical pattern discovery comes way before that.
"
On the next level of search, the derivatives are also selectively compared between patterns. This generates secondary derivatives over discontinuity, &|or over different types of coordinates. Such syntactic expansion is pruned by selected representation of variable types at a given resolution of position & magnitude, with partially redundant aggregation at a lower resolution.
Beyond that, a higher-order syntax is formed by comparisons across current syntax, analogous to, but far more complex than comparison across initial coordinates.
"
Ben G: but I don't understand what is your method for choosing which expressions in this "higher order syntax" to evaluate against sensory data (and the lower-level patterns computed therein)
Boris:
All the patterns are compared to all the lower-level outputs within a given range of search. Selection (evaluation for elevation) of potential outputs is done at a lower level, according to projected match of the former. Projected match (predictive correspondence) is the quantitative criterion of intelligence that I was talking about. It is computed by adjusting accumulated match of a pattern (defined in part 2) by hierarchical feedback (described in part 4), – average match, redundancy, contrast, expected match...
Ben G:
I don't really understand how you would handle
-- generating composite actions
-- episodic memory
-- assignment of credit thru the hierarchy once reinforcement is received
-- lots and lots of other stuff
Boris:
Neither I nor anyone else can explain higher levels explicitly, they build on a gazillion of lower-level choices. What I have is general principles that guide this process, derived from my definition of intelligence. This definition is the highest-level (meta) generalization one can make. It can't be "proven" without a life-time worth of examples, one simply has to work up to it through introspection. People get increasingly sloppy with elevation, hence the ludicrous mess that passes for philosophy & AGI.
Ben G: I sort of understand how you want to do perceptual pattern recognition, but not really how you want to leverage perceptual pattern recognition for control of an embodied agent doing stuff in a world over an extended time period...
Boris: Action is an adjustment of coordinates for sensors & actuators (they’re always combined), which is a direct extension of downward feedback within a representational hierarchy.
Ben G: I would be curious if other list members find the knol more transparent than I do...
Boris: They probably don’t, it’s something you need to work on, at the exclusion of everything else.