This intro is out of date, new version is Readme on GitHub
Proposed algorithm is a strictly bottomup connectivitybased clustering, from pixels to eternity. It's derived directly from my definition of general intelligence: the ability to predict from prior / adjacent input. That includes planning, which is technically a selfprediction. Any prediction is interactive projection of known patterns, hence primary process must be pattern discovery (AKA unsupervised learning: an obfuscating negationfirst term). This perspective is not novel, pattern recognition a main focus in ML, and a core of any IQ test. The problem I have with current ML is conceptual consistency.
Pattern recognition is a default mode in Neural Nets, but they work indirectly, in a very coarse statistical fashion. Basic NN, such as multilayer perceptron or KAN, performs lossy stochastic chainrule curve fitting. Each node outputs a normalized sum of weighted inputs, then adjusts the weights in proportion to modulated similarity between input and output. In Deep Learning, this adjustment is mediated by backprop of decomposed error (inverse similarity) from the output layer. In Hebbian Learning, it's a more direct adjustment by local output/input coincidence: a binary version of their similarity.
Modern ANNs combine such vertical training with lateral crosscorrelation, within an input vector. CNN filters are designed to converge on edgedetection in initial layers. Edge detection means computing lateral gradient, by weighted pixel crosscomparison within kernels. Graph NNs embed lateral edges, representing similarity or/and difference between nodes, also produced by their crosscomparison. Popular transformers can be seen as a variation of Graph NN. Their first step is selfattention: computing dot product between QKV vectors within context window of an input. This is a form of crosscomparison because dot product serves as a measure of similarity, though an unprincipled one.
So basic operation in both trained CNN and selfattention is what I call crosscomparison, but the former selects for variance and the latter for similarity. I think the difference is due their relative rarity in respective target data: sparse gradients in raw images and sparse similarities in compressed text. This rarity or surprise determines information content of the input. But almost all text actually describes generalized images and objects therein, so there should be a gradual transition between the two. In my scheme higherlevel crosscomparison computes both variance and similarity, for differential clustering.
GNN, transformers, and Hinton's Capsule Networks also have positional embeddings (as I use explicit coordinates). But they are still trained through destructive backprop: indiscriminate summation first, meaningful outputtotemplate comparison last. This primary summation degrades resolution of the whole learning process, exponentially with the number of layers. Hence, a ridiculous number of backprop cycles is needed to fit hidden layers into generalized representations (patterns) of the input. Most practitioners agree that this process is not very smart, but it's simple enough for lazy human coding and even lazier evolution. It's also easy to parallelize, which is crucial for cellbased biology.
I think it should be the reverse: first crosscomparison of atomic inputs, then summing them into matchdefined patterns/clusters. That's lateral connectivitybased clustering, vs. vertical statistical fitting in NN. This crosscomp and clustering is recursively hierarchical, forming patterns of patterns and so on. Resulting compositional hierarchy is indefinitely extended as a pipeline. As are most interesting realworld systems, a connectivity cluster is defined by links / interactions between its nodes. Initial frame of reference here is spacetime, but higher levels will reorder the input along all sufficiently predictive derived dimensions, similar to spectral clustering. Feedback adjusts hyperparameters to filter future inputs, vs. fitting them to templates. No topdown training, only bottomup learning.
Connectivity clustering is among the oldest approaches in ML, what I think makes my scheme different:
 links are valued by both similarity and variance, derived by comparison between their nodes, and potentiated by the overlap in the surround of these nodes (context).
 firstprinciples definition of similarity as compression (vs. dot product), direct or inverse. Variance is intrinsically negative, only valuable if borrowing from coprojected similarity.
 nested derivatives parameterize resulting clusters for higherorder crosscomp, selectively incremental in range, derivation, and composition of comparands and their param sets.
Such compressed encoding should be far more meaningful than huge flat weigh matrices in ANNs. But it’s very complex to consistently design and parallelize, precluding immediate trial and error that dominates ML.
Below I describe the process in more detail, then extend comparisons to ANN and BNN. This is an open project: CogAlg, we need help with design and implementation, in Python. I offer awards for contributions, or monthly payment if there is a track record, see the last part here.
This content is
published under Creative Commons Attribution 4.0 International License.
Outline of my
approach
Initial clustering levels, positional resolution (macro)
lags value resolution (micro) by one quantization order:
Inputs 
Comparison

Positional resolution 
Outputs 
Conventionally known as 
unary intensity 
AND 
none, all in the same
coordinates 
pixels of intensity 
digitization 
integer pixels 
SUB 
binary: direction of comparison 
blobs of gradient 
edge detection, flood fill 
float: averaged params of blobs 
DIV: comp ave params 
integer: distance between blob
centers 
graphs of blobs 
connectivitybased clustering 
complex: norm. params of graphs 
LOG: params hierarchy 
float: distance between graph
centers 
hierarchical graphs 
agglomerative clustering 
And so on, higher levels should be added recursively. Such
process is very complex and deeply structured, there is no way it could evolve
naturally. Since the code is supposed to be recursive, testing before it is complete
is useless. Which is probably why no one seems to work on such methods. But once
the design is done, there is no need for interminable glacial and opaque training,
my feedback only adjusts hyperparameters.
So, a pattern is
a cluster of matching input elements, where match is compression achieved by
encoding the input as derivatives, see “Comparison” section below. Some define
pattern as a recurring item or group, in my terms these are pattern elements.
If the items covary: don't match but their derivatives do, then they form
higherderivation pattern, where the elements are derivatives.
But
lowerderivation and shorterrange crosscomp must be done first, starting with
consecutive atomic inputs. That means sensory input at the limit of resolution:
adjacent pixels of video or equivalents in other modalities. All primary
modalities form dense array of such inputs in Cartesian dimensions, symbolic
data is subsequent encoding. To discover meaningful patterns, the symbols must
be decoded, which is exponentially more difficult with the level of encoding. Thus
a start with raw sensory input is by far the easiest to implement (part 0).
This lowlevel process, directly translated into my code,
seems like quite a jump from the generalities above. But it really isn’t, internally
consistent pattern discovery must be strictly bottomup, in complexity of both
inputs and operations. And there is no ambiguity at the bottom: initial predictive
value that defines patterns is a match from crosscomparison among their
elements, starting with pixels. So, I think my process is uniquely consistent
with highlevel definitions, please let me know if you see any discrepancy in
either.
Comparison, more in part 1:
Basic comparison is
inverse arithmetic operation between singlevariable comparands, of incremental
power: Boolean, subtraction, division, etc. Each order of comparison forms miss or variance:
XOR, difference, ratio, etc., and match or similarity, which can be defined directly or as inverse
deviation of miss. Direct match is compression of represented magnitude by
replacing larger input with corresponding miss between the inputs: Boolean AND,
the smaller input in
comp by subtraction, integer part of ratio in comp by division, etc.
These direct similarity measures work if input intensity
corresponds to some measure of stability of an object: mass, energy, hardness. This
is the case in tactile but not in visual input: brightness doesn’t correlate
with inertia or invariance, dark objects are just as stable as bright ones.
Thus, initial match in vision should be defined indirectly, as inverse
deviation of variation in intensity. 1D variation is difference, ratio, etc.,
while multiD comparison has to combine them into Euclidean distance and
gradient, as in common edge detectors.
Patterns, more in part 2:
Crosscomparison among patterns forms match and miss per
parameter, as well as dimensions and distances: external match and miss (these
are separate parameters: value = precision of what * precision of where).
Comparison is limited by max. distance between patterns. Overall hierarchy has
incremental dimensionality: search levels ( param levels ( pattern levels)).., and
pattern comparison is selectively incremental per such level. This is hard to
explain in NL, please see the code, starting with line_Ps and line_PPs.
Resulting matches and misses are summed into lateral match
and miss per pattern. Proximate input patterns with aboveaverage match to
their nearest neighbors are clustered into higherlevel patterns. This adds two
pattern levels: of composition and derivation, per level of search. Conditional
crosscomp over incremental range and derivation, among the same inputs, may
also add sublevels in selected newly formed patterns. On a pixel level,
incremental range is using larger kernels, and incremental derivation starts
with using Laplacian.
Feedback, more in part 3 (needs editing):
Average match is the first order of value filter,
computed on higher levels. There are also positional filters, starting
with pixel size and kernel size, which determine external dimensions of the
input. Quantization (bit, integer, float..) of internal and external filters
corresponds to the order of comparison, The filters are similar to hyperparameters
in Neural Nets, with values updated by feedback. But I have no equivalent of
weight matrix: my learning is connectivity clustering, vs. vertical clustering
via backprop or Hebbian learning.
All filter types represent coaverages to a higherlevel
average value, locally projected by higherlevel patterns. Clustering on a
filtered level is by the sign of deviation from those filters (crossinputelementmatch
 filter), so using averages balances positive and negative patterns: spans of above
and below average crossmatch in future inputs. Resulting positive patterns contain
input elements that are both novel: exceeding expectations of higher levels,
and similar to each other: making them predictive of future input.
Hierarchy, part 4 but out of date:
There is a single global hierarchy: feedforward inputs
and feedback filters pass through the same levels of search and composition. Each
higher level is a nested hierarchy, with depth proportional to elevation, but
subhierarchies are unfolded sequentially. That’s why I don’t have many
diagrams: they are good at showing relations in 2D, but I have a simple 1D sequence
of levels. Nested subhierarchies are generated by the process itself,
depending on elevation in a higherorder hierarchy. That means I can’t show
them in a generic diagram.
Braininspired schemes have separate sensory and motor
hierarchies, in mine they combined into one. The equivalent of motor patterns in
my scheme are positional filter patterns, which ultimately move the sensor. The
first level is colocated sensors: targets of input filters, and more coarse
actuators: targets of positional filters. I can think of two reasons they are
separated in the brain: neurons and axons are unidirectional, and training
process has to take the whole hierarchy offline. Neither constraint applies to
my scheme.
Final algorithm will consist of firstlevel operations +
recursive increment in operations per level. The latter is a metaalgorithm
that extends working levelalgorithm, to handle derivatives added to current inputs.
So, the levels are: 1st level: G(x), 2nd level: F(G)(x), 3rd level: F(F(G))(x)..,
where F() is the recursive code increment.
Resulting hierarchy is a pipeline: patterns are outputted
to the next level, forming a new level if there is none. Given novel inputs, higher
levels will discover longerrange spatiotemporal and then conceptual patterns.
Some notes:
 There should be a unique set of operations added per
level, hence a singular in “cognitive algorithm”.
 Core design must be done theoretically: generality
requires large upfront investment in process complexity, which makes it a huge
overkill for any specific task. That’s one reason why such schemes are not
explored.
 Many readers note disconnect between abstractions in
this outline, and the amount of detail in current code. That’s because we are
in spacetime continuum: search must follow proximity in each dimension, which
requires specific processing. It’s not specific to vision, the process is
mostly the same for all raw modalities.
 Another complaint is that I don't use mathematical
notation, but it simply doesn't have the flexibility to express deeply
conditional process, with recursively increasing complexity.
 Most people who aspire to work on AGI think in terms
behavior and robotics. I think this is far too coarse to make progress, the most
significant mechanisms are on the level of perception. Feedforward (perception)
must drive feedback (action), not the other way around.
 Other distractions are supervision and reinforcement.
These are optional taskspecific addons, core cognitive process is
unsupervised pattern discovery, and main problem here is scaling in complexity.
 Don’t even start me on chatbots.
Comparison to artificial and
biological neural networks
All unsupervised learning is some form of pattern
discovery, by input comparison and clustering. I do both laterally: among
inputs within a level, while in statistical learning they are vertical: between
layers of weighted summation. Weight adjustment from error in final comparison is
a soft clustering: modulated inclusion or exclusion of subsequent
inputs into next output. So, vertical weighted summation is primary to
comparison, which makes the comparands distant. This is a conceptual flaw: comparison
must follow proximity.
Neural Nets is a version of statistical learning, I think
it is best understood as centroid clustering (centroid
doesn’t have to be a single value, fitted line in linear regression can be
considered a onedimensional centroid). Basic ANN is a multilayer perceptron: each node weighs the inputs at synapses,
then sums and thresholds them into output. This normalized sum of inputs is
their centroid. Output of the top layer is compared to some template, forming
an error. Stochastic Gradient Descent then backpropagates the error, training
initially random weights into transformations (reversed vertical derivatives) that
reduce future error.
That usually means training CNN to perform some sort of
edgedetection or crosscorrelation (same as my comparison but the former terms
lose meaning on higher levels of search). But CNN operations are initially
random, while my process is designed for crosscomp from the start. This is why
it can be refined by my feedback, updating the filters, which far more subtle
and selective than training by backprop.
So, I have several problems with basic process in ANN:
 Vertical learning (via feedback of error) takes tens of
thousands of cycles to form accurate representations. That's because summation
per layer degrades positional input resolution. With each added layer, the
output that ultimately drives learning contains exponentially smaller fraction
of original information. Crosscomp and clustering is far more complex per
level, but the output contains all information of the input. Lossy selection is
only done on the next level, after evaluation per pattern (vs. before
evaluation in statistical methods).
 Both initial weights and sampling that feeds SGD are
randomized. Also driven by random variation are RBMs, GANs, VAEs, etc. But randomization
is antithetical to intelligence, it's only useful in statistical methods
because they merge inputs with weights irreversibly. Thus, any nonrandom
initialization and variation will introduce bias. All input modification in my
scheme is via hyperparameters, stored separately and then used to normalize
(remove bias) inputs for comparison to inputs formed with differentvalue hyperparameters.
 SGD minimizes error (toplayer miss), which is
quantitatively different from maximizing match: compression. And that error is
w.r.t. some specific template, while my match is summed over all past input /
experience. The “error” here is plural: lateral misses (differences, ratios,
etc.), computed by crosscomparison within a level. All inputs represent
environment and have positive value. But then they are packed (compressed) into
patterns, which have different range and precision, thus different relative
value per relatively fixed record cost.
 Representation in ANN is fully distributed, similar to
the brain. But the brain has no alternative: there is no substrate for local
memory or program in neurons. Computers have RAM, so parallelization is a
simple speed vs. efficiency tradeoff, useful only for complex semantically
isolated nodes. Such nodes are patterns, encapsulating a set of coderived
“what” and “where” parameters. This is similar to neural ensemble, but
parameters that are compared together should be localized in memory, not
distributed across a network.
More basic neural learning mechanism is Hebbian, though
it is rarely used in ML. Conventional spiking version is that weight is
increased if the synapse often receives a spike just before the node fires,
else the weight is decreased. But input and output don't have to be binary, the
same logic can be applied to scalar values: the weight is increased / decreased
in proportion to some measure of similarity between its input and following
output of the node. That output is normalized sum of all inputs, or their
centroid.
Such learning is local, within each node. But it’s still a
product of vertical comparison: centroid is higher order of composition than
individual inputs. This comparison across composition drives all statistical
learning, but it destroys positional information at each layer. Compared to
autoencoders: main backpropdriven unsupervised learning technique, Hebbian learning
lacks the decoding stage (as does the proposed algorithm). Decoding decomposes hidden
layers, to equalize composition orders of output and compared template.
Inspiration by the brain kept ANN research going for
decades before they became useful. Their “neurons” are mere stick figures, but
that’s not a problem, most of neuron’s complexity is due to constraints of
biology. The problem is that core mechanism in ANN, weighted summation, may
also be a nolonger needed compensation for such constraints: neural memory
requires dedicated connections. That makes representation and crosscomparison
of individual inputs very expensive, so they are summed. But again, we now have
dirtcheap RAM.
Other biological constraints are very slow neurons, and the
imperative of fast reaction for survival in the wild. Both favor fast though
crude summation, at the cost of glacial training. Reaction speed became less
important: modern society is quite secure, while continuous learning is far
more important because of accelerating progress. Summation also reduces noise,
which is very important for neurons that often fire at random, to initiate and
maintain latent connections. But that’s irrelevant for electronic circuits.
Evolution is extremely limited in complexity that can be
added before it is pruned by natural selection, I see no way it could produce
proposed algorithm. And that selection is for reproduction, while intelligence is distantly instrumental. The brain evolved to guide the
body, with neurons originating as instinctive stimulustoresponse converters.
Hence, both SGD and Hebbian learning is fitting, driven by feedback of actiontriggering
weighted input sum. Pattern discovery is their instrumental upshot, not an
original purpose.
Uri Hasson, Samuel Nastase, Ariel Goldstein reach a
similar conclusion in “Direct fit to nature: an evolutionary perspective on
biological and artificial neural networks”: “We argue that neural computation is grounded in
bruteforce direct fitting, which relies on overparameterized optimization
algorithms to increase predictive power (generalization) without explicitly
modeling the underlying generative structure of the world. Although
ANNs are indeed highly simplified models of BNNs, they belong to the same
family of overparameterized, directfit models, producing solutions that are
mistakenly interpreted in terms of elegant design principles but in fact
reflect the interdigitation of ‘‘mindless’’ optimization processes and the
structure of the world.”
Comparison to Capsule Networks
The nearest
experimentally successful method is recently introduced “capsules”. Some similarities to CogAlg:
 capsules also
output multivariate vectors, “encapsulating” several parameters, similar to my
patterns,
 these parameters
also include pose: coordinates and dimensions, compared to compute transformations,
 these transformations
are compared to find affine transformations or equivariance: my match of
misses,
 capsules also send
direct feedback to lower layer: dynamic routing, vs. transhiddenlayer
backprop in ANN.
My main problems
with CapsNet and alternative treatment:

Object
is defined as a recurring configuration of different parts. But such recurrence
can’t be assumed, it should be derived by crosscomparing relative position
among parts of matching objects. This can only be done after their positions
are crosscompared, which is after their objects are crosscompared: two levels
above the level that forms initial objects. So, objects formed by positional
equivariance would be secondary, though they may displace initial segmentation
objects as a primary representation. Stacked
Capsule Autoencoders also have exclusive
segmentation on the first layer, but proximity doesn’t matter on their higher
layers.

Routing
by agreement is basically recursive centroid clustering, by match of input
vector to the output vector. The output (centroid) represents inputs at all
locations, so its comparison to inputs is effectively mixeddistance. Thus,
clustering in CapsNet is fuzzy and discontinuous, forming redundant
representations. Routing by agreement reduces that redundancy, but not
consistently so, it doesn’t specifically account for it. My default clustering
is exclusive segmentation: each element (child) initially belongs to one
cluster (parent). Fuzzy clustering is selective to inputs valued above the cost
of adjusting for overlap in representation, which increases with the range of
crosscomparison. This conditional range increase is done on all levels of
composition.

Instantiation
parameters are applicationspecific, CapsNet has no general mechanism to derive
them. My general mechanism is crosscomparison of input capsule parameters,
which forms higherorder parameters. First level forms pixellevel gradient,
similar to edge detection in CNN. But then it forms proximityconstrained
clusters, defined by gradient and parameterized by summed pixel intensity, dy,
dx, gradient, angle. This crosscomparison followed by clustering is done on
all levels, with incremental number of parameters per input.

Number
of layers is fixed, while I think it should be incremental with experience. My
hierarchy is a dynamic pipeline: patterns are displaced from a level by
criterion sign change and sent to existing or new higher level. So, both
hierarchy of patterns per system and subhierarchy of derivatives per pattern
expand with experience. The derivatives are summed within a pattern, then
evaluated for extending intrapattern search and feedback.

Output
vector of higher capsules combines parameters of all lower layers into
Euclidean distance. That is my default too, but they should also be kept
separate, for potential crosscomp among layerwide representations.
Overall, CapsNet is
a variation of ANN, with input summation first and dynamic routing second. So,
it’s a type of Hebbian learning, with most of the problems that I listed in the
previous section.
Elaboration, parts 4 and below are out of
date:
0. Cognition vs. evolution, analog
vs. symbolic initial input
Some say intelligence can
be recognized but not defined. I think that’s absurd: we recognize some
implicit definition. Others define intelligence as a problemsolving ability,
but the only general problem is efficient search for solutions. Efficiency is a
function of selection among inputs, vs. bruteforce alltoall search. This
selection is by predicted value of the inputs, and prediction is interactive
projection of their patterns. Some agree that intelligence is all about pattern
discovery, but define pattern as a crude statistical coincidence.
Of course, the only
mechanism known to produce humanlevel intelligence is even cruder, and that
shows in haphazard construction of our brains. Algorithmically simple,
biological evolution alters heritable traits at random and selects those with
aboveaverage reproductive fitness. But this process requires almost
inconceivable computing power because selection is extremely coarse: on the
level of whole genome rather than individual traits, and also because
intelligence is only one of many factors in reproductive fitness.
Random variation in
evolutionary algorithms, generative RBMs, and so on, is antithetical to intelligence. Intelligent
variation must be driven by feedback within cognitive hierarchy: higher levels
are presumably “smarter” than lower ones. That is, higherlevel inputs
represent operations that formed them, and are evaluated to alter future
lowerlevel operations. Basic operations are comparison and summation among
inputs, defined by their range and resolution, analogous to reproduction in
genetic algorithms.
Range of comparison per
conservedresolution input should increase if projected match (cognitive
fitness function) exceeds average match per comparison. In any nonrandom
environment, average match declines with the distance between comparands. Thus,
search over increasing distance requires selection of above average
comparands. Any delay, coarseness, and inaccuracy of such selection is
multiplied at each search expansion, soon resulting in combinatorial explosion
of unproductive (low additive match) comparisons.
Hence, my model is strictly
incremental: search starts with minimalcomplexity inputs and expands with
minimal increments in their range and complexity (syntax). At each level, there
is only one best increment, projected to discover the greatest additive match.
No other AGI approach follows this principle.
I guess people who aim for
humanlevel intelligence are impatient with small increments and simple sensory
data. Yet, this is the most theoretical problem ever, demanding the longest
delay in gratification.
symbolic obsession and its
discontents
Current Machine Learning and related theories (AIT, Bayesian inference, etc.) are largely
statistical also because they were developed primarily for symbolic data. Such
data, precompressed and preselected by humans, is far more valuable than
sensory inputs it was ultimately derived from. But due to this selection and compression,
proximate symbols are not likely to match, and partial match between them is very
hard to quantify. Hence, symbolic data is a misleading initial target for
developing conceptually consistent algorithm.
Use of symbolic data as
initial inputs in AGI projects betrays profound misunderstanding of cognition.
Even children, predisposed to learn language, only become fluent after years of
directly observing things their parents talk about. Words are mere labels for
concepts, the most important of which are spatiotemporal patterns, generalized
from multimodal sensory experience. Topdown reconstruction of such patterns
solely from correlations among their labels should be exponentially more
difficult than their bottomup construction.
All our knowledge is
ultimately derived from senses, but lower levels of human perception are
unconscious. Only generalized concepts make it into our consciousness, AKA declarative memory, where we assign them
symbols (words) to facilitate communication. This brainspecific constraint
creates heavy symbolic vs. subsymbolic bias, especially strong in artificial
intelligentsia. Which is putting a cart in front of a horse: most words are meaningless
unless coupled with implicit representations of sensory patterns.
To be incrementally
selective, cognitive algorithm must exploit proximity first, which is only
productive for continuous and losstolerant raw sensory data. Symbolic
data is already compressed: consecutive characters and words in text won’t
match. It’s also encoded with distant crossreferences, that are hardly ever
explicit outside of a brain. Text looks quite random unless you know the code:
operations that generalized pixels into patterns (objects, processes,
concepts). That means any algorithm designed specifically for text will not be
consistently incremental in the range of search, which will impair its
scalability.
In Machine Learning, input
is string, frame, or video sequence of a defined length, with artificial
separation between training and inference. In my approach, learning is
continuous and interactive. Initial inputs are streamed pixels of maximal
resolution, and higherlevel inputs are multivariate patterns formed by
comparing lowerlevel inputs. Spatiotemporal range of inputs, and selective
search across them, is extended indefinitely. This expansion is directed by
higherlevel feedback, just as it is in human learning.
Everything ever written is
related to my subject, but nothing is close enough: not other method is meant
to be fully consistent. Hence a dire scarcity of references here. My approach
is selfcontained, it doesn’t require references. But it does require clean
context, hopefully cleanedup by reader‘s introspective generalization.
1. Atomic comparison: quantifying
match and miss between two variables
First, we need to quantify
predictive value. Algorithmic information theory defines it as
compressibility of representation, which is perfectly fine. But compression is
currently computed only for sequences of inputs, while I think a logical start
is analog input digitization: a rock bottom of organic compression hierarchy.
The next level is crosscomparison among resulting pixels, commonly known as
edge detection, and higher levels will crosscompare resulting patterns. Partial
match computed by comparison is a measure of compression.
Partial match between two
variables is a complementary of miss, in corresponding power of comparison:
 Boolean match is AND and miss is XOR (two zero inputs form zero
match and zero miss),
 comparison by subtraction
increases match to a smaller comparand and reduces miss to a difference,
 comparison by division
increases match to min * integer part of ratio and reduces miss to a fractional
part
(direct match works for
tactile input. but reflectedlight in vision requires inverse definition of
initial match)
In other words, match is a
compression of larger comparand’s magnitude by replacing it with miss. Which
means that match = smaller input: a common subset of both inputs, = sum of AND between their uncompressed
(unary code) representations. Ultimate criterion is recorded magnitude, rather
than bits of memory it occupies, because the former represents physical impact
that we want to predict. The volume of memory used to record that magnitude
depends on prior compression, which is not an objective parameter.
Some may object that match
includes the case when both inputs equal zero, but then match should also be
zero. The purpose here is prediction, which represents conservation of some physical
property of observed objects. Ultimately, we’re predicting potential impact on
observer, represented by input. Zero input means zero impact, which has no
conservable property (inertia), thus no intrinsic predictive value.
Given incremental
complexity, initial inputs should have binary resolution and implicit
coordinate (which is a macroparameter, so its resolution lags that of an
input). Compression of bit inputs by AND is well known as digitization:
substitution of two lower 1 bits with one higher 1 bit. Resolution of coordinate
(input summation span) is adjusted by feedback to form integers that are large
enough to produce aboveaverage match.
Nextorder compression is
comparison between consecutive integers, with binary (before  after)
coordinate.
Additive match is achieved
by comparison of a higher power than that which produced comparands: AND will
not further compress integers digitized by AND. Rather, initial comparison
between integers is by subtraction, resulting difference is miss, and smaller
input is absolute match. Compression of represented magnitude is by replacing
i1, i2 with their derivatives: match (min) and miss (difference). If we sum
each pair:
inputs: 5 + 7 > 12, derivatives:
match = 5 + miss = 2 > 7. Compression by replacing = match: 12  7 > 5.
Difference is smaller than XOR (nonzero complementary of AND) because XOR may
include oppositesign (oppositedirection) bit pairs 0, 1 and 1, 0, which are
cancelledout by subtraction.
Comparison by division
forms ratio, which is a compressed difference. This
compression is explicit in long division: match is accumulated over iterative
subtraction of smaller comparand from remaining difference. In other words,
this is also a comparison by subtraction, but between different orders of
derivation. Resulting match is smaller comparand * integer part of ratio, and miss is final reminder or
fractional part of ratio. The ratio can be further
compressed by converting it to radix or logarithm, and so on.
By reducing miss, higherpower
comparison increases complementary match (match = larger input  miss):
to be compressed: larger
input  XOR  difference: combined currentorder match &
miss
additive match:
AND

oppositesign XOR  multiple: of a smaller input within a
difference
remaining miss: XOR

difference 
fraction: complementary to multiple
within a ratio
But the costs of operations
and incidental sign, fraction, irrational fraction, etc. may grow even faster.
To justify the costs, the power of comparison should only increase in patterns of
aboveaverage match from prior order of comparison: AND for bit inputs, SUB for
integer inputs, DIV for pattern inputs, etc. Inclusion into such patterns is by relative match: match  ave: past
match that cooccurs with average higherlevel match.
Match value should be weighted
by the correlation between input intensity and its stability: mass / energy /
hardness of an observed object. Initial
input, such as reflected light, is likely to be incidental: such correlation is
very low. Since match is the magnitude of smaller input, its weight should also
be low if not zero. In this case projected match consists mainly of its inverse
component: match cancellation by coderived miss, see below.
The
above discussion is on match from current comparison, but we really want to
know projected match to future or distant inputs. That means the value of match
needs to be projected by coderived miss. In comparison by subtraction,
projected match = min (i1, i2) * weight (fractional)  difference (i1, i2) / 2 (divide
by 2 because the difference only reduces projected input, thus min( input,
projected input), in the direction in which it is negative. It doesn’t affect
min in the direction where projected input is increasing).
quantifying lossy compression
There is a general
agreement that compression is a measure of similarity, but no one seems to apply
it from the bottom up, the bottom being single scalars. Also, any significant
compression must be lossy. This is currently evaluated by perceived similarity of
reconstructed input to the original input, as well as compression rate. Which is
very coarse and subjective. Compression in my level of search is lossless, represented
by match on all levels of pattern. All derived representations are redundant, so
it’s really an expansion vs. compression overall.
The lossy part comes after
evaluation of resulting patterns on the next level of search. Top level of
patterns is crosscompared by default, evaluation is per lower level: of incremental
derivation and detail in each pattern. Loss is when lowrelativematch buffered
inputs or alternative derivatives are not crosscompared. Such loss is
quantified as the scope * resolution of representation in these lower levels,
not some subjective quality.
2. Forward search and patterns,
implementation for image recognition in video
Pattern is a contiguous
span of inputs that form aboveaverage matches, similar to conventional cluster.
As explained above, matches
and misses (derivatives) are produced by comparing consecutive inputs. These
derivatives are summed within a pattern and then compared between patterns on
the next level of search, adding new derivatives to a higher pattern. Patterns
are defined contiguously on each level, but positive and negative patterns are
always interlaced, thus nextlevel samesign comparison is discontinuous.
Negative patterns represent
contrast or discontinuity between positive patterns, which is a one or higher
dimensional equivalent of difference between zerodimensional pixels. As with
differences, projection of a negative pattern competes with projection of
adjacent positive pattern. But match and difference are derived from the same
input pair, while positive and negative patterns represent separate spans of
inputs.
Negative match patterns are
not predictive on its own but are valuable for allocation: computational
resources of nolonger predictive pattern should be used elsewhere. Hence, the
value of negative pattern is borrowed from predictive value of coprojected
positive pattern, as long as combined additive match remains above average. Consecutive
positive and negative patterns project over same future input span, and these
projections partly cancel each other. So, they should be combined to form
feedback, as explained in part 3.
Initial match is evaluated
for inclusion into higher positive or negative pattern. The value is summed
until its sign changes, and if positive, evaluated again for crosscomparison
among constituent inputs over increased distance. Second evaluation is
necessary because the cost of incremental syntax generated by crosscomparing
is per pattern rather than per input. Pattern is terminated and outputted to
the next level when value sign changes. On the next level, it is compared to
previous patterns of the same compositional order.
Initial inputs are pixels
of video, or equivalent limit of positional resolution in other modalities.
Hierarchical search on higher levels should discover patterns representing
empirical objects and processes, and then relational logical and mathematical
shortcuts, eventually exceeding generality of our semantic concepts.
In cognitive terms,
everything we know is a pattern, the rest of input is noise, filtered out by
perception. For online learning, all levels should receive inputs from lower
levels and feedback from higher levels in parallel.
spacetime dimensionality
and initial implementation
Any prediction has two
components: what and where. We must have both: value of prediction = precision
of what * precision of where. That “where” is currently neglected: statistical
ML represents spacetime at greatly reduced resolution, if at all. In the brain
and some neuromorphic models, “where” is represented in a separate
network. That makes transfer of positional information very expensive and
coarse, reducing predictive value of representations. There is no such separation
in my patterns, they represent both what and where as local vars.
My core algorithm is 1D:
time only (part 4). Our spacetime is 4D, but each of these dimensions
can be mapped on one level of search. This way, levels can select input
patterns that are strong enough to justify the cost of representing additional
dimension, as well as derivatives (matches and differences) in that dimension.
Initial 4D cycle of search
would compare contiguous inputs, similarly to connectedcomponent analysis:
level 1 compares
consecutive 0D pixels within horizontal scan line, forming 1D patterns: line
segments.
level 2 compares contiguous
1D patterns between consecutive lines in a frame, forming 2D patterns: blobs.
level 3 compares contiguous
2D patterns between incrementaldepth frames, forming 3D patterns: objects.
level 4 compares contiguous
3D patterns in temporal sequence, forming 4D patterns: processes.
(in simple video, time is added on level 3
and depth is computed from derivatives)
Subsequent cycles would
compare 4D input patterns over increasing distance in each dimension, forming
longerrange discontinuous patterns. These cycles can be coded as
implementation shortcut, or form by feedback of core algorithm itself, which should
be able to discover maximal dimensionality of inputs. “Dimension” here is
parameter that defines external sequence and distance among inputs. This is
different from conventional clustering, were both external and internal
parameters are dimensions. More in part 6.
However, average match at a given distance in
our spacetime is presumably equal over all four dimensions. That means
patterns defined in fewer dimensions will be fundamentally limited and biased
by the angle of scanning. Hence, initial pixel comparison and clustering into
patterns should also be over 4D at once, or at least over 2D for images and 3D
for video. This is ouruniversespecific extension of my core algorithm.
There
is also a visionspecific adaptation in the way I define initial match. Predictive
visual property is albedo, which means locally stable ratio of brightness /
intensity. Since lighting is usually uniform over much larger area than pixel,
the difference in brightness between adjacent pixels should also be stable. Relative
brightness indicates some underlying property, so it should be crosscompared
to form patterns. But it’s reflected: doesn’t really represent physical quantity /
density of an object. Thus, initial match
is inverse deviation of gradient.
We
are currently coding 1^{st} level algorithm: https://github.com/boriskz/CogAlg/wiki. 1D code is complete, but not
macrorecursive. We are extending it to 2D for image recognition, then to 3D
video for object and process recognition. Higher levels for each Dcycle algorithm
will process discontinuous search among fullD patterns. Complete hierarchical (metalevel)
algorithm will consist of:

1st level algorithm: contiguous crosscomparison over fullD cycle, plus bitfilter
feedback

recurrent increment in complexity, extending currentlevel alg to nextlevel
alg. It will unfold increasingly complex higherlevel input patterns for
crosscomparison, then combine results for evaluation and feedback.
We
will then add colors, maybe audio and text. Initial testing could be
recognition of labeled images, but 2D is a poor representation of our 4D world,
video or stereo video is far better. Variation across space is a product of
past interactions, thus predictive of variation over time (which is normally
lower: we can’t speedup time).
3. Feedback filters, attentional
input selection, imagination, motor action
After evaluation for
inclusion into higherlevel pattern, the input is also accumulated into
feedback to lower levels. Feedback is update to filters that evaluate forward (Λ) and feedback (V), as described above but on
lower level. Feedback value = absolute value of summed input parameter 
filterfilter (opportunity cost of filter update). Default feedback is combined
levelsequentially, while more expensive shortcut feedback may be sent to
selected levels to filter inputs that are already in the pipeline, or to
rearrange levels in the hierarchy.
There is internal filter
for each compared variable of an input, and external filter per coordinate in
which the inputs are ordered. Basic internal filter is average projected match
that cooccurs with (predicts) average higherlevel match, and basic external filter
is a distance to the next input of projected average value. Thus, coordinate
filter is a span of inputs skipped because they are projected to be either too
predictable or too noisy to bother with. External filters have lower resolution
/ higher scope at the same order of quantization.
Both input and coordinate
filters discussed above are integers, but they can be of any order of
quantization. Binary filters are the least and the most significant bits of
input value and coordinate (input summation span). For coordinate filter, LSB
is pixel size and MSB is frame size. These filters are adjusted to balance
overflow and underflow. Then there are higherthaninteger filters: ratios or
coefficients, AKA weights, and so on. They adjust magnitude per input variable
type, in proportion to relative higherlevel match of these variables.
Lower filters are min
values for input inclusion in highercomposition inputs, and upper filters are max
values that trigger higherinput termination: bit > pixel > pattern
> pattern_of_patterns (code starts
from pixels).
The number of updateable filters
will increase with elevation:
1^{st} level may
update only:
 value bit filters: lower: LSB, upper: word size > MSB, and
 coord lower bit filter:
LSB, which is a pixel size
2^{nd} level may
also update:
 coord upper bit filters,
such as frame dimensions > coordinate MSB, causing premature P termination
 value integer filters:
lower: average match, upper: max Match > premature average match feedback
3^{rd} level may
add:
coord lower integer filter:
starting coordinate (next C or skipto distance), and
coord upper integer filter:
max next C > premature nextC feedback
Etc.
novelty vs. generality
Any system must have a common
fitness function or selection criterion. Two obvious criteria in cognition are
novelty and generality: miss and match. But we can’t select for both, they
exhaust all possibilities. Novelty can’t be primary criterion: it would select
for noise and filter out all patterns, which are defined by match. On the other
hand, to maximize match of inputs to memory we can stare at a wall: lock into
predictable environments. But of course, natural curiosity actively skips
predictable locations, thus reducing the match.
This dilemma is resolved if
we maximize predictive power: projected vs. confirmed match of inputs to
records (all records are predictions, else they are forgotten). To the extent
that new match is predictable, it doesn’t add to total projected match of the
model. But neither does noise: novelty (difference from records) of inputs that
won’t match in the future. So, match is positive in feedforward but negative in
feedback: the sign is reversed with direction. Projected match is the same as
compression, which includes skipping lowvalue input spans.
We can see this in
individual derivatives:
 higherlevel match is
specific to past inputs, thus it’s a filter for future inputs, projected from
the past.
 higherhigher level match
of a match is more detached from specific inputs, thus less accurate as a
filter.
On the opposite, it
projects higher match among future inputs, independently from their match to
past inputs.
And so on, higher
derivation orders of match are increasingly positive (less filtering) for
future inputs.
So, selection for novelty
is done by subtracting higherlevel projection from corresponding input parameter.
Higherorder positional selection is skipping (or avoiding processing) predictable
future input spans. Skipped input span is formally a *coordinate* filter feedback:
next coordinate of inputs with expected aboveaverage *additive* predictive value.
Thus, next input location is selected first by proximity and then by novelty,
both relative to a template comparand. This is covered in more detail in part
4, level 3.
Vertical evaluation
computes deviations, to form positive or negative higherlevel patterns. Evaluation
is relative to higherlevel averages, which represent past inputs, thus should
be projected over feedback delay: average += average
difference * (delay / average span) /2. Average per input variable may also be
a feedback, representing redundancy to higher level, which also depends on higherlevel
match rate: rM = match / input.
If rM > average per cost
of processing: additive match = input match  inputtoaverage match * rM.
Lateral comparison computes
differences, to project corresponding parameters of all derivation orders:
difference in magnitude of
initial inputs: projected next input = last input + difference/2,
difference in input match,
a subset of magnitude: projected next match = last match + match difference/2,
difference in match of
match, a subsubset of magnitude, projected correspondingly, and so on.
Ultimate criterion is top
order of match on a top level of search: the most predictive parameter in a
system.
imagination, planning,
action
Imagination is never truly original,
it can only be formalized as interactive projection of known patterns. As
explained above, patterns send feedback to filter lowerlevel sources. This feedback
is to future sources, where the patterns are projected to continue or reoccur.
Stronger upstream patterns and correspondingly higher filters reduce resolution
of or totally skip predictable input spans. But when multiple originally
distant patterns are projected into the same location, their feedback cancels
out in proportion to their relative difference.
In other words, combined
filter is cancelledout to the extent that coprojected patterns are mutually
exclusive:
filter = max_pattern_feedback
 alt_pattern_feedback * match_rate. By default, match_rate used here is
average (match / max_comparand). But it has average error: average abs(match_rate
 average_match_rate). To improve filter accuracy, we can derive actual match
rate by crosscomparing coprojected patterns. I think imagination is just that:
search across coprojected patterns, before accessing their external target
sources.
Patterns are projected in
space and time, depending on their past ST span and a vector of input
derivatives over that span. So, pattern input parameters in some future
location can be projected as:
(recorded input parameters)
+ (corresponding derivatives * relative distance) / 2.
Where relative distance = (projected
coords  current coords) / span of the pattern in the same direction.
Any search is defined by
location: contiguous coordinate span. Span of feedback target is that of
feedback source’ input pattern: narrower than the span of feedback source unit’
output pattern. So, search across coprojected patterns is performed on a
conceptually lower level, but patterns themselves belong to higher level. Meaning
that search will be within intersection
of coprojected patterns, vs. whole patterns. Intersection is a location within
each of the patterns, and crosscomparison will be among pattern elements in
that location.
Combined filter is then prevaluated:
projected value of positive patterns is compared to the cost of evaluating all
inputs, both within a target location. If prevalue is negative: projected
inputs are not worth evaluating, their location is skipped and “imagination” moves
to the next nearest one. Filter search continues until prevalue turns positive
(with aboveaverage novelty) and the sensor is moved that location. This sensor
movement, along with adjustment of its threshold, is the most basic form of
motor feedback, AKA action.
Cognitive component of
action is planning: a form of imagination where projected patterns include
those that represent the system itself. Feedback of such selfpatterns
eventually reaches the bottom of representational hierarchy: sensors and
actuators, adjusting their sensitivity  intensity and coordinates. This adjustment
is action. Such environmental interface is a part of any cognitive system,
although actuators are optional.
4. Initial levels of search and
corresponding orders of feedback (fine to skip)
This part recapitulates and
expands on my core algorithm, which operates in one dimension: time only.
Spatial and derived dimensions are covered in part 6. Even within 1D, the search
is hierarchical in scope, containing any number of levels. New level is added
when current top level terminates and outputs the pattern it formed.
Higherlevel patterns are
fed back to select future inputs on lower levels. Feedback is sent to all lower
levels because span of each pattern approximates combined span of inputs within
whole hierarchy below it.
So, deeper hierarchy forms
higher orders of feedback, with increasing elevation and scope relative to its
target: samelevel prior input, higherlevel match average,
beyondthenextlevel match value average, etc.
These orders of feedback
represent corresponding order of input compression: input, match between
inputs, match between matches, etc. Such compression is produced by comparing
inputs to feedback of all orders.
Comparisons form patterns, of
the order that corresponds to relative span of compared feedback:
1: prior inputs are compared to the following ones on the
same level, forming difference patterns dPs,
2: higherlevel match is used to evaluate match between
inputs, forming deviation patterns vPs,
3: higherhierarchy value revaluates positive values of
match, forming more selective shortcut patterns sPs
Feedback of 2^{nd} order consists of input
filters (if) defining value patterns, and coordinate
filters (Cf) defining positional resolution and relative
distance to future inputs.
Feedback of 3^{rd} order is shortcut filters
for beyondthenext level. These filters, sent to a location defined by
attached coordinate filters, form higherorder value patterns for deeper
internal and distantlevel comparison.
Higherorder patterns are
more selective: difference is as likely to be positive as negative, while value
is far more likely to be negative, because positive patterns add costs of
reevaluation for extended crosscomparison among their inputs. And so on, with
selection and reevaluation for each higher order of positive patterns.
Negative patterns are still compared as a whole: their weak match is compensated
by greater span.
All orders of patterns
formed on the same level are redundant representations of the same inputs.
Patterns contain representation of match between their inputs, which are
compared by higherorder operations. Such operations increase overall match by
combining results of lowerorder comparisons across pattern’s variables:
0Le: AND of bit inputs to form digitized integers, containing
multiple powers of two
1Le: SUB of integers to form patterns, over additional
external dimensions = pattern length L
2Le: DIV of multiples (L) to form ratio patterns,
over additional distances = negative pattern length LL
3Le: LOG of powers (LLs), etc. Starting from
second level, comparison is selective per element of an
input.
Such power increase also applies
in comparison to higherorder feedback, with a lag of one level per order.
Power of coordinate filters
also lags the power of input filters by one level:
1Le fb: binary sensor resolution: minimal and maximal detectable
input value and coordinate increments
2Le fb: integervalued average match and relative initial
coordinate (skipping intermediate coordinates)
3Le fb: rationalvalued coefficient per variable and multiple
skipped coordinate range
4Le fb: realvalued coefficients and multiple coordinaterange
skip
I am defining initial
levels to find recurring increments in operations per level, which could then
be applied to generate higher levels recursively, by incrementing syntax of
output patterns and of feedback filters per level.
Operations per generic level (out of date)
Level 0 digitizes inputs,
filtered by minimal detectable magnitude: least significant bit (i LSB). These bits are AND
compared, then their matches are AND compared again, and so on, forming
integer outputs. This is identical to iterative summation and bitfiltering by
sequentially doubled i LSB.
Level 1 compares
consecutive integers, forming ± difference patterns (dP s). dP s are then evaluated to crosscompare their
individual differences, and so on, selectively increasing derivation of
patterns.
Evaluation: dP M (summed match)  dP aM (dP M per average match between
differences in level 2 inputs).
Integers are limited by the
number of digits (#b), and input span: least significant bit of coordinate (C LSB).
No 1^{st} level feedback: fL cost is additive to dP cost, thus must be
justified by the value of dP (and coincident difference in value of patterns
filtered by adjusted i LSB), which is not known till dP is outputted to 2^{nd} level.
Level 2 evaluates match
within dP s  bf L (dP) s, forming ± value patterns: vP s  vP (bf L) s. +vP s are evaluated for
crosscomparison of their dP s, then of resulting derivatives, then of inputted
derivation levels. +vP (bf L) s are evaluated to crosscompare bf L s, then dP s, adjusted by the
difference between their bit filters, and so on.
dP variables are compared
by subtraction, then resulting matches are combined with dP M (match within dP) to
evaluate these variables for crosscomparison by division, to normalize for the
difference in their span.
// match filter is also
normalized by span ratio before evaluation, samepower evaluation and
comparison?
Feedback: input dP s  bf L (dP) are backprojected and
resulting magnitude is evaluated to increment or decrement 0^{th} level i LSB. Such increments
terminate bitfilter span ( bf L (dP)), output it to 2^{nd} level, and initiate a new i LSB span to filter future
inputs. // bf L (dP) representation: bf , #dP, Σ dP, Q (dP).
Level 3 evaluates match in
input vP s or f L (vP) s, forming ± evaluationvalue patterns:
eP s  eP (fL) s. Positive eP s are evaluated for
crosscomparison of their vP s ( dP s ( derivatives ( derivation levels ( lower searchlevel
sources: buffered or external locations (selected sources may directly specify
strong 3^{rd} level subpatterns).
Feedback: input vP is
backprojected, resulting match is compared to 2^{nd} level filter, and the
difference is evaluated vs. filterupdate filter. If update value is positive,
the difference is added to 2^{nd} level filter, and filter span is terminated.
Same for adjustment of previously covered bit filters and 2^{nd} level filterupdate
filters?
This is similar to 2^{nd} level operations, but
input vP s are separated by
skippedinput spans. These spans are a filter of coordinate (Cf, higherorder than f for 2^{nd} level inputs), produced by
prevaluation of future inputs:
projected novel match =
projected magnitude * average match per magnitude  projectedinput match?
Prevalue is then evaluated
vs. 3^{rd} level evaluation filter +
lowerlevel processing cost, and negative prevaluevalue input span (= span of
backprojecting input) is skipped: its inputs are not processed on lower
levels.
// no prevaluation on 2^{nd} level: the cost is higher
than potential savings of only 1^{st} level processing costs?
As distinct from input
filters, Cf is defined individually rather than per filter span. This is
because the cost of Cf update: span representation and interruption of
processing on all lower levels, is minor compared to the value of represented
contents? ±eP = ±Cf: individual skip evaluation, no flushing?
or interruption is
predetermined, as with Cb, fixed C f within C f L: a span of sampling across fixedL gaps?
alternating signed Cf s are averaged ±vP s?
Division: between L s, also inputs within
minimaldepth continuous dsign or morder derivation hierarchy?
tentative generalizations
and extrapolations
So, filter resolution is
increased per level, first for i filters and then for C filters: level 0 has
input bit filter,
level 1 adds coordinate bit
filter, level 2 adds input integer filter, level 3 adds coordinate integer
filter.
// coordinate filters (Cb, Cf) are not inputspecific,
patterns are formed by comparing their contents.
Level 4 adds input multiple
filter: eP match and its derivatives,
applied in parallel to corresponding variables of input pattern. Variablevalues
are multiplied and evaluated to form patternvalue, for inclusion into
nextlevel ±pattern // if separately evaluated, inputvariable value =
deviation from average: signreversed match?
Level 5 adds coordinate
multiple filter: a sequence of skippedinput spans by iteratively projected
patterns, as described in imagination section of part 3. Alternatively,
negative coordinate filters implement crosslevel shortcuts, described in level
3 subpart, which select for projected matchassociated novelty.
Additional variables in
positive patterns increase cost, which decreases positive vs. negative span
proportion.
Increased difference in
sign, syntax, span, etc., also reduces match between positive and negative
patterns. So, comparison, evaluation, prevaluation... on higher levels is
primarily for samesign patterns.
Consecutive differentsign
patterns are compared due to their proximity, forming ratios of their span and
other variables. These ratios are applied to project match across differentsign
gap or contrast pattern:
projected match +=
(projected match  intervening negative match) * (negative value / positive
value) / 2?
ΛV selection is incremented by induction: forward and
feedback of actual inputs, or by deduction: algebraic compression of input
syntax, to find computational shortcuts. Deduction is faster, but actual inputs
also carry empirical information. Relative value of additive information vs.
computational shortcuts is set by feedback.
following
parts cover three initial levels in more detail, but mostly out of date:
Level 1: comparison to past
inputs, forming difference patterns and match patterns
Inputs to the 1^{st} level of search are single
integers, representing pixels of 1D scan line across an image, or equivalents
from other modalities. Consecutive inputs are compared to form differences,
difference patterns, matches, relative match patterns. This comparison may be
extended, forming higher and distant derivatives:
resulting variables per
input: *=2 derivatives (d,m) per comp, + conditional *=2 (xd, xi) per extended
comp:
8 derivatives
// ddd, mdd, dd_i, md_i, + 1inputdistant dxd, mxd, +
2inputdistant d_ii, m_ii,
/
\
4 der
4 der // 2 consecutive: dd, md, + 2
derivatives between 1inputdistant inputs: d_i and m_i,
/
\ / \
d,m d,m
d,m // d, m: derivatives from default comparison
between consecutive inputs,
/ \ /
\ / \
i >> i
>> i >> i // i:
singlevariable inputs.
This is explained /
implemented in my draft python code: line_patterns. That first level is for
generic 1D cognitive algorithm, its adaptation for image and then video
recognition algorithm will be natively 2D.
That’s what I spend most of
my time on, the rest of this intro is significantly out of date.
bitfiltering and
digitization
1^{st} level inputs are filtered
by the value of most and least significant bits: maximal and minimal detectable
magnitude of inputs. Maximum is a magnitude that cooccurs with average 1^{st} level match, projected by
outputted dP s. Least significant bit
value is determined by maximal value and number of bits per variable.
This bit filter is
initially adjusted by overflow in 1^{st} level inputs, or by a set number of
consecutive overflows.
It’s also adjusted by
feedback of higherlevel patterns, if they project over or under flow of 1^{st} level inputs that exceeds
the cost of adjustment. Underflow is average number of 0 bits above top 1 bit.
Original input resolution
may be increased by projecting analog magnification, by impact or by distance.
Iterative bitfiltering is
digitization: bit is doubled per higher digit, and exceeding summed input is
transferred to next digit. A digit can be larger than binary if the cost of
such filtering requires larger carry.
Digitization is the most
basic way of compressing inputs, followed by comparison between resulting
integers.
hypothetical: comparable
magnitude filter, to form minimalmagnitude patterns
This doesn’t apply to
reflected brightness, only to types of input that do represent physical
quantity of a source.
Initial magnitude justifies
basic comparison, and summation of belowaverage inputs only compensates for
their lower magnitude, not for the cost of conversion. Conversion involves
higherpower comparison, which must be justified by higher order of match, to
be discovered on higher levels.
iP min mag span conversion
cost and comparison match would be on 2^{nd} level, but it’s not justified by 1^{st} level match, unlike D span
conversion cost and comparison match, so it is effectively the 1^{st} level of comparison?
possible +iP span
evaluation: double evaluation + span representation cost < additional
lowerbits match?
The inputs may be
normalized by subtracting feedback of average magnitude, forming ± deviation,
then by dividing it by next+1 level feedback, forming a multiple of average
absolute deviation, and so on. Additive value of input is a combination of all
deviation orders, starting with 0^{th} or absolute magnitude.
Initial input evaluation if
any filter: cost < gain: projected negativevalue (comparison cost 
positive value):
by minimal magnitude > ± relative magnitude patterns
(iP s), and + iP s are evaluated or crosscompared?
or by average magnitude
> ± deviations, then by
coaverage deviation: ultimate bit filter?
Summation *may* compensate
for conversion if its span is greater than average per magnitude spectrum?!
Summation on higher levels
also increases span order, but withinorder conversion is the same, and
betweenorder comparison is intrapattern only. bf spans overlap vP span,
> filter conversion costs?
Level 2: additional evaluation of
input patterns for feedback, forming filter patterns (out of date)
Inputs to 2^{nd} level of search are
patterns derived on 1^{st} level. These inputs are evaluated for feedback to update
0^{th} level i LSB, terminating
samefilter span.
Feedback increment of LSB
is evaluated by deviation (∆) of magnitude, to avoid input overflow or underflow:
∆ += I/ L  LSB a; ∆ > ff? while (∆ > LSB a){ LSB ±; ∆ = LSB a; LSB a *2};
LSB a is average input (* V/ L?) per LSB value, and ff
is average deviation per positivevalue increment;
Σ (∆) before evaluation: no V patterns? #b++ and C LSB are more expensive,
evaluated on 3^{rd} level?
They are also compared to
previously inputted patterns, forming difference patterns dPs and value
patterns vPs per input variable, then combined into dPP s and vPP s per input pattern.
L * sign of consecutive dP s is a known miss, and
match of dP variables is correlated by common derivation.
Hence, projected match of
other +dP and dP variables = amk * (1  L / dP). On the other hand,
samesign dP s are distant by L,
reducing projected match by amk * L, which is equal to reduction by miss of L?
So, dP evaluation is for
two comparisons of equal value: crosssign, then cross L samesign (1 dP evaluation is blocked by
feedback of discovered or defined alternating sign and covariable match
projection).
Both of last dP s will be compared to the
next one, thus past match per dP (dP M) is summed for three dP s:
dP M ( Σ ( last 3 dP s L+M))  a dP M (average of 4Le +vP dP M) > v, vs;; evaluation / 3 dP s > value, sign / 1 dP.
while (vs = ovs){ ovs = vs; V+=v; vL++; vP (L, I, M, D) += dP (L, I, M, D);; default vP  wide sum, select preserv.
vs > 0? comp (3 dP s){ DIV (L, I, M, D) > N, ( n, f, m, d); vP (N, F, M, D) += n, f, m, d;; sum: der / variable, n / input?
vr = v+ N? SUB (nf) > nf m; vd = vr+ nf m, vds = vd  a;; ratios are too small
for DIV?
while (vds = ovds){ ovds = vds; Vd+=vd; vdL++; vdP() += Q (d  ddP);; default Q (d  ddP) sum., select. preserv.
vds > 0? comp (1^{st} x l^{st} d  ddP s of Q (d) s);; splicing Q (d) s of matching dP s, cont. only: no comp ( Σ Q (d  ddP)?
Σ vP ( Σ vd P eval: primary for P,
redundant to individual dP s ( d s for +P, cost *2, same for +P' I and P' M,D?
no Σ V  Vd evaluation of cont. comp
per variable or division: cost + vL = comp cost? Σ V per fb: no vL, #comp;
 L, I, M, D: same value per mag,
power / compression, but I  M, D redund = mag, +vP: I  2a,  vP: M, D  2a?
 no variable eval: cost (sub + vL + filter) > comp cost, but
match value must be adjusted for redundancy?
 normalization for
comparison: min (I, M, D) * rL, SUB (I, M, D)? Σ L (pat) vs C: more general but
interrupted?
variablelength DIV: while (i > a){ while (i> m){ SUB (i, m) > d; n++; i=d;}; m/=2; t=m; SUB (d, t); f+= d;}?
additive compression per d
vs. m*d: > length cost?
tdP ( tM, tD, dP(), ddP Σ ( dMΣ (Q (dM)), dDΣ (Q (dD)), ddLΣ (Q (ddL)), Q (ddP))); // last d and D are within
dP()?
Input filter is a
higherlevel average, while filter update is accumulated over multiple
higherlevel spans until it exceeds filterupdate filter. So, filter update is
2^{nd} order feedback relative to
filter, as is filter relative to match.
But the same filter update
is 3^{rd} order of feedback
when used to evaluate input value for inclusion into pattern defined by a
previous filter: update span is two orders higher than value span.
Higherlevel comparison
between patterns formed by different filters is mediated, vs. immediate
continuation of currentlevel comparison across filter update (mediated cont.:
splicing between differentfilter patterns by vertical specification of match,
although it includes lateral crosscomparison of skipdistant specifications).
However, filter update
feedback is periodic, so it doesn’t form continuous crossfilter comparison
patterns xPs.
adjustment of forward
evaluation by optional feedback of projected input
More precisely, additive
value or novel magnitude of an input is its deviation from higherlevel
average. Deviation = input  expectation: (higherlevel summed input  summed
difference /2) * rL (L / hL).
Inputs are compared to last
input to form difference, and to past average to form deviation or novelty.
But last input is more
predictive of the next one than a more distant average, thus the latter is
compared on higher level than the former. So, input variable is compared
sequentially and summed within resulting patterns. On the next level, the sum
is compared vertically: to nextnextlevel average of the same variable.
Resulting vertical match
defines novel value for higherlevel sequential comparison:
novel value = past match 
(vertical match * higherlevel match rate)  average novel match:
nv = L+M  (m (I, (hI * rL)) * hM / hL)  hnM * rL; more precise than
initial value: v = L+M  hM * rL;
Novelty evaluation is done
if higherlevel match > cost of feedback and operations, separately for I
and D P s:
I, M ( D, M feedback, vertical SUB (I, nM ( D, ndM));
Impact on ambient sensor is
separate from novelty and is predicted by representationalvalue patterns?
 nextinput prediction: seq
match + vert match * relative rate, but predictive selection is per level, not
input.
 higherorder expectation is
relative match per variable: pMd = D * rM, M/D, or D * rMd: Md/D,
 if rM  rMd are derived by
intrapattern comparison, when average M  Md > average per division?
oneinput search extension
within crosscompared patterns
Match decreases with
distance, so initial comparison is between consecutive inputs. Resulting match
is evaluated, forming ±vP s. Positive P s are then evaluated for expanded internal
search: crosscomparison among 1inputdistant inputs within a pattern (on same
level, higherlevel search is between new patterns).
This cycle repeats to
evaluate crosscomparison among 2inputdistant inputs, 3inputdistant inputs,
etc., when summed currentdistance match exceeds the average per evaluation.
So, patterns of longer
crosscomparison range are nested within selected positive patterns of shorter
range. This is similar to 1^{st} level ddP s being nested within dP s.
Same input is reevaluated
for comparison at increased distance because match will decay: projected match
= last match * match rate (mr), * (higherlevel mr / currentlevel mr) *
(higherlevel distance / next distance)?
Or = input * average match rate
for that specific distance, including projected match within negative patterns.
It is reevaluated also
because projected match is adjusted by past match: mr *= past mr / past projected
mr?
Also, multiple comparisons
per input form overlapping and redundant patterns (similar to fuzzy clusters),
and must be evaluated vs.
filter * number of prior comparisons, reducing value of projected match.
Instead of directly
comparing incrementally distant input pairs, we can calculate their difference
by adding intermediate differences. This would obviate multiple access to the
same inputs during crosscomparison.
These differences are also
subtracted (compared), forming higher derivatives and matches:
ddd, x1dd, x2d ( ddd: 3^{rd} derivative, x1dd: d of 2inputdistant
d s, x2d: d of 2inputdistant
inputs)
/ \
dd, x1d dd, x1d ( dd: 2^{nd} derivative, x1d = d+d =
difference between 1inputdistant inputs)
/
\ /
\
d d d ( d:
difference between consecutive inputs)
/ \ /
\ / \
i
i
i i
(
i: initial inputs)
As always, match is a
smaller input, cached or restored, selected by the sign of a difference.
Comparison of both types is
between all sametype variable pairs from different inputs.
Total match includes match
of all its derivation orders, which will overlap for proximate inputs.
Incremental cost of
crosscomparison is the same for all derivation orders. If projected match is
equal to projected miss, then additive value for different orders of the same
inputs is also the same: reduction in projected magnitude of differences will
be equal to reduction in projected match between distant inputs?
multiinput search
extension, evaluation of selection per input: tentative
On the next level, average
match from expansion is compared to that from shorterdistance comparison, and
resulting difference is decay of average match with distance. Again, this decay
drives reevaluation per expansion: selection of inputs with projected decayed
match above average per comparison cost.
Projected match is also
adjusted by prior match (if local decay?) and redundancy (symmetrical if no
decay?)
Slower decay will reduce
value of selection per expansion because fewer positive inputs will turn
negative:
Value of selection = Σ
comp cost of negvalue inputs  selection cost (average saved cost or
relative delay?)
This value is summed
between higherlevel inputs, into average value of selection per increment of
distance. Increments with negative value of selection should be compared
without reevaluation, adding to minimal number of comparisons per selection,
which is evaluated for feedback as a comparisondepth filter:
Σ (selection value per
increment) > average selection value;; for negative patterns of each depth,
 >1 only?
depth adjustment value =
average selection value; while (average selection value > selection cost){
depth adjustment ±±; depth
adjustment value = selection value per increment (depthspecific?); };
depth adjustment >
minimal per feedback? >> lowerlevel depth filter;; additive depth = adjustment
value?
 match filter is summed
and evaluated per current comparison depth?
 selected positive
relative matches don’t reduce the benefit of pruningout negative ones.
 skip if negative
selection value: selected positive matches < selection cost: average value
or relative delay?
Each input forms a queue of
matches and misses relative to templates within comparison depth filter. These
derivatives, both discrete and summed, overlap for inputs within each other’s
search span. But representations of discrete derivatives can be reused,
redundancy is only necessary for parallel comparison.
Assuming that environment
is not random, similarity between inputs declines with spatiotemporal
distance. To maintain proximity, a ninput search is FIFO: input is compared to
all templates up to maximal distance, then added to the queue as a new
template, while the oldest template is outputted into patternwide queue.
valueproportional
combination of patterns: tentative
Summation of +dP and dP is
weighted by their value: L (summed dsign match) + M (summed i match).
Such relative probability
of +dP vs.  dP is indicated by corresponding ratios: rL = +L/L, and rM =
+M/M.
(Ls and Ms are compared by
division: comparison power should be higher for more predictive variables).
But weighting
complementation incurs costs, which must be justified by value of ratio. So,
division should be of variable length, continued while the ratio is above
average. This is shown below for Ls, also applies to Ms:
dL = +L  L, mL = min (+L, L); nL =0; fL=0;
efL=1; // nL: L multiple, fL: L fraction, efL: extended fraction.
while (dL > adL){ dL =
dL; // all Ls are positive; dL is evaluated for long division by adL: average
dL.
while (dL > 0){ dL =
mL; nL++;} dL = mL/2; dL >0? fL+= efL; efL/=2;} // ratio: rL= nL + fL.
Ms’ longdivision
evaluation is weighted by rL: projected rM value = dM * nL (reducedresolution
rL)  adM.
Ms are then combined: cM =
+M + M * rL; // rL is relative probability of M across iterated cL.
Ms are not projected (M+= D * rcL * rM D (MD/cD) /2): precision of
higherlevel rM D is below that of rM?
Prior ratios are
combination rates: rL is probability of M, and combined rL and rM (cr) is
probability of D.
If rM < arM, cr = rL,
else: cr = (+L + +M) / (L + M) // cr = √(rL * rM) would lose L vs. M
weighting.
cr predicts match of
weighted cD between cdPs, where negativedP variable is multiplied by
aboveaverage match ratio before combination: cD = +D + D * cr. // after unweighted
comparison between Ds?
Averages: arL, arM, acr,
are feedback of ratios that cooccur with aboveaverage match of
spannormalized variables, vs. input variables. Another feedback is averages
that evaluate long division: adL, adM, adD.
Both are feedback of positive
C pattern, which represents these variables, inputted & evaluated on 3^{rd} level.
; or 4^{th} level: value of dPs * ratio is compared to value
of dPs, & the difference is multiplied by cL / hLe cL?
Comparison of oppositesign
Ds forms negative match = smaller D, and positive difference dD = +D+ D.
dD magnitude predicts its
match, not further combination. Single comparison is cheaper than its
evaluation.
Comparison is by division
if larger D cooccurs with hLe nD of aboveaverage predictive value (division
is signneutral & reductive). But average nD value is below the cost of
evaluation, except if positive feedback?
So, default operations for
L, M, D of complementary dPs are comparison by long division and combination.
D combination: +D D*cr,
vs.  cD * cr: +D vs. D weighting is
lost, meaningless if cD=0?
Combination by division is
predictive if the ratio is matching on higher level (hLe) & acr is fed back
as filter?
Resulting variables: cL,
rL, cM, rM, cr, cD, dD, form top level of cdP: complemented dP.
Level 3: prevaluation of
projected filter patterns, forming updatedinput patterns
(out of date)
3^{rd} level inputs are ± V
patterns, combined into complemented V patterns. Positive V patterns include
derivatives of 1^{st} level match, which project match within future inputs (D
patterns only represent and project derivatives of magnitude). Such
projectedinputsmatch is prevaluated, negative prevaluespan inputs are
summed or skipped (reloaded), and positive prevaluespan inputs are evaluated
or even directly compared.
Initial upward (Λ) prevaluation by E filter
selects for evaluation of V patterns, within resulting ± E patterns. Resulting
prevalue is also projected downward (V), to select future input spans for evaluation,
vs. summation or skipping. The span is of projecting V pattern, same as of
lower hierarchy. Prevaluation is then iterated over multiple projectedinput
spans, as long as last prevalue remains above average for the cost of
prevaluation.
Additional interference of
iterated negative projection is stronger than positive projection of lower
levels, and should flush them out of pipeline. This flushing need not be final,
spans of negative projected value may be stored in buffers, to delay the loss.
Buffers are implemented in slower and cheaper media (tape vs. RAM) and accessed
if associated patterns match on a higher level, thus project aboveaverage
match among their inputs.
Iterative backprojection
is evaluated starting from 3^{rd} level: to be projectable the input must
represent derivatives of value, which are formed starting from 2^{nd} level. Compare this to 2^{nd} level evaluation:
Λ for input, V for V
filter, iterated within V pattern. Similar subiteration in E pattern?
Evaluation value =
projectedinputsmatch  E filter: average input match that cooccurs with
average higherlevel match per evaluation (thus accounting for evaluation
costs + selected comparison costs). Compare this to V filter that selects for 2^{nd} level comparison: average
input match that cooccurs with average higherlevel match per comparison (thus
accounting for costs of default crosscomparison only).
E filter feedback starts
from 4^{th} level of search, because
its inputs represent prevaluated lowerlevel inputs.
4^{th} level also preprevaluates
vs. prevaluation filter, forming preprevalue that determines prevaluation vs.
summation of next input span. And so on: the order of evaluation increases with
the level of search.
Higher levels are
increasingly selective in their inputs, because they additionally select by
higher orders derived on these levels: magnitude ) match and difference of
magnitude ) match and difference of match, etc.
Feedback of prevaluation is
± prefilter: binary evaluationvalue sign that determines evaluating vs.
skipping initial inputs within projected span, and flushing those already
pipelined within lower levels.
Negative feedback may be
iterated, forming a skip span.
Parallel lower hierarchies
& skip spans may be assigned to different external sources or their
internal buffers.
Filter update feedback is
levelsequential, but prefilter feedback is sent to all lower levels at once.
Prefilter is defined per
input, and then sequentially translated into prefilters of higher derivation
levels:
prior value += prior match
> value sign: nextlevel prefilter. If there are multiple prefilters of
different evaluation orders from corresponding levels, they AND & define
infrapatterns: sign ( input ( derivatives.
filter update evaluation
and feedback
Negative evaluationvalue blocks input evaluation (thus comparison) and filter updating on all lower levels. Notevaluated input spans (gaps) are also outputted, which will increase coordinate range per contents of both higherlevel inputs and lowerlevel feedback. Gaps represent negative projectedmatch value, which must be combined with positive value of subsequent span to evaluate comparison across the gap on a higher level. This is similar to evaluation of combined positive + negative relative match spans, explained above.
Blocking locations with expected inputs will result in preference for exploration & discovery of new patterns, vs. confirmation of the old ones. It is the opposite of upward selection for stronger patterns, but sign reversal in selection criteria is basic feature of any feedback, starting with average match & derivatives.
Positive evaluationvalue
input spans are evaluated by lowerlevel filter, & this filter is evaluated
for update:
combined update = (output
update + output filter update / (samefilter span (fL) / output span)) /2.
both updates: = last
feedback, equalweighted because higherlevel distance is compensated by range:
fL?
update value = combined
update  update filter: average update per average higherlevel additive match.
also differential costs of
feedback transfer across locations (vs. delay) + representation + filter
conversion?
If update value is
negative: fL += new inputs, subdivided by their positive or negative predictive
value spans.
If update value is
positive: lowerlevel filter += combined update, new fL (with new filter
representation) is initialized on a current level, while currentlevel part of
old fL is outputted and evaluated as nextlevel input.
In turn, the filter gets
updates from higherlevel outputs, included in higherhigherlevel positive
patterns by that level’s filter. Hence, each filter represents combined
spannormalized feedback from all higher levels, of exponentially growing span
and reduced update frequency.
Deeper hierarchy should
block greater proportion of inputs. At the same time, increasing number of
levels contribute to projected additive match, which may justify deeper search
within selected spans.
Higherlevel outputs are
more distant from current input due to elevation delay, but their projection
range is also greater. So, outputs of all levels have the same relative
distance (distance/range) to a next input, and are equalweighted in combined
update. But if input span is skipped, relative distance of skipinitiating
pattern to next input span will increase, and its predictive value will
decrease. Hence, that pattern should be flushed or at least combined with a
higherlevel one:
combined V prevalue = higherlevel V prevalue + ((currentlevel V prevalue  higherlevel V prevalue) / ((currentlevel
span / distance) / (higherlevel span / distance)) /2. // the difference
between currentlevel and higherlevel prevalues is reduced by the ratio of
their relative distances.
To speed up selection,
filter updates can be sent to all lower levels in parallel. Multiple direct
filter updates are spannormalized and compared at a target level, and the
differences are summed in combined update. This combination is equalweighted
because all levels have the same spanperdistance to next input, where the
distance is the delay of feedback during elevation. // this happens
automatically in levelsequential feedback?
combined update = filter
update + distancenormalized difference between output & filter updates:
((output update  filter
update) / (output relative distance / higheroutput relative distance)) /2.
This combination method is
accurate for postskipped input spans, as well as next input span.
 filter can also be
replaced by output + higherlevel filter /2, but value of such feedback is not
known.
 possible fixedrate
sampling, to save on feedback evaluation if slow decay, ~ deep feedforward
search?
 selection can be by
patterns, derivation orders, subpatterns within an order, or individual
variables?
 match across distance
also projects across distance: additive match = relative match * skipped
distance?
crosslevel shortcuts:
higherlevel subfilters and symbols
After individual input
comparison, if match of a current scale (lengthofalength…) projects positive
relative match of input lowerscale / higherderivation level, then the later
is also crosscompared between the inputs.
Lower scale levels of a
pattern represent old lower levels of a search hierarchy (current or buffered
inputs).
So, feedback of lower scale
levels goes down to corresponding search levels, forming shortcuts to preserve
detail for higher levels. Feedback is generally negative: expectations are
redundant to inputs. But specifying feedback may be positive: lowerlevel details
are novel to a pattern, & projected to match with it in the future.
Higherspan comparison
power is increased if lowerspan comparison match is below average:
variable subtraction ) span division )
superspan logarithm?
Shortcuts to individual
higherlevel inputs form a queue of subfilters on a lower level, possibly
represented by a queuewide prefilter. So, a level has one filter per parallel
higher level, and subfilter for each specified subpattern. Subfilters of
incrementally distant inputs are redundant to all previous ones.
Corresponding input value =
match  subfilter value * rate of match to subfilter * redundancy?
Shortcut to a whole level
won’t speedup search: higherlevel search delay > lowerhierarchy search
delay.
Resolution and parameter
range may also increase through interaction of colocated counterprojections?
Symbols, for communication
among systems that have common highlevel concepts but no direct interface, are
“coauthor identification” shortcuts: their recognition and interpretation is
performed on different levels.
Higherlevel patterns have
increasing number of derivation levels, that represent corresponding lower
search levels, and project across multiple higher search levels, each evaluated
separately?
Match across discontinuity
may be due to additional dimensions or internal gaps within patterns.
Search depth may also be
increased by crosscomparison between levels of scale within a pattern: match
across multiple scale levels also projects over multiple higher and lower
scale levels? Such comparison between variable types within a pattern would be
of a higher order:
5. Comparison between variable types
within a pattern (tentative)
To reiterate, elevation increases syntactic
complexity of patterns: the number of different variable types within them.
Syntax is identification of these types by their position (syntactic
coordinate) within a pattern. This is analogous to recognizing parts of speech
by their position within a sentence.
Syntax “synchronizes”
sametype variables for comparison  aggregation between input patterns. Access
is hierarchical, starting from sign>value levels within each variable of
difference and relative match: sign is compared first, forming + and  segments,
which are then evaluated for comparison of their values.
Syntactic expansion is
pruned by selective comparison vs. aggregation of individual variable types
within input patterns, over each coordinate type or resolution. As with
templates, minimal aggregation span is resolution of individual inputs, &
maximal span is determined by average magnitude (thus match) of new derivatives
on a higher level. Hence, a basic comparison cycle generates queues of
interlaced individual & aggregate derivatives at each template variable,
and conditional higher derivatives on each of the former.
Sufficiently complex syntax
or predictive variables will justify comparing across “syntactic“ coordinates
within a pattern, analogous to comparison across external coordinates. In fact,
that’s what higherpower comparisons do. For example, division is an iterative
comparison between difference & match: within a pattern (external
coordinate), but across derivation (syntactic coordinate).
Also crossvariable is
comparison between orders of match in a pattern: magnitude, match,
matchofmatch... This starts from comparison between match & magnitude:
match rate (mr) = match / magnitude. Match rate can then be used to project
match from magnitude: match = magnitude * output mr * filter mr.
In this manner, mr of each
match order adjusts intraorderderived sequentially higherorder match:
match *= lower interorder
mr. Additive match is then projected from adjusted matches & their
derivatives.
This interorder projection
continues up to the top order of match within a pattern, which is the ultimate
selection criterion because that’s what’s left matching on the top level of
search.
Interorder vectors are ΛV
symmetrical, but ΛV derivatives from lower order of match are also projected
for higherorder match, at the same rate as the match itself?
Also possible is comparison
across syntactic gaps: ΛY comparison > difference, filter feedback VY
hierarchy. For example, comparison between dimensions of a multiD pattern will
form possibly recurrent proportions.
Internal comparisons can
further compress a pattern, but at the cost of adding a higherorder syntax,
which means that they must be increasingly selective. This selection will
increase “discontinuity” over syntactic coordinates: operations necessary to
convert the variables before comparison. Eventually, such operators will become
large enough to merit direct comparisons among them. This will produce
algebraic equations, where the match (compression) is a reduction in the number
of operations needed to produce a result.
The first such shortcut
would be a version of Pythagorean theorem, discovered during search in 2D (part
6) to compute cosines. If we compare 2Dadjacent 1D Ls by division, over 1D
distance and derivatives (an angle), partly matching ratio between the ratio of
1D Ls and a 2nd derivative of 1D distance will be a cosine.
Cosines are necessary to
normalize all derivatives and lengths (Ls) to a value they have when orthogonal
to 1D scan lines (more in part 6).
Such normalization for a
POV angle is similar to dimensionality reduction in Machine Learning, but
is much more efficient because it is secondary to selective dimensionality
expansion. It’s not really “reduction”: dimensionality is prioritized rather
than reduced. That is, the dimension of pattern’s main axis is maximized, and dimensions
sequentially orthogonal to higher axes are correspondingly minimized. The
process of discovering these axes is so basic that it might be hardwired in
animals.
6. Cartesian dimensions and sensory
modalities (out of date)
This is a recapitulation
and expansion on incremental dimensionality introduced in part 2.
Term “dimension” here is
reserved for a parameter that defines sequence and distance among inputs,
initially Cartesian dimensions + Time. This is different from terminology of
combinatorial search, where dimension is any parameter of an input, and their
external order and distance don’t matter. My term for that is “variable“,
external dimensions become types of a variable only after being encoded within
input patterns.
For those with ANN
background, I want to stress that a level of search in my approach is 1D queue
of inputs, not a layer of nodes. The inputs to a node are combined regardless
of difference and distance between them (the distance is the difference between
laminar coordinates of source “neurons”).
These derivatives are
essential because value of any prediction = precision of what * precision of
where. Coordinates and coderived differences are not represented in ANNs, so
they can't be used to calculate Euclidean vectors. Without such vectors,
prediction and selection of where must remain extremely crude.
Also, layers in ANN are
orthogonal to the direction of input flow, so hierarchy is at least 2D. The
direction of inputs to my queues is in the same dimension as the queue itself,
which means that my core algorithm is 1D. A hierarchy of 1D queues is the most
incremental way to expand search: we can add or extend only one coordinate at a
time. This allows algorithm to select inputs that are predictive enough to
justify the cost of representing additional coordinate and corresponding
derivatives. Again, such incremental syntax expansion is my core principle,
because it enables selective (thus scalable) search.
A common objection is that
images are “naturally” 2D, and our spacetime is 4D. Of course, these empirical
facts are practically universal in our environment. But, a core cognitive
algorithm must be able to discover and forget any empirical specifics on its
own. Additional dimensions can be discovered as some general periodicity in the
input flow: distances between matching inputs are compared, match between these
distances indicates a period of lower dimension, and recurring periods form
higherdimension coordinate.
But as a practical shortcut
to expensive dimensiondiscovery process, initial levels should be designed to
specialize in sequentially higher spatial dimensions: 1D scan lines, 2D frames,
3D set of confocal “eyes“, 4D temporal sequence. These levels discover
contiguous (positive match) patterns of increasing dimensionality:
1D line segments, 2D blobs,
3D objects, 4D processes. Higher 4D cycles form hierarchy of multidimensional
orders of scale, integrated over time or distributed sensors. These higher
cycles compare discontinuous patterns. Corresponding dimensions may not be
aligned across cycles of different scale order.
Explicit coordinates and
incremental dimensionality are unconventional. But the key for scalable search
is input selection, which must be guided by costbenefit analysis. Benefit is projected
match of patterns, and cost is representational complexity per pattern. Any
increase in complexity must be justified by corresponding increase in
discovered and projected match of selected patterns. Initial inputs have no
known match, thus must have minimal complexity: singlevariable “what”, such as
brightness of a greyscale pixel, and singlevariable “where”: pixel’s
coordinate in one Cartesian dimension.
Single coordinate means
that comparison between pixels must be contained within 1D (horizontal) scan
line, otherwise their coordinates are not comparable and can’t be used to
select locations for extended search. Selection for contiguous or proximate
search across scan lines requires second (vertical) coordinate. That increases
costs, thus must be selective according to projected match, discovered by past
comparisons within 1D scan line. So, comparison across scan lines must be done
on 2^{nd} level of search. And so
on.
Dimensions are added in the
order of decreasing rate of change. This means spatial dimensions are scanned
first: their rate of change can be spedup by moving sensors. Comparison over
purely temporal sequence is delayed until accumulated change / variation
justifies search for additional patterns. Temporal sequence is the original
dimension, but it is mapped on spatial dimensions until spatial continuum is
exhausted. Dimensionality represented by patterns is increasing on higher
levels, but each level is 1D queue of patterns.
Also independently
discoverable are derived coordinates: any variable with cumulative match that
correlates with combined cumulative match of all other variables in a pattern.
Such correlation makes a variable useful for sequencing patterns before
crosscomparison.
It is discovered by summing
matches for sametype variables between input patterns, then crosscomparing
summed matches between all variables of a pattern. Variable with the highest
resulting match of match (mm) is a candidate coordinate. That mm is then
compared to mm of current coordinate. If the difference is greater than cost of
reordering future inputs, sequencing feedback is sent to lower levels or
sensors.
Another type of empirically
distinct variables is different sensory modalities: colors, sound and pitch,
and so on, including artificial senses. Each modality is processed separately,
up a level where match between patterns of different modalities but same scope
exceeds match between unimodal patterns across increased distance. Subsequent
search will form multimodal patterns within common ST frame of reference.
As with external
dimensions, difference between modalities can be predefined or discovered. If
the latter, inputs of different modalities are initially mixed, then segregated
by feedback. Also as with dimensions, my core algorithm only assumes
singlemodal inputs, predefining multiple modalities would be an addon.
7. Notes on working mindset and awards
for contributions
My terminology is as
general as the subject itself. It’s a major confounder,  people crave context,
but generalization is decontextualization. And cognitive algorithm is a
metageneralization: the only thing in common for everything we learn. This
introduction is very compressed, partly because much the work is in progress.
But I think it also reflects and cultivates ruthlessly reductionist mindset
required for such subject.
My math is very simple,
because algorithmic complexity must be incremental. Advanced math can
accelerate learning on higher levels of generalization, but is too expensive for
initial levels. And minimal general learning algorithm must be able to discover
computational shortcuts (AKA math) on it’s own, just like we do. Complex math is
definitely not innate in humans on any level: cavemen didn’t do calculus.
This theory may seem too speculative,
but any degree of generalization must be correspondingly lossy. Which is
contrary to precisionoriented culture of math and computer science. Hence,
current Machine Learning is mostly experimental, and the progress on
algorithmic side is glacial. A handful of people aspire to work on AGI, but they either lack or
neglect functional definition of intelligence, their theories are only vague
inspiration.
I think working on this level
demands greater delay of experimental verification than is acceptable in any
established field. Except for philosophy, which has nothing else real to study.
But established philosophers have always been dysfunctional fluffers, not
surprisingly as their only paying customers are college freshmen.
Our main challenge in
formalizing GI is a speciewide ADHD. We didn’t evolve for sustained focus on
this level of generalization, that would cause extinction long before any
tangible results. Which is no longer a risk, GI is the most important problem
conceivable, and we have plenty of computing power for anything better than
bruteforce algorithms. But our psychology lags a lightyear behind technology:
we still hobble on mental crutches of irrelevant authority and peer support,
flawed analogies and needless experimentation.
Awards for contributions
I offer prizes up to a
total of $500K for debugging, optimizing and extending this algorithm: github.
Contributions must fit into
incrementalcomplexity hierarchy outlined here. Unless you find a flaw in my
reasoning, which would be even more valuable. I can also pay monthly, but there
must be a track record.
Winners will have an option
to convert the awards into an interest in all commercial applications of a
final algorithm, at the rate of $10K per 1% share. This option is informal and
likely irrelevant, mine is not a commercial enterprise. Money can’t be primary
motivation here, but it saves time.
Awards so far:
2010: Todor Arnaudov, $600 for suggestion to
buffer old inputs after search.
2011: Todor, $400
consolation prize for understanding some ideas that were not clearly explained
here.
2014: Dan He, $600 for pushing me to be more specific and
to compare my algorithm with others.
2016: Todor Arnaudov, $500 for multiple
suggestions on implementing the algorithm, as well as for the effort.
Kieran Greer, $375 for an attempt to
implement my level 1 pseudo code in C#
2017:
Alexander Loschilov, $2800 for help in
converting my level 1 pseudo code into Python, consulting on PyCharm and SciPy,
and for insistence on 2D clustering, FebruaryApril.
Todor Arnaudov: $2000 for help in
optimizing level_1_2D, JuneJuly.
Kapil Kashyap: $ 2000 for stimulation
and effort, help with Python and level_1_2D, SeptemberOctober
2018:
Todor Arnaudov, $1000 mostly for effort and
stimulation, JanuaryFebruary
Andrei Demchenko, $1800 for conventional refactoring in
line_POC_introductory.py, interface improvement and few improvements in the
code, April  May.
Todor Arnaudov, $2000 for help in debugging
frame_dblobs.py, September  October.
Khanh Nguyen, $2700, for getting to work line_POC.
2019:
Stephan Verbeeck, $2000 for getting me to return to using minimallycoarse gradient and
his perspective on colors and line tracing, JanuaryJune
Todor Arnaudov, $1600, frequent participant,
MarchJune
Kok Wei
Chee,
$900, for diagrams of line_POC and frame_blobs, December
Khanh Nguyen, $10100, lead debugger and codesigner, JanuaryDecember
2020:
Mayukh Sarkar, $600 for
frame_blobs performance analysis and porting form_P to C++, January
Maria Parshakova, $1600,
team developer, MarchMay
Khanh Nguyen, $8100, team
developer
Kok Wei Chee, $14200, team
developer
2021:
Many thanks to Chris Sun for
his efforts to find collaborators!
Kok Wei Chee, $22000, lead
developer, JanuaryDecember
Khanh Nguyen, $5000, team
developer, AprilOctober
Alex Pitertsev, $1000:
mostly visualization via dfs, JulyAugust
Kelvin Spacey, $1840: port
to dataframes, various, MayJuly
Yura Guruel, $1000: various,
MayJuly
Aqib Mumtaz and Ayesha Ali,
$1400: audio interfacing for 1D alg, AprilMay
2022:
Kok Wei Chee, $25300: lead
developer, JanuaryDecember
Alex Pitertsev, $2000 for
porting line_comp to Julia, June