Revision as of 00:11, 7 May 2017

Thoughts

Philip Johnson-Laird is an academic who sits at the intersection of philosophy and psychology. He studies cognition and the inner workings of the brain. My first exposure to his work came through his book "Mental Models," which I used when writing my dissertation to help articulate what, exactly, a model is, and understanding what models can and cannot do.

This book is particularly apt, given the recent resurgence in machine learning and artificial intelligence. When the book was originally published in 1988, the idea of a neural network was still undergoing development, and many foundational ideas are discussed here. That the book is not written like a computer scientist who is teaching how to do X in Y, or assume the reader will be able to follow graduate-level linear algebra concepts, but rather like a cognitive scientist carefully devising an experiment to devise the mechanisms of the brain.

The organization of the book is in six parts, each focusing equally on aspects of how our brains work, and how that can be replicated through computation.

Part 1, Computation and the Mind, starts by talking about the concept of computability, what it means to compute something, and how we might replicate some of the computing functions of the brain. It answers some basic questions that any non-expert would have, like how do you study the mind?

The remaining parts each focus on a particular aspect of our mental machinery:

Part 2: Vision

Part 3: Learning, Memory, and Action

Part 4: Cogitation

Part 5: Communication

Part 6: The Conscious and Unconscious Mind

Notes

Part 1: Computation and the Mind

Since Descartes, theorists have assumed that there is no problem in understanding how machines work. Indeed, Lord kelvin, the eminent Victorian physicist, even turned this argument around, and wrote in a letter to a colleague: "I can never satisfy myself until I can make a mechanical model of a thing. If I can make a mechanical model I can understand it. As long as I cannot make a mechanical model all the way through I cannot understand.
p. 24

On the Meaning of Symbols

Any system of external symbols, such as numerals or an alphabet, is capable of symbolizing many different domains. Thus, the binary numeral 1100 can stand for many things. It may stand for the number twelve, for the letter Z as in morse code, or for a particular person, artifact, 3d shape, region of the earth's surface, or many other entities, Numerals are potent because they are each distinct from one another, and there is a simple structural recipe for constructing an unlimited supply of them.
Even if a domain contains a potentially infinite number of entities, then a numerical system can be used to symbolize it provided that there is some way to relate the numerals to what they signify. The simplest link is an arbitrary pairing of each symbol to one referent, and each referent o one symbol, as in a numerical code for the rooms of a hotel. A symbol may be well formed, e.g., the Roman numeral XII, but fail to designate anything (no room with number 13). Rather than arbitrary pairings, it is usually convenient to have some principles for assigning interpretations to symbols. These principles may be a matter of rules, conventions or habits. If symbols are assembled out of primitives according to structural rules, then the structure of the symbol may, or may not, be relevant to its interpretation. A Roman numeral has a structure that is relevant to its interpretation as a number. A pile of sand in an hourglass has a structure that is not relevant to its interpretation as an interval of time - only the volume of sand matters.
p. 31-32

Computability and Mental Processes

Computers work in a very different way from Turing machines: their memories are not just one-dimensional tapes, and they have a much richer set of basic operations. But a computer program is analogous to a particular Turing machine, and the computer is analogous to a univerasal machine because it can execute any program that is written in an appropriate code. Anything that can be computed by a digital computer cna be computed by a Turing machine.
Not everything, however, can be computed. There are many problems that can be stated but that have no computable solution. It is impossible, for example, to design a universal machine that determines whether any arbitrarily selected Turing machine, given some arbitrarily selected data, will come to halt or go on computing for ever. Hence, there is not test guaranteed to decide whether or not a problem has a computable solution.
p. 51

There are three morals to be drawn for cognitive science.
First, since there is an infinity of different programs for carrying out any computable task, observations of human performance can never eliminate all but the correct theory...
Second, if a theory of mental processes turns out to be equivalent in power to a universal machine, then it will be difficult to refute.
Third, theories of the mind should be expressed in a form that can be modeled in a computer program. A theory may fail to satisfy this criterion for several reasons: it may be radically incomplete; it may rely on a process that is not computable; it may be inconsistent, incoherent, or, like a mystical doctrine, take so much for granted that it is understood only by its adherents. These flaws are not always obvious. Students of the mind do not always know that they do not know what they are talking about. The surest way to find out is o try to devise a computer program that models the theory. A working computer model places a minimal reliance on intuition;: the theory it embodies may be false, but at least it is coherent, and does not assume too much. Computer programs model the interactions of fundamental particles, the mechanisms of molecular biology, and the economy of the country. The rest of the book is devoted to computable theories of the human mind.
p. 52

Part Two: Vision

The Visual Image

Consider three beliefs about vision:

The eye is like a television camera - you point it at a scene, it registers the scene, and it projects the image inside your head.
Vision is impossible. Different arrangements of things can produce the same image, so the brain does not know what particular arrangement you are looking at.
Vision is easy for brain to do, but hard for us to understand.

Three different levels of explanation are needed:

Theory of what is computed
Theory of how the system carries out computations
Theory of underlying neurophysiology (the "hardware")

Three stages of vision:

Vision stage 1: grayscale images (brightness value for pixels)
Vision stage 2: changes in intensity (gradients between pixels)
Vision stage 3: the primal sketch

Locating Gradients in Intensity

Gray-level array has certain amounts of noise - random fluctuations. How to differentiate between small scale changes and large scale changes?

In order to get a sensible measure of where the gradient undergoes major, significant changes, we need to apply a filter, or a spatial average.

Simple technique for reducing noise is to replace each value in the array by its local average - applying a 2D filter to smooth changes in the intensity.

Let's talk more about this filtering concept.

A crude local averaging stencil would just be an even weighted average of neighboring points:

$\frac{\Delta x}{3} \left( x_{i-1} + x_{i} + x_{i+1} \right)$

More generally - the left hand rule, right hand rule, midpoint rule approximate the function between two points as a constant (1 unknown), requires 1 point

The trapezoid rule approximates the function between two points as a line (2 unknowns, slope and intercept) and requires 2 points

Can get increasingly better stencils by using things like Simpson's Rule, approximates function over an interval with a quadratic (3 unknowns, 3 coefficients) and requires 3 points

$\frac{\Delta x}{2} \left( \frac{1}{3} x_{i-1} + \frac{4}{3} x_{i} + \frac{1}{3} x_{i+1} \right)$

Applying a filter and removing local irregularities reveals large scale changes. (Another way to think about this: the SPECTRAL content of the image shifts to being larger scale changes.)

To extend this idea to the most general form, can apply averaging operator using a particular weighting function, the Gaussian (normal) distribution.

Once the averaging operator is applied, how to detect intensity boundaries? Simple way to measure steepness of gradient is to multiply left value by -1 and multiply right value by +1 and sum the results. If there is no gradient, the change is 0. If there is a gradient, the sum of these two values will result in a step function.

To explore this further, the boundaries of the gradient intensity can be found by calculating the gradient of the gradient - and finding where it crosses zero (corresponding to a location of constant gradient). The zero-crossing value is a strong indicator of a boundary between different regions of different intensities.

It is possible to combine the two operations, of local averaging and of finding changes in the gradient, in 2D. The result is the Mexican Hat filter. The Laplace of the Gaussian is nearly equivalent, and is intended to work for arbitrary number of dimensions.

In order to apply the operation of both local averaging and finding changes in the gradient, and performing that in two dimensions, we can combine the two operations into one by applying the Mexican Hat filter in multiple dimensions.

Each level in the gray level array is averaged with its neighbors using the Mexican Hat filter. Weights are positive for very near neighbors (values of near-neighbor points are weighted more heavily), and are negative for distant neighbors. When the Mexican Hat filter is applied to a gray level image, the result will be a set of positive and negative values, and a resulting zero-crossings map.

Visual filtering is possible by adjusting the width of the Mexican Hat filter. Larger hat extending over many elements reveals gradual changes in intensity over larger areas. It may be useful to use multiple filter sizes to obtain multiple zero-crossings maps for different filter sizes.

NOTE: The gradient of the intensity is equivalent to the first spatial derivative, while the changes in that gradient (the gradient of the gradient) is equivalent to the second derivative. The second derivative can be applied in two dimensions isotropically (equally weighted in all directions away from pixel). This is the Laplacian operator.

The Mexican Hat function is a combination of the Gaussian normal distribution, to smooth the data (importance/weight decreases with distance) and the Laplacian (of the Gaussian). So, if you see reference to the LoG (Laplacian of Gaussian), it's equivalent to the Mexican Hat function.

Neurophysiology of Vision

Trying to understand vision by studying only nerve cells, as Marr remarked, is like trying to understand bird flight by studying only feathers.
p. 72

A brief review of eye physiology:

The pupil is the black part of the eye, through which light enters the eye and is received by the brain. In camera terminology, this is equivalent to the eye's arpeture.
The iris is the colored portion that surrounds the pupil. It controls the size of the pupil and how much light enters. It is equivalent to the diaphragm f-stop controlling the arpeture. The pigmentation absorbs light and prevents excess light from reaching the retina, essentially making the eye more efficient.
The retina is the back of the eye, where light enters and is received by nerve cells. This light is converted into electrical and chemical signals that are forwarded on to the brain.

The retina consists of cells that create a coating on the inside of the eye, also called photoreceptor cells or ganglion cells.

Some ganglion cells are excited by light that falls directly on them, and inhibited by light that falls on the cells that surround it. Other ganglion cells are inhibited by light that falls directly on the center of the cell, and excited by light that falls on neighboring cells. This mechanism provides the necessary signal addition and subtraction to apply a Mexican Hat filter biologically. Cells that are excited by direct signals are the additive portion of the filter, while cells that are inhibited by direct signals are the negative values further away. The cells normally fire at a specific frequency, and when they are excited they fire at a faster rate and when inhibited fire at a slower rate. The zero crossings (where the second derivative crosses zero/changes sign), which corresponds to the location of edges, is linked to locations where these two sorts of ganglion cells have equal activity.

Neurophysiologists David Hubel and Torsten Wiesel studied mechanisms of perception, found cells in visual cortex excited by bright lines or bars at particular orientation (Marr's theory suggests these correspond to zero-crossings).

Third State of Vision: Primal Sketch

The eye is applying a filter equivalent to the Laplacian of the Gaussian, but with structures of ganglionic nerve cells each applying different size filters. Thin bars and details that are far away may give two zero-crossings when applying a small filter but be blurred together by a larger filter. The brain would thus find it useful to be able to compare the results of filters of different sizes - when zero-crossings are detected across multiple filter sizes, it is a "real" result.

Marr believes the zero-crossings are the key, while others (Roger Watt and Michael Morgan) believe it is the peaks and troughs.

Breaking down the visual perception of the world into a map of bars, edges, and blobs is how the macro-scale image of the world can be represented (so-called "primal sketch"). However, the mechanisms behind how the brain forms these is difficult to study.

Usually, focusing on the primal sketch and ignoring details will lead to a loss of information. However, you can also gain information. Example: checkerboard image of Lincoln (image by Leon D. Harmon).

Cost of Visual Processing

Major challenges of visual processing with onboard computers: computers have far fewer interconnections (electronic nerves) than biological systems, so slower bus speeds and bandwidth. Crucial to work fast enough for the task at hand - e.g., self-driving car can't take two seconds to process an image.

Major computational cost is filtering out gray-level array. For a 1000 x 1000 array, need to apply filter to every pixel, for every filter size. Specialty hardware can help, but still has significant costs.

Workarounds include limiting vision of the world to a primal sketch - much like the housefly, which does not need to perform 3D extrapolation from 2D images, everything boils down to a set of algorithms. Landing algorithm: if visual field expands at high speed, fly turns feet toward center of expanding plane, and stops flying when its feet hit the surface. Mate tracking: find small black patch moving against a background. Left and right wing power governed by patch position in visual field and by angular velocity, so fly keeps the target centered in its visual field and flies toward it.

For a fly, vision is, in fact, impossible. But the mechanism is tuned to work for specific scenarios with limited information. Thus there are many tasks a fly cannot accomplish.

References

Harmon, L. D. The recognition of faces. Scientific American, November 1973, p 75.

Mayhew Frisby 1984 (technical account of computer vision)

Watt 1988 (advanced monograph on the initial stages of human vision)

Marroquin, J. L. "Human visual perception of structure." Master's thesis, Dept of EE and CS, MIT, 1976.

Flags

@@ Line 215: / Line 215: @@
 {{ReadingFlag}}
-[[Category:Machine Learning]]
+[[Category:ML]]
-[[Category:Neural Networks]]
+[[Category:NN]]

The Computer and the Mind: Difference between revisions

From charlesreid1