Explanation of “multi-layer backpropagation Neural network called a Convolution Neural Network”

In my last post I said:

The technology behind the ATMs was developed by LeCun and others almost 10 years ago, at AT&T Bell Labs… The algorithm they developed goes under the name LeNet, and is a multi-layer backpropagation Neural network called a Convolution Neural Network.  I will explain this terminology in my next post.

ANNs are mathemetical models composed of interconnected nodes. Each node is a very simple processor: it collects input from either outside the system or from other nodes, makes a simple calculation (for example, summing the inputs and comparing to a threshold value) and produces an output.

ANNs are usually composed of several layers. There are a number of input nodes — imagine 1 node for each input. Similarly, there is an output layer. Between these two there can be one or more ‘hidden’ layers, so called because they are only interact with other nodes and are therefore invisible outside the system. The addition of hidden nodes allows greater complexity in the system. Choosing the number of and architecture of hidden nodes is an important consideration in the design of an ANN. The description of LeNet as a “multi-layer ANN” indicates that one or more hidden layers are used.

Layers of a Artificial Neural Network
Layers of a Artificial Neural Network

Backpropagation” is by far the most common type of ANN in use today. The development of the backpropagation technique was very significant and was responsible for reviving interest in ANNs. After the initial excitement due to Rosenblatt’s development of the Perceptron (in 1957), which some people (briefly) believed was a algorithm-panacea, researchers hit a brick wall due to the limitations of the perceptron. Minsky and Papert published one of the most important papers in the field (in 1969) that proved this limitation and drove the proverbial nail in the coffin. Work and interest in ANNs practically vanished. In the 1980’s ANNs were revived by the work on backpropagation techniques by Rumelhart and others.

Which doesn’t really answer the question. What is backpropagation and how did it overcome these limitations? The question deserves a dedicated discussion, but imagine the flow of information through a multi-layer neural network such as the one in the picture above. We start at the input layer, where external input enters the system. The input nodes pass this along to the hidden nodes. Each hidden node sums up the input from several input nodes, and compares it to some threshold value.  — 1) This sum is in fact a weighted sum — we assign a weight to each node which determines how significant that node’s contribution will be.  2) We will then employ a mathematical function such as the logistics function to compare the sum to our ‘threshold’ value.  This is done so that we don’t have to work with a hard-limiter function, which would give us a less useful yes/no. —  The result from our hidden node is passed along to one or more output nodes (unless there is another hidden layer).  The output nodes follow the same process and then pass along their ouput which leaves the system as the final output.  Information has moved forward through our network.

This final output will hopefully be correct, but, during training, it will be wrong or not sufficiently close to the right answer. We can tell the network which of these is the case — this is called ‘supervised learning’ — by calculating the error in each of the output nodes. Now we will go backwards through the system: each of the output nodes must adjust its output, and then pass back information to the hidden layer so that each of those nodes can also adjust its output. The way in which the network adjusts its output is by changing weights and threshold values. The exact method used to decide how much to adjust these values is clearly very important, but the general principle we have employed is the backwards propagation of errors — this is the revolutionary ‘backpropagation’ technique.

Which brings us to the final term: “convolution”, which I was completely unfamiliar with. After doing some reserach, here’s my first attempt at an explanation — please Email me or leave a comment if I have something wrong and I will make the necessary revisions.

A convolution neural network is special architecture (arrangement of layers, nodes, and connections) commonly used in visual and auditory processing which more specifically defines a spacial relationship between layers of nodes. Imagine we are trying to recognize objects from a picture, which we subdivide into a coarse grid and then subdivide further into progressively finer grids. We could define node connections in such a way that a single grid unit (pixel) from layer l corresponds only to a block of pixels in layer (l + 1). We use this limitation to increase efficiency.  Since we are now dealing with many layers, we choose to create an interpretation of output several times, not just at the end.  The first time we do this we look for coarse information, like edges, then something more refined. CNN architectures are usually characterized by local receptive fields, shared weights, and spatial or temporal subsampling.

[Update: check out a Matlab class for CNN implementation on the Matlab file exchange, by Mihail Sirotenko.]

Put together, LeCun tells us that LeNet is a “multi-layer backpropagation Neural network called a Convolution Neural Network”.

Return to the post about LeCun’s visual processing algorithm.

June 13, 2009. Tags: , , , , , , . Neural Networks. 1 comment.

Digression: Space Objects

Astronomy frustrates me. I love physics, but I have little or no interest in astronomy and it really bugs me that when someone says ‘physics’ everyone seems to either think of ‘stars and planets’ or ‘photons and quarks’.

I recently listened to an episode of the ‘Science and City’ podcast from the NYAS which had a forum-style discussion on the topic titled “From Planets to Plutoids” (Mar 27, 2009). Discussed was the difficulty of defining objects in space, the history of the debate, various contemporary points of view and why (whether?) the debate is important.

It’s a mess. There are planets, planetoids, stars, asteroids, Kuiper Belt objects. They’ve been defined by their current physical properties, by their history, by location, by their orbit, by their relationship to nearby objects. Some planetary scientists care about defining objects from one perspective (perhaps the way in which they were formed, or their composition) and they fight against others such as dynamicists who care about the creating definitions from a wholly different perspective. Attempts by organizations (IAU) are considered by some to have failed and are, at best, reluctantly accepted by others for lack of better alternatives.

This type of problem must have been approached many times before in science. The analogy that struck me was that of attempting to categorize elements. As natural scientists from the perspective of physics and chemistry looked at the ever increasing number of elements in the world, they must have been similarly confused in their attempts to name and categorize them.

From that single podcast — I am happy to confess I am far from well-read on this subject — it seems to me that what astronomers are attempting to do is akin to as if we had attempted to categorize elements by their physical state at room temperature, or some similarly ‘intuitive’ but ultimately inappropriate approach. The problem seems to be approached in a haphazardly, ignored until a new class of objects is found and throws everything asunder. Imagine if a physicist attempted to change the very definition of an element so that he could cement his legacy by claiming to have found one.

Why has there yet been no Mendeleev in the world of astronomy? Are we so far from understanding our universe that a periodic table of objects is beyond our grasp? Or are there truly no fundamental properties — the equivalent of quantum numbers — to make such an attempt?

June 11, 2009. Tags: , , , , . Digression. Leave a comment.