Link: ‘Hello World’ genetic algorithm

I enjoyed a post today at Hidden Clause talking about an attempt to generate the string ‘hello, world!’ using a genetic algorithm.

In the first chapter of the documentation for the Watchmaker Framework Daniel W. Dyer presents an interesting genetic algorithm example: what if we wanted to evolve the string “HELLO WORLD”? … My first thought: implementing this in JGAP shouldn’t be that hard.

The setup reminds me of the game ‘Mastermind’ — we’re dealing with a fixed-length string and scoring based on how many of the correct characters occur in the string and if they are in the right position. You combine some better strings and sometimes mutate them to generate new strings, and repeat that process until, hopefully, you arrive at the target string, ‘hello, world!’.

Did it work? Well, almost. The first result, “‘!ello, or!d!”,  was reached after 1 million iterations on a population size of a thousand — and, it used a limited alphabet containing only the letters and punctuation in the target string (and not, for example, ‘z’).

The reason the solution was not reached, the post’s author notes, is that the algorithm doesn’t have the breadth or the convergent ability to match the statistical odds involved. Recalling the old problem of monkeys on typewriters producing Shakespeare , the author notes how unlikely it is to produce the desired result.

Of course, we’re not simply monkeying at typewriters. We’re making random changes and mutations, but we’re intelligently selecting the parents. Apparently, the method first tried wasn’t intelligent enough to arrive at the desired solution within that number of iterations, and increasing iterations or population size, apparently, isn’t going to be practical.  So we need to use a better way of selecting the parents.

We could improve the fitness function. Let’s say we take the whole alphabet and the punctuation and number them (for instance: A=1… Z=26 and space and comma and exclamation could be 27, 28, and 29, respectively). Now, assign fitness based on the difference between actual and target values of each character. Is this cheating? Well, no. Our new fitness function requires no a priori knowledge of the solution and will work equally well for a different target string. This is providing our GA with a lot more information per epoch, and therefore we should see a much faster convergence.

(Aside: there are much better way of numbering the letters and punctuation to encourage faster convergence, based on the frequency the characters are used in the English language. In fact, one could develop a system that took advantage of commonly grouped letters like ‘th’ and ‘ing’, and if dealing with sentences, commonly grouped words…  Is that cheating? Well, we must remember that the more such information we encode, the more pre-disposed or biased our output will be. Also, the point of a using a genetic algorithm is to substitute computation time for programming complexity. In a contrived problem, it’s up to us where that line lies. )

There are a number of other ways to improve the fitness function, but my instinct is that this suggestion or a similar modification would be sufficient. I hope the Hidden Clause crew reports the result of another attempt.

August 13, 2009. Tags: , , , . Emergence. Leave a comment.

Tukey’s bits

Reading a paper published in 1948, I came across the following:

“.. if the base 2 is used the resulting units may be called binary digits, or more briefly bits, a word suggested by J.W. Tukey.”

Sure enough, the contraction of ‘binary digit’ was coined by an American statistician named John Wilder Tukey and was published for the first time in the very paper I was reading, “A Mathematical Theory of Communication” by C.E. Shannon. That paper is now considered one of the most important papers in the field of Information Theory.

Having read this paper and having spotted this first-ever is, to me, a bit like discovering a celebrity in my high school yearbook. As if I can, in a way, claim a familiarity with the word that others cannot.

August 9, 2009. Tags: , , . Digression. Leave a comment.

Link: Big Dog Robot by larvalsubjects

http://larvalsubjects.wordpress.com/2009/07/26/big-dog-robot/

…the advantage of legged robots such as this is that they can go just about anywhere. Where robots that use wheels and treads get bogged down in mud and rocky terrain, legged robots are able to tackle just about any terrain. Just think about the billy goat, for example. Even billy goats that are a few days old are able to climb mountainous terrain with little trouble. The remarkable thing about Big Dog is just how biological its movements look. You get a sense of this when you watch it slipping on the ice in the film above, but the footage I saw yesterday was even more remarkable. There researchers kicked the robot from the side as it was walking and the manner in which it recovered was eerily organic.

The interpretation of ‘biological’ traits as a sign of superior design or intelligence, is somewhat anthropocentric but completely natural. The robot in the video displays charateristics which we expect to see in animal quadrapeds, which is a big part of the ‘wow’ factor of the video.

More interesting to us as in the context of algorithms is the video in the same post describing machine learning. Any comments on that video?

July 29, 2009. General. Leave a comment.

Predicting Gambling behavior

“Using Neural Networks to Model the Behavior and Decisions of Gamblers, in Particular, Cyber-Gamblers.” by Victor K. Y. Chan.

A system is written that utilizes a back-propagation neural network to model Texas Holdem gamblers’ behavior, based on data collected from a cyber gambling website.

This article describes the use of neural networks and an empirical data sample of, inter alia, the amounts of bets laid and the winnings/losses made in successive games by a number of cyber-gamblers to longitudinally model gambler’s behavior and decisions as to such bet amounts and the temporal trajectory of winnings/losses. The data was collected by videoing Texas Holdem gamblers at a cyber-gambling website.

The full article is available on Scribd.

Gambler-ANNstructure-M1The above diagrams shows the structure utilized for one of the two neural networks developed for the paper. This one, dubbed “M1”, attempts to predict the bet amounts laid by each individual Texas Holdem gambler, based on winnings and losses in immediately preceding games, and the gambler’s current account balance.

M1: the model for successive bet amounts, which longitudinally models and thus
predicts the bet amounts in the successive game laid by each individual Texas Holdem
gambler based on his/her winnings/losses in a number of immediately preceding games
and his/her current gambling account balance.

Apparently, the ANN was generally speaking pretty successful, showing a mean magnitude of relative error on the order of 10^(-2). Two things caught my eye: 1) General betting trends were very well represented by the model, but short-lived (“high-frequency”, if you will) deviances in behavior were not.  The author notes that the same can be said of financial market models. 2) The same model was developed to predict six different gamblers — and achieved a high accuracy of prediction. What does this mean?  In the words of the author Victor Chan:

The influence of a gambler’s skills, strategies, and personality on his/her successive bet amounts is almost totally reflected by the pattern(s) of his/her winnings/losses in the several immediately preceding games and his/her gambling account balance.

Exclamation mark.

Article found on SciAm:  “Artificial Intelligence Predicts Gambling Behavior” via Mindhacks.

July 21, 2009. Tags: , , , , . General. 1 comment.

Queen Problems

Can you place three white queens and five black queens on a 5×5 chessboard so that no queen of one color attacks a queen of the other? The solution is unique, except rotations and reflections.

It took me longer than I first expected to solve this problem. The solution is: Puzzle-5BQ-3WQ

Since a queen attacks along rank, file, and diagonal, it leaving triangles between them unattacked, with a knight-move as the closest vertex of each triangle. While that is immediately obvious to anyone familiar with chess, I was able to solve the problem by thinking about placing white queens in such a way as to allow the largest triangles. Before searching for a solution in that manner, I tried other methods unsuccessfully: hit-and-miss attempts (having at first underestimated the problem); placing queens at knight-moves as one might do for the classic eight-queen problem (see below); utilizing symmetry by placing queens at, say, each corner and at the center; and so on.

How would a computer solve the problem? Well, a wikipedia article suggests, among other methods, a genetic algorithm, which struck my fancy.  So I started writing one.

After the skeleton of the algorithm was written and I started tinkering with parameters and the fitness function, I realized this problem would make a great introduction to genetic algorithms and also provide a simple platform for various discussions about the nuances of GAs.

July 17, 2009. Tags: , , , , . General. Leave a comment.

Le Cun and Backpropagation

A couple of posts ago I wrote about an interview with Yann Le Cun. I subsequently found this interesting note in Smith’s Neural Networks for Statistical Modeling:

“Backpropagation is an example of multiple invention. David Parker (1982,1985) and Yann LeCun (1986) working independently of each other and of the Rumelhard group, published similar discoveries. But none of these workers made the first discovery of backpropagation. that honor goes, belatedly, to Paul Werbos, whose 1974 Harvard Ph.D. thesis, _Beyond Regression_ contains the earliest exposition of the techniques involved (Werbos 1974).
“Werbos’ 1974 discovery had gone unappreciated, but Rumelhard, Hinton, and Williams’ 1986 discovery did not. It kindled a firestorm of interest in Neural Networks.”
From Smith, 1993

Backpropagation is an example of multiple invention. David Parker (1982,1985) and Yann LeCun (1986) working independently of each other and of the Rumelhart group, published similar discoveries. But none of these workers made the first discovery of backpropagation. That honor goes, belatedly, to Paul Werbos, whose 1974 Harvard Ph.D. thesis, Beyond Regression, contains the earliest exposition of the techniques involved (Werbos 1974).

Werbos’ 1974 discovery had gone unappreciated, but Rumelhart, Hinton, and Williams’ 1986 discovery did not. It kindled a firestorm of interest in Neural Networks.

I don’t know any details of Le Cun’s discovery in 1986, but I’m curious to look it up.  Note that it was published in the same year as the Rumelhart paper. Here’s the full reference:

Le Cun, Yann. 1986. Learning Processes in a Asymetric Threshold Network. In Disordered Systems and Biological Organization, ed. E. Bienenstock, F. Fogelman Soulie, and G. Weisbuch. Berlin: Springer.

June 14, 2009. Tags: , . Neural Networks. Leave a comment.

Explanation of “multi-layer backpropagation Neural network called a Convolution Neural Network”

In my last post I said:

The technology behind the ATMs was developed by LeCun and others almost 10 years ago, at AT&T Bell Labs… The algorithm they developed goes under the name LeNet, and is a multi-layer backpropagation Neural network called a Convolution Neural Network.  I will explain this terminology in my next post.

ANNs are mathemetical models composed of interconnected nodes. Each node is a very simple processor: it collects input from either outside the system or from other nodes, makes a simple calculation (for example, summing the inputs and comparing to a threshold value) and produces an output.

ANNs are usually composed of several layers. There are a number of input nodes — imagine 1 node for each input. Similarly, there is an output layer. Between these two there can be one or more ‘hidden’ layers, so called because they are only interact with other nodes and are therefore invisible outside the system. The addition of hidden nodes allows greater complexity in the system. Choosing the number of and architecture of hidden nodes is an important consideration in the design of an ANN. The description of LeNet as a “multi-layer ANN” indicates that one or more hidden layers are used.

Layers of a Artificial Neural Network
Layers of a Artificial Neural Network

Backpropagation” is by far the most common type of ANN in use today. The development of the backpropagation technique was very significant and was responsible for reviving interest in ANNs. After the initial excitement due to Rosenblatt’s development of the Perceptron (in 1957), which some people (briefly) believed was a algorithm-panacea, researchers hit a brick wall due to the limitations of the perceptron. Minsky and Papert published one of the most important papers in the field (in 1969) that proved this limitation and drove the proverbial nail in the coffin. Work and interest in ANNs practically vanished. In the 1980’s ANNs were revived by the work on backpropagation techniques by Rumelhart and others.

Which doesn’t really answer the question. What is backpropagation and how did it overcome these limitations? The question deserves a dedicated discussion, but imagine the flow of information through a multi-layer neural network such as the one in the picture above. We start at the input layer, where external input enters the system. The input nodes pass this along to the hidden nodes. Each hidden node sums up the input from several input nodes, and compares it to some threshold value.  — 1) This sum is in fact a weighted sum — we assign a weight to each node which determines how significant that node’s contribution will be.  2) We will then employ a mathematical function such as the logistics function to compare the sum to our ‘threshold’ value.  This is done so that we don’t have to work with a hard-limiter function, which would give us a less useful yes/no. —  The result from our hidden node is passed along to one or more output nodes (unless there is another hidden layer).  The output nodes follow the same process and then pass along their ouput which leaves the system as the final output.  Information has moved forward through our network.

This final output will hopefully be correct, but, during training, it will be wrong or not sufficiently close to the right answer. We can tell the network which of these is the case — this is called ‘supervised learning’ — by calculating the error in each of the output nodes. Now we will go backwards through the system: each of the output nodes must adjust its output, and then pass back information to the hidden layer so that each of those nodes can also adjust its output. The way in which the network adjusts its output is by changing weights and threshold values. The exact method used to decide how much to adjust these values is clearly very important, but the general principle we have employed is the backwards propagation of errors — this is the revolutionary ‘backpropagation’ technique.

Which brings us to the final term: “convolution”, which I was completely unfamiliar with. After doing some reserach, here’s my first attempt at an explanation — please Email me or leave a comment if I have something wrong and I will make the necessary revisions.

A convolution neural network is special architecture (arrangement of layers, nodes, and connections) commonly used in visual and auditory processing which more specifically defines a spacial relationship between layers of nodes. Imagine we are trying to recognize objects from a picture, which we subdivide into a coarse grid and then subdivide further into progressively finer grids. We could define node connections in such a way that a single grid unit (pixel) from layer l corresponds only to a block of pixels in layer (l + 1). We use this limitation to increase efficiency.  Since we are now dealing with many layers, we choose to create an interpretation of output several times, not just at the end.  The first time we do this we look for coarse information, like edges, then something more refined. CNN architectures are usually characterized by local receptive fields, shared weights, and spatial or temporal subsampling.

[Update: check out a Matlab class for CNN implementation on the Matlab file exchange, by Mihail Sirotenko.]

Put together, LeCun tells us that LeNet is a “multi-layer backpropagation Neural network called a Convolution Neural Network”.

Return to the post about LeCun’s visual processing algorithm.

June 13, 2009. Tags: , , , , , , . Neural Networks. 1 comment.

Yann Le Cun

Yann Le Cun was recently featured on NYAS’s Science and the City podcast. He spoke about visual processing using Artificial Neural Networks (ANNs) and in particular his work on the system that reads the amounts on your cheques at the ATM.  The host, Alana Range, notes that her ATM has never gotten her cheque amounts wrong. I myself no longer bother to check.

The technology behind the ATMs was developed by Le Cun and others almost 10 years ago, at AT&T Bell Labs [which, tragically, has been closed down]. The algorithm they developed [now] goes under the name LeNet, and is a multi-layer backpropagation Neural network called a Convolution Neural Network.  I will explain this terminology in my next post. [Update: explanation now posted.]

In the object recognition demonstration, Yann Le Cun describes four output interpretations in the LeNet algorithm: 1) edges and contours, 2) motifs, 3) categories, and finally 4) objects. By narrowing down options in several steps LeNet can arrive at the final outputn (identifying the object) far more rapidly — the demonstration on the podcast proceses 4-5 pictures each second, and can recognize five different objects.  There are also demonstrations online on Yann Le Cun’s website.

I wondered as I listened to this podcast about the comparisons drawn between mammalian visual processing and Le Cun’s algorithm. There were some very pronounced differences that suggest that our brains utilize a totally different technique, not simply a more complex version of LeNet, as was implied in the interview. These differences were:

  1. LeNet utilizes supervised learning. Our learning is largely unsupervised.
  2. We extrapolate, not just interpolate. So while LeNet needs tens of thousands of samples before it starts recognizing an object, babies might see a few toy planes and recognize the one in the sky. In interview, Le Cun notes that his algorithm, when used for letter recognition, needs to be trained with both printed and handwritten samples and is unable to extrapolate from one to the other.

I think there were some other differences that I do not now recall.  I got flashbacks of the TED-talk in which Handspring founder Jeff Hawkins cracked some jokes about the fundamental differences between computer and human visual processing. I’ll try to post about that; it was pretty entertaining.

[Update: check out a Matlab class for CNN implementation on the Matlab file exchange, by Mihail Sirotenko.]

June 13, 2009. Tags: , , , , , , . Neural Networks. 1 comment.

Digression: Space Objects

Astronomy frustrates me. I love physics, but I have little or no interest in astronomy and it really bugs me that when someone says ‘physics’ everyone seems to either think of ‘stars and planets’ or ‘photons and quarks’.

I recently listened to an episode of the ‘Science and City’ podcast from the NYAS which had a forum-style discussion on the topic titled “From Planets to Plutoids” (Mar 27, 2009). Discussed was the difficulty of defining objects in space, the history of the debate, various contemporary points of view and why (whether?) the debate is important.

It’s a mess. There are planets, planetoids, stars, asteroids, Kuiper Belt objects. They’ve been defined by their current physical properties, by their history, by location, by their orbit, by their relationship to nearby objects. Some planetary scientists care about defining objects from one perspective (perhaps the way in which they were formed, or their composition) and they fight against others such as dynamicists who care about the creating definitions from a wholly different perspective. Attempts by organizations (IAU) are considered by some to have failed and are, at best, reluctantly accepted by others for lack of better alternatives.

This type of problem must have been approached many times before in science. The analogy that struck me was that of attempting to categorize elements. As natural scientists from the perspective of physics and chemistry looked at the ever increasing number of elements in the world, they must have been similarly confused in their attempts to name and categorize them.

From that single podcast — I am happy to confess I am far from well-read on this subject — it seems to me that what astronomers are attempting to do is akin to as if we had attempted to categorize elements by their physical state at room temperature, or some similarly ‘intuitive’ but ultimately inappropriate approach. The problem seems to be approached in a haphazardly, ignored until a new class of objects is found and throws everything asunder. Imagine if a physicist attempted to change the very definition of an element so that he could cement his legacy by claiming to have found one.

Why has there yet been no Mendeleev in the world of astronomy? Are we so far from understanding our universe that a periodic table of objects is beyond our grasp? Or are there truly no fundamental properties — the equivalent of quantum numbers — to make such an attempt?

June 11, 2009. Tags: , , , , . Digression. Leave a comment.

Excerpt from The Selfish Gene

The following passage caught my eye when I read Richard Dawkin’s “Selfish Genes and Selfish Memes” (1976).

Computers do not yet play chess as well as human grand masters, but they have reached the standard of a good amateur. More strictly, one should say programs have reached the standard of a good amateur, for a chess-playing program is not fussy which physical computer it uses to act out its skills. Now, what is the role of the human programmer? First, he is definitely not manipulating the computer from moment to moment, like a puppeteer pulling strings. That would be just cheating. He writes the program, puts it in the computer, and then the computer is on its own: there is no further human intervention, except for the opponent typing in his moves. Does the programmer perhaps anticipate all possible chess positions and provide the computer with a long list of good moves, one for each possible contingency? Most certainly not, because the number of possible positions in chess is so great that the world would come to an end before the list had been completed. For the same reason, the computer cannot possibly be programmed to try out ‘in its head’ all possible moves, and all possible follow-ups, until it finds a winning strategy. There are more possible games of chess than there are atoms in the galaxy. So much for the trivial nonsolutions to the problem of programming a computer to play chess. It is in fact an exceedingly difficult problem, and it is hardly surprising that the best programs have still not achieved grand master status.

The programmer’s actual role is rather more like that of a father teaching his son to play chess. He tells the computer the basic moves of the game, not separately for every possible starting position, but in terms of more economically expressed rules. He does not literally say in plain English ‘bishops move in a diagonal’, but he does say something mathematically equivalent, such as, though more briefly: ‘New coordinates of bishop are obtained from old coordinates, by adding the same constant, though not necessarily with the same sign, to both old x coordinate and old y coordinate’. Then he might program in some ‘advice’, written in the same sort of mathematical or logical language, but amounting in human terms to hints such as ‘don’t leave your king unguarded’, or useful tricks such as ‘forking’ with the knight. The details are intriguing, but they would take
us too far afield. The important point is this: When it is actually playing, the computer is on its own and can expect no help from its master. All the programmer can do is to set the computer up beforehand in the best way possible, with a proper balance between lists of specific knowledge and hints about strategies and techniques.

The genes too control the behavior of their survival machines, not directly with their fingers on puppet strings, but indirectly like the computer programmer. All they can do is to set it up beforehand; then the survival machine is on its own, and the genes can only sit passively inside.

In the last paragraph, Dawkins is closing out his analogy between genes and computer programs.

I read this in The Mind’s I (Hostadter, Dennett), but it was originally published in Dawkin’s The Selfish Gene. Read the entire essay online [PDF] or you can buy the book.

June 9, 2009. Tags: , , , . General. Leave a comment.

Next Page »