Excerpted and reposted from a longer article at EDGE.
Marvin Minsky
Why don’t we yet have good theories about what our minds are and how they work? In my view this is because we’re only now beginning to have the concepts that we’ll need for this. The brain is a very complex machine, far more advanced that today’s computers, yet it was not until the 1950s that we began to acquire such simple ideas about (for example) memory—such as the concepts of data structures, cache memories, priority interrupt systems, and such representations of knowledge as ‘semantic networks.’ Computer science now has many hundreds of such concepts that were simply not available before the 1960s.
Psychology itself did not much develop before the twentieth century. A few thinkers like Aristotle had good ideas about psychology, but progress thereafter was slow; it seems to me that Aristotle’s suggestions in the Rhetoric were about as good as those of other thinkers until around 1870. Then came the era of Galton, Wundt, William James and Freudãand we saw the first steps toward ideas about how minds work. But still, in my view, there was little more progress until the Cybernetics of the ’40s, the Artificial Intelligence of the ’50s and ’60s, and the Cognitive Psychology that started to grow in the ’70s and 80s.
Why did psychology lag so far behind so many other sciences? In the late 1930s a botanist named Jean Piaget in Switzerland started to observe the behavior of his children. In the next ten years of watching these kids grow up he wrote down hundreds of little theories about the processes going on in their brains, and wrote about 20 books, all based on observing three children carefully. Although some researchers still nitpick about his conclusions, the general structure seems to have held up, and many of the developments he described seem to happen at about the same rate and the same ages in all the cultures that have been studied. The question isn’t, “Was Piaget right or wrong?” but “Why wasn’t there someone like Piaget 2000 years ago?” What was it about all previous cultures that no one thought to observe children and try to figure out how they worked? It certainly was not from lack of technology: Piaget didn’t need cyclotrons, but only glasses of water and pieces of candy.
Perhaps psychology lagged behind because it tried to imitate the more successful sciences. For example, in the early 20th century there were many attempts to make mathematical theories about psychological subjectsãnotable learning and pattern recognition. But there’s a problem with mathematics. It works well for Physics, I think because fundamental physics has very few laws—and the kinds of mathematics that developed in the years before computers were good at describing systems based on just a fewãsay, 4, 5, or 6 laws—but doesn’t work well for systems based on the order of a dozen laws. The physicist like Newton and Maxwell discovered ways to account for large classes of phenomena based on three or four laws; however, with 20 assumptions, mathematical reasoning becomes impractical. The beautiful subject called Theory of Groups begins with only five assumptionsãyet this leads to systems so complex that people have spent their lifetimes on them. Similarly, you can write a computer program with just a few lines of code that no one can thoroughly understand; however, at least we can run the computer to see how it behaves—and sometimes see enough then to make a good theory.
However, there’s more to computer science than that. Many people think of computer science as the science of what computers do, but I think of it quite differently: Computer Science is a new way collection of ways to describe and think about complicated systems. It comes with a huge library of new, useful concepts about how mental processes might work. For example, most of the ancient theories of memory envisioned knowledge like facts in a box. Later theories began to distinguish ideas about short and long-term memories, and conjectured that skills are stored in other ways.
However, Computer Science suggests dozens of plausible ways to store knowledge away—as items in a database, or sets of “if-then” reaction rules, or in the forms of semantic networks (in which little fragments of information are connected by links that themselves have properties), or program-like procedural scripts, or neural networks, etc. You can store things in what are called neural networks—which are wonderful for learning certain things, but almost useless for other kinds of knowledge, because few higher-level processes can ‘reflect’ on what’s inside a neural network. This means that the rest of the brain cannot think and reason about what it’s learned—that is, what was learned in that particular way. In artificial intelligence, we have learned many tricks that make programs faster—but in the long run lead to limitations because the results neural network type learning are too ‘opaque’ for other programs to understand.
Yet even today, most brain scientists do not seem to know, for example, about cache-memory. If you buy a computer today you’ll be told that it has a big memory on its slow hard disk, but it also has a much faster memory called cache, which remembers the last few things it did in case it needs them again, so it doesn’t have to go and look somewhere else for them. And modern machines each use several such schemes, but I’ve not heard anyone talk about the hippocapmus that way. All this suggests that brain scientists have been too conservative; they’ve not made enough hypotheses, and therefore, most experiments have been trying to distinguish between wrong alternatives.
Reinforcement vs. Credit assignment.
There have been several projects that were aimed toward making some sort of “Baby Machine” that would learn and develop by itselfãto eventually become intelligent. However, all such projects, so far, have only progressed to a certain point, and then became weaker or even deteriorated. One problem has been finding adequate ways to represent the knowledge that they were acquiring. Another problem was not have good schemes for what we sometimes call ‘credit assignment’ãthat us, how do you learning things that are relevant, that are essentials rather than accidents. For example, suppose that you find a new way to handle a screwdriver so that the screw remains in line and doesn’t fall out. What is it that you learn? It certainly won’t suffice merely to learn the exact sequence of motions (because the spatial relations will be different next time)—so you have to learn at some higher level of representation. How do you make the right abstractions? Also, when some experiment works, and you’ve done ten different things in that path toward success, which of those should you remember, and how should you represent them? How do you figure out which parts of your activity were relevant? Older psychology theories used the simple idea of ‘reinforcing’ what you did most recently. But that doesn’t seem to work so well as the problems at hand get more complex. Clearly, one has to reinforce plans and not actionsãwhich means that good Credit-Assignment has to involve some thinking about the things that you’ve done. But still, no one has designed and debugged a good architecture for doing such things.
We need better programming languages and architectures.
I find it strange how little progress we’ve seen in the design of problem solving programsãor languages for describing them, or machines for implementing those designs. The first experiments to get programs to simulate human problem-solving started in the early 1950s, just before computers became available to the general public; for example, the work of Newell, Simon, and Shaw using the early machine designed by John von Neumann’s group. To do this, they developed the list-processing language IPL. Around 1960, John McCarthy developed a higher-level language LISP, which made it easier to do such things; now one could write programs that could modify themselves in real time. Unfortunately, the rest of the programming community did not recognize the importance of this, so the world is now dominated by clumsy languages like Fortran, C, and their successorsãwhich describe programs that cannot change themselves. Modern operating systems suffered the same fate, so we see the industry turning to the 35-year-old system called Unix, a fossil retrieved from the ancient past because its competitors became so filled with stuff that no one cold understand and modify them. So now we’re starting over again, most likely to make the same mistakes again. What’s wrong with the computing community?
Expertise vs. Common Sense
In the early days of artificial intelligence, we wrote programs to do things that were very advanced. One of the first such programs was able to prove theorems in Euclidean geometry. This was easy because geometry depends only upon a few assumptions: Two points determine a unique line. If there are two lines then they are either parallel or they intersect min just one place. Or, two triangles are the same in all respects if the two sides and the angle between them are equivalent. This is a wonderful subject because you’re in a world where assumptions are very simple, there are only a small number of them, and you use a logic that is very clear. It’s a beautiful place, and you can discover wonderful things there.
However, I think that, in retrospect, it may have been a mistake to do so much work on task that were so ‘advanced.’ The result was thatãuntil todayãno one paid much attention to the kinds of problems that any child can solve. That geometry program did about as well as a superior high school student could do. Then one of our graduate students wrote a program that solved symbolic problems in integral calculus. Jim Slagle’s program did this well enough to get a grade of A in MIT’s first-year calculus course. (However, it could only solve symbolic problems, and not the kinds that were expressed in words. Eventually, the descendants of that program evolved to be better than any human in the world, and this led to the successful commercial mathematical assistant programs called MACSYMA and Mathematica. It’s an exciting storyãbut those programs could still not solve “word problems.” However in the mid 1960s, graduate student Daniel Bobrow wrote a program that could solve problems like “Bill’s father’s uncle is twice as old as Bill’s father. 2 years from now Bill’s father will be three times as old as Bill. The sum of their ages is 92. Find Bill’s age.” Most high school students have considerable trouble with that. Bobrow’s program was able to take convert those English sentences into linear equations, and then solve those equationsãbut it could not do anything at all with sentences that had other kinds of meanings. We tried to improve that kind of program, but this did not lead to anything good because those programs did not know enough about how people use commonsense language.
By 1980 we had thousands of programs, each good at solving some specialized problems—but none of those program that could do the kinds of things that a typical five-year-old can do. A five-year-old can beat you in an argument if you’re wrong enough and the kid is right enough. To make a long story short, we’ve regressed from calculus and geometry and high school algebra and so forth. Now, only in the past few years have a few researchers in AI started to work on the kinds of common sense problems that every normal child can solve. But although there are perhaps a hundred thousand people writing expert specialized programs, I’ve found only about a dozen people in the world who aim toward finding ways to make programs deal with the kinds of everyday, commonsense jobs of the sort that almost every child can do.