21 minute read

Meta-note: I figured out how to put footnotes in my posts. I apologize in advance.

I have been trying to read Douglas Hofstadter’s famous book Gödel, Escher, Bach (henceforth GEB) for ten years. I decided that I was going to try reading it from the beginning for the third or fourth time a few months ago, and I finally read the whole thing this time. Every previous time I tried to read it, I found the narrative intellectually compelling and delightfully playful, and yet I would have difficulty picking it up again for some reason. The thing that kept me coming back was the formal logic. I love a surprising logical consequence, and the set theory in the early chapters stoked my curiosity. Intellectually, I wanted to know where the setup of typographical formal systems was going, but I didn’t have the willpower to get the words into my brain for the longest time. I’m glad that I finally did.

To explain why the book is so difficult to get through, it might help to explain its structure. The book alternates between dialogues and chapters. The dialogues read like scripts for a play and consist of fantastical dialogues between Achilles, Tortoise, and others which are relevant to the following chapter. The dialogues are explicitly structured after Bach fugues. They are amusing, clever, and self-indulgent. They often have explanatory power. Sometimes I feel like they unnecessarily padded the word count. Many things in the book do this, but I don’t think Hofstadter could have written this book without all of the tangents he explores. For example, he has an extended chapter about DNA and protein synthesis which doesn’t really demonstrate anything that the typographical number theory chapters did not already do, but it certainly carves more deeply the outlines of what he means by strange loops. As you’ll see later in this post, it’s incredibly useful as an analogy. The Bach elements do not do much for me, although they clearly assisted the author in his thinking. The Escher bits are much more compelling to me, as they provide obviously self-referential reference analogies for self-referential arguments. When he writes the final chapters which contain and restate his most important arguments, you feel like it all rests upon the previous chapters and even the dialogues. Everything definitely feels relevant, but it’s not the most concise way to make the argument he makes.

The computer science in this book is fifty years out of date, but I think the core insight on explaining minds holds up, in that it is still plausible regardless of whether it is correct. In my read, the book is an exploration of the idea that you can build something out of easily understood components on a low level that becomes incomprehensibly tangled up in self-reference but also beautiful or true or functional if understood at a higher level of organization. He has a number of case studies here. The one for which the book is named is Gödel’s Incompleteness Theorem. Hofstadter does a masterful job of pointing out the parts of the theorem which are actually clever and glossing over the minute details while still giving some of the structure. I cannot do his explanation justice, but here is my understanding of how the theorem works: Gödel’s theorem creates a “sentence” which indirectly describes itself using language which is designed to describe only numbers. The trick is to define a way to turn sentences into numbers and then make a sentence which describes a number which from the outside clearly refers to the sentence itself. The sentence itself does not declare its own truth; it (indirectly) declares a statement that there isn’t a proof of itself in its own language. When you apply formal logic to the declaration, you find that any assumption you make inevitably forces the declaration to be true. This probably all sounds like nonsense if you haven’t read the book or aren’t intimately familiar with the proof for some other reason, and that’s why I forced myself to read the book. I had to know what the nonsense was about.

The thing Hofstadter seems to actually want to talk about is the nature of consiousness. Here is an exerpt from the book explaining why Hofstadter went to so much trouble to explain an esoteric mathematics proof in a philosophy of mind book:

Gödel’s proof offers the notion that a high-level view of a system may contain explanatory power which simply is absent on the lower levels. By this I mean the following. Suppose someone gave you G, Gödel’s undecidable string, as a string of TNT1.

The only way to explain G’s nontheoremhood is to discover the notion of Gödel-numbering and view TNT on an entirely different level. It is not that it is just difficult and complicated to write out the explanation on the TNT-level; it is impossible. Such an explanation simply does not exist. There is, on the high level, a kind of explanatory power which simply is lacking, in principle, on the TNT-level. G’s nontheoremhood is, so to speak, an intrinsically high-level fact.

Looked at this way, Gödel’s proof suggests-though by no means does it prove!-that there could be some high-level way of viewing the mind/brain, involving concepts which do not appear on lower levels, and that this level might have explanatory power that does not exist-not even in principle-on lower levels. It would mean that some facts could be explained on the high level quite easily, but not on lower levels at all.

I don’t know if I buy this! This does not merely assert that consciousness makes far more sense on a higher level of abstraction than the level of raw neurons2. I think he is saying that there are true statements about the arrangement of the physical materials which support consciousness (neurons in a brain for example) that there is no way to explain using only the mechanisms of those materials. I have a couple of metaphors which I will try to use to understand this assertion, and they happen to be mostly things which Hofstadter talks about at length in the book3.

  1. Machine code is the set of instructions that mechanistically make a computer processor do something. I briefly discuss it here in the context of GPU acceleration. It is possible to write an entire computer program in machine code (some punch cards once held computer programs written this way), but nobody does this anymore unless they’re trying to prove a point. Exceptional people code in assembly, which is basically a one-to-one natural language wrapper on machine code. Most people code in a higher-level language which is run through a compiler or interpreter that turns their code into machine code. The higher-level language contains things like “for loops,” which are much easier for a human to understand than an incomprehensible string of binary which you feed directly into a wire as changes in voltage, but I don’t think it contains any information that the machine code does not. There is still a mechanistic process which leads to a set of actions being carried out.
  2. I was about to say that we can make computer programs that talk now, but large language models (LLMs) are almost entirely incomprehensible on a raw data level. We do not know how to build an LLM piece by piece, we build programs which non-deterministically search for the weights which make an LLM function by trial and error at a truly massive scale. We know how the matrix multiplication that outputs tokens works, but we don’t have a complete picture of how it leads to language. The sorts of things people understand about LLM language comprehension resemble “I found a model weight which, when modified, predictably leads the model to output sentences of this sort.” This is not sufficient understanding that we could code by hand an algorithm which outputs sentences as well as LLMs do. The fact that a bunch of matrix multiplication can be understood on a higher level as encoding English syntax after some translation could be in the realm of things Hofstadter is pointing at
  3. There is nothing in biological life which is not downstream of DNA. If you gather up all of the DNA in a cell’s mitochondria and nucleus, you have everything you need to create that organism. Except that you generally must start with at least one existing cell which already has the proteins required to read the DNA and build more proteins based off of its instructions.4 In practice, many animals also require a mother or other environmental constants for parts of the development process. Consider DNA for warm-blooded animals which undergo live birth. One might think that DNA for an animal which maintains homeostasis might be more complicated than that for animals which fluctuate with the environment. However, animals which maintain homeostais can have simpler DNA than other types of organisms because the temperature conditions or whatever can be taken as given. You don’t have to code for “what to do if it’s 20 degrees out” because the organism initially develops inside a parent that makes sure that the temperature is whatever it needs to be for the development process to occur. Once the organism is developed, it maintains its own homeostasis which the repair and replication chain can depend on. If an alien was handed a human genome with no context, would it be able to infer the process of live birth? I’m not a geneticist; there might be genes which obviously show this is happening. But at least in principle, couldn’t genes just code for a weird internal cavity and homeostasis such that one could only infer live birth from the lack of complicated exception-handling mechanisms for embryonic development? Some extraordinary alien scientist might theorize that putting the embryo in the cavity would be an elegant explanation for the lack of embryonic environmental exception-handling. This is reasoning far above the level of genes, so maybe this is what Hofstadter is pointing at.
  4. Students learn Newtonian mechanics in high school, but physicists universally agree that Newtonian mechanics are at best an approximation of what “actually happens” in the world (scare quotes for imprecise language). A physicist who specializes in mechanics might tell you that Newtonian mechanics is an intuitive wrapper around the principle of least action, which can be extended to more general coordinates than the rigid cartesian coordinates preferred by Newtonian momentum conservation. A quantum mechanist might at this point say that the principle of least action is just a large-scale approximation of the time evolution at the heart of quantum theories of matter. A particle physicist might at this point sagely nod and say that quantum field theory is the quantum theory of matter which works best at microscopic scales and high energy. However, our friendly particle physicist is useless for you if you want to understand why a bridge stays up, whereas your high school physics teacher can give a pretty good explanation of why a bridge stays up using forces which all sum to zero if your bridge doesn’t accelerate (i.e. doesn’t fall down). I have more reason than most people to trust that quantum field theory explains the interactions between the electrons, gluons, and quarks which compose nearly all of the energy density distribution which we call a bridge, but if I want to understand why the bridge stays up, then I’m going to use the higher-level abstractions of Newtonian mechanics. This smells like what Hofstadter is pointing at.

The strong claim by Hofstadter is that you won’t understand consciousness simply by looking at neurons. He has a variety of weaker claims which are more interesting. One is that he thinks that the self-referential nature of Gödel’s theorem will be directly analogous to a self-referential explanation which will be the best way to understand consciousness. Here is an exerpt which comes soon after the last one in GEB:

This act of translation from low-level physical hardware to high-level psychological software is analogous to the translation of number-theoretical statements into metamathematical statements. Recall that the level-crossing which takes place at this exact translation point is what creates Godel’s incompleteness and the self-proving character of Henkin’s sentence. I postulate that a similar level-crossing is what creates our nearly unanalyzable feelings of self.

In order to deal with the full richness of the brain/mind system, we will have to be able to slip between levels comfortably. Moreover, we will have to admit various types of “causality”: ways in which an event at one level of description can “cause” events at other levels to happen.

This is the point of the book. I’m going to briefly slip into Hofstadter’s nomenclature, where the word “symbol” is used to refer to a recognizable pattern of neurons in the brain doing something. Hofstadter talks about how consiousness, the feeling of being yourself, might be explained by a “self-symbol” which interacts with other symbols in your brain which are entangled with your environment (which is to say that they are shaped the way they are because of signals from your sensory organs). The feeling of “free will” is what happens when the self-symbol interacts with the mental processes by which your body does things. The self-symbol abstractly represents the body which contains the mind which is made up of neurons, some of which compose the self-symbol. The self-symbol can be said to contain itself via moving through a chain of abstraction layers in what feels like a single direction until you end up back where you started.

This is in opposition to my earlier musings on machine code. You can understand a high-level computer program on its own terms: if you click the button or type the command, it does a thing. You can understand the source code on its own terms: the for loop iterates the variable and slightly changes what the commands do each time it repeats. You can understand the corresponding machine code on its own terms: when this binary number is read into register 2, the CPU add together the numbers in registers 3 and 4 to be output to register 5. Hofstadter isn’t saying that you can’t understand what a mind is doing by looking at the neurons. I quote GEB: “We should remember that physical law is what makes it all happen-way, way down in neural nooks and crannies which are too remote for us to reach with our high-level introspective probes.” The brain is a physical object, and you can explain everything that happens with things like “this neuron has had more neurons connect to it than the threshold amount that makes it connect to this other neuron.” But Hofstadter (in GEB at least) doesn’t think you’ll understand consciousness that way, and even if you find a more convenient representation of the brain which allows you to represent huge collections of neurons as an entity that performs functions. He doesn’t really make clear what that explanation would look like beyond the sort of argument in the last paragraph, which I suppose is something.

Now we can finally talk about the outdated parts of the book. Hofstadter thinks that his strange loops are at the heart of understanding consiousness and at the heart of creating artificial intelligence. That didn’t pan out with modern artificial intelligence regimes. As an unfair example, here is possibly the least accurate prediction in the book:

Question: Will there be chess programs that can beat anyone?

Speculation: No. There may be programs which can beat anyone at chess, but they will not be exclusively chess players. They will be programs of general intelligence, and they will be just as temperamental as people. “Do you want to play chess?” “No, I’m bored with chess. Let’s talk about poetry.” That may be the kind of dialogue you could have with a program that could beat everyone. That is because real intelligence inevitably depends on a total overview capacity-that is, a programmed ability to “jump out of the system”, so to speak-at least roughly to the extent that we have that ability. Once that is present, you can’t contain the program; it’s gone beyond that certain critical point, and you just have to face the facts of what you’ve wrought.

This is arguably a decent description of LLMs, but LLMs aren’t that good at chess. The best chess algorithms are based on a similar neural network architecture to LLMs, but they will repeatedly destroy any human chess player until the heat death of the universe without getting “bored with chess.” Rather than guessing what Hofstadter got wrong, I will paste in his words from the 90s when it became clear that chess algorithms were surpassing human chess prodigies: “They’re just overtaking humans in certain intellectual activities that we thought required intelligence. My God, I used to think chess required thought. Now, I realize it doesn’t. It doesn’t mean Kasparov isn’t a deep thinker, just that you can bypass deep thinking in playing chess, the way you can fly without flapping your wings.” Getting back to LLMs, the nice thing about being the author of arguably the most famous book ever on artificial intelligence is that, as seen above, people interview you once it becomes clear that computers are becoming capable of doing the things which you once argued that they would eventually do. I don’t need to guess what Hofstadter would say about ChatGPT, because I can just go listen to what he did say about it. To that end, I see little reason to ramble about what GEB gets wrong about human and artificial intelligence. I will simply quote 2023 Hofstadter, who muses that “maybe the human mind is not so mysterious and complex and impenetrably complex as I imagined it was when I was writing GEB…I felt at those times…that, as I say, we were very far away from reaching anything computational that could possibly rival us.”

I could argue that Hofstadter is underestimating the complexity of minds here. I think that he expected that we wouldn’t be able to build something remotely intelligent without understanding it. It turns out that we could, and, in this sense, intelligence was not as complex as he thought. Something truly complex shouldn’t be able to be created by tweaking numbers in a giant matrix in whatever direction makes words come out prettier for a few weeks.5 If LLMs can create anything remotely like deep thought, which Hofstadter and I agree that LLMs can do, then deep thought wasn’t as difficult as he thought it was, much like chess didn’t require as much deep thought as he thought it did. That said, we don’t really understand how the ability to answer questions in syntactically correct English is encoded into giant matrices just like we can’t explain the process by which brains do the same thing in enough detail to make a blueprint for such a brain. It may be that the explanation that would actually make a human understand LLMs or their own brain at that level will require the sort of level-hopping explanations he envisions. The thing he wants to explain may not be as complex as he thought it was, but it still might be complex enough to require the sort of explanations he thought it would.

I was previously somewhat conflicted about calling this a philosophy book, but I’m rather attached to the label now. Robert Miles says that “philosophy” is appoximately a synonym for “thinking about things we don’t know how to think about” until we know how to think about it well enough that we call it something else. Isaac Newton was a philosopher in his own time, but now we understand motion well enough that we call the theories he wrote down “physics” along with all the other theories we use to describe where undirected lumps of matter will be at what time. GEB is philosophy in the fullest sense of this definition. The arguments for consciousness are barely there. It’s a pile of metaphors and frameworks which the author says might help understand consciousness someday. Not even decades-older Hofstadter thinks that what he wrote in GEB is accurate, but he was trying to think about something which we still don’t actually know how to think about: consiousness and what it means for “intelligent” computers, whatever that means. He was careful even at the time to declare that he was speculating, but it’s a beautifully justified chain of speculation.

  1. TNT is Hofstadter’s typographical number theory. He shows earlier in the book how you can encode a small set of basic principles of the natural numbers into a set of typographical rules which let you show things like a+b=b+a for any two natural numbers a and b. The upside is that the typographical rules are objective and unambiguous. Some mathematical proofs point at concepts using natural language, and the typographical theory is one way to avoid the ambiguities inherent in language that can result in apparent paradox or result in a proof which has hidden consequences. One can mechanistically check whether a series of well-formed TNT string follows from the axioms of number theory and then interpret the string into a truth about the natural numbers if it follows. 

  2. Or other substrates that minds could run on. He does assert elsewhere in the book that minds are best interpretted via higher level “symbols” (more on that soon) rather than trying to understand them on the physical level. Ant Fugue is my favorite of the dialogues in the book, and it explores this principle. I think it is the one dialogue which makes a complete argument that chapters in the book do not make on their own. It might be worth reading even if you don’t read the rest of the book. 

  3. I would have reached for these examples even if Hofstadter had not. Maybe this means that I should not have bothered with the book. Or maybe it means that this book was made for me. Or that I have self-sorted into only engaging intellectually with people who think like me, and my individuality is a cherished lie. Who knows? 

  4. In theory, the dependence on prior life to propagate life has been violated at least once, but you can take it as given for the past 3 billion years or so. I once tried to make an argument that there is no need to consider explanations of how life evolves which considers things on a lower level than DNA and the protein synthesis chain. Yes, DNA and proteins are made of atoms and you can interact with them as atoms, but I argued that the explanation for life was best carried out on the DNA level or higher. In principle, I said, the DNA codes for all of the proteins, so there is a well-founded stack that rests on DNA and adds up to an organism. My wife disabused me of this by talking about epigenetic inheritence. You really do have to consider what the environment of atoms does to DNA and proteins in different scenarios without modifying the underlying genetic sequence. This seems to argue that higher-level abstractions lose explanatory power relative to lower-level ones, which I think is the obverse of Hofstadter’s point, but I also think it is true. It can be true that higher-level abstractions make for better explanations of reality and also that they lose information about reality relative to lower-level abstractions. 

  5. One could argue that this is a silly position to take because evolution created human intelligence, and evolution is just the repeated tweaking of DNA until it creates an organism that keeps replicating its DNA. But evolution doesn’t understand the intelligence it made. Evolution is just the name we give to how a self-perpetuating system changes over time. Of course, that’s still an argument that you don’t need to understand intelligence to create it, so maybe it would be better to say that evolution implicitly understood intelligence. Evolution was a gigantic program running on entire biosphere worth of parallelization for 3 billion years, and it finally created a set of DNA strands that encode a structure capable of answering questions with syntactically correct English. Even then, a baby can’t understand English until it has had several years worth of training by intelligent humans that do have some understanding of how English works (although one can argue that the earliest users of language did not in fact know or care about explicit rules of grammer or syntax or whatever). You also have to understand that GEB uses DNA as a prime example of strange loopiness. The structures which read DNA go on to read DNA that tells them how to build more of the structures that read DNA. If intelligent life was created by accident, at least it is obscenely complicated to sort out what is happening inside intelligent life. I think Hofstadter thinks that LLMs aren’t that complicated. For example, Hofstadter expresses shock that LLMs are feed-forward. I’m not entirely sure what he’s referring to when he says this, but I assume he’s referring to the fact that LLMs generate one token at a time based off of previous tokens. You can feed half of a sentence to an LLM and it will happily spit out the next word, and then you can feed that new sentence with one more word to a fresh instance of the LLM and get the next word and so on, and you will get something just as good as if you asked for the whole sentence from one LLM to begin with. It seems inconceivable that an LLM isn’t planning where it’s going when it writes the middle of a long argument, but that’s sure what it looks like it’s doing. If I correctly understand the state of the art, the current most impressive models generate a bunch of answers and pick the best one, but this is a hack built on top of many instances generating one feed-forward response each. 

Updated: