9 minute read

I thought I could make a short intuitive argument for why AI research is dangerous, but I got halfway through and realized that the argument was no longer short, and other people have made the long argument well. In the interest of erring on the side of publishing more over editing things forever, I’m posting the half-baked argument anyway.

The argument I’m about to make has three parts. The argument does not contain everything that scares me about artificial intelligence, but I was trying for a simple argument about the scariest thing.

  1. Smart things that have goals achieve those goals in ways which affect the larger environment
  2. Environmental changes tend to kill the things which depend on the environment
  3. The default way that humans seem likely to make smart things will make smart things with goals we don’t care for

The easiest way for me to talk about this argument is via anthropomorphism. The question of whether the AI has feelings is a separate question, but I want a fast way to describe the end state that the universe will reach based on the actions of an AI, so I will say “the AI wants the thing” even though it may be misleading due to how humans experience wanting. The AI might not even be conscious. It may sound like you can solve the problems I’m proposing by saying “oh don’t worry, we won’t make the AI want anything,” but if the AI does anything useful, then it performs actions or gives people information which leads them to perform actions which have a predictable effect. For example, consider a thermostat. If the room is too cold, the thermostat supplies electricity to a space heater until the room warms to the set temperature and then turns off the heater. When I say that an AI wants something, I mostly mean it in the way that the thermostat wants the room to be above a certain temperature.

Let’s address point 1 now. A thermostat is not smart. If you open all of the doors of your house, the thermostat will burn electricity and not get what it wants because the hot air in the house continuously gets replaced with cold outside air. If you replace the thermostats with a human, the human might notice that the heater or air conditioner aren’t doing anything and close all the doors to the room so that the warm air stops escaping. Humans are smart. We build models of the world like “this air in this room is cold, and if I close the doors then the room will heat up faster because the air in the room won’t escape after I heat it.” Prehistoric humans figured out that they could dig out all the existing plants in some area, put seeds and tubers in the ground, and grow their food instead of hunting around for it. More recent humans figured out how to chemically produce the nutrients that plants need, and now we consistently grow huge amounts of food. Look at the town you live in. It is full of angular concrete structures, massive roads, and buildings which block out the environment and replace it with one that is windless, rainless, and a comfortable temperature. Nature didn’t have those things, but nature’s environment was inconvenient for us, so we built them.

Maybe this is a good time to address point 2. Wide swaths of earth have been converted to farmland from whatever they were before humans got here because humans want food. Most of the plants and animals that used to live in those habitats are dead or gone. Humans held no malice for them. We just needed the land for food, so we plowed the existing plants under. We wanted food, so we chased off the animals that might eat the food or made scarecrows that would make the animals avoid the area where they used to be able to feed. Eventually, the way humans found to get food was to use a bunch of fertilizer, which washed out of the farmland and polluted the waterways and changed the equilibrium in which animals were living outside of human farms. Animals got less of what they wanted because the humans were smarter about going after what humans wanted. I think humans will run into the same problems with sufficiently smart AI that’s easy to develop, even if the AI we make mostly want the same things we want. Animals want food, humans want food, but humans didn’t make food more available for all animals by making there be more food. You need to be very careful when making smart things that want stuff if you want the smart thing to do things that don’t harm you. Humans don’t just kill things we don’t care much about when we get clever, we harm ourselves. We found out that you could get quick energy by burning things we dug out of the ground and now our cities are full of smog and we’ve changed the planet’s weather to kill more of us. It might be ignorance or selfishness, but every day we decide we’d rather have some amount of cars or electricity now even if it means a billion people are displaced by rising sea levels somewhere else later or if it means we’ll have garbage in our air which makes us sick tomorrow. We can’t even be smart ourselves without harming people. Now multiply that by a lot, because something smarter than we are can build better technology or accomplish bigger things. We shouldn’t throw something that is good at creatively making the world look the way it wants into the world if it isn’t callibrated with our wants.

This directs us into point 3. Humans want weird things. Many humans want ice cream, but most humans don’t want melted ice cream in spite of it having literally every ingredient in it as frozen ice cream has in it. You’d think that lazy humans would buy or make bottled unfrozen ice cream so they could have sweet deliciousness while out and about, but it turns out the freezing process matters. You have to churn it too; you can’t just freeze it into ice cubes and suck on them. This seems wildly specific. Imagine trying to describe the properties of foods humans like to aliens such that the aliens would come up with ice cream, but not unfrozen ice cream or sugar/cream ice cubes. Human preferences are hard to hit exactly, even though they seem incredibly obvious to us. Even that is an illusion to some extent. Have you ever heard of a person who wants a thing that seems terrible to you? I’ve tried to argue that smart things that don’t want what you want are dangerous to you, and it should be very scary that it’s hard to describe exactly the sort of things humans want. Even humans can’t explain to themselves exactly what they want. Have you ever been surprised when you ate a weird food that it tasted pretty good? Sometimes we’re embarrassed by something we like and don’t want to admit it to people, but it’s still there, and would you want someone to rewrite you in such a way that you don’t want it anymore?

All this is an argument for why making AI needs to be done carefully. The modern AI research paradigm that led to large language models makes me despair for the future. As far as I can tell, the way that machine learning algorithms are trained is basically that you feed the training algorithm a bunch of data, and it tweaks parameters of the model in whatever direction makes the model predict the data better until it finds something that works. I’m ignoring a lot of important details that aren’t important to my argument. If you wanted to come up with something that was made for the purposes of kind-of looking like it wants what you want while actually wanting something subtly different, you could not come up with a better backstory than this. Shut it down now. Don’t let people build smart things this way. It turns out that you can run a pretty basic algorithm that has barely changed since the 1960s on a whole lot of GPUs, feed it the contents of the internet, and it’ll give you an inscrutable program which writes in complete sentences about basically anything you can think of. It turned out that mimicing parts of human intelligence just weren’t that difficult if you throw enough GPUs at the problem and add one wierd trick (transformers, in case you’re wondering what I’m thinking). Yes, there are super easy things for people to get right that large language models still get wrong; that’s not the point. Don’t give me some nonsense about how it’s just parroting back information or not really alive; the things are writing essays better than most undergraduate students can write. I’m not saying that current models are smart enough to be dangerous. I’m saying that if you keep letting companies throw billions of dollars worth of electricity and hardware at intelligence, they might eventually succeed, and they have not demonstrated an ability to tell whether the things they are making are dangerous or not. They’re clearly willing to throw things out into the world with minimal safeguards that bored teenagers can break within three days, and you cannot be trusted to build smart things if that’s how you operate.

If you want to build a smart thing, you should be able to convince people that your plan will not be dangerous, which is a strictly easier problem than building a smart thing that isn’t dangerous. If someone can come up with a safe plan, and they’ll let other people watch them do it to make sure they aren’t making any mistakes, and a panel of expert computer scientists agrees that it’ll work, and they have a super-compelling use-case for why the thing they’re making will save lots of lives and they’re only achieving the minimum amount of smartness necessary to achieve that limited goal, then MAYBE we can let them start a closely monitored project that has access to dangerous levels of compute, but why risk everyone dying before then? Don’t give greedy companies the benefit of the doubt; just shut it all down. We have some interesting models that haven’t killed everyone around to play with and study, so don’t let the negligent psychopaths try to make more effective ones before they understand the ones they have now. I don’t say this lightly. I don’t like governments saying that I can’t use my computer a certain way, but I prefer that to literally every human dying. This is not an argument that we should ban anything that might be dangerous; this is an argument that things that creatively make plans that achieve arbitrary goals in particular have the capability to make Earth unliveable, and those things in particular should be banned by default.

Updated: