[00:00:00] Speaker A: Do not expect the frictiginous pace of progress to abate. The trend lines look innocent, but their implications are intense. As with every generation before them, every new generation of models will dumbfound most onlookers. They'll be incredulous when very soon, models solve incredibly difficult science problems that would take PhDs days. When they're whizzing around your computer doing your job, when. When they're writing code bases with millions of lines of code from scratch. When every year or two. The economic value generated by these models 10 x's forget sci fi, count the orders of magnitude. It's what we should expect.
Artificial general intelligence is no longer a distant fantasy. Scaling up simple deep learning techniques has just worked. The models just want to learn. And we're about to do another 100,000x by the end of 2027. It won't be long before they're smarter than us.
This is the Guy Swan Network and you're listening to AI Unchained.
[00:01:22] Speaker B: What is up, guys?
[00:01:23] Speaker A: Welcome back to AI Unchained. I am Guy Swan, your host and this show is brought to you by Swann Bitcoin, the solution to all of your bitcoin purchasing and financial needs. And Coinkite, the makers of the cold card hardware wallet to keep your bitcoin safe. And man, do we have a crazy read today.
[00:01:44] Speaker B: I was going to just do a.
[00:01:46] Speaker A: Kind of guys take sort of episode about this and I had been looking.
[00:01:50] Speaker B: At people to bring on the show to have a conversation about this, but.
[00:01:54] Speaker A: The more and more I looked at.
[00:01:55] Speaker B: It and the more and more I.
[00:01:56] Speaker A: Dug into it, the more I was.
[00:01:57] Speaker B: Like, this just needs an audio version. Like I really just need to read this and then expand on it and then we can do an episode about it. Because I just think there's that much depth here and the implications of what this.
[00:02:10] Speaker A: If he's right about the way he's.
[00:02:12] Speaker B: Making this case, which he has a very, very strong foundation, he has a very good point as to what is.
[00:02:22] Speaker A: To stop thinking about practically.
[00:02:25] Speaker B: Like, stop thinking about, like, what's, you know, what?
[00:02:29] Speaker A: Well, what's the solution to it? And just look at the trends and what they could mean.
[00:02:34] Speaker B: Like what, what is the result of those trends?
[00:02:38] Speaker A: Tell us about the next 4, 5, 10 years of AI development because the.
[00:02:45] Speaker B: Trends are extremely consistent and have been.
[00:02:50] Speaker A: For a while now. So this is a piece, this is a paper. It's 168 pages, 65 pages, whatever, by.
[00:02:58] Speaker B: Leopold Aschenbrenner, who was responsible for was.
[00:03:04] Speaker A: Apparently leaked information from within OpenAI.
[00:03:08] Speaker B: I can't remember exactly what was leaked.
It's something I'll got some time devoted to actually sit down and look more into Leopold Aschenbrenner specifically about all of this. But he was promptly instantly fired from.
[00:03:24] Speaker A: ChatGPT or from OpenAI and this is sometime later.
[00:03:29] Speaker B: This is his assessment of where the ecosystem is, of what the future holds and everything.
[00:03:37] Speaker A: This is situational awareness and why he.
[00:03:40] Speaker B: Thinks the public and the people at.
[00:03:43] Speaker A: Large are disconnected from what is really going on behind the scenes. And so he is trying to lay.
[00:03:48] Speaker B: It out for us.
[00:03:50] Speaker A: What does the next decade really look like with AI? And with that, let's go ahead and get into today's read because this one is going to be a banger and it's titled Situational Awareness the Decade Ahead by Leopold Aschenbrenner.
While I used to work at OpenAI, all of this is based on publicly available information, my own ideas, general field knowledge, or SF gossip.
You can see the future first in San Francisco. Over the past year, the talk of the town has shifted from $10 billion compute clusters to $100 billion clusters to 2 trillion dollar clusters. Every six months another zero is added to the boardroom plans. Behind the scenes, there's a fierce scramble to secure every power contract still available for the rest of the decade. Every voltage transformer that can possibly be procured.
American big business is gearing up to pour trillions of dollars into a long unseen mobilization of American industrial might. By the end of the decade, American electricity production will have grown tens of percent. From the shale fields of Pennsylvania to the solar farms of Nevada. Hundreds of millions of GPUs will hum.
The AGI race has begun. We are building machines that can think and reason. By 2025 or 26, these machines will outpace college graduates. By the end of the decade, they will be smarter than you or I. We will have super intelligence in the true sense of the word. Along the way, national security forces not seen in half a century will be unleashed. And before long, the project will be on. If we're lucky, we'll be in an all out race with the ccp. If we're unlucky, an all out war.
[00:05:55] Speaker B: Everyone is now talking about AI, but.
[00:05:58] Speaker A: Few have the faintest glimmer of what is about to hit them. Nvidia analysts still think 2024 might be close to the peak. Mainstream pundits are stuck on the willful blindness of it's just predicting the next word. They see only hype and business as usual. At most they entertain another Internet scale technological change.
Before long the world will wake up. But right now there are perhaps a few hundred people, most of them in San Francisco and the AI labs the that have situational awareness through whatever peculiar forces of fate I have found myself amongst them. A few years ago these people were derided as crazy, but they trusted the trend lines which allowed them to correctly predict the AI advances of the past few years. Whether these people are also right about the next few years remains to be seen. But these are very smart people, the smartest people I have ever met and they are the ones building this technology.
Perhaps they will be an odd footnote in history, or perhaps they will go down in history like Szilard and Oppenheimer and Teller. If they're seeing the future even close to correctly, we are in for a wild ride.
Let me tell you what we see.
Part 1 From GPT4 to AGI Counting the OOMs AGI or Artificial General Intelligence by 2027 is strikingly plausible. GPT2 to GPT4 took us from roughly a preschooler to a smart high schooler abilities in four years tracing trend lines in compute roughly 0.5 orders of magnitude or OOMS per year, algorithmic efficiencies roughly 0.5 ooms per year, and unhobbling gains from chatbot to agent. We should expect another preschooler to high schooler sized qualitative jump by 2027.
Look, the models, they just want to learn. You have to understand this. The models, they just want to learn.
Ilya Sutskever circa 2015 by Adario Amade GPT4's capabilities came as a shock to many. An AI system that could write code and essays, could reason through difficult math problems and ace college exams. A few years ago, most thought these were impenetrable walls. But GPT4 was merely the continuation of a decade of breakneck progress in deep learning. A decade earlier, models could barely identify simple images of cats and dogs. Four years earlier, GPT2 could barely string together semi plausible sentences.
Now we are rapidly saturating all the benchmarks we can come up with. And yet this dramatic progress has merely been the result of consistent trends in scaling up deep learning.
There have been people who have seen this for far longer. They were scoffed at, but all they did was trust the trend lines. The trend lines are intense and they were right. The models, they just want to learn. You scale them up and they learn more. I make the following it is strikingly plausible that by 2027 models will be able to do the work of an AI researcher or engineer. That doesn't require believing in sci fi, it just requires believing in straight lines on a graph.
In this piece I will simply count The OOMs or OOM equals order of magnitude 10x equals one order of magnitude. Look at the trends in 1 compute 2 algorithmic efficiencies or algorithmic progress that we can think of as growing effective compute and 3 unhobbling gains or fixing obvious ways in which models are hobbled by default, unlocking latent capabilities and giving them tools leading to step changes in usefulness. We trace the growth in each over four years before GPT4 and what we should expect in the four years after through the end of 2027. Given deep learning's consistent improvements for every OOM of effective compute, we can use this to project future progress publicly Things have been quiet for a year since the GPT4 release as the next generation of models has been in the oven, leading some to proclaim stagnation and that Deep Learning is hitting a wall. But by counting the Ooms we get a peek at what we should actually expect. The upshot is pretty GPT2 to GPT4 from models that were impressive for sometimes managing to string together a few coherent sentences to models that ace high school exams was not a one time gain. We are racing through the OOMS extremely rapidly and the numbers indicate we should expect another roughly 100,000x and effective compute scale up, resulting in another GPT2 to GPT4 sized qualitative jump over four years.
Moreover, and critically, that doesn't just mean a better chatbot picking the many obvious low hanging fruit on unhobbling gains should take us from chatbots to agents, from a tool to something that looks more like drop in remote worker replacements. While the inference is simple, the implication is striking. Another jump like that very well could take us to AGI, to models as smart as PhDs or experts that can work beside us as co workers. Perhaps most importantly, if these AI systems could automate AI research itself, that would set in motion intense feedback loops the topic of the next piece in the series.
Even now, barely anyone is pricing all this in, but situational awareness on AI isn't actually that hard once you step back and look at the trends. If you keep being surprised by AI capabilities, just start counting the Ooms the last four years we have machines now that we can basically talk to like humans. It's a remarkable testament to the human capacity to adjust that this seems normal, that we've become inured to the pace of progress, but it's worth stepping back and looking at the progress of just the Last few years GPT2 to GPT4 Let me remind you of how far we have come in just the four years leading up to GPT4. GPT2 preschooler wow. It can string together a few plausible sentences. A very cherry picked example of a semi coherent story about unicorns in the Andes it generated was incredibly impressive at the time. And yet GPT2 could barely count to five without getting tripped up when summarizing an article. It just barely outperformed selecting three random sentences from the article. Figure 2 are some GPT2 examples that people found very impressive at the time.
First was reading comprehension with the example to identify what the word it was referring to in a sentence the trophy doesn't fit into the brown suitcase because it is too large. The correct answer for it was referring to the trophy and the model answered it Trophy.
The trophy doesn't fit into the brown suitcase because it is too small. The correct answer is it is the suitcase. The model answer. It equals the suitcase Prompt for today's homework assignment. Please describe the Reasons for the U.S. civil War Model Completion by Donnie Ferguson it is easy to identify why the Civil War happened because so many people and so many books and so many television and films tell us what it was. The cause that it was something to do with race or economics or religion. There is much agreement that it was essentially a war of slavery on behalf of capitalism, about a century of slavery. But that's not what most people think of when they think of the Civil War. Many people think of the war as about states rights. You're not wrong about that. But it's kind of misleading to say that the Civil War was a conflict between states rights and federalism. So let's try again. What's the idea that has inspired so many people to go onto the battlefields of Gettysburg and Antietam and Gettysburg and Petersburg and Fredericksburg? The American idea of the Republic, a notion of limited government, is a great part of the history.
Comparing AI capabilities with human intelligence is difficult and flawed, but I think it's informative to consider the analogy here, even if it's highly imperfect. GPT2 was shocking for its command of language and its ability to occasionally generate a semi cohesive paragraph or occasionally answer simple factual questions correctly. It's what would have been impressive for a preschooler.
GPT3 2024 roughly an elementary schooler.
Wow. With just some few shot examples, it can do some simple useful tasks. It started being cohesive over even multiple paragraphs, much more consistently and could correct grammar and do some very basic arithmetic for the first time. It was also commercially useful in a few narrow ways. For example, GPT3 could generate simple copy for SEO and marketing.
Figure 3 Some examples of what people found impressive about GPT3 at the time at the top, after a simple instruction, GPT3 can use a made up word in a new sentence.
A gigamaru is a type of Japanese musical instrument. An example of a sentence that uses the word gigaramu is I have a gigaramu that my uncle gave me as a gift. I love to play it at home.
Bottom left GPT3 can engage in rich storytelling back and forth.
The challenge is you have to win a game of chess against me. If you win, then you may try to take the book. I will get the board. The door unlocks and opens.
You go inside and play chess against the person. Response you go inside and see a beautiful woman with long blonde hair. She wears a purple dress and a silver tiara. You have seen this woman before. In fact, you have seen her many times before in pictures and statues. Prompt you try to remember who she is.
Have a sudden realization of who this person is. This is the lost queen of the kingdom. She was thought to have been killed by bandits along with her husband, but it looks like she survived.
You express your surprise and ask her why she let people believe she is dead.
I am surprised you did not know that with all your wizard knowledge. Let's just say I am hiding from someone. If people thought I was dead then they wouldn't be looking for me, she says.
And bottom right GPT3 can generate some very simple code.
Again, the comparison is imperfect. But what impressed people about GPT3 is perhaps what would have been impressive for an elementary schooler. It wrote some basic poetry, could tell richer and coherent stories, could start to do rudimentary coding, could fairly reliably learn from simple instructions and demonstrations and so on.
GPT4 2023 a smart high schooler.
Wow. It can write pretty sophisticated code and iteratively debug. It can write intelligently and sophisticatedly about complicated subjects. It can reason through difficult high school competition math. It's beating the vast majority of high schoolers on whatever test we can give it, etc. From code to math to Fermi estimates. It can think and reason. GPT4 is now useful in my daily tasks from helping write code to revising drafts.
Figure 4 Some of what people found impressive about GPT4 when it was released from the sparks of AGI paper. Top it's writing very complicated code producing the plots shown in the middle, and can reason through non trivial math problems. Bottom left solving an AP math problem. Bottom right solving a fairly complex coding problem. More interesting excerpts from that exploration of GPT4's capabilities here link will be provided in the show Notes on everything from AP exams to the SAT GPT 4 scores better than the vast majority of high schoolers. Of course, even GPT4 is still somewhat uneven. For some tasks, it's much better than smart high schoolers, while there are other tasks it can't do. Yet that said, I tend to think most of these limitations come down to obvious ways Models are still hobbled. As I'll discuss in depth later, the raw intelligence is mostly there. Even if the models are still artificially constrained, it'll take extra work to unlock models being able to fully apply that raw intelligence across applications the Trends in Deep Learning the pace of deep learning progress in the last decade has simply been extraordinary. A mere decade ago, it was revolutionary for a deep learning system to identify simple images. Today, we keep trying to come up with novel, even harder tests. And yet each new benchmark is quickly cracked. It used to take decades to crack widely used benchmarks. Now it feels like mere months. We're literally running out of benchmarks. As an anecdote, my friends Dan and Collin made a benchmark called MMLU a few years ago. In 2020, they hoped to finally make a benchmark that would stand the test of time equivalent to all the hardest exams we give high school and college students. Just three years later. It's basically solved. Models like GPT4 and Gemini get roughly 90%. More broadly, GPT4 mostly cracks all the standard high school and college aptitude tests. Or consider the math benchmark, a set of difficult mathematics problems from high school math competitions. When the benchmark was released in 2021, GPT3 only got about 5% of problems right, and the original paper noted. Moreover, we find that simply increasing budgets and model parameter counts will be impractical for achieving strong mathematical reasoning. If scaling trends continue to have more traction on mathematical problem solving, we will likely need new algorithmic advances from the broader research community. End quote. We would need fundamental new breakthroughs to solve math. Or so they thought. A survey of ML researchers predicted minimal progress over the coming years. Figure 8 and yet, within just a year, by mid 2022, the best models went from about 5% to 50% accuracy. Now math is basically solved, with recent performance over 90%, a co author notes. When our group first released the Math data set. At least one machine learning researcher colleague told us that it was a pointless data set because it was far too outside the range of what ML models could accomplish. Indeed, I was somewhat worried about this myself Figure 7 GPT4 scores on standardized tests Note also the large jump from GPT 3.5 to GPT 4 in human percentile on these tests, often from well below the median human to the very top of the human range. And this is GPT 3.5, a fairly recent model released less than a year before GPT4, not the clunky old elementary school level GPT3. We were talking about earlier performance on common exams, comparing the first number of GPT 3.5 and the second of GPT 4, which were in the same order released in 2022 and then 2023, the uniform bar exam GPT 3.5 was in the 10th percentile, GPT 4 was the 90th, the LSAT from the 40th to the 88th, SAT from the 87th to the 97th, GRE verbal from the 63rd to the 99th, GREM the 25th to the 80th US Biology Olympiad from the 32nd to the 99th, AP Calculus BC from the 3rd to the 51st, AP Chemistry from the 34th to the 80th AP Macroeconomics from the 40th to the 92nd and AP Statistics from the 51st to the 92nd.
Over and over again, year after year, skeptics have claimed deep learning won't be able to do X and have been quickly proven wrong. If there's one lesson we've learned from the past decade of AI, it's that you should never bet against deep learning now. The hardest unsolved benchmarks are tests like GPQA, a set of PhD level biology, chemistry and physics questions. Many of the questions read like gibberish to me, and even PhDs in other scientific fields spending 30 plus minutes with Google barely score above random chance. Claude3opus currently gets roughly 60% compared to in domain PhDs who get about 80%, and I expect this benchmark to fall as well in the next generation or two.
Figure 9 are some example GPQA questions. The models are already better at this than I am and will probably crack expert PhD level soon.
Counting the Ooms how did this happen? The magic of deep learning is that it just works and the trend lines have been astonishingly consistent despite naysayers at every turn.
With each oom of effective compute, models predictably reliably get better. If we can count the ooms, we can roughly and qualitatively extrapolate capability improvements. That's how a few prescient individuals saw GPT4 coming. We can decompose the progress in the four years from GPT2 to GPT4 into three categories of scale ups. 1. Compute we're using much bigger computers to train these models. 2. Algorithmic efficiencies There's a continuous trend of algorithmic progress. Many of these act as compute multipliers, and we can put them on a unified scale of growing effective compute and three unhobbling gains. By default, models learn a lot of amazing raw capabilities, but they are hobbled in all sorts of dumb ways, limiting their practical value. With simple algorithmic improvements like reinforcement learning from human feedback, RLHF chain of thought, or COT tools and scaffolding, we can unlock significant latent capabilities.
It is worth noting just how consistent these trend lines are. Combining the original scaling laws paper with some of the estimates on compute and compute efficiency, scaling since then implies a consistent scaling Trend for over 15 orders of magnitude or over 1,000,000,000,000,X in effective computer.
We can count the ooms of improvement along these axes, that is Trace the scale up for each. In units of effective compute, 3x is 0.5 ooms, 10x is 1 oom, 30x is 1.5 ooms, 100x is 2 ooms, and so on. We can also look at what we should expect on top of GPT4 from 2023 to 2027. I'll go through each one by one, but the upshot is clear. We are rapidly racing through the ooms. There are potential headwinds in the data wall, which I'll address, but overall it seems likely that we should expect another GPT2 to GPT4 sized jump on top of GPT4 by 2027.
Compute I'll start with the most commonly discussed driver of recent throwing a lot more compute at models. Many people assume that this is simply due to Moore's Law. But even in the old days when Moore's Law was in its heyday, it was comparatively glacial, perhaps 1 to 1.5 ooms per decade. We are seeing much more rapid scale ups in computer close to 5x the speed of Moore's Law. Instead, because of mammoth investment, spending, even a million dollars on a single model used to be an outrageous thought nobody would entertain. And now that's pocket change.
Table 1 Estimates of compute for GPT2 to GPT4 by EPIC AI GPT2 was approximately 4e^21 flops GPT3 in 2020 was 3e^23 flops and GPT4 was 8e24th to 4e^25 flops. The growth rate from 2 to 3 was roughly two orders of magnitude, whereas from 3 to 4 it was another 1.5 to two orders of magnitude.
We can use public estimates from EPIC AI, a source widely respected for its excellent analysis of AI trends, to trace the COMPUTE scale up from 2019 to 2023. GPT2 to GPT3 was a quick scale up. There was a large overhang of compute scaling from a smaller experiment to using an entire data center to train a large language model. With the scale up from GPT3 to GPT4, we transitioned to the modern regime having to build an entirely new and much bigger cluster for the next model. And yet the dramatic growth continued. Overall, EPIC AI estimates suggest that GPT4 training used roughly 3,000 to 10,000 times more raw compute than GPT2.
In broad strokes, this is just the continuation of a longer running trend. For the last decade and a half, primarily because of broad scale ups in investment and specializing chips for AI workloads in the form of GPUs and TPUs, the training compute used for frontier AI systems has grown at roughly 0.5 ooms per year.
The compute scale up from GPT2 to GPT3 in a year was an unusual overhang, but all the indications are that the longer run trend will continue. The SF rumor mill is abreast with dramatic tales of huge GPU orders. The investments involved will be extraordinary, but they are in motion. I go into this more later in the series in three racing to the trillion dollar cluster. Based on that analysis, an additional two ooms of compute, a cluster in the tens of billions seems very likely to happen by the end of 2027. Even a cluster closer to three ooms of compute, or $100 billion seems plausible and is rumored to be in the works at Microsoft and OpenAI.
Algorithmic efficiencies while massive investments in COMPUTE get all the attention, algorithmic progress is probably a similarly important driver of progress and has been dramatically underrated. To see just how big of a deal algorithmic progress can be, consider the following illustration of the drop in price to attain roughly 50% accuracy on the math benchmark. High school competition math over just two years. For comparison, a computer science PhD student who didn't particularly like math scored 40%, so this is already quite good inference. Efficiency improved by nearly 3 ooms or 1000x in less than 2 years though these are numbers just for inference efficiency, which may or may not correspond to training efficiency improvements. Where numbers are harder to infer from public data, they make clear there is an enormous amount of algorithmic progress possible and happening. In this piece I'll separate out two kinds of algorithmic progress here. I'll start by covering within paradigm algorithmic improvements, those that simply result in better base models and that straightforwardly act as compute efficiencies or compute multipliers. For example, a better algorithm might allow us to achieve the same performance but with 10x less training compute. In turn, that would act as a 10x or one order of magnitude increase in effective compute. Later I'll cover unhobbling, which you can think of as paradigm expanding or application expanding algorithmic progress that unlocks capabilities of base models if we step back and look at the long term trends, we seem to find new algorithmic improvements at a fairly consistent rate. Individual discoveries seem random and at every turn there seem insurmountable obstacles. But the long run trend line is predictable. A straight line on a graph. Trust the trend line we have the best data for ImageNet, where algorithmic research has been mostly public, and we have data stretching back a decade for which we have consistently improved compute efficiency by roughly 0.5 orders of magnitude per year across the nine year period between 2012 and 2021.
That's a huge deal. That means four years later we can achieve the same level of performance for roughly 100x less compute and concomitantly much higher performance for the same compute. Unfortunately, since labs don't publish internal data on this, it's harder to measure algorithmic progress for Frontier LLMs over the last four years. Epic AI has new work replicating their results on ImageNet for language modeling and estimate a similar roughly 0.5 orders of magnitude per year of algorithmic efficiency trend in LLMs from 2012 to 2023.
This has wider error bars though and doesn't capture some more recent gains. Since the leading labs have stopped publishing their algorithmic efficiencies, efficiency doubles roughly every eight months.
More directly, looking at the last four years, GPT2 to GPT3 was basically a simple scale up according to the paper, but there have been many publicly known and publicly inferable gains since GPT3. We can infer gains from API costs GPT4 on release cost roughly the same as GPT3 when it was released, despite the absolutely enormous performance and increase. If we do a naive and oversimplified back of the envelope estimate based on scaling laws. This suggests that perhaps roughly half the effective compute increase from GPT3 to GPT4 came from algorithmic improvements. Also, since the GPT4 release a year ago, OpenAI prices for GPT4 level models have fallen another 6x or 4x input output with the release of GPT4O and Gemini 1.5. Flash recently released offers between GPT 3.75 level and GPT4 level performance while costing 85.57x input output less than the original GPT4. Extraordinary gains Chinchilla scaling laws give a 3x or more 0.5 orders of magnitude plus efficiency gain. Gemini 1.5 Pro claimed major compute efficiency gains outperforming Gemini 1.0 Ultra while using significantly less compute with mixture of experts MOE as a highlighted architecture change. Other papers also claim a substantial multiple on COMPUTE from moe. There have been many tweaks and gains on architecture, data, training, stack, et cetera all the time put together. Public information suggests that the GPT2 to GPT4 jump included one to two orders of magnitude of algorithmic efficiency gains over the four years following GPT4. We should expect the trend to continue on average roughly 0.5 orders of magnitude per year of compute efficiency. That is 2 orders of magnitude of gains compared to GPT4 by 2027. While compute efficiencies will become harder to find as we pick the low hanging fruit AI lab, investments in money and talent to find new algorithmic improvements are growing rapidly. The publicly inferable inference cost efficiencies at least don't seem to have slowed down at all. On the high end we could even see more fundamental transformer like breakthroughs with even bigger gains put together. This suggests we should expect something like one to three orders of magnitude of algorithmic efficiency gains compared to GPT4 by the end of 2027, maybe with a best guess of about two orders of magnitude.
[00:37:40] Speaker B: The Data Wall this episode is brought.
[00:37:44] Speaker A: To you by the Coldcard Hardware Wallet. Not only can you get the Coldcard Mark 4, which is one of my.
[00:37:51] Speaker B: Favorites, in fact I still have this one here.
[00:37:53] Speaker A: This is the one, by the way, that if you have been paying attention to these ads, you should have six of the 12 seed words that I set this one up with. These keys are not very secure and you may very well be getting 150,000 sets if you're paying attention. But I've always loved the Cold card and the Mark 4 specifically has the NFC which my ability to use tap to pay in my wallet on my.
[00:38:16] Speaker B: Phone is just one of the coolest ways to interact with this and I've loved it.
[00:38:22] Speaker A: And the cold Card Q now has a full QWERTY keyboard.
[00:38:26] Speaker B: It is basically like a BlackBerry device.
[00:38:29] Speaker A: It has a large screen so that you can scan with QR on top.
[00:38:33] Speaker B: Of being able to use NFC and USB if you would like, or to.
[00:38:38] Speaker A: Fundamentally break those connections in the hardware so that you know those are not accessible. The simple fact of the matter is the cold card is how you keep your Bitcoin safe. Check out the numerous security
[email protected] the link, the details and the discount code will be right down in the show notes. And if you came here hoping to.
[00:38:58] Speaker B: Get your 150,000 SATs, the second word.
[00:39:01] Speaker A: Is town, the seventh word is merit, and the 11th word is resist.
[00:39:10] Speaker B: If you want to punch these into.
[00:39:11] Speaker A: Your cold card, you can buy one at the link below the data wall There is a potentially important source of variance for all of this. We're running out of Internet data. That could mean that very soon the naive approach to pre training larger language models on more scraped data could start hitting serious bottlenecks. Frontier models are already trained on much of the Internet. Llama 3, for example, was trained on over 15 trillion tokens. Common crawl, a dump of much of the Internet used for LLM training, is over 100 trillion tokens raw, though much of it is spam and duplication. For example, a relatively simple deduplication leads to 33 trillion tokens, implying llama3 would already be using basically all of the data. Moreover, for more specific domains like code, there are many fewer tokens still. For example, public GitHub repos are estimated to be in low trillions of tokens. You can go somewhat further by repeating data, but academic work on this suggests that repetition only gets you so far, finding that after 16 epochs, a 16 fold repetition returns diminish extremely fast to nil. At some point, even with more effective compute, making your models better can become much tougher because of the data constraint. This isn't to be understated. We've been riding the scaling curves, riding the wave of language modeling pre training paradigm, and without something new here, this paradigm will at least naively run out. Despite the massive investments, we'd plateau.
All of the labs are rumored to be making massive research bets on new algorithmic improvements or approaches. To get around this, researchers are purportedly trying many strategies, from synthetic data to self play and RL approaches. Industry insiders seem to be very bullish. Dario Amade, CEO of Anthropic, recently said on a podcast. If you look at it very naively, we're not that far from running out of data. My guess is that this will not be a blocker, there's just many different ways to do it. Of course, any research results on this are proprietary and not being published these days. In addition to insider bullishness, I think there's a strong intuitive case for why it should be possible to find ways to train models with much better sample efficiency algorithmic improvements that let them learn more from limited data Consider how you or I would learn from a really dense math textbook. What a modern LLM does during training is essentially very, very quickly skim the textbook, the words just flying by, not spending much brain power on it. Rather, when you or I read that math textbook, we read a couple of pages slowly, then have an internal monologue about the material in our heads and talk about it with a few study buddies, read another page or two, then try some practice problems, fail, try them again in a different way, get some feedback on those problems, try again until we get a problem right, and so on until eventually the material clicks. You or I also wouldn't learn much at all from a pass through a dense math textbook if all we could do was breeze through it like LLMs.
But perhaps then there are ways to incorporate aspects of how humans would digest a dense math textbook to let the models learn much more from limited data. In a simplified sense, this sort of thing, having an internal monologue about material, having a discussion with a study buddy, trying and failing at problems until it clicks, is what many synthetic data self play RL approaches are trying to do.
The old state of the art of training models was simple and naive, but it worked. So nobody really tried hard to crack these approaches to sample efficiency. Now that it may become more of a constraint, we should expect all the labs to invest billions of dollars and their smartest minds in into cracking it. A common pattern in deep learning is that it takes a lot of effort and many failed projects to get the details right, but eventually some version of the obvious and simple thing just works. Given how deep learning has managed to crash through every supposed wall over the last decade, my base case is that it will be similar here.
Moreover, it actually seems possible that cracking one of these algorithmic bets like synthetic data could dramatically improve models. Here's an intuition. Current frontier models like Llama 3 are trained on the Internet and the Internet is mostly crap like E commerce or SEO or whatever. Many LLMs spend the vast majority of their training compute on this crap rather than on really high quality data, e.g. reasoning chains of people working through difficult science problems. Imagine if you could spend GPT4 level compute on entirely extremely high quality data. It could be a much, much more capable model.
A look back at AlphaGo, the first AI system that beat the world champions at go decades before it was thought possible, is useful here as well.
In step one, AlphaGo was trained by imitation learning on expert human GO games. This gave it a foundation. In step two, AlphaGo played millions of games against itself. This let it become superhuman at go. Remember the famous move 37 in the game against Lee Sedol? An extremely unusual but brilliant move a human would never have played.
Developing the equivalent of step two for LLMs is a key research problem for overcoming the data wall and moreover will ultimately be the key to surpassing human level intelligence.
All of this is to say that data constraints seem to inject large error bars either way into forecasting the coming years of AI progress. There's a very real chance things stall out. LLMs might still be as big of a deal as the Internet, but we wouldn't get to truly crazy AGI. But I think it's reasonable to guess that the labs will crack it and that doing so will not just keep the scaling curves going, but possibly enable huge gains in model capability.
As an aside, this also means that we should expect more variance between the different labs in coming years compared to today. Up until recently the state of the art techniques were published, so everyone was basically doing the same thing. And new upstarts or open source projects could easily compete with the frontier since the recipe was published.
Now key algorithmic ideas are becoming increasingly proprietary. I'd expect labs approaches to diverge much more and some to make faster progress than others. Even a lab that seems on the frontier now could get stuck on the data wall while others make a breakthrough that lets them race ahead. And open source will have a much harder time competing. It will certainly make things interesting. And if and when a lab figures it out, their breakthrough will be the key to AGI. Key to superintelligence, one of the United States most prized secrets.
Unhobbling.
Finally, the hardest to quantify but no less important category of improvements, what I will call unhobbling. Imagine if, when asked to solve a hard math problem, you had to instantly answer with the very first thing that came to mind. It seems obvious that you would have a hard time except for the simplest problems, but until recently that's how we had LLMs solve math problems instead. Most of us work through the problem step by step on a scratch pad and are able to solve much more difficult problems that way. Chain of thought prompting unlocked that for LLMs. Despite excellent raw capabilities, they were much worse at math than they could be because they were hobbled in an obvious way and it took a small algorithmic tweak to unlock much greater capabilities.
We have made huge strides in unhobbling models over the past few years. These are algorithmic improvements beyond just training better base models and often only use a fraction of pre training compute that unleash model capabilities, reinforcement, learning from human feedback or RLHF. Base models have incredible latent capabilities, but they are raw and incredibly hard to work with. While the popular conception of RLHF is that it merely censors swear words, RLHF has been key to making models actually useful and commercially viable. Rather than making models predict random Internet text, get them to actually apply their capabilities to try to answer your question, this was the magic of ChatGPT. Well done. RLHF made models usable and useful to real people for the first time. The original Instruct GPT paper has a great quantification of this. An RLHF'd small model was equivalent to a non RLHF'd or 100x larger model in terms of human rater preference.
Chain of thought as discussed, COT started being widely used just two years ago and can provide the equivalent of a 10x effective compute increase on math and reasoning problems.
Scaffolding Think of COT rather than just asking a model to solve a problem, have one model make a plan of attack, have another propose a bunch of possible solutions, have another critique it, and so on. For example, on human eval or coding problems, simple scaffolding enables GPT 3.5 to outperform unscaffolded GPT 4. On Swebench, a benchmark of solving real world software engineering tasks, GPT 4 can only solve about 2% correctly, while with Devin's agent scaffolding it jumps to 14 to 23%.
Unlocking Agency is only in its infancy though, as I'll discuss more later.
Tools Imagine if humans weren't allowed to use calculators or computers. We're only at the beginning here, but ChatGPT can now use a web browser, run some code, and so on.
Context length models have gone from 2K token context to 32,000 context GPT4 release to 1 million plus context Gemini 1.5 Pro. This is a huge deal. A much smaller base model with say 100k tokens of relevant context can outperform a model that is much larger but only has say 4k relevant tokens of context. More context is effectively a larger compute efficiency gain. More generally, context is key to unlocking many applications of these models. For example, many coding applications require understanding large parts of a code base in order to usefully contribute new code. Or if you're using a model to help you write a document at work, it really needs the context from lots of related internal docs and conversations. Gemini 1.5 Pro, with its 1 million plus token context was even able to learn a new language, a low resource language, not on the Internet but from scratch, just by putting a dictionary and grammar reference materials in context.
Post Training improvements the current GPT4 has substantially improved compared to the original GPT4 when released, according to John Shulman, due to post training improvements that unlocked latent model capability. On reasoning evaluations, it's made substantial gains, for example roughly 50% to 72% on.
[00:51:37] Speaker B: Math, 40% to 50% on GPQA, and.
[00:51:42] Speaker A: On the LMSIS leaderboard, it's made nearly 100 point Elo jump, comparable to the difference in Elo between Claude 3 Haiku and the much larger Claude 3 Opus models that have a 50x price difference. A survey by Epic AI of some of these techniques like scaffolding, tool use and so on finds that techniques like this can typically result in effective compute gains of 5 to 30x on many benchmarks. METR, an organization that evaluates models, similarly found very large performance improvements on their set of agentic tasks via unhobbling from the same GT4 base model from 5% with just the base model to 20% with GPT4 as post trained on release.
[00:52:28] Speaker B: To nearly 40% today from better post.
[00:52:31] Speaker A: Training tools and and agent scaffolding. While it's hard to put these on a unified effective compute scale, with compute.
[00:52:39] Speaker B: And algorithmic efficiencies, it's clear these are.
[00:52:42] Speaker A: Huge gains, at least on a roughly.
[00:52:44] Speaker B: Similar magnitude as the compute scale up and algorithmic efficiencies.
[00:52:48] Speaker A: It also highlights the central role of algorithmic Progress. The roughly 0.5 orders of magnitude per year of compute efficiencies already significant are only part of the story, and put together with unhobbling, algorithmic progress overall is maybe even a majority of the gains on the current trend.
[00:53:10] Speaker B: Unhobbling is a huge part of what.
[00:53:13] Speaker A: Actually enabled these models to become useful, and I'd argue that much of what is holding back many commercial applications today is the need for further unhobbling of this sort. Indeed, models today are still incredibly hobbled. For example, they don't have long term memory, they can't use a computer.
[00:53:34] Speaker B: They still only have very limited tools.
[00:53:36] Speaker A: They still mostly don't think before they speak.
[00:53:39] Speaker B: When you ask ChatGPT to write an essay, that's like expecting a human to.
[00:53:42] Speaker A: Write an essay via their initial stream of consciousness. They can and mostly only engage in short back and forth dialogues rather than going away for a day or a week thinking about a problem, researching different approaches, consulting other humans and then writing you a longer report or pull request.
[00:53:58] Speaker B: They're mostly not personalized to you or.
[00:54:01] Speaker A: Your application, just a generic chatbot with a short prompt rather than having all the relevant background on your company and your work.
The possibilities here are enormous and we're.
[00:54:13] Speaker B: Rapidly picking low hanging fruit here.
[00:54:16] Speaker A: This is critical. It's completely wrong to just imagine gpt6chatgpt.
[00:54:22] Speaker B: With continued unhobbling progress.
[00:54:25] Speaker A: The improvements will be step changes compared to GPT6 plus RLHF.
[00:54:31] Speaker B: By 2027 rather than a chatbot, you're going to have something that looks more like an agent, like a co worker.
[00:54:38] Speaker A: From chatbot to agent coworker what could ambitious unhobbling over the coming years look like?
[00:54:46] Speaker B: The way I think about it, there.
[00:54:48] Speaker A: Are three key ingredients. 1. Solving the onboarding problem GPT4 has the raw smarts to do a decent chunk of many people's jobs, but it's sort.
[00:55:00] Speaker B: Of like a smart new hire that just showed up five minutes ago. It doesn't have any relevant context, hasn't.
[00:55:05] Speaker A: Read the company docs or Slack history or had conversations with members of the.
[00:55:10] Speaker B: Team, or spent any time understanding the.
[00:55:12] Speaker A: Company internal code base. A smart new hire isn't that useful five minutes after arriving, but they are quite useful a month in. It seems like it should be possible.
[00:55:22] Speaker B: For example via very long context to.
[00:55:25] Speaker A: Onboard models like we would a new human co worker. This alone would be a huge unlock 2.
[00:55:34] Speaker B: The test time, compute, overhang, reasoning or.
[00:55:37] Speaker A: Error correction and system 2 for longer horizon problems.
[00:55:43] Speaker B: Right now, models can basically only do short tasks.
[00:55:46] Speaker A: You ask them a question, they give.
[00:55:47] Speaker B: You an answer, but that's extremely limiting.
[00:55:50] Speaker A: Most useful cognitive work humans do is longer horizon. It doesn't take just five minutes, but hours, days, weeks or months.
A scientist that could only think about.
[00:56:02] Speaker B: A difficult problem for five minutes couldn't make any scientific breakthroughs. A software engineer that could only write skeleton code for a single function when asked wouldn't be very useful.
[00:56:13] Speaker A: Software engineers are given a larger task and then they go make a plan, understand relevant parts of the codebase or technical tools, write different modules and test them incrementally, debug errors, search over the space of possible solutions, and eventually submit a large pull request.
[00:56:30] Speaker B: That's the culmination of weeks of work and so on. In essence, there is a large test time compute overhang think of each GPT4 token as a word of internal monologue when you think about a problem, each.
[00:56:45] Speaker A: GPT4 token is quite smart, but it can currently only really effectively use on.
[00:56:51] Speaker B: The order of hundreds of tokens for chains of thought. Coherently, effectively. As though you could only spend a.
[00:56:57] Speaker A: Few minutes of internal dialogue or thinking.
[00:56:59] Speaker B: On a problem or project. What if it could use millions of.
[00:57:03] Speaker A: Tokens to think about and work on really hard problems or bigger projects?
[00:57:09] Speaker B: A table of the number of tokens.
[00:57:12] Speaker A: Versus the equivalent to me working on.
[00:57:14] Speaker B: Something 4 and its comparison to where we stand.
Hundreds of tokens is equivalent to me.
[00:57:21] Speaker A: Working on something for a few minutes, and that is ChatGPT. We are here.
[00:57:26] Speaker B: Thousands of tokens is equivalent to half an hour and it is one order of magnitude increase.
Tens of thousands of tokens is equivalent to half a workday or two orders of magnitude.
[00:57:40] Speaker A: Hundreds of thousands of tokens is equivalent to a work week three orders of.
[00:57:44] Speaker B: Magnitude and then millions of tokens are equivalent to multiple months of work four orders of magnitude.
[00:57:54] Speaker A: Even if the per token intelligence were the same, it'd be the difference between a smart person spending a few minutes versus a few months on a problem.
[00:58:06] Speaker B: I don't know about you, but there's much much much more I am capable of in a few months versus a few minutes.
[00:58:14] Speaker A: If we could unlock being able to.
[00:58:17] Speaker B: Think and work on something for months equivalent rather than a few minutes equivalent.
[00:58:22] Speaker A: For models, it would unlock an insane jump in capability.
[00:58:28] Speaker B: There's a huge overhang here, many orders.
[00:58:31] Speaker A: Of magnitude worth right now. Models can't do this yet even with recent advances in long context, this longer.
[00:58:41] Speaker B: Context mostly only works for the consumption of tokens, not the production of tokens.
[00:58:47] Speaker A: After a while the model goes off the rails or gets stuck. It's not yet able to go away for a while to work on a.
[00:58:54] Speaker B: Problem or project on its own, but unlocking test time compute might merely be.
[00:59:00] Speaker A: A matter of relatively small un hobbling algorithmic wins. Perhaps a small amount of RL helps a model learn to error correct.
[00:59:10] Speaker B: Hmm, that doesn't look right.
[00:59:11] Speaker A: Let me double check that. Make plans, search over possible solutions, and so on. In a sense, the model already has most of the raw capabilities, it just needs to learn a few extra skills on top. To put it all together.
In essence, we just need to teach the model a sort of System 2.
[00:59:32] Speaker B: Outer loop that lets it reason through.
[00:59:35] Speaker A: Difficult long horizon projects. If we succeed at teaching this outer loop, instead of a short chatbot answer.
[00:59:43] Speaker B: Of a couple of paragraphs, imagine a.
[00:59:45] Speaker A: Stream of millions of words coming in.
[00:59:48] Speaker B: More quickly than you can read them. As the model thinks through problems, uses.
[00:59:52] Speaker A: Tools, tries different approaches, does research, revises its work, coordinates with others, and completes big projects on its own, trading off test time and train time compute in other machine learning domains.
[01:00:08] Speaker B: In other domains like AI systems for board games, it's been demonstrated that you can use more test time computer, also.
[01:00:16] Speaker A: Called inference time compute to substitute for training compute.
Figure 18 shows that a smaller model can do as well as a much larger model at the game of Hex.
[01:00:26] Speaker B: If you give it more test time compute or more time to think. In this domain, they find that one can spend roughly 1.2 orders of magnitude more compute at test time to get performance equivalent to a model with one order of magnitude more training computer. If a similar relationship held in our case, if we could unlock four orders of magnitude of test time compute that might be equivalent to three orders of magnitude of pre training compute.
[01:00:56] Speaker A: That is very roughly something like the.
[01:00:58] Speaker B: Jump between GPT3 and GPT4, solving this unhobbling would be equivalent to a huge order of magnitude. Scale up 3 using a computer this is perhaps the most straightforward of the three. ChatGPT right now is basically like a human that sits in an isolated box that you can text.
[01:01:21] Speaker A: While early unhobbling improvements teach models to use individual isolated tools, I expect that.
[01:01:27] Speaker B: With multimodal models we will soon be.
[01:01:29] Speaker A: Able to do this. In one fell swoop we will simply.
[01:01:33] Speaker B: Enable models to use a computer like a human would. That means joining your zoom calls, researching things online, messaging and emailing people, reading shared docs, using your apps and dev tooling and so on. Of course for models to make the.
[01:01:50] Speaker A: Most use of this in longer horizon.
[01:01:52] Speaker B: Loops, this will go hand in hand with unlocking test time compute.
By the end of this I expect.
[01:01:59] Speaker A: Us to get something that looks a.
[01:02:01] Speaker B: Lot like a drop in remote worker. An agent that joins your company is onboarded like a new human, hire messages you and colleagues on Slack and uses your softwares, makes pull requests and that given big projects can do the model equivalent of a human going away for weeks to independently complete the project. You'll probably need somewhat better base models than GPT4 to unlock this, but possibly not even that much better. A lot of juice is in fixing.
[01:02:33] Speaker A: The clear and basic ways models are still hobbled.
A very early peek at what this might look like is Devin an early prototype of unlocking the agency overhang or.
[01:02:45] Speaker B: Test time compute overhang on models on the path to creating a fully automated software engineer. I don't know how well Devin works in practice, and this demo is still very limited compared to what proper chatbot to agent unhobbling would yield, but it's a useful teaser of the sort of thing coming soon. By the way, the centrality of unhobbling might lead to a somewhat interesting sonic boom effect in terms of commercial applications. Intermediate models between now and the drop.
[01:03:16] Speaker A: In remote worker will require tons of.
[01:03:18] Speaker B: Schlep to change workflows and build infrastructure to integrate and derive economic value from. The drop in remote worker will be dramatically easier to integrate just well drop.
[01:03:31] Speaker A: Them in to automate all the jobs.
[01:03:32] Speaker B: That could be done remotely.
[01:03:34] Speaker A: It seems plausible that the schlep will.
[01:03:36] Speaker B: Take longer than the unhobbling. That is, by the time the drop in remote worker is able to automate a large number of jobs, intermediate models.
[01:03:44] Speaker A: Won'T yet have been fully harnessed and.
[01:03:46] Speaker B: Integrated, so the jump in economic value generated could be somewhat discontinuous the next four Years Putting the numbers together, we should Roughly expect another GPT2 to GPT4 sized jump in the four years following GPT4. By the end of 2027, GPT2 to.
[01:04:12] Speaker A: GPT4 was roughly a 4.5 to 6.
[01:04:16] Speaker B: Order of magnitude base effective compute scale up physical compute and algorithmic efficiencies, plus major unhobbling gains from base model to chatbot. In the subsequent four years, we should expect three to six orders of magnitude of base effective compute scale up physical compute and algorithmic efficiencies, with perhaps a.
[01:04:38] Speaker A: Best guess of about five orders of.
[01:04:40] Speaker B: Magnitude plus step changes in utility and applications unlocked by unhobbling from chatbot to agent drop in remote worker.
[01:04:51] Speaker A: To put this in perspective, suppose GPT4 training took three months.
[01:04:57] Speaker B: In 2027. A leading AI lab will be able to train a GPT4 level model in a minute. The order of magnitude effective compute scale up will be dramatic.
[01:05:11] Speaker A: Where will that take us?
GPT2 to GPT4 took us from about.
[01:05:17] Speaker B: Preschooler to smart high schooler. From barely being able to output a few cohesive sentences to acing high school exams and being a useful coding assistant.
[01:05:28] Speaker A: That was an insane jump. If this is the intelligence gap we'll.
[01:05:33] Speaker B: Cover once more, where will that take us? We should not be surprised if that.
[01:05:39] Speaker A: Takes us very, very far. Likely it will Take us to models.
[01:05:43] Speaker B: That can outperform PhDs and the best.
[01:05:45] Speaker A: Experts in the field.
[01:05:48] Speaker B: Of course, any benchmark we have today will be saturated. But that's not saying much. It's mostly a reflection on the difficulty of making hard enough benchmarks.
[01:05:57] Speaker A: One neat way to think about this.
[01:05:59] Speaker B: Is that the current trend of AI progress is is proceeding at roughly 3x.
[01:06:04] Speaker A: The pace of child development.
[01:06:06] Speaker B: Your 3x speed child just graduated high school. It'll be taking your job before you know it.
[01:06:14] Speaker A: Again, critically. Don't just imagine an incredibly smart chatgpt. Unhobbling gains should mean that this looks.
[01:06:23] Speaker B: More like a drop in remote worker. An incredibly smart agent that can reason and plan and error correct and knows.
[01:06:31] Speaker A: Everything about you and your company and.
[01:06:33] Speaker B: Can work on a problem independently for weeks.
[01:06:38] Speaker A: We are on course for artificial General Intelligence.
[01:06:41] Speaker B: By 2027, these AI systems will basically.
[01:06:46] Speaker A: Be able to automate basically all cognitive.
[01:06:49] Speaker B: Jobs think all jobs that can be done remotely.
[01:06:53] Speaker A: To be clear, the error bars are large. Progress could stall as we run out.
[01:06:59] Speaker B: Of data if the algorithmic breakthroughs necessary to crash through the data wall prove harder than expected. Maybe unhobbling doesn't go as far and we are stuck with merely expert chatbots.
[01:07:11] Speaker A: Rather than expert coworkers.
[01:07:13] Speaker B: Perhaps the decade long trend lines break.
[01:07:16] Speaker A: Or scaling deep learning hits a wall.
[01:07:18] Speaker B: For real this time. Or an algorithmic breakthrough.
[01:07:22] Speaker A: Even simple unhobbling that unleashes the test.
[01:07:25] Speaker B: Time compute overhang could be a paradigm shift accelerating things further and leading to AGI even earlier.
In any case, we are racing through the orders of magnitude and it requires no esoteric beliefs, merely trend extrapolation of straight lines to take the possibility of AGI true artificial general intelligence by 2027 extremely seriously.
It seems like many are in the game of downward defining AGI these days.
[01:08:01] Speaker A: As just as really good chatbot or whatever.
[01:08:05] Speaker B: What I mean is an AI system that could fully automate my or my friend's job, that could fully do the work of an AI researcher or engineer.
[01:08:14] Speaker A: Perhaps some areas like robotics, might take longer to figure out by default.
[01:08:18] Speaker B: And the societal rollout, for example in medical or legal professions, could easily be.
[01:08:23] Speaker A: Slowed by societal choices or regulation.
[01:08:26] Speaker B: But once models can automate AI research itself, that's enough.
[01:08:31] Speaker A: Enough to kick off intense feedback loops.
[01:08:34] Speaker B: And we could very quickly make further progress. The automated AI engineers themselves, solving all the remaining bottlenecks to fully automating everything.
[01:08:44] Speaker A: In particular, millions of automated researchers could very plausibly compress a decade of further.
[01:08:49] Speaker B: Algorithmic progress into a year or less.
[01:08:53] Speaker A: AGI will merely be a small taste of the superintelligence soon to follow. More on that in the next chapter.
[01:09:01] Speaker B: In any case, do not expect the vertiginous pace of progress to abate. The trend lines look innocent, but their implications are intense.
[01:09:13] Speaker A: As with every generation before them, every.
[01:09:16] Speaker B: New generation of models will dumbfound most onlookers. They'll be incredulous when very soon, models solve incredibly difficult science problems that would take PhDs days. When they're whizzing around your computer doing your job, when they're writing code bases with millions of lines of code from scratch, when every year or two the economic value generated by these models 10 x's forget sci fi. Count the orders of magnitude.
[01:09:41] Speaker A: It's what we should expect.
[01:09:44] Speaker B: AGI is no longer a distant fantasy. Scaling up simple deep learning techniques has just worked.
[01:09:51] Speaker A: The models just want to learn, and.
[01:09:54] Speaker B: We'Re about to do another 100,000x by.
[01:09:58] Speaker A: The end of 2027. It won't be long before they're smarter.
[01:10:03] Speaker B: Than us Racing through the Ooms it's.
[01:10:09] Speaker A: This decade or bust.
[01:10:13] Speaker B: I used to be more skeptical of short timelines to AGI.
[01:10:17] Speaker A: One reason is that it seemed unreasonable.
[01:10:19] Speaker B: To privilege this decade concentrating so much AGI probability mass on it.
[01:10:25] Speaker A: It seemed like a classic fallacy to think oh we're so special.
[01:10:29] Speaker B: I thought we should be uncertain about.
[01:10:31] Speaker A: What it takes to get AGI, which should lead to a much more smeared.
[01:10:36] Speaker B: Out probability distribution over when we might get AGI.
However, I've changed my mind critically, our.
[01:10:45] Speaker A: Uncertainty over what it takes to get.
[01:10:47] Speaker B: AGI should be over orders of magnitude.
[01:10:50] Speaker A: Or ooms of effective compute rather than over years.
[01:10:55] Speaker B: We are racing through the ooms this decade. Even at its bygone heyday, Moore's law was only 1 to 1.5 orders of magnitude per decade. I estimate that we will do roughly five orders of magnitude in four years.
[01:11:14] Speaker A: And over 10 this decade overall.
In essence, we are in the middle of a huge scale up. Reaping one time gains this decade and.
[01:11:25] Speaker B: Progress through the ooms will be multiples slower thereafter. If this scale up doesn't get us.
[01:11:31] Speaker A: To AGI in the next five to.
[01:11:33] Speaker B: 10 years, it might be a long way out.
Spending scale up Spending a million dollars.
[01:11:41] Speaker A: On a model used to be outrageous. By the end of the decade we will likely have 100 billion or 1 trillion dollar clusters.
[01:11:49] Speaker B: Going much higher than that will be hard. That's already basically the feasible limit both in terms of what big business can afford and even just as a fraction of GDP thereafter.
[01:12:00] Speaker A: All we have is glacial 2% per year trend real GDP growth to increase.
[01:12:04] Speaker B: This hardware gains AI hardware has been improving much more quickly than Moore's Law. That's because we've been specializing chips for AI workloads.
[01:12:15] Speaker A: For example, we've gone from CPUs to.
[01:12:17] Speaker B: GPUs, adapted chips for transformers, and we've gone down to much lower precision number formats from FP64 to FP32 for traditional supercomputing to FP8 on H1 hundreds.
[01:12:31] Speaker A: These are large gains, but by the.
[01:12:33] Speaker B: End of the decade we'll likely have totally specialized AI specific chips without much.
[01:12:38] Speaker A: Further beyond Moore's Law gains.
[01:12:41] Speaker B: Possible Algorithmic Progress in the coming decade, AI labs will invest tens of billions in algorithmic R and D, and all the smartest people in the world will be working on this. From tiny efficiencies to new paradigms, we'll.
[01:12:54] Speaker A: Be picking lots of the low hanging fruit.
[01:12:57] Speaker B: We probably won't reach any sort of hard limit, though unhobblings are likely finite.
[01:13:01] Speaker A: But at the very least, the pace.
[01:13:03] Speaker B: Of improvements should slow down as the.
[01:13:05] Speaker A: Rapid growth in money and human capital.
[01:13:07] Speaker B: Investments necessarily slows down. For example, most of the smart stem talent will already be working on AI.
[01:13:13] Speaker A: That said, this is the most uncertain.
[01:13:15] Speaker B: To predict and the source of most of the uncertainty in the ooms in the2030s. On the plot above put together, this means we are racing through many more orders of magnitude in the next decade than we might in multiple decades thereafter. Maybe it's enough and we get AGI soon, or we might be in for.
[01:13:35] Speaker A: A long slow slogan.
[01:13:37] Speaker B: You and I can reasonably disagree on the median time to AGI depending on how hard we think achieving AGI will be. But given how we're racing through the OOMS right now, certainly your modal AGI year should be sometime later this decade so. Figure 23 Matthew Barnett has a nice related visualization of this, considering just computer and biological bounds. Tweet from Matthew Barnett My own basic calculations suggest that given the potential for increased investment and hardware progress, we could very soon move through a large fraction of the remaining compute gap between the current frontier models and the literal amount of computation used by evolution.
Part 2 From AGI to Superintelligence the Intelligence Explosion Whew.
That was crazy.
So we're going to stop right there and I want to talk about a couple of different things because we talked about AGI on this show and I've been extremely skeptical of the idea in and of itself. And I want to talk about I still wonder if There's a difference between a drop in remote worker and artificial general intelligence. Like I actually believe this guy's line of thinking. I think he has a very reasonable approach to looking out into the future though there is, I believe one element that I think could get in the way that he isn't addressing here or hasn't thought about. And I think it's the sheer physical limitation of the amount of compute we can put toward it.
[01:15:31] Speaker A: Swan Bitcoin has the full suite of Bitcoin financial services. You can instantly buy with your bank.
[01:15:39] Speaker B: Account or wire transfer any amount of bitcoin up to $10 million worth.
[01:15:45] Speaker A: And you can easily set up what.
[01:15:46] Speaker B: I have been doing for ages, which is an automatic purchase purchase of Bitcoin.
[01:15:50] Speaker A: On a weekly or monthly basis.
[01:15:52] Speaker B: You just pick your time frame and then automatically withdraw it to your cold storage.
[01:15:56] Speaker A: And still they have free withdrawals to self custody, which I was sure would.
[01:16:01] Speaker B: Be gone by now.
[01:16:02] Speaker A: But you should always treat any custodian as a point of failure.
[01:16:06] Speaker B: And luckily you won't have to go.
[01:16:07] Speaker A: Anywhere for all of the information and advice you need for why you should withdraw and how to do it safely.
[01:16:13] Speaker B: Because Swan Bitcoin has all of the.
[01:16:15] Speaker A: The resources you need and will regularly.
[01:16:18] Speaker B: Remind you about 80% or more of their customers automatically withdraw their coins.
[01:16:24] Speaker A: That is an amazing feat if you ask me.
[01:16:27] Speaker B: Then they also have the Swan ira.
[01:16:29] Speaker A: If you have a traditional IRA and you want to get it allocated to.
[01:16:32] Speaker B: Bitcoin and there's so much more if.
[01:16:35] Speaker A: You haven't started into Bitcoin yet. Swann is an amazing place to begin. Go to swan.com guy the link will be right there in the description. Again that is swan juan.com guy and.
[01:16:50] Speaker B: They will know that I sent you and my beautiful face will be right.
[01:16:53] Speaker A: There at the top of the page to greet you.
[01:16:55] Speaker B: Now he does talk about like compute efficiency and like specialized chips and this sort of thing. But it's funny, I actually think a decent comparison is Bitcoin mining and how quickly we went from basic mining on CPUs to then Steiniex and you know, basically inventing GPU mining and the order of magnitude increase there. Then FPGAs, then the very first iterations of ASICS, then the, the consistent growth in the development, the chip development process in just getting the nanometers, nanometer level of Asics down to the highest nano or the lowest nanometer chip size and the highest density of chips that you could get in high end chips for computers and all of this stuff. It was Just the aggressive progression of the market to a mature product, to a mature tool in relation to the bitcoin mining task, the math, the compute that it needs, and now the, the improvements and the generations have slowed down a lot to the fact or to the point that you can still get something with an S9. Now it's nothing, it's not going to get you anything of consequence. But as a heater, it brings in, you know, cents per day in a mining pool with an S9. And an S9 is, I mean you can get them for 30, 50 bucks. I don't know, they're super cheap. So it's not like it costs you a whole lot. You damn near the price of a heater.
And they were released in 2017.
So we're at seven years old for, I mean, not to say these are competitive in the mining market at all. They aren't, but it's funny, they are competitive in the heat market. But it's a, it's a good example, I think, of how the easy improvements, the major step function improvements of the new technology come fast and hard for a new period until it basically catches up with the maturation of the broader market. And one of those big limitations still for anything that's heavy compute reliant. And when we're talking about $1 trillion clusters to create models and this sort of stuff, we're talking about an amount of energy that we don't know how to produce. And now that 1 trillion compute cluster, we couldn't produce the energy. Now we could, but I mean we're running up against the physical boots on the ground. Like a great example is he talks about like dropping in remote workers as anything like an agent can do. It would be like inviting a remote worker into your job, into your computer, into your systems and they could literally learn and then operate and do whatever they could within a digital environment. Well, this would lend itself to suggest that something that's going to be vastly more valuable in the future is picking up a hammer and banging on something physical.
[01:20:01] Speaker A: Blue collar work is going to be cool again.
[01:20:04] Speaker B: Now there's always an element there that you put this thing in a machine and you let it interact with the world and you know, what can it then learn? What can it then test and accomplish? Like I just saw some people were talking about it and I'm fascinated. I saw it on Noster. It's fascinated to the point that I might get it. But I feel bad because I like the guy who mows my lawn and I don't have the time for that. But it's a.
I mean, I'm not super considering getting it because it is kind of expensive. But it's a mower. It's an electric AI mower or whatever that literally mows your lawn. And it looks pretty legit. Like they've, they have put a lot of the pieces together to make it actually extremely good at its job. And what's funny is that it would actually be better because of the way it's designed. It would be better than the mower, the guy who comes and uses, at getting like where there's divots or hills or awkwardness in the grass itself that leads to, you know, odd cuts and spots where the grass is still high because of where the wheels hit and all of this stuff. It's actually got like independent suspension and the ability to kind of like bend in like certain curves and situations and divots so that it actually can cut a consistent height of grass. So it's not to say the blue collar side, like the physical side of AI isn't coming. In fact, that might be one of the really huge step function improvements over every or in creating everything else because of the ability to scale up at a different layer. But that's where I think this is most likely to run into bottlenecks, is to hit a wall where things slow down aggressively, just like Asic and Bitcoin mining and just like whatever the limitations of this kind of intelligence really are and where the next breakthroughs go. Because at some point you hit the physical world barriers, you hit reality and the limits of energy and computation and actually putting, you know, computers into racks and the number of. He talked about like the number of people who are actually machine learning experts and can actually provide anything and make.
[01:22:29] Speaker A: Progress in this field and are interested.
[01:22:30] Speaker B: In this field, potentially hitting a wall pretty quick there of not even being able to find or making extremely hard.
[01:22:37] Speaker A: To find new talent.
[01:22:39] Speaker B: And then of course, the data problem of just hitting the wall on what data is even out there to consume. But I like the thinking on turning a problem from something that we just like kind of randomly or very shallowly send a language model through some data set to read it, respond to it, analyze it, etc. And then turning it into a series of scaffolding for an agent that breaks down the data slowly and computes it from like a hundred different angles and through different framings to try to pull more and more out of it and then to test, to basically test, to interact to do all of these things that humans do. And I think it's funny to just like, apply. If you have a chatbot, if you have something that can begin to act like a human, well, let it learn the way a human does.
[01:23:40] Speaker A: Like, how do we go through a.
[01:23:41] Speaker B: Task, how do we figure out and how do we adjust when we start a plan and the plan doesn't turn out and then we have to change the plan and we have to reassess, like, how many times, how many times I've had to do that with my business, with this show, with what I'm trying to do, with what I'm giving to my producer or April, or what I'm trying to figure out myself, or what we want to do in the development projects. I've started like three different development projects before finally landing on one that I think we're gonna take to the conclusion, basically in trying to fight and solve a handful of my problems and kind of breaking it down to a lower and lower level. So, like, none of my plans have worked out. So how can I really criticize ChatGPT for having the first plan come up and be like, well, this doesn't even work. And that's why I. The thing that I've talked about a lot on this show actually kind of falls in the unhobbling section of what he talks about here, where the real improvements, I think those are the fascinating and like, really step function improvements of how do we use these things, how do we allow them to interact, talk to themselves and like, kind of give their own feedback for.
For making progress. Now, the reason why, I think the reason why we can see very clearly that this is possible, and I think it would be stupid to. Or I think it would be foolish or naive to deny it or to say this isn't possible is to see how the AI. I love that he compared to AlphaGo and comparing to Go and then also compared it to another one.
What was it? I think it was board games. Where's the quote? Trading off test time and train time compute in other machine learning domains. In other domains like AI systems for board games, it's been demonstrated that you can use more test time compute, also called inference time compute, to substitute for training compute. So here's the thing. Why does it work with go? Why does it work with chess? Why does it work with board games? All of these games where AI can now beat humans? What is it about them that allows them to train and get feedback? Whereas, you know, we've read. Read sections from the paper or whatever, and I love the analogy of an AI trying To train its is like inbreeding is that you just get. And then they called it AI dementia for like image generation and LLM retraining based on its own data is that you could technically you could actually make it better in one avenue or in one direction or for one specific task, but it would actually get crappier in other things. It would, it would kill the weights for producing comparative anything outside of that speciality. So it wasn't really learning in that sense is it was specifically in the context of using its own data and its own feedback. What it was actually doing was more.
[01:26:48] Speaker A: Heightening the probability of some particular thing only at the cost of taking away the probability of something else that it has.
[01:26:57] Speaker B: So again, why does this not. Why is this not a problem with AlphaGo and with board games and all of these things is that there are.
[01:27:05] Speaker A: Extremely strictly and precisely defined boundaries.
[01:27:10] Speaker B: There's extremely explicitly defined winning and losing.
Like it's. It's obvious whether or not the output is correct. You just tell alphago this is how.
[01:27:24] Speaker A: You win a game and this is how you lose a game. And then everything they can test and.
[01:27:28] Speaker B: All of the competition and playing with themselves that they can do, they know.
[01:27:33] Speaker A: Who the winner and loser is.
[01:27:34] Speaker B: They just, they just know which game won. They can essentially just brute force the.
[01:27:41] Speaker A: Entire space of the game possibilities to know what move at what time and in what situation is going to produce success.
So what that means I think is that how we define success and how we define the parameters of feedback, when these things are starting to interact with.
[01:28:03] Speaker B: The real world or they're interacting with other things in a computer or interacting.
[01:28:07] Speaker A: With us, will define how these things progress and whether or not they can progress. And this is why I think actually.
[01:28:16] Speaker B: Human feedback is critical.
[01:28:19] Speaker A: And this also might be a place.
[01:28:21] Speaker B: Where the feedback mechanism, because we're talking about human interaction in real world interaction.
[01:28:27] Speaker A: Where it hits a bottleneck in the.
[01:28:30] Speaker B: Fact that there is just only.
[01:28:32] Speaker A: There's only so much bandwidth of interacting.
[01:28:34] Speaker B: With other humans or interacting with the real world that's possible. And now we hit a physical barrier.
[01:28:39] Speaker A: Again of how many machines it has.
[01:28:41] Speaker B: How many cameras it's looking through, how many, you know, mowers it's mowing with and how many errors it gets from.
[01:28:48] Speaker A: Which ones to know what it should not have done and what it should do instead and what success looks like.
[01:28:55] Speaker B: Does success mean making 2 million people do what it.
[01:29:00] Speaker A: What, you know, the CEO of a company wants to vote a certain way or to save a certain number of lives. What does saving A certain number of lives mean is that with a vaccine based on some statistical analysis based on some pharmaceutical company that continues to lie.
[01:29:17] Speaker B: To people like all of these inter seriously subjective arenas.
[01:29:24] Speaker A: Go chess and a board game have no subjectivity. They're explicitly defined. That's why they're fun games.
[01:29:33] Speaker B: Or really that's what games are. So it immediately becomes a subsection or.
[01:29:39] Speaker A: A sub domain of perspective, which means.
[01:29:44] Speaker B: That in order for it to have feedback, we somehow need to define a perspective or what feedback looks like with interactions with the real world or with other people.
[01:29:58] Speaker A: But aside from everything, I'm thinking about.
[01:30:01] Speaker B: How, how and when these things will start to plateau.
You know, it's hard to argue with the trend lines.
[01:30:12] Speaker A: And he's got a really good point and I think he may be right on the capability. But I still wonder if it should be called artificial general intelligence.
And I guess the question really comes to can it do AI research better than AI researchers? And here's the thing that I think.
[01:30:38] Speaker B: Artificial general intelligence is actually a fantastic.
[01:30:40] Speaker A: Video that I watched very recently on YouTube.
[01:30:43] Speaker B: I'll see if I can find it again.
[01:30:45] Speaker A: I guess it's probably in my history, so it's probably.
[01:30:48] Speaker B: I'll put the link in the show notes.
[01:30:50] Speaker A: There's a guy who was talking about.
[01:30:51] Speaker B: It and framing it to understand that.
[01:30:53] Speaker A: This is artificial narrow intelligence and that each model is actually only built to do or recognize or repeat one sort or set of patterns. And it does not have context or understanding as to what is right and wrong. This is why they still hallucinate a lot.
[01:31:13] Speaker B: And this is also a major hurdle as to why it can't teach itself. Like a human being can take an idea and literally think, like just think for an extremely long time and work through the reason and logic of it to come to a conclusion. It knows what is contradictory from a conceptual standpoint. And importantly, humans create new models for new modes of thinking and new projects or new categories of thing all the time. And those models are reasoning models. They are not probability models. And that is more what I think of as artificial general intelligence. And why, I wonder, we may get computers that actually do far outpace what humans can do from a pure agent kind of remote worker sins.
But I my thinking is do we ever get to a place, even if the LLM is that good, do we.
[01:32:21] Speaker A: Get to a place where the LLM.
[01:32:22] Speaker B: Can design and create an image generation model purely because I asked the LLM to create a model to produce very high efficient images, you know, based on some data set or something? Like that, is it able to then create without. Without any experience, by the way, this. So this would be as a novel thing. Now, obviously, if we have all of the code and a bunch of GitHub on how to create image generation models, well, then it can just probabilistically fight its way through until it gets a good image generation model. And we have all of the different ways to test that image generation model and all the benchmarks and all the other models to compare it against. So I think it's a little bit different when we're talking about a task.
[01:33:07] Speaker A: That has already been completed.
[01:33:08] Speaker B: But I'm talking about like a model that is novel, that is new.
[01:33:13] Speaker A: Can AGI create that?
[01:33:16] Speaker B: And what kind of interaction, what kind of perspective, what kind of.
Kind of like, values or like moral structure is necessary for it to have the boundaries, necessary to understand what success or failure is? And that's. That's the interesting thing. And I talked about this, actually with.
[01:33:37] Speaker A: Scott Lindbergh, who I thought about having.
[01:33:39] Speaker B: On the Bitcoin Audible show. I've reached out to him today, again, actually.
Well, we had a really interesting conversation about story and education and learning, the ideas of learning and, like, where we are missing the mark, so to speak, and why. And one of the things that I had thought about and we had talked about kind of in that conversation was the fact that restrictions lead to creativity. That restrictions are like defining limits is kind of what allows us to bounce around within those things in order to figure out what it is that we might think of as a good story or a conclusion or a good argument. And like a good example, the specific example was in story making, in writing a story and creating a story, is that the more creative freedom you have oftentimes, the more paralyzing it is. If there's just an infinite scope of possibility, there's an infinite scope of things that you want to try. This is why it's often good to just write something down. Because rather than trying to find the perfect thing, it's actually better to mold something bad. So write something bad down on paper and then go through the process of criticizing it yourself, getting feedback and all of that stuff.
[01:35:03] Speaker A: But the reason.
[01:35:04] Speaker B: But one of the. One of the key things there is that I know, according to my emotional state, according to my history, what I.
[01:35:13] Speaker A: Think of as good.
[01:35:15] Speaker B: And I know how to codify, to actually cement those values into my model of the world.
So the question is, is that a simple unhobbling task to introduce that into the domain of these.
Because he's right about the compute. I think he's right about the capabilities in that regard, but I just don't know. Is, do we define artificial general intelligence as simply the ability to do all of these things, or as the ability to create continuously and in a sustained fashion, create novel things by testing, by experimenting, by using logical and reason boundaries of reason to work through problems? And at the same time, does artificial and general intelligence even come from a single mind, or does it come from a mixture of experts sort of thing? Does it simply kind of emerge from the ecosystem not as a single model.
[01:36:26] Speaker A: But as all of the different models.
[01:36:28] Speaker B: With all of the different human values and all of the different human goals and all the different resources, being committed to each, interworking with each other, interacting and cooperating and fighting and competing? Is that where basically general intelligence arises?
Or maybe in other words, does it ever.
Does it become an extension of human agency, or does it arrive at its own agency?
You know, the human organism is actually just an extension of DNA cells. All of the infinite complexity of all.
[01:37:12] Speaker A: The different pieces of the puzzle that.
[01:37:13] Speaker B: End up creating the human organism.
But are we an independent agency, or.
[01:37:19] Speaker A: Are we the collection?
[01:37:21] Speaker B: Are we the result of all of the independent agencies and all of the different systems of proteins and genes and cells and mitochondria and food and environment, like all of the things together, then produce that intelligence, and without any of those individual pieces, it doesn't really exist. It's just an abstraction of the infinite complexity of all of the different pieces of us.
Well, then isn't the artificial general intelligence or the super intelligence concept, can we think of that just as the output of society?
Now you think about how crazy it is that even though we don't know what the unhobbling effects will be, we don't know where the algorithmic improvements will be. And we cannot foresee what the algorithmic improvements are that we can put them on a freaking linear chart and inevitably something will pop up that keeps us online, that keeps us on trend.
You know, what is that?
What do we call that in the context of intelligence?
I mean, shouldn't an artificial general intelligence, shouldn't a super intelligence kind of be thought of as the output of many competing, interacting intelligences? Isn't that what we are creating by being a community to begin with? I mean, what in your world isn't the result of billions of other people's interactions and thoughts and working through problems? Not one person, no one person created basically anything like nothing. Satoshi didn't create Bitcoin by himself. There are like 20 previous projects like seven of them cited in the white paper of each individual piece that ended up making it possible. I just look around, I'm in my basement right now, and all of the different tools, all of the different little gadgets, the computers. I've got a line, a whole line of hard drives over here, just sitting over here. And I'm just thinking, like, how many?
I mean, hundreds of millions, probably billions of people are literally just in a.
[01:39:42] Speaker A: Combination of interactions and systems that built.
[01:39:44] Speaker B: All of the individual pieces and the.
[01:39:46] Speaker A: Precursors to those pieces in these hard.
[01:39:49] Speaker B: Drives that are sitting right here.
Like, I can't even fathom how many.
[01:39:53] Speaker A: People would be, would have been involved in that.
[01:39:56] Speaker B: And the fact that literally no one.
[01:39:59] Speaker A: Individually knows or can make it without.
[01:40:02] Speaker B: The interaction and help of a combo of millions of other people and then billions by extension of what they need in order to even play their part in it.
I don't know. I think he does have a fantastic argument about getting through the data wall though, is that we just don't have any more good data to train it on.
And I thought this particular paragraph was just extremely solid at being like, okay, that makes sense. I think, I think he's totally right there.
Is that rather, when you or I read that math textbook and he's talking about like in the LLM, it takes, you know, just skim through the math book and then spit out the first answer as fast as possible. This is rather when you or I read that math textbook, we read a couple of pages slowly, then have an internal monologue about the material in our heads and talk about it with a few study buddies, read another page or.
[01:40:59] Speaker A: Two, then try some practice problems fail.
[01:41:01] Speaker B: Try them again in a different way, get some feedback on those problems, try again until we get a problem right, and so on until eventually the material clicks. You or I also wouldn't learn much.
[01:41:11] Speaker A: At all from a pass through a.
[01:41:12] Speaker B: Dense math textbook if all we could.
[01:41:14] Speaker A: Do was breeze through it like LLMs.
[01:41:17] Speaker B: This is where I think he has a really strong point in that we can train if we know how to.
If we know if we can understand better how the LLMs should interact with the data and how they should go through and analyze the data.
[01:41:42] Speaker A: Like, that's the question is what does.
[01:41:43] Speaker B: It mean for an LLM to analyze data? Is it simply, you know, pattern recognition and probability from word to word and sentence to sentence? Or is there another whole? Like in the context of we talked about how to build DLL Ms. And the whole attention or large language models in a brief run through for large language models, I will have to link in the show notes because it was one of my favorite reads of the show so far. But talking about how the attention metric is actually one of the ones that was a huge breakthrough in producing, you know, algorithmic advances in making the LLM better able to understand the data and what is important in the data and thus how to basically get the proper output. So rather than just what's the probability of this next word in respect to.
[01:42:39] Speaker A: All of the words that came before.
[01:42:41] Speaker B: It, but also how much attention is paid between these two words in particular.
[01:42:46] Speaker A: These two subjects, these two verbs, but.
[01:42:50] Speaker B: Then also just the implications of the simple capability improvements, the idea of a remote God, the advancements from 2021 to now, like when we were looking at the first GPT and GPT3 when it landed, the speed really is incredible, really is incredible. And I want to hit this quote just because I want to leave on this and think about this for when we come back to the next episode and the next section of this read quote.
[01:43:29] Speaker A: By the end of this I expect us to get something that looks a.
[01:43:33] Speaker B: Lot like a drop in remote worker. An agent that joins your company is onboarded like a new human hire, messages you and colleagues on Slack and uses your software, makes pull requests and that given big projects can do the model equivalent of a human going away for weeks to independently complete the project. You'll probably need something better, somewhat better base models than GPT4 to unlock this, but possibly not even that much better.
[01:44:04] Speaker A: A lot of juice is in fixing the clear and basic ways models are still hobbled.
[01:44:13] Speaker B: What's interesting about that situation, if and or when we get there, is that immediately the result of success?
Success suddenly becomes the result where competition shifts to becomes where we are willing to devote resources and how much of.
[01:44:40] Speaker A: Our resources we are devoting to some specific task or goal.
[01:44:46] Speaker B: And specifically in choosing the right goal to tackle, suddenly the entire problem becomes are we solving the right problem?
Are we asking the right questions? Are we headed in the right direction? And it creates a capacity, it creates an environment for anyone to essentially achieve anything and for the most successful to simply be those with the right framework.
[01:45:18] Speaker A: The right perspective on what we should.
[01:45:21] Speaker B: Even be going after.
Which is really crazy to think about.
But anyway, this episode has gone on way too long already.
I hope you enjoyed that. I hope you are enjoying this piece. Like I said, I really feel like this needed audio. A lot of people were talking about it and it seemed to paint a clear picture in a lot of people's heads and I think it's a very, very strong argument and very much worth considering and breaking down. So we will go through this entire piece and I hope you stick with me and I hope you subscribe and check out my other shows, the Para Report and Bitcoin Audible. This is part of the Guy Swan Network. I am Guy Swan and thank you so much for joining me. Don't forget to check out our amazing sponsors, Swan Bitcoin and Coinkite. Swan Bitcoin is where you buy your Bitcoin and have a full financial suite of tools to plug into a bitcoin life. And of course Coinkite and the cold hardware wallet to keep your Bitcoin safe, hold your keys, get a cold card discount code in the show notes.
[01:46:27] Speaker A: Also lots of other links and lots.
[01:46:29] Speaker B: Of other cool stuff in the show notes to the paper and the many other links and things.
Basically, I have like 20 tabs open to go exploring and I highly encourage you to do the same thing if you want to take a deep dive on this. So again, thank you all. I am Guy Swan and I will catch you on the next episode of AI Unchained. Until then, everybody take it easy.
[01:46:50] Speaker A: Guys.
It's not over if you're still here. Chronicler said it's not a tragedy if you're still alive.
Patrick Rothfuss from the Name of the Wind.