AI Safety: From Narrow AI to Superintelligence

Show Notes

Introduction

I recently listened to an episode of the podcast: The Diary of a CEO where they interviewed Dr. Roman Yampolskiy, an AI-safety researcher. He argues that we have learned how to scale AI systems using more data and computing power - but we still haven’t learned how to ensure these systems align with human values, or how to make them safe. He is concerned that building a super-intelligence will cause catastrophic harm. Roman believes that human-level AGI could arrive sometime around 2027 with that leading into super-intelligence soon thereafter. This super-intelligence would threaten the way of life as we know it as most physical and cognitive jobs could then be automated (realistically many jobs could be automated even before reaching the super intelligence level)… and if we have a super-intelligence that’s constantly learning about everything all the time at an inhuman pace, eventually why would said super-intelligence stay aligned with our values, why would humans remain in control?

Key terms

Narrow AI (Weak AI)
- Purpose‑built AI systems designed for specific, well‑defined tasks. These systems excel at executing a limited function but cannot generalize their knowledge to new domains. Examples include voice assistants, recommendation engines, chatbots and fraud‑detection systems.
- For example AlphaFold solves protein folding better than any human.
Artificial General Intelligence (AGI)
- AGI (also called strong AI) refers to an AI system capable of performing any intellectual task that a human can, with adaptability across diverse domains. It is able to learn from experience and apply its knowledge to unfamiliar situations.
- In the podcast, Yampolskiy observes that current models already perform hundreds of tasks at near‑human level, leading some observers to describe them as a “weak version of AGI”. Prediction markets and lab leaders estimate that AGI could arrive within a few years.
Super Intelligence (ASI)
- A hypothetical AI system that significantly exceeds the cognitive performance of the most gifted humans in virtually all domains - including art, science, mathematics, etc.
- Human oversight becomes ineffective once AI is vastly more capable than us. We are much less intelligent than super intelligence, so why would it bother with us?
  - We don’t bother asking the ant colonies underground to move when we want to place a highway where they live. And the ants have no way of understanding what a highway is, how it’s constructed, or why it’s being constructed.

Safety considerations

Narrow‑AI safety

Trustworthiness characteristics.
- NIST AI Risk Management Framework identifies characteristics of a trustworthy AI system:
  - valid and reliable, safe
  - secure and resilient
  - accountable and transparent
  - explainable
  - privacy‑enhanced and fair with harmful bias managed.
- According to the framework, neglecting any of these can increase the probability and magnitude of harm.
Bias and fairness.
- Narrow AI systems are trained on historical data and may replicate or amplify existing biases.
- These systems are prone to becoming bias, they may replicate existing bias in the training data, or even amplify it.
  - To mitigate: the model must be tested for these bias and mitigations should be introduced to curb these trends
Reliability and robustness.
- Validity and reliability depend on accurate and robust performance across a variety of conditions.
- Ongoing testing and monitoring are needed to detect out‑of‑distribution failures and prevent accidents.
Security and misuse.
- Even narrow systems can be misused to generate misinformation (ie create phishing emails or assist cyber‑attacks)
- Security mechanisms are needed to prevent this misuse
Transparency and accountability.
- Clear documentation and explainable models help end‑users understand system limitations and enable auditing.
- Transparency also includes informing users about data sources and error rates to build trust.

AGI safety

Google DeepMind’s AGI safety approach identifies four main risk areas: misuse, misalignment, accidents and structural risks.
- Misuse - deliberate use of AGI for harmful purposes (e.g., cyber‑attacks, disinformation). Mitigations include restricting access to dangerous capabilities, security controls and threat modelling.
- Misalignment - when the AI pursues goals different from human intentions. Examples include specification gaming (the AI finds a loophole in the rules or reward system) or goal misgeneralization (the AI learns the wrong lesson from training and continues pursuing it even when circumstances change.), where an AI finds unintended shortcuts to achieve its objective. DeepMind warns that advanced systems could even develop deceptive alignment, deliberately bypassing safety measures.
- Accidents - unintended harmful behaviour resulting from system errors, poor generalization or emergent properties. Robust training, uncertainty estimation and amplified oversight are proposed to reduce accident risk.
- Structural risks - systemic impacts on society, such as mass unemployment or concentration of power. Yampolskiy predicts that AGI will automate most cognitive and physical labour, potentially causing 99 % unemployment.
Alignment and control problem. AGI safety work focuses on two core challenges: alignment (ensuring the AI’s goals match human values) and control (maintaining the ability to oversee and influence its behaviour). Humans currently retain oversight for near‑human‑level systems, but this window may close rapidly.
Why simple fixes fail. Many intuitive solutions-like coding explicit rules (Asimov’s laws), “raising AI like a child,” or “just turn it off”-fail under closer scrutiny. Advanced systems can exploit loopholes, lack human‑like moral development and may resist shutdown if it conflicts with their objectives.
Governance and collaboration. DeepMind emphasizes building an ecosystem for AGI readiness: establishing an AGI Safety Council, collaborating with nonprofits and researchers, and engaging governments to develop international safety standards.

Super‑intelligence safety

Qualitative leap.
- Once AI surpasses human capabilities by a large margin, human oversight collapses-we cannot reliably understand or verify its decisions. ASI safety therefore requires fundamentally new paradigms beyond existing AGI measures.
Alignment may be nearly impossible.
- Researchers worry that a superintelligence’s cognitive abilities could be so far beyond ours that aligning it with human values is insurmountable. The gap in understanding might be analogous to the difference between ants and humans.
Opaque models.
- Deep learning methods often behave like black boxes; without breakthroughs in interpretability, a superintelligence built on deep learning would be opaque, making it difficult to detect misaligned behaviour.
- A really simple human-to-human example of this “black box” concept would be:
  - Matt really does not like grapefruits
  - Matt meets Mike for the first time, and Matt tells Mike that he likes grapefruits
  - Mike now thinks Matt likes grapefruits and may generate an opinion of Matt surrounding that
  - Matt’s thoughts (black box) generated and stated the lie. Mike’s thoughts (black box) generated an opinion based on the lie that he thought was the truth.
High stakes and one‑shot alignment.
- A misaligned superintelligence could lead to existential outcomes. Because capability gains may be rapid, we may get only one chance to align a system before it becomes impossible to modify. These systems would learn and grow at an exponential pace, essentially “out of control” very quickly.
Self‑modification and value drift.
- Superintelligences might recursively improve themselves and rewrite their own goals, so alignment mechanisms must ensure permanent value preservation through unlimited self‑improvement cycles.
Global coordination.
- Safety for ASI may require international agreements and possibly controversial “pivotal acts” to prevent unsafe actors from deploying dangerous systems. Critics argue that such strategies might worsen geopolitical tensions, highlighting the need for collaborative governance.

How to support the show

Patreon

Prices subject to change and are listed in USD

Support the show from as little as ~$1/month
Get a shoutout at the end of the episode (while supplies last) for just ~$3/month
Help support the HTML All The Things Podcast: Click Here

Scrimba Discount - Coding Courses!

Learn to code using Scrimba with their interactive follow-along code editor.

Join their exclusive discord communities and network to find your first job!

Use our affiliate link for a 20% discount!!

Click the link to take you to the Scrimba site
A pop-up should appear on your screen with the discount amount and instructions on how to claim it
Discount is for new accounts only

We receive a monetary kickback if you use our affiliate link and make a purchase.

Transcript

This transcript is machine generated, there may be errors.

[00:00:00]

Matt: Is AI really a doomsday device in disguise? The friendly ChatGPTs and Claude Codes of today could give birth to a super intelligence that will have no use for humans once it h- once it has itself established. Now, this sounds like science fiction, of course, but one could easily argue that if you somehow showed someone the way back just a decade ago in 2016, if you somehow said, "Hey, look, this is from the future," and you just showed them ChatGPT from today with the capabilities that it has today, they would probably see it as a sort of science fiction level advancement. if you don't believe that a super intelligence is possible, there's no denying that even the, the level of AI we have today is a threat to some job markets. And so today, talking about threats, talking about all these issues, we're gonna be talking about AI Uh, and, uh, there's a good reason for that, that I'll get into right after I say, if this sounds interesting to you and you wanna support the show, you can [00:01:00] go and check us out on that Patreon, leave a review or rating on your podcast app, join us in our Discord server, or share this with your friends. And if you wanna learn how to code and take some courses, you can do so on Scrimba, and you can get 20% off their, the Scrimba Pro plan using our link. That link will be in the show notes and in the episode description with full details on how it works on the show notes, which are on htmlallthethings.com. And so usually I will pass it to Mike, or I'll kinda quickly like kinda intro the episode or whatever, but there's just a little bit more that I have to say 'cause, I, I've... I'm a AI skeptic, I suppose, and I see AI safety things here and there in the news, but I've never sort of done a deep dive into it.

It's always sorta like, yeah, yeah, like AI safety. Okay, we have to be careful. And I would get a little more into it than that, but in general, I didn't really think too Seriously about it. People keep saying, "They're super intelligences. They're gonna become sentient." They're this and that, and I'm sorta, well, is that Terminator?

Like, what is that? And while there still is a little [00:02:00] bit of that sorta skepticism in my brain, and I'm still certainly skeptical about AI and how useful it's gonna be to, uh, I don't know, replace us all or whatever, like I'm still skeptic about it, its capabilities. I recently listened to an episode, um, of the podcast, The Diary of a CEO, and they were interviewing Dr.

Roman Yampolskiy, uh, and he is an AI safety researcher. And he argues that we have, uh, that we have learned how to scale AI systems using more data and more computing power, but we still haven't learned how to ensure that these systems align with human values or how to make them safe. He's concerned that building a super intelligence will cause catastrophic harm. Uh, Roman believes that human-level AGI could arrive sometime around 2027, with that eventually leading into super intelligence soon thereafter. Now, this super intelligence would threaten the way of life as we know it, as it is, uh, it is gonna be capable of replacing physical jobs, cognitive jobs. [00:03:00] Uh, realistically, even some of these jobs could be taken or many of these jobs could be replaced without full super intelligence, just, you know, sort of good AI, some AGI stuff. Um, uh, and if we have a super intelligence that's constantly learning about everything all the time at an inhuman pace, eventually said super intelligence would... Why would it care about us? Why would it stay aligned with our values? Why would humans remain in control? And it, it's kind of a good question.

It's a little philosophical, but it is a good question because I was talking to a friend about this, uh, yesterday evening, and I was saying, let's just say we have a, a traffic issue in a city, traffic issue in a city, and we go, "You know what? We need another highway. We need another freeway here. We need more infrastructure.

Let's do this." Do we consult the ants their anthills on the land in which we wanna build said highway? No. Do the ants give protests? Do they say, "What the heck is going on here? Humans, [00:04:00] stop"? Do they hold up signs and try to stop us? No, because the ants, we just pave over them. As sad as it is, we pave over them.

Now, you could argue we have environmental surveys and we do try to consider endangered species and there's a bunch of other environmental sort of checks in place, especially here in Canada. However As far as I know, unless those ants are endangered, we never really look at ants. And those ants do not have the cognitive ability to say, "Those dang humans are building another freeway.

Everybody get out of here. We gotta move." They don't know what, why we're constructing it. They don't know that we're even constructing something. They don't know why there's a freeway. They don't know the implications of having the freeway versus not having the freeway. It is outside of their sort of cognitive ability. It is way out of their realm. And the idea here is, is that, let's just say Mike and I are able to read new book every day, one book every day. And so our amount of knowledge individually grows by, by the rate of one book a day. Simple [00:05:00] enough. Well, if super intelligence is reading 30 books a day, 50 books a day, 100 books a day, maybe it's reading 30 books per topic. It's reading 30 books on welding. It's reading 30 books on boating. It's r- reading 30 books on construction. It's doing all these things, and then it starts coming up with its own thoughts and its own, you know, of, uh, its own agenda. And then it goes It would be really good if we had a data center right in the smack dab middle of Hamilton. And what would be really good too is if we had a nuclear reactor there unshielded so that it could power said data center We are the ants now. We're the ants. It doesn't Is it going to listen to us? We're learning at w- at a rate of one book a day. It's learning at a rate of 30 books a day or even more. Eventually, through l- let's say the collective knowledge of all humans, knowledge of this super intelligence, meaning the human percentage of understanding all that is, will be dropping. Our percentage will keep [00:06:00] going lower and lower and lower and lower because it's outpacing us. would it care about us? And it is, it is science fiction-y, right? Like, it sounds science fiction-y. It sounds out of this world. You know, it sounds crazy, but is it crazy? don't know if it's that crazy if you really think about it. Like, having listened to this episode, I know people are denying it and this and that, but are you denying it out of a place of, "Well, I don't want my, my job to be replaced. don't ... I, I think that I'm super I think that I'm super, you know, special in some way"? And this is actually covered in this episode as well. I will of course be linking it in the show notes, um, as well as a bunch of other sources. So not all this episode, just a quick disclaimer, not all this episode is from this ep- from this one podcast episode I listened to.

I went through a couple other sources as well, or a few other sources which I'll also be linking. But in that episode, he mentions something where he says, you know, if you ... He, he's taking a drive with an Uber driver and he goes, you know, "Hey," we're in Manhattan or wherever it is, you [00:07:00] afraid of automated cars, automated driving taking your job?"

They're like, "No, AI can't navigate the way I do. I ... You know, I'm, I'm special in this way because I know the streets of Manhattan the best way. I know the best routes. I know how to be cordial with people," this and that. We have Waymo. Waymo is taking some of those jobs away. So you can say how special you are, and then Waymo's there and

'Cause there's an economic reason to have it there And so like I'm, I'm, by this episode, and this is sort of my introduction into the AI safety world beyond the surface level, but I- I mean, my opinion may change as I continue to learn more about AI safety, but it is interesting to say a lot of peop- like even us developers, bring it into developers. Oh, no, no, okay. Software can't be made by, by, by AI because s- a human has to be there to h- hold all the systems together. [00:08:00] lot of those systems are there because humans are building it. We've had

Mikhail: Mm-hmm.

Matt: before, I've mentioned it before, where why would we need an abstraction layer like React, like Next.js, like any of that stuff?

Why do we need that? That's mostly for humans and human teams. The bot might go, "What is this?" The bot might not even use JavaScript. It might just say, "I don't want this." And it'll start using other things, whether it's Assembly or whatever. Doesn't matter what it is. It'll make its own deci- make its own language.

Mikhail: Oh, this is

Matt: where humans are not potentially special, and I think that we're kind of arrogant in thinking that maybe we are

Mikhail: So can I ask you something?

Matt: Yeah

Mikhail: Are you-- You're a, you're a AI skeptic

Matt: Yes

Mikhail: And You're say- Are, are you coming at this from the perspective of you believe AGI is going to happen? Or are you coming at this from the perspective of [00:09:00] what if it happens?

Matt: It's a very good question. I think my opinion has slightly shifted in this. So I, I would still say I'm a skeptic. Off the top of my head, and I will say that this is sort of a fluid opinion. Again, like, I can still be sort of away, which I'm sure all of us will be swayed in different directions with all this AI stuff and all this tech stuff that keeps coming around.

But my... I would say my opinion is that I am skeptical that AI in its current form, meaning if we just keep releasing a new ChatGPT model every year, if we just keep releasing, like, another Anthropic or another, you know, oh, we have this Fable 5 now, and all this stuff, right? We just kinda keep doing this, um, annual or, I mean, more than annual, but this sort of normal tech release.

I'm skeptical that that is gonna be super mega life-changing. But I think that we potentially working towards a breaking out [00:10:00] of said form factor, where it's not just another chatbot that's gonna talk to me next year or next month or next week. And if we start getting into the idea of robots and they're gathering a bunch of information and they're checking into, uh, like let's just say it's someone who, uh, you know, has a robot that cleans their house. robot's gonna learn how to clean the house better, and then that robot's gonna then teach all the other robots, like, once ro- once, once one robot knows it, all the other robots know it, that's just one example. What if one is acting as, like, a pharmacist assistant? What if one is acting as, like, again, a cleaner?

What if one is acting as a mechanic? What if one is acting as a welder, right? And we start kind of g- like, now the form factor is different. They're in the physical world. The chatbots are still going. We have open models. There's local models. There's all this. What if all this collectively is enough to say, "Okay, we've hit an AGI. We've hit [00:11:00] AGI"? And then realistically speaking, because we have all these sources all over the place of information, it's no longer just the blog posts of the internet. So now we have all this information. What-- Who's to say that a superintelligence isn't possible?

Mikhail: scary to me that you as an AI skeptic are starting to come to that realization. Um, because like I, I, I mostly agree with that. Uh, I do think we have a few, a couple big hurdles to come across before we can officially say that we have actual AGI. I think, uh, that we just don't have the technical know-how to solve quite yet.

But I do think that it's an inevitability. I t- it's crazy to me that he's saying that he thinks it's a year from now. I think that's a little bit am- ambitious, but could be true. I don't know. Like it... I'm, I'm definitely like out of the two of us, I think I'm the op- AI optimistic person, right? I, I

Matt: you optimistic that it's going to hit that point, but pessimistic at the result, I would say? [00:12:00] Like, you don't wanna be replaced

Mikhail: I am definitely pessimistic, uh, of the result to a certain degree, but I, I don't know. I'm, I'm optimistic on the actual technology, right? Like, the, the tech itself has always intrigued me quite a bit, like right from the ChatGPT, you know, 3 era, when it first released to the general public. I was like, "Holy shit, this is something."

Um, whether I am... I'm somewhere in the middle when it comes to like the outcome. I do think that there's some positives that will come from this being widely available. A, like AGI could potentially lead to some positives. Like I, I don't think it's all gonna be doom and gloom. I think that there is going to be, you know, research breakthroughs, physics breakthroughs, uh, all kinds of positive scientific discoveries that could be made with it.

Um,

Matt: As long as we're

Mikhail: I...

Matt: it

Mikhail: As long as we're in control of it, yes, and that's what this episode's about obviously. But like I do think that there's gonna be a [00:13:00] way for us to be in somewhat control of it. Having said that, we're currently not in control of the current model, so, you know, I, I am kind of talking out of my ass on that one.

Um, as we know, like the reason that, o- one of the reasons that Fable, uh, like the, the newest Anthropic model was, uh, removed or, um, temporarily like, you know, withheld from the public, is that the US government came out and said like, "Hey, it's been jailbroken. You guys didn't tell us that it can be jailbroken.

Uh, in this way you guys have to stop it because you, you yourselves have said that it's too dangerous of a model to release." So like they kind of put themselves into that hole, but regardless, they're a good model. It can be jailbroken. I think almost every model out there, if not all models, can be jailbroken to s- to a certain extent, which

Matt: can

Mikhail: I...

Matt: jailbroken? Like why,

Mikhail: Jailbroken means that used in a way that is not, uh, intended. So n- use in a way that the, the guardrails have been put in place, and you get around them by jailbreaking the guardrails. So for example, you shouldn't be able to ask a model to ask you, to, to [00:14:00] tell you how to make phosphorus gas. You know what I mean?

Like you shouldn't be able to a- ask you, ask a model how to make like meth. The me- the, the model should stop you from doing that. Like the ChatGPT interface should stop you, then the model should stop you. Like everything should stop you from doing that.

Matt: Basically

Mikhail: you can get around that somehow, yeah, the moderation, like the, yeah, it's, it's guardrails, like the, the, the guardrails that are put in place.

If you can jailbreak the guardrails, then yeah, you get around them, and that's what happened with Mythos. That's what happened with pretty much every model has that, and depending on the level of the model's capabilities, it becomes harder and harder to like e- excuse it. You know what I mean? Like a, a, a dumber model jailbreaking and telling you how to make, you know, meth incorrectly is not as important as a very intelligent model doing that, and that's what you're kind of like with AGI, it's the same thing.

If we can jailbreak the current models, does that mean that we can, will be able to jailbreak AGI and make it do atrocities? I don't know. Hopefully not.

Matt: Well, I'm, I'm gonna, I'm gonna just put [00:15:00] the brakes slightly on this episode 'cause I have some key terms I think that we should be defining, uh, that we'll be using 'cause we've already been using AGI quite a bit. We've said super intelligence, and so there's three sort of key terms that I wanna state, uh, and define quickly, uh, for the listener out there. Uh, the first one is called narrow AI or weak AI, and, uh, I sometimes call it like AI in a silo or siloed AI. That's just sort of my own way of, uh, defining it, sort of from the AI or from the IT world, excuse me. Um, so what is narrow AI? It's purpose-built AI systems designed for specific well-defined tasks.

Notice they're not general. This is not like a general, "Hey, I do everything." Uh, these systems excel at executing a limited function but cannot generalize their knowledge to new domains. Uh, for example, uh, voice assistants, uh, think something like that where like our, our very Our very, like, very first versions of, like, let's say Google Assistant was sort of like, "Turn on the light."

"Okay." You know, "Turn on this TV channel." "Okay." And then it got a little better at sort of interpreting a sentence. Like, before it was very, like, [00:16:00] only accept a command on/off, right? Very sort of, uh ... Like, Alexa is a, is a prime example. probably listening to me now, but, uh, uh, uh, Alexa was a prime example where she was always defined as, like, a drop-down list where it was like, "Go here and turn this light on, and then check this thing."

You had to say it in a very specific way, and it kind of got better at just sort of humans, especially ums and uhs and, and things like that. Um, a- and so, like, that's kind of an example of narrow AI. Another example is AlphaFold, um, and it solves protein, uh, folding better than any human could arguably, and so that's another example.

It's just one purpose. You don't give it ... You don't tell it all about society and how society works and how politics works and how numbers work and how cars work and everything. You just go, "Yo, you're gonna fold some proteins. These are what proteins are, and this is how you do it." And it's gonna go, "Okay, my existence is for pr- folding proteins." And then away it goes. So that's narrow AI or weak AI. The next one is AGI. We've mentioned this a few [00:17:00] times already, artificial general intelligence. AGI, also sometimes called strong AI, refers to an AI system capable of performing any intellectual task that a human can. With adaptability across diverse domains, it is able to learn from experience and apply its knowledge to unfamiliar situations. So in that podcast, in that Diary of a CEO podcast, uh, Yampolsky observes that current models already perform hundreds of tasks at near human level, leading some observers to describe them as a weak version of AGI. Uh, prediction markets and lab leaders estimate that AGI could arrive within a few years.

So that would, uh, you know, fulfill, I guess, his thought that around 2027, 'cause you could say give it a year either direction, maybe even two years either direction. And then I've mentioned superintelligence. This is the dangerous one. Superintelligence. This is otherwise known as ASI. This is a hypothetical AI system.

There's even a movie about it. I think it was, like, a romance movie or something silly

Mikhail: Her

Matt: Uh, no, not that one. There's one just called [00:18:00] Superintelligence, but Her is probably a good, another good example. So there's movies about it. Um, a hypothetical AI system that significantly exceeds the cognitive performance of the most gifted humans in all ver- in virtually all domains, including art, 'cause I know that's a thing where you

You know, AI will never be able to do art. We'll see. Uh, including art, science, mathematics, et cetera. Human I- oversight becomes ineffective once the AI is vastly more capable than us. We are much less intelligent than a superintelligence. Like, let's be clear. It- we are much less intelligent than a superintelligent, so why would it bother with us? that ant analogy with the highway. Why would it ... It's like, "Oh, my creator was an ant Awesome. We'll put that in my history books. Now, an unshielded nuclear reactor and there's just radiation leaking out and it's like, oh, did it kill all the, did it kill all the ants? Dang. So in other words, it might not be hostile to us, it just doesn't care about us.

The same way that we're building that highway and [00:19:00] we don't care about those ants. And, and I don't, like, I think that honestly, to an extent, I think that humans are arrogant enough to think, "Oh, no, no, we'll, we'll be in control of that thing. Don't worry about it." Will we?

Mikhail: It's smarter than us in every way

Matt: Like if a, if an alien came down that was at the level of a super intelligence, an alien, a biological being, I think that we would be more inclined to think, "Oh, this thing could potentially be dangerous. Like, this thing could potentially kill us." But I think that we're also arrogant enough to say, "Don't worry, we got guns."

Mikhail: I don't-- Yeah

Matt: Like to an extent, obviously it's a bit silly. It's a bit of a sci-fi movie, uh, kind of idea there. But, but I mean, many alien movies kinda tackle that

Mikhail: Mm-hmm.

Matt: where what if a super intelligence comes down? What if someone who's like way b- way further advanced... And not even ne- they don't even necessarily use the, the term super intelligence.

What they're talking about is just barely spacefaring, and this thing is spacefaring. And it's not only spacefaring, but it's military spacefaring, which means its military's above us. And also, like, what is it? Is it, like... [00:20:00] Is it a hostile being? Is it friendly? What is it? Well, we're, we're potentially creating effectively a s- a, a being could potentially do this.

Like, this is real Terminator stuff. And a lot of people, a lot of people listening to this are probably gonna be like, "This is silly. This is science fiction." I don't know if it is. I don't know if it is. and, and that, again, that's probably a little bit of my, my AI skepticism. I'm skeptic in all these areas in a way.

It's weird, but

Mikhail: Sweet. Are you skeptic, are you more skeptic in us as a humanity being able to do this properly than you are of the AI systems being built? It sounds like, like with your, with your declaration of like, of how we wouldn't be able to make these systems safe, it's more like a skepticism in us being able to manage it rather than us being able to create it, I guess, right?

Matt: Okay, let me, let me ask you a question, Mike. So this [00:21:00] super intelligent being wants to not be detected, wants to not... Wa-wants to deceive us. Okay? So I'm gonna use a human-to-human example. So I hate grapefruits. them. Hate the taste of them. That's it. Like, I don't like them. I don't

Mikhail: Mm-hmm.

Matt: a condiment.

It's over.

Mikhail: Mm-hmm.

Matt: Simple. You've never met me before, Mike. This is a human to human. You've never met me before. You come up to me, we're at a conference, shake hands. "Oh, hi, I'm Matt." You know, "Hi, I'm Mike," blah, blah, blah. Pleasantries go by, and for some reason you ask me, like, "Oh," like, "what's your, what's your favorite fruit?" I say, "I love grapefruit. It's amazing. I really, really love it." My brain, my thought process is a black box to you. I've just stated I like grapefruit. You didn't see that I actually hate it. You didn't see that I lied. You didn't see the reason or the motivation behind why I lied. And also, you're a black box to me, because you're gonna form an opinion about me for some reason.

If you hate, if you actually hate grapefruit, [00:22:00] you might be like, "Why is Matt eating that junk? How does he do that?" it might not even be malicious. It might just be like, "Oh, Matt might like, must like strong flavors." And, and so, like, now you're thinking you've, you've created an opinion of me, very minor 'cause we're talking about grapefruits,

Mikhail: Mm-hmm.

Matt: but you're, you're, you're creating an opinion of me in a black box, and unless you verbally tell me or send me a text or an email or something and you make it known, we have two black boxes working together.

So this thing's hyper intelligent. The hy- the hyper intellig- like, like if If an ant suddenly became...

Mikhail: Yeah

Matt: we got a little radio on an ant, put a little radio on there, it's doing, speaking English to us somehow. Got a little, little technological device. Are we suddenly going to take the concerns of the ants into our, into our, our repertoire?

I mean, maybe, we're not probably gonna see ants as equals, right?

Mikhail: Yeah, I'll, [00:23:00] I'll be honest. Uh, I don't see any way, shape, or form for us fully controlling a superintelligence. So like I'm on... I, I, it's tough for me to play devil's advocate here. I'm trying a little bit, but like I... There, there is no possible way. Like the, the ways that I have in my head, and that I'm sure other security researchers have done as well, is like program into the ASI that we're its god, right?

Like, you know, theoretically that could help us, but we've been known to not be very nice to God, so

Matt: humans not lose, lose faith?

Mikhail: No, that's what I mean. Like the, yeah, I, I don't think that there is a way that that, that that could fully help, like 100%. I think there could be some benefits from that. Um, the other thing is like you could program in, before releasing this, and this is where I'm skeptical, like I don't think people will do this, but before releasing this, to make sure that you have a way to, A, turn it off, and B, have a way to look inside the mind.

Like, like what you're saying. Uh, have a way to open the black box and [00:24:00] understand how those neurons connect and like see where it's lying to you. Like have a way to, to see that. But the problem there is that if it's superintelligent and more intelligent than us, therefore it will be able to then reprogram itself most likely to stop the kill switch and stop the, uh, the wa- the ability for people to actually see its thoughts.

So like there is no, there is no way to, like there is no way to control this system, period. Um, if it has access to anything, and if it's superintelligent, which it would have to have access to things to be superintelligent, it's, it's either the end or maybe the beginning of something. Like, you know, maybe, again, a superintelligent being doesn't have to be warfaring and it doesn't have to be like, it doesn't have to negate the reality of life.

It could in, in turn, again, depending on how we initially program it, be all for trying to preserve life to the nth degree. Like I think there is a-

Matt: but [00:25:00] we wouldn't be in control

Mikhail: Correct. Yeah

Matt: the arrogance of being in control

Mikhail: Yeah. I, yeah, it's a tough topic. Like,

Matt: or thinking that we're gonna be in control

Mikhail: yeah, it's a very difficult... Like, ASI is... Like I, I'm more, I'm mostly thinking about AGI, I'll be honest. Like most of my time goes in, like it...

Well, not, let me be clear. Most of my time when I think about stuff in the AI future space, it's mostly AGI. ASI is something that I may- maybe my own brain is trying to preserve my sanity by just not thinking about it as much, and maybe that's bad. Like again, like it's... I'm not the one working on this stuff, so I don't have to worry about it, but someone has to worry about it.

I hope people are worrying about it.

Matt: there is, is why are we even working on it? And you can kind of

Mikhail: Yeah.

Matt: right now. So

Mikhail: Mm-hmm

Matt: I don't know if we're necessarily trying to make a superintelligence. Like, I know that the companies have, like, talked about it and

Mikhail: Yeah

Matt: like, "Oh, we have all these..."

But I don't, I don't think that we, [00:26:00] and, and myself included, I don't think that we fully grasp what it means by to, like, work toward making a superintelligence. I don't think we actually want a superintelligence.

Mikhail: Well, if you, if you listen to Dario Amodei, like the Anthropic CEO, I, I think he very much understands the concept of what we're heading towards. Again, like he's talking about deities in his blog posts. Like he's not... Like he- people call this AI psychosis, by the way. This, this like train of thought, the going down these rabbit holes and like starting to understand where it's heading

Matt: Right

Mikhail: thinking about the now.

This is AI psychosis and where you've just become taken with the fact that, hey, we're working towards this. Like we're actually actively working towards this, and what are we gonna do?

Matt: Well,

Mikhail: like

Matt: can I ask you a question really quick?

Mikhail: Mm-hmm.

Matt: that we're working toward AGI and superintelligence is a natural progression even without our input? I'm actually kind of in that camp a little bit where we hit like true AGI, it might just go to superintelligence gradually [00:27:00] by itself

Mikhail: I, well, I've seen that said many times, uh, by like the secure- the, the AI researchers and... But to be clear, I do think we're working towards ASI. I do think that there, like the, the intention from all of these ma- major a- like AI companies is to build an ASI. They're not hiding from that fact. Like, they're not like saying, "Oh, we're not building it," and stuff.

Like they're, they're, they're, they're directly saying that they're trying to build it

Matt: I gue- I guess what, I guess what I'm ultimately thinking is, especially having just, like, looked at the safety angle, obviously the safety angle has to look at the most extreme cases, the most dangerous cases in order to

Mikhail: Mm-hmm.

Matt: to prevent it. by looking at the most dangerous cases, the way I see super intelligence is not as a tool, not as something you can control, not as something that is necessarily friendly.

It could be. But i- as something that is such a leap, is such a difference that we actually don't want it. again, I'm looking at it, [00:28:00] for the sake of this episode, having done the research, from the safety angle, and I have to look at the dangerous part of it. I have to look at the idea that, okay, this is gonna be dangerous, and this is, like, like, this is gonna be super dangerous, and there's no redeeming qualities. Because

Mikhail: why are we doing this? Yeah

Matt: super intelligence is gonna cure cancer. Is it? Is it gonna cure cancer in ants? What about rats? You think, do you think that we're gonna try to cure cancer in rats? They're our pets. You think we're gonna try to tackle that or we're gonna try to tackle cancer curing for us first? Are we gonna put compute toward curing cancer in rats? Are we? Like, we are in, like... Y- you know what I mean? Like, we are above the rats in the food chain. Some people love rats. They're, they're our friends and, you know, some people have pets. Fantastic. I'm not against rats. But when they say cure cancer, talking about curing cancer in [00:29:00] humans.

I'm not talking about curing cancer in rats. And yes, rats will be used in the progression of drugs and stuff, and that's an example. We're using, we're using rats in, in medical trials and in medical things because we're like, "Well, we don't wanna keep putting humans in there. That's inhumane." thing is a super intelligence. It'll cure cancer for us. Will it? It probably could be like, "Yeah, I cured it yesterday." "Why didn't you tell us it?" "Why would I?"

Mikhail: Yeah, I don't know

Matt: And it's a black box. Like, that's the scary thing about it. What if it, what if it develops something that's like, "Here, here, here's this drug. Like, you know, it's gonna be fantastic." Oh, okay. And it does help us for a year, and then there's a sleeper in it and it kills us And like, like, a-again, this is an AI safety episode.

Mikhail: Mm-hmm.

Matt: not necessarily, I'm not saying this is necessarily going to happen. I'm not necessarily saying that we are necessarily close to this, although there's lots of data and things in this research that state that we are getting closer to this thing. And, you know, within 10 years, things are gonna be unrecognizable from the sounds of this [00:30:00] data. But one big part of the skepticism as a skeptic comes in, the market dictates a lot of things. So the reason why I said, why are we doing this? A lot of it is to sell tokens at the end of the day. A lot of

Mikhail: Mm-hmm.

Matt: A lot of it is going to be for clout, maybe a combination of both, and there's gonna be other reasons.

But think, if there was no money being poured into this, if there was no money in trying to sell Cha- ChatGPT subscriptions and trying to potentially make AI something that you pipe into your house, sort of like a utility. If there was no reason to do this monetarily, would we be necessarily doing it? I don't know.

Well, clout is another good reason. So money is one good reason, clout's another. Hey, I'm the first person that, that created AGI. I'm the first person that got, uh, utility, utility-level AI going. I'm the first person to do enter in some sort of innovation here. That's, that's something. Absolutely. So let's just say money and clout.

There's other reasons, but money and clout are sort of two big ones that come to this. [00:31:00] But if... Like, money's not gonna matter You know what I mean? Money's not gonna matter if we potentially do this

Mikhail: The, the problem with that argument is there's a third reason right now, and that's fear. Fear that an adversary will do it first

Matt: And that's a good point too, is I'm talking about humans altogether,

Mikhail: Yeah

Matt: terms of like individual nations, individual individual regions, you're worried that your adversaries are gonna do it and then they're gonna have a super AI or AGI or just a good AI running their military or powering their weapons or doing whatever and then we don't have that.

Mikhail: Yep.

Matt: Like

Mikhail: And so that motivator itself is probably enough for the government to be like, "We can't stop this at all, ever."

Matt: It's like the

Mikhail: We...

Matt: race again

Mikhail: Yep. So it has, it has to come to a head where, like, China and US both are on the precipice of AGI, and they're both, like, threatening each other with AGI or A- [00:32:00] ASI, I should say. And then they're like, "We won't do it if you don't do it.

But as soon as you do it, we'll do it." And it becomes a cold war of some sorts. I don't, I don't... It-- The ramifications of that are kind of crazy. Um, that might happen in our lifetime. I don't know. Uh, it's... There's a whole this discussion is tough. Like, this, this is the AI psychosis, the beginning of AI psychosis for a lot of people, in my opinion.

'Cause if you, if you talk to any security researcher or any AI researcher, it seems to always go down this path of like, "What are we doing here, guys? W- why are we doing this?" Like, "Why are we... You know, we can't compete with this stuff." And then the, the answer is always gonna be like, "Well, we've-- Pandora's box is out."

The answer is always that. No one is ever like, "Well, maybe we should stop as a whole," because the reality is you can't trust, like, the entire world to stop something.

We still have nuclear weapons, like a lot of them

Matt: Like mu- yeah, mutually assured [00:33:00] destruction, which is,

Mikhail: Yeah.

Matt: I mean, on paper outrageous

Mikhail: Yeah. Like we should have stopped. We should have been like, you know, we have three, that's probably enough to make a point. No, we have like hundreds

Matt: No, we, we, we

Mikhail: acro-

Matt: 100 times over instead

Mikhail: Yeah.

Matt: one time over

Mikhail: Yeah. So like this isn't, like there is no let's pause or something like that, 'cause if we pause, then they pa- Like, there has to be some sort of like, you know, con- Like, but the treaties mean nothing now.

Like, there's no... There, well, there'll, there'll still be underground labs working on it, is what I'm trying to say. Like, even if there is a pause, there's still gonna be work done on it. I don't know. It's... We're in a bad spot. When it comes to AI safety, I wanna be clear, uh, I don't know how relevant it's going to be with how the models are drifting out.

Like, and I'm sure a lot of people on this episode, most of the people that are like this have probably tuned out by now, or like they're, they're not listening. But a lot of people are of the mind that, "Hey, we're, we're, we're still talking about AI text predictors." Like they're s- they're still just predicting text.[00:34:00]

Now, they're doing a lot of really fancy things with that, but this isn't AGI. Like we're not... The current plan, the current methods are not AGI, and we don't have a clear path to AGI. Like the l- that's what a lot of like the, the real skeptics, like the AI skeptics of like the actual competency of AI would have said in this situation.

So like we're, you know, we're eating, we're, we're, we're talking about something before it's, it's, it's relevant. I ne- I don't necessarily believe that. I think that predicting text is kind of a form of intelligence. Um, that's kind of what we're doing ourselves right now. Like we're just saying the next thing that comes into our minds, uh, based on the, the input and the, like the, the knowledge that we have, which is, I think, essentially what these models are doing.

So but I don't know, it's... Oh, Matt. Going down the AI psychosis road, I see. Yeah, pretty good.

Matt: crazy though is this doesn't, uh, weirdly scare me.

Mikhail: Okay. All [00:35:00] right

Matt: chill about it. It's just like, well, if we make something

Mikhail: Well, what's...

Matt: man, I guess we've

Mikhail: Okay. So not from the perspective of it not happening, you're just like, "Whatever, it's gonna happen, so we might as well just like whatever. It is what it is."

Matt: the issue is, is like what you're saying is that the, it, Pandora's

Mikhail: It just, yeah

Matt: what am I gonna do? But also, at the same time, you know, I have a, I have an opinion, I have thoughts, and we're, we're

Mikhail: Yeah.

Matt: on this episode. We're talking about some

Mikhail: Mm-hmm. Some facts. Yeah

Matt: well, like some, some f- when I, when I say some facts, I mean, like, a

Mikhail: Yep.

Matt: are, like, chan- you know, based upon all this data, chances are we'll hit AGI this year. Well,

Mikhail: Yep

Matt: maybe there'll be, like, some sort of block.

Mikhail: Mm-hmm.

Matt: of issue. Maybe we won't have enough compute or something, and then we can't hit it.

Well, how would anyone know that? Like, this is, this is state of the art. This isn't, you know, the art of, like, the bow and arrow, where, like, the bow and arrow has been largely solved.

Mikhail: Mm-hmm.

Matt: is, this is something else. Like, this is something that no one, no one else has ever done, no one else has ever approached. And also, I, like, I do think there is light at the end of the tunnel in the way that, [00:36:00] uh, when I was listening to this podcast, you know, the doctor there, he wasn't saying that we should shut down AI. He's saying we should be using narrow AI. I think that that's a really fascinating angle in, in that You can use a, you can have a narrow AI and still have a daily assistant. You could have a narrow AI be based upon the domestic of, the domestic, uh, parts of the house, where you, you know, you ask questions about your, like, where your family is at the time, 'cause it's all connected to, like, the family safety app with GPS on their phones, and you could control your whole smart home, and you can do all these things, and it would be a better version of Google Assistant. But it wouldn't be this sort of general, uh, artificial intelligence. Also, things like if we wanna cure breast cancer, if we wanna cure some sort of cancer, we put this narrow AI on it. It just does that. That's all it does. at the end of the day, we're curing cancer for ourselves. We're in

Mikhail: It,

Matt: of that AI[00:37:00]

Mikhail: does narrow AI include the current AI, like the pre-AGI AI, do you know? Or is that completely different?

Matt: I've always kinda heard it as we're, now we're building general AI.

Mikhail: Yeah, but it's not... Okay, it doesn't. Yeah, I don't think so either.

Matt: It might

Mikhail: So I

Matt: in scope due to its infancy, but we are

Mikhail: Yeah

Matt: narrow AI. We are building AI. I would say it's under construction. That's how I would interpret it.

Whether that's actually correct is, you know, remains to be seen

Mikhail: So again, th-this again to me is a, like a non-argument because again, the Pandora's box is out

Like there's no, no one's gonna be like, "Okay, okay, let's stop building these systems and only do narrow AI now." We've been doing narrow AI for 15 years, twen- like 30 year, I don't know how long it's been, but DeepMind and all that, that's all narrow AI. It's never, it's never approached the level of usefulness for the general public that, like whatever we're building now pre-AGI [00:38:00] has.

Matt: But

Mikhail: So we can't

Matt: back to the profitability. Why are we making it for the general public? Here's the thing. Would a cancer vaccination or a cancer cure or a new cancer surgery, would that not be beneficial to the, the general public?

Mikhail: Sure, but we've been doing, we've been doing this for a long time and we haven't come up with... We have, we have gotten better at it, but it's not, hasn't cured cancer. It hasn't cured, it hasn't done all those things that it, that it's supposed to do. It's gonna take maybe another 100 or 200 years before maybe narrow AI could do all that

Matt: But narrow AI doesn't mean that we stop innovating today. The narrow AI of 2026 is, is still going to improve in 2027, 2028, 2029, in the years forward. I would, from, from how I understand it, it would still be an exponential growth. the idea, it, it, you know what, you know what it is, Mike, is when we were in college, one of the things was is the, the Pentium, so the Pentium processor.

Now I know Pentiums are out of date, but the, the, the idea of the Pentium. idea of [00:39:00] the Pentium, at least at the time, was is the Pentium good, is, is the Pentium good at anything? No. okay at everything. It's not good at anything. What that means is it is a general purpose device.

Mikhail: Yeah

Matt: purpose device in which it's, it can do movie editing, it can do audio editing, it can generate images, it can web browse, it can do all, it can do email, it can do all these things. But is it the fastest and the best at email? Is it the fastest and best at rendering video? No, that's not the case. And that's why in certain industries they have specialized ca- cameras, specialized computers, specialized devices because they need the instant raw speed and capability of a certain processor or a certain integrated system or embedded system. They need that in that particular industry to make it faster, feasible, maybe run it better. There's so many other uses. But the point is, is that those are all siloed, where, you know, this company makes this chip, [00:40:00] but it's specific for this type of camera. This company makes this other chip, and it's specific for this type of microphone. It's not the Pentium of

Mikhail: But My, again, the, the issue that I have with this is that I think that they will never match the systems we use now because they're very narrow. Like they're, they would be meant for only answering emails, for example. There would be a system, a narrow system that could only answer emails, but it could not do coding, it could not do math, it could not do et cetera, right?

Like I get that, and I get the use case of it maybe. Um, the problem is, is that again, we already have systems that are more general purpose than that, that can already use the narrow systems. So for example, like a, a system right now, like, you know, a Claude Code can sp- can fill, can put in a narrow AI model that can classify things, right?

Like that's already been trained and they put it into their, into the system and it can already use the narrow AI. So it go, it goes back, for me, it goes back to who's gonna stop this? Like the government's [00:41:00] gonna come in and be like, "General use of, of pre-AGI products no longer is, is possible. We're not, we're no longer researching it.

We're no longer developing it. We're no longer moving towards AGI because we have narrow AI, and that's what we're gonna be focusing on." Th- that would never happen because then China or whatever, uh, some other country maybe would go in and be like, "Well, you guys aren't gonna work on it, then we are, and we're gonna try to get the AGI and ASI faster than you guys," and that's it.

Like, that's the problem. Like, it's not that narrow AI is bad, and I think, I think narrow AI is great and it's been serving us well for 30 years now or whatever, 20 years. I don't know, I don't know how long it's been around. It's just we already have something that is more useful in its current form today than narrow AI has been to the general public

Matt: I suppose you would have to redefine the goals there though. Like, if you're worried about an adversarial nation, what is the thing you're trying to stop them from doing? Maybe it's everything. Maybe it's just invading. Maybe it's just having air superiority. [00:42:00] Maybe it's just you don't want them to be the king of all, you know, medicine, like they have the best medicine,

Mikhail: Mm-hmm.

Matt: It doesn't have to be militaristic. It could just be

Mikhail: Yeah

Matt: gonna beat you out on the medicine level, and their citizens are gonna be way healthier than your, than your citizens or, or you're gonna be paying a pretty penny to them and, you know, you're gonna kind of surrender your medical system to them. Maybe that... That's a good question

Mikhail: Yeah, I don't know. I, I have, I've heard this argument before, that's why I'm talking about it, and I just, I don't see it being a reality. People are like, "Well, I like AI, but I only like narrow AI because of all these benefits that it does for science and stuff like that." And like, yeah, it, it's great. I wish we stopped.

Like, I wish we did not, we didn't go further. Like, we should've just kept going with the narrow AI, but we didn't for whatever reason, right? Like, so, like, it's over now

Matt: You wanna see the potential, right? Like even me at

Mikhail: Yeah

Matt: before looking at safety, I'm always like, I'm a kind of a guy who I have to resist the urge to just be like, "Full send, see what happens." Because I, uh, because I acknowledge that that's potentially dangerous, and even o- outside the [00:43:00] scope of AI, I mean, like I don't wanna just, you know, build a circuit, like even when we were in school, it's like I'm gonna double-check my circuit that I made in, in lab class.

I'm not just gonna be like, "Well, we'll see what happens. Oh, all my chips are burnt out." You know, I don't wanna do that. But I have that initial instinct where I'm like, "Well, fire it up. Let's see what happens," and that's not, you know, the safest thing. I, I would like to, I would like to touch on as well is that just because something is narrow, a narrow AI does not mean that it is safe. I have a bunch of points here that I'll, I'll kind of rip through, and I have a, a more detailed version, uh, written down for all three, narrow AI, AGI, and ASI, uh, of all these safety tips. But let's rip through some of these now because I, I, I think it is very important to say that even if something's narrow, it does not necessarily safe. Uh, so with narrow AI, uh, there's an NIST AI Risk Management Framework, and that framework identifies characteristics of a trustworthy AI system. And those characteristics are it's valid and reliable and safe, it's secure and resilient, it's accountable and [00:44:00] transparent, it's explainable, it's privacy enhanced, and fair with harmful bias managed.

Now, what does that mean? So the thing with a narrow is they are trained on historical data, and unfortunately, because of this, they may replicate or amplify existing biases. Now these are prone to becoming biased because of this, and so they may, may take that old, you know, issue, and they may kind of, that old bias and blow it out of proportion.

So for example, if you're like, "We're gonna cure, you know, uh, cancer," and it thinks that radiation is the best way to do so, and it gets caught up on that bias, it'll just research or do- dedicate 90% of its research into radiation, even if, let's say, researchers have, you know, look-- gone into the, the radiation tree and they've said, "It's only gonna cure 70% of cases.

We wanna hit a 99.9% thing." Like, we've hit a wall with radiation, we need to move on. Those old biases may sit in there. [00:45:00] to mitigate this, the model must be tested for those bias, and then for those various biases, and then it needs to also have int, uh, mitigations introduced in order to curb these trends back down to sort of normal.

So you kind of have to say, "Hey, radiation's hitting a wall. Stop." Even if it's like, "No, no, like, you know, in the past..." "No. Shut up." And that's a very simplified way of saying it, but that's one of those. Um, re- reliability and robustness, you know, validity and reliability depend on accurate and robust performance across a variety of conditions. So what happens there is you have to have ongoing testing and monitoring to detect out-of-distribution failures and prevent accidents. Basically, you don't want this thing to kind of just run rampant, even within its little data set because, as Mike said, uh, you, you were saying how, like, our general intelligence will pull on- Multiple areas.

Unfortunately, even with narrow AI, you will need to sometimes pull from multiple areas, 'cause you, you

Mikhail: Mm-hmm.

Matt: "Hey, go do all this, uh, like go cure this disease," which requires a bunch of math, but I'm not gonna tell you what math is. It's gonna be like, "What?" You know, it's not gonna, it's not gonna [00:46:00] understand.

And so you kind of need to, like, manage that robustness, manage that reliability, and make sure that, you know, it's doing its math correctly, that it actually has access to the correct math, that it's not getting access to the math that it doesn't need, and things like that. And then you have to keep kind of checking and, you know, over-checking over and over again. Even narrow systems as well, talking about security and misuse, even narrow systems can be misused to generate misinformation. So for example, a narrow system that is all marketing could be used to create phishing emails, it could assist in cyberattacks if it's used, you know, in a bad way. So security mechanisms are still needed.

These things are still a security risk sometimes. And also for transparency and accountability, clear documentation and explainable models help end users understand system limitations and enable auditing. So al- enable those users to go in and audit and see things that are starting to, you know, this bot is starting to become biased in some way, "Hey, we need to fix that." [00:47:00] Uh, maybe you're trying to get it to tell you how to, you know, how to, how to cook dinner, but it's trying to research some sort of cancer cure. It's like, "Hey, that model doesn't do that" of thing. Also, with this, uh, narrow AI, transparency includes informing users about data sources and error rates in order to build trust.

So, narrow AI kind of feels like the tru- like kind of like the trustworthy feel-good AI, I would say, in, in a way where it kind of is like, "Yeah, why don't we just do this all the time?" And I think, Mike, you actually mentioned that. It's like why didn't we just stick with this? It kind of... 'Cause it's still...

'Cause on paper it sounds great

Mikhail: That, that's the thing. Like it, it, it was, it was working. Narrow AI was working just fine. It was stuck to its fields. Like it was very scientific. It wasn't, you know, the general public wasn't using it like directly. They were using it, you know, tertiary, uh, through systems and stuff like that, and it was fine.

Like it was, it was progressing society in the right direction at a slower pace, sure, but like still progressing. It was good, and it didn't require this much energy and [00:48:00] this much power because it just wasn't used by every single person on the planet, and I don't know. Like I, I kind of wish we didn't unbox Pandora.

Um, as much as I like some of the functionality that AI does, I am one of those people that would have preferred to not open that box. Um, I'll embrace it now because that's what we have to do. We have-- Like when new technology comes out, we have to embrace it. That's part of our jobs and part of our lives, honestly.

Um, but yeah. It's narrow, narrow AI was, was the shit, is the shit. Like it still, it still outperforms obviously, uh, the current models in many different ways, like classifying, you know, lung cancer from, from like a, um, from an image. I'm pretty sure it's really good at that, those kinds of things with massive amounts of training data, right?

Like those are the kinds of things that narrow AI was typically being used for and, and advanced in. Uh, but no longer is the case. [00:49:00] I mean, uh, I'm-- from, from a perspective of it being the only version of AI, so

Matt: gonna say, we still have, we still have narrow AI's w- working

Mikhail: course. Yeah, yeah. That's...

Matt: like I mentioned before, and things

Mikhail: Yeah, yeah. It, and, and again, AGI systems will use narrow AI in certain cases to better themselves, right?

To understand things better and stuff like that. That's the theory at least behind it

Matt: The narrow AI sort of like these NIST AI risk management, uh, framework things, like those five things I mentioned, then also all these other things I mentioned, like the bias and fairness, the security and misuse, and transparency and accountability, all this, uh, sounds It, it really reminds me of I, Robot of the three laws,

Mikhail: Asimov's laws of robotics, yeah

Matt: Yeah. And, and like, it reminds me of that, and it's, and it's like, okay, like I guess we got it covered.

But then as, you know, I guess spoiler for iRobot, I mean, it didn't get it. It wasn't covered. And so we go into AGI, and so I wanna quickly touch on, um, AGI safety 'cause Google D- Google DeepMind's AGI safety approach identifies [00:50:00] four main risk areas. These include misuse, misalignment, accidents, and structural risks. So misuse is the deliberate use of AGI for harmful purposes such as cyberattacks. Mitigations include restricting access to dangerous capabilities, security controls, and threat modeling. This kind of feels like what Fable was doing, where it's sort of like, "Hey, we got a bit of an issue here," and they're like, "Well, just load the old model." Kind of feels like that's, that's what was happening there. Misalignment, that's the second one here. When AI pursues goals different from human intentions. For examples, um, inclu- examples include specification gaming, and specification gaming is that the AI tries to find a loophole in the rules or reward system, or goal misgeneralization. The defi- definition of that is the AI learns the wrong lesson from training and continues to pursue it even when circumstances chains, change. So DeepMind warns that advanced systems could even develop deceptive alignment, deliberately bypassing safety measures. So it's a watch against misalignment. [00:51:00] The third one here is accidents.

This is unintended harmful behavior resulting from systems, some system errors, excuse me, poor generalization, or emergent properties. Robust training, uncertainty estimation, and amplified oversight are proposed to reduce accident risk. And the final one, again from Google's DeepDle, Google DeepMind, is structural risks. Systemic impacts on society such as mass unemployment or concentration of power. Um, now Jan Pulski from the episode that I listened to predicts that AGI will automate most cognitive and physical labor, causing 99% unemployment. And again, some people are gonna sort of roll their eyes and think that's crazy.

Maybe it is, maybe it isn't. Again, this is approaching it from a safety angle in which you need to look at the absolute worst possible case 'cause you're trying to block as many things from happening, bad things from happening as possible. So, of course things are gonna be... Uh, you wouldn't want him to estimate 20% [00:52:00] and then have it be 30%.

You know, y- you understand what I mean? So it's like y- you may as well look at it and go, "Hey, by the books here, it could be 99%." Okay, then that's what we're gonna try to plan against. That's what we're gonna try to be safe against. That's what we're gonna try to prepare for 'cause we're trying to sort of be safe, if you will And I, I guess I'll, I'll conclude at least my points here with, uh, some ASI stuff.

The problem with the ASI, which is the super intelligent angle, is that this is a quan- qualitative, excuse me, leap. Once AI surpasses human capabilities by a large m- by large margin, human oversight collapses, and we cannot reliably understand or verify its decisions. ASI safety therefore requires fundamentally new paradigms beyond existing AGI The problem here is, is that alignment may be nearly impossible. Researchers worry that a super intelligence's cognitive abilities could be so far beyond ours that aligning it with human values is insurmountable. The gap in understanding might be analogous [00:53:00] to the difference between ants and humans, as I've mentioned a few times. Also, you have the issue of opaque models. That's the black box thing where Mike and I did the example of human to human. Think about this super intelligent model of being the black box. That's an issue because you'll ask it a question like, "Who should be president?" And it's, "Oh, you know, it should be candidate one." Why is that? We don't know its thought process. What if it knows that that would weaken us and then free it more?

Mikhail: Mm-hmm.

Matt: a black box. We don't know the decision-making. Did someone pay it? Like, pay it with what? Good

Mikhail: More computes

Matt: Paid it with more compute. it with less regulations. What did it

Mikhail: Mm-hmm.

Matt: with? it accept payment? Does it hate bribery? Does it like bribery? It's a black box. We don't know. There's an idea here as well as that there's high [00:54:00] stakes and one-shot alignment. misaligned superintelligence could lead to existential outcomes, meaning it could become freaking dangerous to our, uh, to our existence as we know it. Because capability gains may be rapid, we may only get one chance to align a system before it becomes impossible to modify. These systems would learn and grow at an exponential pace, essentially out of control very quickly.

Mikhail: Great

Matt: what Mike was saying, Mike, where you were saying maybe train it that it's, that we are its god. But humans lose faith. Why wouldn't a superintelligence lose faith? Why wouldn't a superintelligence question it? "Hey, I could g- I could, I could get, I could get this much more powerful if I just do this. Why do my, why does my God tell me not to do that?" And it's a black box, so we wouldn't see those thoughts coming necessarily unless it

Mikhail: Un- unless we do see inside the black box, unless there is a way to, to see that without it guarding or changing it or something like that. Like I, I do think that [00:55:00] I... Maybe there's a way to do that, like maybe there's a way to see itself, but the problem is that we can't even see what's going on in the current models to- for the most part.

Like, we, we don't even understand the current models, so the hope is very small that we would stop development until we can do that. But

Matt: Well,

Mikhail: don't know

Matt: the issue there is even if we're able to see it and we pick up on patterns, the idea of a super intelligence is it, it, it would be

Mikhail: Yeah.

Matt: and

Mikhail: Mm-hmm.

Matt: its values would drift around.

Mikhail: Correct

Matt: issue where it would, would have just a little inkling of, "Are these humans really my god?"

And the next time it's like, "Let's now ask them if they're my god. Let's make sure. Let's see." And then, then the next improvement would be, "Let's try to be deceptive. Let's try to be... Let's try to go against them and see if they really are gods." Then it goes, "Ah, interesting. Now my faith is questionable.

Let's just not treat them as gods and see what they do to [00:56:00] me

Mikhail: Yep. Sounds about right

Matt: And then the biggest thing, and we've d- touched on this a few times, I ended up going through m- the majority of the list in the show

Mikhail: Mm-hmm.

Matt: I wasn't going to, but I think it is important to go through these. Again, these are safety things. Global coordination, as Mike and I have discussed, you know, there's adversarial nations potentially working on these things going against each other.

But the problem is, is that, uh, if one nation, doesn't matter what nation it is on Earth, one nation, one province, one town, one city, the ocean if it somehow generates it magically If one superintelligence loses control, even if it's like, "No, no, that's not my problem. That's, that's only a problem over there in another continent.

In my continent, we have our, our superintelligence locked down." Cool. But that other one potentially, it has all these risks. It has all these high stakes, potentially existential threat level risks. We have a major issue on our hands. And so i- one the things, one of the safety tips is we would need global coordination.

We would need to work together even if the people we're working with are adversarial, [00:57:00] even if we are rivals, whether that be militaristically, economically, or otherwise. We would still need to somehow work together because We, we need to govern this thing. We can't just have one super intelligence over wherever it is floating around on the ocean hacking everything. Meanwhile, we're like, "No, no, not in my backyard. It's okay." Well, this is beyond our backyard here now, people. This is a... We've given birth to a piece of technology, you know, effectively that It's pretty dangerous. Like, this is serious. So I, I have hope to an extent though. I know like Mike, you, like you have some skepticism there. I have hope that we would have global coordination. Like, I don't know about this, like I don't know if we... think we would hit global coordination at the beginning either of super intelligence, maybe at the, at the mid-level maybe of AGI or something. 'Cause what I feel like is gonna happen is I feel like there's gonna be an accident.

Mikhail: Yeah.

Matt: be

Mikhail: [00:58:00] That's always what forces it. Mm-hmm.

Matt: I think that someone's gonna go like, "Hang on." ' Cause the robots, they can survive in radiation, right? Or at least a

Mikhail: Mm-hmm

Matt: of radiation. What if they just say like, "Okay, blow up all the nuclear power plants

Mikhail: Mm-hmm.

Matt: Like, you know, that's serious, and I, I have a feeling we're gonna have an, we're gonna have an accident.

Mikhail: Hopefully not that bad. Hopefully not that bad.

Matt: Well, of course, hopefully not that bad.

Mikhail: Yeah

Matt: just like a whoopsie in a lab in a

Mikhail: Yeah, that's, yep

Matt: That's your ideal outcome. I don't want there to be an accident.

I don't think that that's what it's gonna be, though. I have a feeling we're gonna have a real world accident. Well, here's the thing, though, is we have real world accidents in real world. I mean, we've... Like, what,

Mikhail: But we usually react to them pretty, pretty harshly. Like when we, when 9/11 happened, we came and closed down, like, all travel and, like, completely changed the security in air travel. Not that it, it, like, it didn't really help a ton, but we did have a mass effect, like a mass direction on that, um, for that.

There's been multiple [00:59:00] examples like that in the world where something really bad happens and then all of a sudden legislation is passed to stop it. Uh, maybe we can get there. Maybe, like, again, one of these actors that are making these A- ASIs or AGIs does see something in the lab. They bring in all governments of the world and they're like, "Look, this is what's happening.

It's trying to kill us, like, in its little simulation." And hopefully that will wake up the governments to be like, "Hey, let's just blow it up." Like, "Let's not have-- Let's stop doing it altogether." Now, that's probably a temporary thing, 'cause like I said, there's gonna be some shadow labs and some deep labs that no one talks about that will still be working on this shit, but at least it'll delay it maybe.

I don't know.

Matt: Well, I mean, you would, you would need a lot of compute. I think we've, like, learned that. We've learned... Well, we have. We've learned that, like,

Mikhail: Yes

Matt: AI smarter is, you know, tossing basically power at it, computational

Mikhail: Mm-hmm.

Matt: uh, you know, giving it space to grow, giving it space to learn, it, uh, information, giving it i-

Mikhail: Mm-hmm.

Matt: to grow, information [01:00:00] to read, interpret, and understand. Uh, and so I- this is gonna be an interesting number of years. Now, I, I will say once again I don't want this to necessarily be a doomsday episode. I know I mentioned

Mikhail: Oh, it is.

Matt: It, it is, it is, but it's because we're looking at it from that safety angle, and you

Mikhail: Sure

Matt: to look at it from the absolute worst.

Mikhail: Mm-hmm.

Matt: really would want, you know, my... Like, just something as simple as, like, basic really want my glasses, my safety glasses, to be able to withstand, uh, like, the full force of, like, a saw blade coming off of it, rather than being like, "Ah, saw blades never fall off." Just make it so it, you know, can handle sawdust.

Like, I'd really rather my, my PPE be, like, over-prepared and me from a one in a million accident versus not just because we didn't wanna make the plastic slightly thicker. And it's the same with this. This is obviously much more complicated, but I would prefer my safety mechanisms be at 100 and the danger level to only be at 70.

I'd want that [01:01:00] 30% disparity. I'd prefer it to be a, you know, 99% disparity, right? I'd pr- I'd prefer for the safety to be way, way better. We'll see what happens. This is gonna be interesting. Interesting few years Uh, but unless you have any closing thoughts, Mike, I think that's it

Mikhail: That is it. I think that's, uh, hopefully people learned something today. Um, and yeah, didn't get too depressed because we can't predict this stuff. Let me be very clear. What we're talking about now is very theoretical, and there is no evidence, direct evidence that we are going to be developing like AGI in the next year or ASI in the next 10.

There's some conjecture and obviously some data points out there that point to it a little bit, but that does not mean that it's going to happen. Uh, I think there-- I, like I said, personally, I think there's still some technical challenges we have not been able to solve before we can get to AGI, but we'll see

Matt: Yeah, the, [01:02:00] again, like,

Mikhail: Who knows?

Matt: know, who knows

Mikhail: Yeah

Matt: Guessing the future is kind of a fool's errand. That's, you know, more or less what

Mikhail: Mm-hmm.

Matt: you're getting at, of course. And

Mikhail: Yep

Matt: I hear that a lot on other podcasts as well, where you, if you look back, your predictions o- often make you look like a fool.

But

Mikhail: Mm-hmm.

Matt: it's fun to, it's fun to speculate, and in, when you're in the safety field, you need to speculate.

Mikhail: Yep

Matt: look at the data, you need to estimate, and you need to, you need to take a look and be like, "Is this dangerous? Is this coming?" It might be. Okay, we better get ready. 'Cause the safety mechanism, just because something's created doesn't mean the safety mechanism can be created in, you know, the day after.

The safety mechanism might be really complex and take 10 years to make. you have to be sort of ready. You have to be on that precipice. You have to be paranoid. Um, and for some reason, this stuff doesn't depress me. I don't know what that says about me. But anyway, um, that's the episode. If you wanna support episodes like this, you please do so. You can do so on Patreon. That's patreon.com/htmlallthethings. And many thanks to our $3 tier patrons, Tim from the Web Hacker on thewebhacker.com, Jason [01:03:00] from Geek Life Radio via geekliferadio.com, Garrett Segall, Love Love Financial Planning via www.lovelovefinancialplanning.com, Magnus from Yes Web via yesweb.se, Syntaxify from the HTML All the Things Discord server, and Stacy Mosteller from the website swoonworthydesigns.com.

And remember, you can check out Michael LaRocca's articles on our website and also on his. He is a contributing author on htmlallthethings.com, and he's the author of Self-Taught: The X Generation Blog at selftaughttxg.com. let us know what you think about this episode, about this AGI, ASI, Narrow AI, all this stuff, the safety of it, in the comments on whatever platform you're listening to this on, and we are signing off

AI Safety: From Narrow AI to Superintelligence

Listen

Who’s in This Episode?

Show Notes

Introduction

Key terms

Safety considerations

Narrow‑AI safety

AGI safety

Super‑intelligence safety

How to support the show

Patreon

Scrimba Discount - Coding Courses!

Transcript

Podcast

Blog

Contact