All hail our new AI overlords

AI is going to have a tremendous impact on our lives. How can we get ready for it?

How AI solutions will change our lives

Disclaimer: the opinions of the author are his own and may not reflect the official position of Kaspersky (the company).

Beyond the various geopolitical events that defined 2022, on the technological level, it was the year of AI. I might as well start by coming clean: until very recently, whenever I’d be asked about AI in cybersecurity, I’d dismiss it as vaporware. I always knew machine learning had many real-world applications; but for us in the infosec world, AI had only ever been used in the cringiest of product pitches. To me, “AI-powered” was just an elegant way of vendors of saying “we have no existing knowledge base or telemetry, so we devised a couple of heuristics instead”. I remain convinced that in more than 95% of cases, the resulting products contained little actual AI either. But the thing is, while marketing teams were busy slapping “AI” stickers on any product that involved k-means calculus as part of its operation, the real AI field was actually making progress.

The day of reckoning for me came when I first tried DALL-E 2 (and soon thereafter, Midjourney). Both projects allow you to generate images based on textual descriptions, and have already caused significant turmoil in the art world.

This art is generated with Midjourney using the prompt "All hail our new AI overlords"

This art is generated with Midjourney using the prompt “All hail our new AI overlords”

Then, in December of last year, ChatGPT took the world by storm. Simply put, ChatGPT is a chatbot. I assume most people have already tried it at this point, but if you haven’t, I strongly suggest you do (just be sure not to confuse it with a virus). No words can convey how much it improves over previous projects, and hearing about it just isn’t enough. You have to experience it to get a feel for everything that’s coming…

ChatGPT says for itself

ChatGPT says for itself

Language models

In the words of Arthur C. Clarke, “any sufficiently advanced technology is indistinguishable from magic”. I love how technology can sometimes bring this sense of wonder into our lives, but this feeling unfortunately gets in the way when we attempt to think about the implications or limits of a new breakthrough. For this reason, I think we first need to spend some time understanding how these technologies work under the hood.

Let’s start with ChatGPT. It’s a language model; in other words, it’s a representation of our language. As is the case with many large machine learning projects, nobody really knows how this model works (not even OpenAI, its creators). We know how the model was created, but it’s way too complex to be formally understood. ChatGPT, being the largest (public?) language model to date, has over 175 billion parameters. To grasp what that means, imagine a giant machine that has 175 billion knobs you can tweak. Every time you send text to ChatGPT, this text becomes converted into a setting for each of those knobs. And finally, the machine produces output (more text) based on their position. There’s also an element of randomness, to ensure that the same question won’t always lead to the exact same answer (but this can be tweaked as well).

This is the reason why we perceive such models as black boxes: even if you were to spend your life studying the machine, it’s unclear that you’d ever be able to figure out the purpose of a single knob (let alone all of them). Still, we know what the machine does because we know the process through which it was generated. The language model is an algorithm that can process text, and it was fed lots of it during its training phase: all of Wikipedia, scraped web pages, books, etc. This allowed for the creation of a statistical model that knows the likelihood of having one word follow another. If I say “roses are red, violets are”, you can guess with a relatively high degree of confidence that the next word will be “blue”. This is, in a nutshell, how any language model works. To such a model, finishing your sentence is no different from guessing which sequence of words is likely to follow your question based on everything it’s read before. In the case of ChatGPT, there was actually one more step involved — called supervised fine-tuning. Human “AI trainers” had numerous chats with the bot and flagged all answers deemed problematic (inaccurate, biased, racist, etc.) so it would learn not to repeat them.

Roses are red, violets are… Guess what?

If you can’t wrap your head around AI, file it under “math” or “statistics”: the goal of these models is prediction. When using ChatGPT, we very easily develop the feeling that the AI “knows” things, since it’s able to return contextually-relevant and domain-specific information for queries it sees for the first time. But it doesn’t understand what any of the words mean: it’s only capable of generating more text that “feels” like it would be a natural continuation of whatever was given. This explains why ChatGPT can lay out a complex philosophical argument, but often trips up on basic arithmetic: it’s harder to predict the result of calculus than the next word in a sentence.

Besides, it doesn’t have any memory: its training ended in 2021 and the model is frozen. Updates come in the form of new models (i.e., GPT-4 in 2024) trained on new data. In fact, ChatGPT doesn’t even remember the conversations you’re having with it: the recent chat history is sent along with any new text you type so that the dialog feels more natural.

Whether this still qualifies as “intelligence” (and whether this is significantly different from human intelligence) will be the subject of heated philosophical debates in the years to come.

Diffusion models

Image generation tools like Midjourney and DALL-E are based on another category of models. Their training procedure, obviously, focuses on generating images (or collections of pixels) instead of text. There are actually two components required to generate a picture based on a textual description, and the first one is very intuitive. The model needs a way to associate words with visual information, so it’s fed collections of captioned images. Just like with ChatGPT, we end up with a giant, inscrutable machine that’s very good at matching pictures with textual data. The machine has no idea what Brad Pitt’s face looks like, but if it’s seen enough photos of him, it knows that they all share common properties. And if someone submits a new Brad Pitt photo, the model is able to recognize him and go “yup, that’s him again”.

The second part, which I found more surprising, is the ability to enhance images. For this, we use a “diffusion model”, trained on clean images to which (visual) noise is gradually added until they become unrecognizable. This allows the model to learn the correspondence between a blurry, low-quality picture and its higher-resolution counterpart — again, on a statistical level — and recreate a good image from the noisy one. There are actually AI-powered products dedicated to de-noising old photos or increasing their resolution.

An example of increasingly low-quality images used to train diffusion models with my trusty avatar

An example of increasingly low-quality images used to train diffusion models with my trusty avatar

Putting everything together, we are able to synthetize images: we start from random noise, and “enhance” it gradually while making sure it contains the characteristics that match the user’s prompt (a much more detailed description of DALL-E’s internals can be found here).

The wrong issues

The emergence of all the tools mentioned in this article led to a strong public reaction, some of which was very negative. There are legitimate concerns to be had about the abrupt irruption of AI in our lives, but in my opinion, much of the current debate focuses on the wrong issues. Let us address those first, before moving on to what I think should be the core of the discussion surrounding AI.

DALL-E and Midjourney steal from real artists

On a few occasions, I have seen these tools described as programs that make patchworks of images they’ve seen before, and then apply kind of filters that allow them to imitate the style of the requested artist. Anyone making such a claim is either ignorant of the technical realities of the underlying models, or arguing in bad faith.

As explained above, the model is completely incapable of extracting images, or even simple shapes from the images it is trained on. The best it can do is extract mathematical features.

What people believe DALL-E starts from (left) versus what DALL-E actually starts from (right)

What people believe DALL-E starts from (left) versus what DALL-E actually starts from (right)

There’s no denying that many copyrighted works were used in the training phase without the original authors’ explicit consent, and maybe there’s a discussion to be had about this. But it’s also worth pointing out that human artists follow the exact same process during their studies: they copy paintings from masters and draw inspiration from artwork that they encounter. And what is inspiration, if not the ability to capture the essence of an art piece combined with the drive to re-explore it?

DALL-E and Midjourney introduce a breakthrough in the sense that they’re theoretically able to gain inspiration from every picture produced in human history (and, likely, any one they produce from now on), but it’s a change in scale only — not in nature.

Compelling evidence of Wolfgang Amadeus Mozart stealing from artists during his training phase

Compelling evidence of Wolfgang Amadeus Mozart stealing from artists during his training phase

AI makes things too easy

Such criticism usually implies that art should be hard. This has always been a surprising notion to me, since the observer of an art piece usually has very little idea of how much (or how little) effort it took to produce. It’s not a new debate: years after Photoshop was released, a number of people are still arguing that digital art is not real art. Those who say it is put forward that using Photoshop still requires skill, but I think they’re also missing the point. How much skill did Robert Rauschenberg require to put white paint on a canvas? How much music practice do you need before you can perform John Cage’s infamous 4′33″?

Even if we were to introduce skill as a criterion for art, where would we draw the line in the sand? How much effort is enough effort? When photography was invented, Charles Baudelaire called it “the refuge of every would-be painter, every painter too ill-endowed or too lazy to complete his studies” (and he was not alone in this assessment). Turns out he was wrong.

ChatGPT helps cybercriminals

With the rise of AI, we’re going to see productivity gains all across the board. Right now, a number of media outlets and vendors are doing everything they can to hitch a ride on the ChatGPT hype, which leads to the most shameful clickbait in recent history. As we wrote earlier, ChatGPT may help criminals draft phishing emails or write malicious code — none of which have ever been limiting factors. People familiar with the existence of GitHub know that malware availability is not an issue for malicious actors, and anyone worried about speeding up development should have raised those concerns when Copilot was released.

I realize it’s silly to debunk a media frenzy born of petty economic considerations instead of genuine concerns, but the fact is: AI is going to have a tremendous impact on our lives and there are real issues to be addressed. All this noise is just getting in the way.

There’s no going back

No matter how you feel about all the AI-powered tools that were released in 2022, know that more are coming. If you believe the field will be regulated before it gets out of control, think again: the political response I’ve witnessed so far was mostly governments deciding to allocate more funds to AI research while they can still catch up. No one in power has any interest in slowing this thing down.

The fourth industrial revolution

AI will lead to — or has probably already led to — productivity gains. How massive they are/will be is hard to envision just yet. If your job consists in producing semi-inspired text, you should be worried. This applies if you’re a visual designer working on commission too: there’ll always be clients who want the human touch, but most will go for the cheap option. But that’s not all: reverse engineers, lawyers, teachers, physicians and many more should expect their jobs to change in profound ways.

One thing to keep in mind is that ChatGPT is a general-purpose chatbot. In the coming years, specialized models will emerge and outperform ChatGPT on specific use-cases. In other words, if ChatGPT can’t do your job now, it’s likely that a new AI product released in the next five years will. Our jobs, all our jobs, will involve supervising AI and making sure its output is correct rather than doing it ourselves.

It’s possible that AI will hit a complexity wall and not progress any further — but after being wrong a number of times, I’ve learned not to bet against the field. Will AI change the world as much as the steam engine did? We should hope that it doesn’t, because brutal shifts in means of production change the structure of human society, and this never happens peacefully.

AI bias and ownership

Plenty has been said about biases in AI tools that I won’t get back into it. A more interesting subject is the way OpenAI fights those biases. As mentioned above, ChatGPT went through a supervised learning phase where the language model basically learns not to be a bigot. While this is a desirable feature, one can’t help but notice that this process effectively teaches a new bias to the chatbot. The conditions of this fine-tuning phase are opaque: who are the unsung heroes flagging the “bad” answers? Underpaid workers in third-world countries, or Silicon Valley engineers on acid? (Spoiler: it’s the former.)

It’s also worth remembering that the AI products won’t work for the common good. The various products designed at the moment are owned by companies that will always be driven, first and foremost, by profits that may or may not overlap with humankind’s best interests. Just like a change in Google’s search results has a measurable effect on people, AI companions or advisors will have the ability to sway users in subtle ways.

What now?

Since the question no longer seems to be whether AI is coming into our lives but when, we should at least discuss how we can get ready for it.

We should be extremely wary of ChatGPT (or any of its scions) ending up in a position where it’s making unsupervised decisions: ChatGPT is extremely good at displaying confidence, but still gets a lot of facts wrong. Yet there’ll be huge incentives to cut costs and take humans out of the loop.

I also predict that over the next decade, the majority of all content available online (first text and pictures, then videos and video games) will be produced with AI. I don’t think we should count too much on automatic flagging of such content working reliably either — we’ll just have to remain critical of what we read online and wade through ten times more noise. Most of all, we should be wary of the specialized models that are coming our way. What happens when one of the Big Four trains a model with the tax code and starts asking about loopholes? What happens when someone from the military plays with ChatGPT and goes: “yeah, I want some of that in my drones”?

AI will be amazing: it will take over many boring tasks, bring new abilities to everyone’s fingertips and kickstart whole new artforms (yes). But AI will also be terrible. If history is any indication, it will lead to a further concentration of power and push us further down the path of techno-feudalism. It will change the way work is organized and maybe even our relationship with mankind’s knowledge pool. We won’t get a say in it.

Pandora’s box is now open.