Over the centuries, innovation has been regarded with suspicion as much as hope—as in ancient Greece, when Socrates believed that writing things down, rather than relying on memory for everything, would make people stupid. The written was sure, he said, to “introduce forgetfulness into the soul,” and thereby ruin the mind. With this anecdote about Socrates, computational social scientist Matt Groh opened his three-week “Make-A-Fake” course for the MIT Museum. Groh went on to explain that the fakes he would be talking about are classified as synthetic media—a category that includes not only altered videos, but AI-generated artifacts like faked cover letters, faked news articles, and faked images.
Groh, a Ph.D. candidate in the Affective Computing group of MIT’s Media Lab, designed the class so that participants would not only learn how to create their own fakes, using free online tools, but also how to distinguish a fake video or image from an authentic one. During each meeting, he also discussed some of the ways that AI might be misused—moments that always elicited plenty of discussion from the group—and described various efforts that have emerged to help society suss out such forgeries. As he said, “We are all susceptible to lies, and therefore we’re all susceptible to Deep Fakes too.”
At the moment, the best-known artificial intelligence tool might be ChatGPT—which is able to generate text thanks to what is known as a large language model (LLM). LLM’s like the one that essentially powers ChatGPT are computer program or algorithms that depend on, or pull from, massive language datasets that have been fed into them during the building process; those enormous amounts of language allow the AI to recognize, summarize, translate, and predict text; to generate authoritative-sounding paragraphs and pages; and—at least in the case of the text-to-image tools that Groh would introduce at another class—to generate other content too.
Until very recently, Chat GPT was not only the most-talked-about LLM but also perhaps the largest: Trained on 500 billion words, it contained 175 billion parameters, or units. But last year, an algorithmic competitor, BLOOM, surpassed ChatGPT; it contains 176 billion parameters, also referred to as binaries. And yet, these tools are like the skyscrapers of old: A new height is barely reached before it it surpassed—and GPT is expected to reach 400 trillion parameters soon enough.
Despite what seems to be a widespread assumption that these models are “trained on” data that accurately reflects the world, like a mirror might reflect an image, Groh pointed out that, in fact, the systems have built-in biases—just as bias was built into Google’s facial recognition mechanisms. A lot of world languages, like those spoken in southern Africa, are barely represented,” in many of these LLMs, he noted. What’s more, he said, “There’s no clear breakdown of how many romance novels are fed to the AI, how many philosophers.”
Elvis Presley Performing Live on "America's Got Talent"
Towards the end of the first meeting, Groh helped familiarize the class with the sub-group of synthetic media referred to as “deep fakes”—videos characterized by one of three types of videographic alterations; face-swapping, head puppetry, and lip-syncing—by showing a clip from a recent primetime network television show: Elvis Presley performed onstage at “America’s Got Talent,” while the show’s hosts served as background singers. Or, at least, so it seemed. In fact, the real star of the show that night was the engine of artificial intelligence used to create compelling footage of the long-dead King of Rock ‘n’ Roll.
Characteristics of Deep Fakes
On the second Monday night in February that the class met, Groh opened with a clip of Tom Cruise talking about “the wonderful world of deep fakes”—except the actor talking wasn’t the real Tom Cruise, but an uncanny double. The video was one of a series made using a Cruise impersonator and deepfake technology—a double whammy of forgery that led to exceedingly realistic-seeming footage. But, as Groh pointed out, it wouldn’t be easy for the average person to replicate such an effort: The trickster behind the Tom Cruise fakes—AI effects artist Chris Umé—spent two and a half months feeding videos and images of the Mission: Impossible star to an AI tool, to ensure the program would be able to videographically enhance recording of the Cruise look-a-like such that the imposter’s movements and facial gestures would be pristinely Cruise-esque. In fact, as Groh pointed out, It takes 200,000 attempts or tweaks, on average, before AI video tools begin to produce realistic-looking fakes.
While showing a few more clips and videos, Groh asked the class to identify the fakes, and to explain why they thought this one or that was authentic or falsified. The results among the sold-out group of fifteen, many of whom were very tech-savvy, were mixed: Some students—particularly a young woman who works in tech—could give astoundingly accurate explanations for why a fake was a fake (“The shadows made by the sun should be on the other side of his face,” she said). But even in that group of relatively affluent and well-educated people, who had already (thanks to Groh) acquired better-than-average knowledge about deep fakes, plenty were stumped.
As it stands, the best way to spot a falsified video or image is to look it in the eyes, so to speak. “Deep fakes are all about manipulating faces,” Groh said. AI tools alter faces in videos or images in one of three ways:
Groh noted that humans may have an advantage over algorithms when it comes to sussing out deep fakes; humans process faces holistically, a sophisticated process, honed during an important stage of human development. At the same time, humans are not great at spotting implausibilities because we don’t expect them; the majority of us interact with the world with a relative lack of suspicion, assuming that what we see is usually what we get. What’s more, we’ll get to a point in the near future when it will be virtually impossible, even with such pointers, to distinguish between deep fakes and authentic artifacts—the technology will just be too powerful.
The Political Dangers of Deep Fakes
A student’s question about the dangers of deep fakes prompted Groh to bring up an example that arose earlier in the Ukraine War. “Someone put out a deep fake of [Ukrainian President Voldomor] Zelensky saying, ‘I resign, I forfeit to Russia,’” as Groh explained. “Twenty minutes after it came out, Zelensky released a video saying it wasn’t true, he wasn’t resigning. World leaders have access to the media. They can get ahead of this kind of thing very quickly.” At the same time, Groh says, who controls the means of communication is important. If Ukraine had lost control of its social media account, and the Russian government dominated Ukraine’s media venues, the situation would’ve been very different. “It can get tricky,” he said.
On the flip side, when producing misinformation, fake videos, or manipulated images is so easy, it’s also easy for a liar—like a politician or other prominent figure—to claim that an authentic video of them is a fake, if a reel of them saying something unpopular or offensive gets out, for instance. That points to a future in which belief in anything is reduced, skepticism in everything is increased, Groh said. “People have been talking about this for a while,” he said, pointing to a 2018 paper that gives the posited effect a name: the liar’s dividend. Nonetheless, Groh argues, increased skepticism may be a good thing. “It doesn’t have to be [that people will think], ‘I don’t believe anything I see,’” he explains. “It could be, ‘I’m going to spend time thinking critically about everything I see.’ I’m not saying that everyone will always do that. But when it’s just as easy to create a fake video as it is to lie, that will start changing the way we interpret visual media. And that’s the thing: everything is dynamic here.”
The Effort to Authenticate Content
Making fakes is already so easy that society is about to be exposed to an onslaught of unethical products made with synthetic media tools. Generative AI tools will allow bad actors to videographically impersonate targets, making people seem to say things they haven’t said, for instance; they could also allow hackers to break into bank accounts using simulated voice recordings.
At least one high-profile group is pushing back against the coming wave: A number of big news and tech companies are working together under the name The Coalition for Content Provenance and Authenticity (C2PA) to provide a means of authenticating online artifacts like videos and images. C2PA brings together the BBC, CBC Radio Canada, and the New York Times, as well as Microsoft, Twitter, and Adobe with the goal of developing of “open, global technical standards” that will serve as a kind of watermark, to help establish or prove the genuineness of clips, photos, and other media used online. Those standards or tools should help “creators to claim authorship while empowering consumers to make informed decisions about what to trust,” according to C2PA’s site. As Groh explained, the C2PA approach encourages big media companies to add to images and videos a kind of digital serial number or signature—i.e. cryptographically bound meta-data; these tags allow viewers to see where and how media originated and whether it has been tampered with.
As good as this workaround may sound, getting the public to buy in isn’t a given—and if people don’t know about, demand, and look for such marks, they will be essentially useless. Groh touched on this concern in class, and a student brought it up again afterwards. The student—who is also a Lecturer at the Harvard Graduate School of Education, and Co-Chair of Arts and Learning there—pointed out that to be able to detect a deep fake, the public will need to have some basic understanding of the underlying issues, along with some forgery-detecting skills. “Different people have different educations, different skills,” said the pupil. “And among schools, there is such a discrepancy around how young people are being taught”—including about technology. Among different communities and populations, too, discrepancies exist.
Much of the public is also unlikely to realize that, as Groh pointed, the sites that provide unpaid access to generative AI will be based largely on an advertising model, as is the Internet writ large. Although society will supposedly use them for “free,” users will pay in the form of giving the companies behind the sites access to their online behavior—and to their eyeballs, which will also be exposed to the ads their algorithms believe will tempt us most.
During the energetic focus group discussion after the last class, led by MIT Museum Director John Durant and attended by all but one of the fifteen students from the sold-out class, the participants talked about how the course had changed their take on the possible harms and goods of generative AI. One man, a Ph.D. student in psychology at MIT, talked about the corrosive influence he expects deep fakes will have on people’s faith in information. He pointed out that Steve Bannon, Trump’s chief strategist and the former head of Breitbart News—also a master of misinformation—reportedly argued that Trump’s political opponents didn’t matter; only news coverage did. “The real opposition is the media,” Bannon supposedly said in 2019. “And the way to deal with them is to flood the zone with sh*t.” As the postdoc noted, “Deep fakes push the cost of making another marginal piece of sh*t to zero.”
Chat GPT can also generate misinformation, even when it is not directed to do so—which is to say that a user may engage Chat GPT with the goal of getting a factual write-up on some subject, only to unknowingly end up with an inaccurate bit of text. That was the case for one of the students, a recently-retired software engineer, who, as part of a class assignment, asked Chat GPT to produce a description of a company for which he’d worked—and the AI system told him, incorrectly, that the company had gone out of business while he was employed there. Where did this error come from? Very likely—as one student knew, and Groh confirmed—from the dataset that Chat GPT has been trained on, which happens to be the entire freely-accessible, never-fact-checked Internet. The store of data was scraped up by the humans programmers behind Chat GPT; everything available online was dumped in there, regardless of quality.
Another class attendee, a recent immigrant from Turkey, pointed out that in his homeland, most people get their information from social media because they don’t trust mainstream news channels, which are largely government-controlled. “Discussing things with people becomes impossible,” he said, “because they are so influenced by social media”—which isn’t necessarily any more accurate than state-backed news; moreover, the biases in social media streams may not be as obvious. He speculated that in the U.S., conversations about current and political events may soon begin to resemble those in Turkey; that generative AI will expose people to increasingly “noisy” information, which will further polarize an already-divided population.
AI began to be developed about sixty years ago, as Durant pointed out, and yet, as he said, “It feels like we’re suddenly going up a deep curve.” What did the students make of this sudden acceleration? The retired software engineer brought up his concerns about “secondary and tertiary effects in society, things we can’t foresee.” For instance, he said, “The generative AI, especially the large language models, I feel like they will have a big effect on jobs. I think so many knowledge professions will be affected by it. An entry-level lawyer won’t be able to find work because AI can do what someone starting out in law used to do. In five or ten years, so much could change.” And these are simply the changes he could foresee. What the unknown unknowns are, even AI can’t predict.
Photography by Ashley McCabe