[Plura-list] The Coprophagic AI crisis

Cory Doctorow doctorow at craphound.com
Thu Mar 14 11:27:51 EDT 2024


Read today's issue online at: https://pluralistic.net/2024/03/14/14/inhuman-centipede

^-^-^-^-^-^-^-^-^-^-^-^-^-^-^-^-^-^-^-^-^-^-^-^-^-^-^-^-^-^-^

I'm on tour with my new, nationally bestselling novel *The Bezzle*! Catch me in Toronto (Mar 22), NYC (Mar 24) (with Laura Poitras!), Anaheim (Mar 29-31) and more!

https://pluralistic.net/2024/02/16/narrative-capitalism/#bezzle-tour

Name your price for 18 of my DRM-free ebooks and support the Electronic Frontier Foundation with the Humble Cory Doctorow Bundle:

https://www.humblebundle.com/books/cory-doctorow-novel-collection-tor-books-books

^-^-^-^-^-^-^-^-^-^-^-^-^-^-^-^-^-^-^-^-^-^-^-^-^-^-^-^-^-^-^

Today's links

* The Coprophagic AI crisis: Even the mystical account of AI's glorious future fails on its own terms.

* Hey look at this: Delights to delectate.

* This day in history: 2004, 2009, 2014, 2019, 2023

* Upcoming appearances: Where to find me.

* Recent appearances: Podcasts, events and more.

* Latest books: You keep readin' em, I'll keep writin' 'em.

* Upcoming books: Like I said, I'll keep writin' 'em.

* Colophon: All the rest.

^-^-^-^-^-^-^-^-^-^-^-^-^-^-^-^-^-^-^-^-^-^-^-^-^-^-^-^-^-^-^

🔥 The Coprophagic AI crisis

A key requirement for being a science fiction writer without losing your mind is the ability to distinguish between science fiction (futuristic thought experiments) and *predictions*. SF writers who lack this trait come to fancy themselves fortune-tellers who SEE! THE! FUTURE!

The thing is, sf writers cheat. We palm cards in order to set up pulp adventure stories that let us indulge our thought experiments. These palmed cards - say, faster-than-light drives or time-machines - are *narrative devices*, not scientifically grounded proposals.

Historically, the fact that some people - both writers *and* readers - couldn't tell the difference wasn't all that important, because people who fell prey to the sf-as-prophecy delusion didn't have the power to re-orient our society around their mistaken beliefs. But with the rise and rise of sf-obsessed tech billionaires who keep trying to invent the torment nexus, sf writers are starting to be more vocal about distinguishing between our made-up funny stories and predictions (AKA "cyberpunk is a warning, not a suggestion"):

https://www.antipope.org/charlie/blog-static/2023/11/dont-create-the-torment-nexus.html

In that spirit, I'd like to point to how one of sf's most frequently palmed cards has become a commonplace of the AI crowd. That slight of hand is: "add enough compute and the computer will wake up." This is a shopworn cliche of sf, the idea that once a computer matches the human brain for "complexity" or "power" (or some other simple-seeming but profoundly nebulous metric), the computer will become conscious. Think of "Mike" in Heinlein's *The Moon Is a Harsh Mistress":

https://en.wikipedia.org/wiki/The_Moon_Is_a_Harsh_Mistress#Plot

For people inflating the current AI hype bubble, this idea that making the AI "more powerful" will correct its defects is key. Whenever an AI "hallucinates" in a way that seems to disqualify it from the high-value applications that justify the torrent of investment in the field, boosters say, "Sure, the AI isn't good enough...*yet*. But once we shovel an order of magnitude more training data into the hopper, we'll solve that, because (as everyone knows) making the computer 'more powerful' solves the AI problem":

https://locusmag.com/2023/12/commentary-cory-doctorow-what-kind-of-bubble-is-ai/

As the lawyers say, this "cites facts not in evidence." But let's stipulate that it's true for a moment. If all we need to make the AI better is more training data, is that something we can count on? Consider the problem of "botshit," Andre Spicer and co's very useful coinage describing "inaccurate or fabricated content" shat out at scale by AIs:

https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4678265

"Botshit" was coined last December, but the internet is already drowning in it. Desperate people, confronted with an economy modeled on a high-speed game of musical chairs in which the opportunities for a decent livelihood grow ever scarcer, are being scammed into generating *mountains* of botshit in the hopes of securing the elusive "passive income":

https://pluralistic.net/2024/01/15/passive-income-brainworms/#four-hour-work-week

Botshit can be produced at a scale and velocity that beggars the imagination. Consider that Amazon has had to cap the number of self-published "books" an author can submit to a mere *three books per day*:

https://www.theguardian.com/books/2023/sep/20/amazon-restricts-authors-from-self-publishing-more-than-three-books-a-day-after-ai-concerns

As the web becomes an anaerobic lagoon for botshit, the quantum of human-generated "content" in any internet core sample is dwindling to homeopathic levels. Even sources considered to be nominally high-quality, from Cnet articles to legal briefs, are contaminated with botshit:

https://theconversation.com/ai-is-creating-fake-legal-cases-and-making-its-way-into-real-courtrooms-with-disastrous-results-225080

Ironically, AI companies are setting themselves up for this problem. Google and Microsoft's full-court press for "AI powered search" imagines a future for the web in which search-engines stop returning links to web-pages, and instead summarize their content. The question is, why the *fuck* would anyone write the web if the only "person" who can find what they write is an AI's crawler, which ingests the writing for its own training, but has no interest in steering readers to see what you've written? If AI search ever becomes a thing, the open web will become an AI CAFO and search crawlers will increasingly end up imbibing the contents of its manure lagoon.

This problem has been a long time coming. Just over a year ago, Jathan Sadowski coined the term "Habsburg AI" to describe a model trained on the output of another model:

https://twitter.com/jathansadowski/status/1625245803211272194

There's a certain intuitive case for this being a bad idea, akin to feeding cows a slurry made of the diseased brains of other cows:

https://www.cdc.gov/prions/bse/index.html

But "The Curse of Recursion: Training on Generated Data Makes Models Forget," a recent paper, goes beyond the ick factor of AI that is fed on botshit and delves into the mathematical consequences of AI coprophagia:

https://arxiv.org/abs/2305.17493

Co-author Ross Anderson summarizes the finding neatly: "using model-generated content in training causes irreversible defects":

https://www.lightbluetouchpaper.org/2023/06/06/will-gpt-models-choke-on-their-own-exhaust/

Which is all to say: even if you accept the mystical proposition that more training data "solves" the AI problems that constitute total unsuitability for high-value applications that justify the trillions in valuation analysts are touting, that training data is going to be ever-more elusive.

What's more, while the proposition that "more training data will linearly improve the quality of AI predictions" is a mere article of faith, "training an AI on the output of another AI makes it *exponentially worse*" is a matter of *fact*.


^-^-^-^-^-^-^-^-^-^-^-^-^-^-^-^-^-^-^-^-^-^-^-^-^-^-^-^-^-^-^

🔥 Hey look at this

* Tuckerization by Cory Doctorow https://locus.betterworld.org/auctions/locus-magazine-science-fiction-f/items/tuckerization-by-cory-doctorow

* Tech Titans Are the Robber Barons of Our Gilded Age https://jacobin.com/2024/03/big-tech-apple-epic-regulations/

* American billionaires are a policy failure https://a.wholelottanothing.org/american-billionaires-are-a-policy-failure/

^-^-^-^-^-^-^-^-^-^-^-^-^-^-^-^-^-^-^-^-^-^-^-^-^-^-^-^-^-^-^

🔥 This day in history

#20yrsago Secret knocking codes for firewalls https://web.archive.org/web/20050212160334/http://www.linuxjournal.com/article/6811

#20yrsago The Talking Heads decision: the judicial system’s David Byrne infatuation https://web.archive.org/web/20040630151030/http://www.legalunderground.com/2004/02/i_was_ready_to_.html

#20yrsago Bush and Kerry’s RSS, side by side https://web.archive.org/web/20040401181052/http://coollame.org/bushkerry.php

#15yrsago Derivatives exposures is worth $190K/human being on Earth https://www.siliconvalleywatcher.com/the-size-of-derivatives-bubble--190k-per-person-on-planet/

#10yrsago British spies lied about getting super-censorship powers over Youtube https://www.techdirt.com/2014/03/14/turns-out-uk-government-only-wishes-it-had-special-powers-to-censor-youtube/

#10yrsago Florida set to delete Hampton, a town with a questing, rent-seeking, corrupt wang https://www.loweringthebar.net/2014/03/hampton-fl.html

#10yrsago Peak Facebook https://medium.com/a-programmers-tale/the-facebook-experiment-has-failed-lets-go-back-f7b8c66109ea

#5yrsago Beto O’Rourke was in the Cult of the Dead Cow and his t-files are still online https://www.reuters.com/investigates/special-report/usa-politics-beto-orourke/

#5yrsago Security researchers reveal defects that allow wireless hijacking of giant construction cranes, scrapers and excavators https://www.trendmicro.com/vinfo/us/security/news/vulnerabilities-and-exploits/attacks-against-industrial-machines-via-vulnerable-radio-remote-controllers-security-analysis-and-recommendations

#5yrsago Letterlocking: the long-lost art of using paper-folding to foil snoops https://www.atlasobscura.com/articles/what-did-people-do-before-envelopes-letterlocking

#5yrsago Self-insurer Walmart flies its sick employees to out-of-state specialists to avoid local price-gougers https://www.cnbc.com/2019/03/14/walmart-sends-employees-to-top-hospitals-out-of-state-for-treatment.html

#5yrsago Big Chemical says higher pollution levels are safe in West Virginia because residents don’t drink water, and are so fat that poisons are diluted in their bodies https://washingtonmonthly.com/2019/03/14/the-real-elitists-looking-down-on-trump-voters/

#1yrago Learning from Silicon Valley Bank's apologists https://pluralistic.net/2023/03/15/mon-dieu-les-guillotines/#ceci-nes-pas-une-bailout

^-^-^-^-^-^-^-^-^-^-^-^-^-^-^-^-^-^-^-^-^-^-^-^-^-^-^-^-^-^-^

🔥 Upcoming appearances

* Wendy Michener Memorial Lecture, Mar 22 (Toronto)
https://events.yorku.ca/events/wendy-michener-memorial-lecture2024/

* The Bezzle at Word, Mar 24 (NYC):
https://shop.wordbookstores.com/event/word-presents-cory-doctorow

* Enshittification: How the Internet Went Bad and How to Get it Back (virtual), Mar 26
https://libcal.library.ubc.ca/event/3781006

* Wondercon Anaheim, Mar 29-31
https://www.comic-con.org/wc/

* Computer Pasts/Computer Futures (NYU/virtual), Apr 4
https://steinhardt.nyu.edu/events/deans-public-square-series-computer-pasts-computer-futures

* The Bezzle at Harvard Berkman-Klein Center, with Randall Munroe (Apr 11)
https://cyber.harvard.edu/events/enshittification

* The Bezzle at Anderson's Books (Chicago), Apr 17
https://www.andersonsbookshop.com/event/cory-doctorow-1

* Torino Biennale Tecnologia (Apr 19-21)
https://www.turismotorino.org/en/experiences/events/biennale-tecnologia

* Canadian Centre for Policy Alternatives (Winnipeg), May 2
https://www.eventbrite.ca/e/cory-doctorow-tickets-798820071337?aff=oddtdtcreator

* Tartu Prima Vista Literary Festival (May 5-11)
https://tartu2024.ee/en/kirjandusfestival/

* Media Ecology Association keynote, Jun 6-9 (Amherst, NY)
https://media-ecology.org/convention

* American Association of Law Libraries keynote, (Chicago), Jul 21
https://www.aallnet.org/conference/agenda/keynote-speaker/

^-^-^-^-^-^-^-^-^-^-^-^-^-^-^-^-^-^-^-^-^-^-^-^-^-^-^-^-^-^-^

🔥 Recent appearances

* How tech-savvy author Cory Doctorow got scammed (Chicago Public Square)
https://www.chicagopublicsquare.com/2024/03/how-tech-savvy-author-cory-doctorow-got.html

* Is Social Media Becoming a Bit Shit? (The Briefing)
https://www.youtube.com/watch?v=jvPlpMd1KEw

* Radioactive (KCRL)
https://krcl.org/blog/grist-investigates-doctorow-seed/

^-^-^-^-^-^-^-^-^-^-^-^-^-^-^-^-^-^-^-^-^-^-^-^-^-^-^-^-^-^-^

🔥 Latest books

* The Bezzle: a sequel to "Red Team Blues," about prison-tech and other grifts, Tor Books (US), Head of Zeus (UK), February 2024 (the-bezzle.org). Signed, personalized copies at Dark Delicacies (https://www.darkdel.com/store/p3062/Available_Feb_20th%3A_The_Bezzle_HB.html#/).

* "The Lost Cause:" a solarpunk novel of hope in the climate emergency, Tor Books (US), Head of Zeus (UK), November 2023 (http://lost-cause.org). Signed, personalized copies at Dark Delicacies (https://www.darkdel.com/store/p3007/Pre-Order_Signed_Copies%3A_The_Lost_Cause_HB.html#/)

* "The Internet Con": A nonfiction book about interoperability and Big Tech (Verso) September 2023 (http://seizethemeansofcomputation.org). Signed copies at Book Soup (https://www.booksoup.com/book/9781804291245).

* "Red Team Blues": "A grabby, compulsive thriller that will leave you knowing more about how the world works than you did before." Tor Books http://redteamblues.com. Signed copies at Dark Delicacies (US): and Forbidden Planet (UK): https://forbiddenplanet.com/385004-red-team-blues-signed-edition-hardcover/.

* "Chokepoint Capitalism: How to Beat Big Tech, Tame Big Content, and Get Artists Paid, with Rebecca Giblin", on how to unrig the markets for creative labor, Beacon Press/Scribe 2022 https://chokepointcapitalism.com

* "Attack Surface": The third Little Brother novel, a standalone technothriller for adults. The *Washington Post* called it "a political cyberthriller, vigorous, bold and savvy about the limits of revolution and resistance." Order signed, personalized copies from Dark Delicacies https://www.darkdel.com/store/p1840/Available_Now%3A_Attack_Surface.html

* "How to Destroy Surveillance Capitalism": an anti-monopoly pamphlet analyzing the true harms of surveillance capitalism and proposing a solution. https://onezero.medium.com/how-to-destroy-surveillance-capitalism-8135e6744d59?sk=f6cd10e54e20a07d4c6d0f3ac011af6b) (signed copies: https://www.darkdel.com/store/p2024/Available_Now%3A__How_to_Destroy_Surveillance_Capitalism.html)

* "Little Brother/Homeland": A reissue omnibus edition with a new introduction by Edward Snowden: https://us.macmillan.com/books/9781250774583; personalized/signed copies here: https://www.darkdel.com/store/p1750/July%3A__Little_Brother_%26_Homeland.html

* "Poesy the Monster Slayer" a picture book about monsters, bedtime, gender, and kicking ass. Order here: https://us.macmillan.com/books/9781626723627. Get a personalized, signed copy here: https://www.darkdel.com/store/p2682/Corey_Doctorow%3A_Poesy_the_Monster_Slayer_HB.html#/.

^-^-^-^-^-^-^-^-^-^-^-^-^-^-^-^-^-^-^-^-^-^-^-^-^-^-^-^-^-^-^

🔥 Upcoming books

* Picks and Shovels: a sequel to "Red Team Blues," about the heroic era of the PC, Tor Books, February 2025

* Unauthorized Bread: a graphic novel adapted from my novella about refugees, toasters and DRM, FirstSecond, 2025

^-^-^-^-^-^-^-^-^-^-^-^-^-^-^-^-^-^-^-^-^-^-^-^-^-^-^-^-^-^-^

🔥 Colophon

Today's top sources:

Currently writing:

* A Little Brother short story about DIY insulin PLANNING

* Picks and Shovels, a Martin Hench noir thriller about the heroic era of the PC. FORTHCOMING TOR BOOKS JAN 2025

* Vigilant, Little Brother short story about remote invigilation. FORTHCOMING ON TOR.COM

* Spill, a Little Brother short story about pipeline protests. FORTHCOMING ON TOR.COM

Latest podcast:
  The Majority of Censorship is Self-Censorship https://craphound.com/news/2024/02/25/the-majority-of-censorship-is-self-censorship/

This work - excluding any serialized fiction - is licensed under a Creative Commons Attribution 4.0 license. That means you can use it any way you like, including commercially, provided that you attribute it to me, Cory Doctorow, and include a link to pluralistic.net.

https://creativecommons.org/licenses/by/4.0/

Quotations and images are not included in this license; they are included either under a limitation or exception to copyright, or on the basis of a separate license. Please exercise caution.

^-^-^-^-^-^-^-^-^-^-^-^-^-^-^-^-^-^-^-^-^-^-^-^-^-^-^-^-^-^-^

🔥 How to get Pluralistic:

Blog (no ads, tracking, or data-collection):

Pluralistic.net

Newsletter (no ads, tracking, or data-collection):

https://pluralistic.net/plura-list

Mastodon (no ads, tracking, or data-collection):

https://mamot.fr/@pluralistic

Medium (no ads, paywalled):

https://doctorow.medium.com/

Twitter (mass-scale, unrestricted, third-party surveillance and advertising):

https://twitter.com/doctorow

Tumblr (mass-scale, unrestricted, third-party surveillance and advertising):

https://mostlysignssomeportents.tumblr.com/tagged/pluralistic

"*When life gives you SARS, you make sarsaparilla*" -Joey "Accordion Guy" DeVilla
-------------- next part --------------
A non-text attachment was scrubbed...
Name: OpenPGP_0xBF3D9110957E5F4C.asc
Type: application/pgp-keys
Size: 4820 bytes
Desc: OpenPGP public key
URL: <http://mail.flarn.com/pipermail/plura-list/attachments/20240314/f15212ac/attachment.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: OpenPGP_signature.asc
Type: application/pgp-signature
Size: 840 bytes
Desc: OpenPGP digital signature
URL: <http://mail.flarn.com/pipermail/plura-list/attachments/20240314/f15212ac/attachment.sig>


More information about the Plura-list mailing list