After decades of normal folks pirating the products of tech companies, tech companies decided to fight back and started pirating the products of normal folks. Facebook's awkwardly named owner, Meta, reportedly pirated "at least 81.7 terabytes of data", "which includes tens of millions of pirated books" (Ars Technica)—in gamer units, that's around at least 800 Baldur Gate 3's.
The pirated data would be used to train an AI. The purpose of such AI model is unknown at this time.
One wonders why would Facebook not just pay for the books instead of committing massive copyright infringement on a scale (probably) never seen before on the Internet, considering that Meta does have the money to buy some books. One legal question is whether buying a book gives you the right, or, in more technical terms, the license to use the book to train an AI. The answer to this is probably not, as buying a book doesn't give you a license to reproduce it, hence why it's called "copyright." It doesn't stop being a reproduction just because you chose to do it in lossy way by an algorithm that uses a large language model. In any case, if buying a book doesn't give you a license, pirating it surely doesn't either. It's hard to imagine what was Meta's grand plan here besides maybe just trying to get away with it.
As Meta's CEO once said in leaked audio: "Everything I say leaks." (404 Media), so what's not a wonder is how the secret got out. Now that they've been caught pirating from corporate laptops, they're arguing in court that it's not illegal to pirate when you're also a jerk about it.
More specifically, Meta pirated books using a peer-to-peer filesharing technology called torrent. In peer-to-peer filesharing, instead of users downloading from a single centralized server, meaning the server must bear the load of all users connected to it simultaneously, each user is able to share the file to other users in a decentralized manner, meaning that anyone can download from anyone else who has the file. We call a user who is downloading from others a "leech," and a user who has the complete file, and therefore no longer needs to download anything from anyone, a "seed." The act of sharing the file, even while you're still a leech, is called "seeding."
If the terms didn't make this obvious enough, the idea is that you aren't supposed to be just a leech in the system. In peer-to-peer filesharing, you download a file from people like you, and you are expected to share that file with others after you finished downloading—it's your turn to seed now. This decreases the load from the users whom you torrented from, and, most importantly, ensures that the file remains downloadable if other users stop seeding. So long as someone still has the file, it can be downloaded, which is beautiful considering how much link rot we get every year. This isn't limited to piracy, by the way. You can distribute anything via torrent, and it's particularly common in the open source community, with many Linux distributions and applications like LibreOffice having official torrent download methods. You can even download OCRemix's entire music collection for free because they provide an official torrent for it.
Meta argues it's not piracy because they don't seed.

[...] Meta "took precautions not to 'seed' any downloaded files," Meta's filing said. [...] because there's allegedly no proof of such "seeding," Meta insisted that authors cannot prove Meta shared the pirated books with anyone during the torrenting process.
[...]
To defend its torrenting, Meta has basically scrubbed the word "pirate" from the characterization of its activity. [...] Instead, all they can claim is that "Meta allegedly accessed and downloaded datasets that [they] did not create, containing the text of published books that anyone can read in a public library, from public websites [they] do not operate or own."
https://arstechnica.com/tech-policy/2025/02/meta-defends-its-vast-book-torrenting-were-just-a-leech-no-proof-of-seeding/ (accessed 2025-02-21)
I love how you can just "claim" anything in court, and if that doesn't work, you just "claim" something else, no matter how contradictory. I wonder how are judges able to put up with this without straining an optical nerve from too much eye rolling.
Meta has previously addressed its torrenting in a motion to dismiss filed last month, telling the court that "plaintiffs do not plead a single instance in which any part of any book was, in fact, downloaded by a third party from Meta via torrent, much less that Plaintiffs’ books were somehow distributed by Meta."
https://arstechnica.com/tech-policy/2025/02/meta-torrented-over-81-7tb-of-pirated-books-to-train-ai-authors-say/ (accessed 2025-02-21)
Observe how, if they actually believed the nonsense they are claiming, they wouldn't be worried about seeding. After all, if the only thing they did while leeching was "access datasets they did not create containing data anyone can access," then the only thing they would do while seeding would be "distribute datasets they did not create." What is the crime of distributing data that you didn't create? That isn't a crime! You're just distributing data you didn't create! Show me the law that says you can't do that.
I'm no public library expert, but I'm pretty sure you can't start distributing copies of a book just because you got it from a public library. They have a license to distribute, you do not. The same way I can't just take a photo a stock photo company licensed to one website that is publicly accessible and use it in my website even though I never got the license from the stock photo company myself. In fact, with rare exceptions, all the third-party photos I use in this website are licensed under Creative Commons, which permits me to use them so long as I credit the author, just so I don't have to worry about infringing upon other people's rights.
Ironically, I don't think Meta could have used Creative Common-licensed books to train their LLM's even if they wanted, as almost all CC licenses require at minimum attribution, i.e. you MUST credit the author. If the LLM reproduces CC-licensed work without crediting the author, that's a clear violation of the license, and without a license, that's just copyright infringement.
Still, it's amusing that someone can dare make the legal argument that it's not illegal to pirate if you have no honor as a pirate. If you just download without seeding. Although there is no real "pirate law" against that, you would have to be a psychopath to approve of such behavior. At least I wouldn't trust a man who doesn't seed their torrents.
For the record, I do not believe piracy is a good thing. The convenience of piracy warps our perception of value. When everything is free, we're insensible to the cost to produce things. There are also opportunity costs: you see an AAA game with a price tag you would never pay for, and instead of just buying a cheaper indie game you can afford, you choose to pirate the expensive one. Although you didn't spend money, you will spend your time playing it. Consequently, the developer that makes games you can afford never gets any money, while the developer that makes games you can't afford gets all your attention. When you choose not to participate in the market, you miss your chance to shape it in your favor. While a single person pirating doesn't seem to do much, when everyone pirates just because they can, piracy can have a great negative impact on the future of an industry. So long as everyone uses Photoshop, everyone HAS to use Photoshop. It doesn't matter if you paid for it or not. I'm not telling you use GIMP, but maybe Krita, Photopea, or Affinity Photo would suffice your needs. Let's not be like Meta. Let's explore the alternatives instead of pirating stuff.
It's worth noting that we don't call "pirating" the act of "seeding," we call "pirating" the act of "downloading." The reason why the distinction between seeding and downloading is important in this case is that that copyright law is about reproducing the work to other people, it isn't about consuming a work without paying for it first. In fact, I'm not entirely sure if there is a law for that, although I'm pretty sure watching a movie without paying for it in a theater will get you in legal trouble, so there probably exists some law about it. I'm not a lawyer, though, so I wouldn't know.