Scanning all the Books: The Work of Scribes for the Internet Archive
In 1996, computer engineer Brewster Kahle founded the Internet Archive with the mission to provide "universal access to all knowledge." Today, that vision drives the methodical work happening in scanning centers where operators carefully digitize books one page at a time, preserving both the content and the physical integrity of centuries-old volumes.
The Internet Archive's approach stems from a fundamental disagreement with the digitization methods that emerged in the early 2000s. When Google launched its Books project in 2004, it revolutionized the scale of digital libraries but introduced a troubling trade-off: speed versus preservation. Google's industrial approach often involved destructive scanning—cutting book spines and dismantling bindings to facilitate rapid automated processing.
The Internet Archive chose a different path. "At the Internet Archive, we never destroy a book by cutting off its binding. Instead, we digitize it the hard way, one page at a time". That led to the adaptation of machines and software to fit the very specific purpose of the Internet Archive, and to a job: book scanner, or scribe operator.
Source: https://archive.org/details/eliza-digitizing-book_202107
Internet Archive employee Eliza Zhang digitizes a book. At the Internet Archive, we never destroy a book by cutting off its binding. Instead, we digitize it the hard way--one page at a time. We use the Scribe, a book scanner our engineers invented, along with the software that it runs. Our scanning centers are located in universities and libraries around the world, from Boston Public Library to the University of Toronto to the Wellcome Library and beyond. Eliza is one of our fastest and most accurate scanners. Next she will execute quality control checks and fix any errors. Then she ships the book back to our Physical Archive for long-term preservation. Scanners like Eliza have done this 2,000,000 times to provide our patrons with a free digital library. (source: https://archive.org/details/eliza-digitizing-book_202107)

The Scribes station
The Internet Archive's book digitization relies on custom-built scanning stations called Scribes, engineered specifically to handle fragile and rare materials without damage. Each Scribe unit features a V-shaped book cradle positioned beneath two high-resolution cameras that capture both pages of an open spread simultaneously at 600 DPI resolution. Adjustable LED lighting arrays provide even illumination without generating heat, while a foot pedal system triggers the cameras, leaving operators' hands free to position and turn pages.



Even if for a good cause, the digitalization of the books remain a productivity, tedious and repetitive task, mostly undertaken by women and immigrants, as most tasks involving data entry in the history of computing.

"Digitizing is a somewhat solitary task and some people “get in the zone” while scanning; others are very chatty or listen to music. Many employees have worked together for nearly a decade and there is a friendly, collaborative vibe at the centers. “We have all sorts of people—artists, printers and photographers. They are people who are meticulous and love books,” Mills said" - Source: Internet Archive Blog.

At a time when critical scientific data and public information is removed from the Internet at an unprecedented pace (especially since the second Trump election), the work of scanning as many books as possible and to preserve digital information is more critical than ever.
The Internet Archive is facing many threats, from budget issues to copyright lawsuits, but also recently cyberattacks. Hopefully it will continue its work, encouraged and funded by many public institutions (and many volunteers) over the world, willing to act in favor or the free and open preservation of knowledge.
Additional resources






