Scanning all the Books: The Work of Scribes for the Internet Archive

Scanning all the Books: The Work of Scribes for the Internet Archive

In 1996, computer engineer Brewster Kahle founded the Internet Archive with the mission to provide "universal access to all knowledge." Today, that vision drives the methodical work happening in scanning centers where operators carefully digitize books one page at a time, preserving both the content and the physical integrity of centuries-old volumes.

The Internet Archive's approach stems from a fundamental disagreement with the digitization methods that emerged in the early 2000s. When Google launched its Books project in 2004, it revolutionized the scale of digital libraries but introduced a troubling trade-off: speed versus preservation. Google's industrial approach often involved destructive scanning—cutting book spines and dismantling bindings to facilitate rapid automated processing.

The Internet Archive chose a different path. "At the Internet Archive, we never destroy a book by cutting off its binding. Instead, we digitize it the hard way, one page at a time". That led to the adaptation of machines and software to fit the very specific purpose of the Internet Archive, and to a job: book scanner, or scribe operator.

Source: https://archive.org/details/eliza-digitizing-book_202107

Internet Archive employee Eliza Zhang digitizes a book. At the Internet Archive, we never destroy a book by cutting off its binding. Instead, we digitize it the hard way--one page at a time. We use the Scribe, a book scanner our engineers invented, along with the software that it runs. Our scanning centers are located in universities and libraries around the world, from Boston Public Library to the University of Toronto to the Wellcome Library and beyond. Eliza is one of our fastest and most accurate scanners. Next she will execute quality control checks and fix any errors. Then she ships the book back to our Physical Archive for long-term preservation. Scanners like Eliza have done this 2,000,000 times to provide our patrons with a free digital library. (source: https://archive.org/details/eliza-digitizing-book_202107)

Image source: New York Times

The Scribes station

The Internet Archive's book digitization relies on custom-built scanning stations called Scribes, engineered specifically to handle fragile and rare materials without damage. Each Scribe unit features a V-shaped book cradle positioned beneath two high-resolution cameras that capture both pages of an open spread simultaneously at 600 DPI resolution. Adjustable LED lighting arrays provide even illumination without generating heat, while a foot pedal system triggers the cameras, leaving operators' hands free to position and turn pages.

Photo by Jason Scott, 2011 (source)
Photo by Jason Scott, 2011 (source)
Photo by Jason Scott, 2011 (source)

Even if for a good cause, the digitalization of the books remain a productivity, tedious and repetitive task, mostly undertaken by women and immigrants, as most tasks involving data entry in the history of computing.

Elizabeth MacLeod, demoing a Scribe in the foyer of the Internet Archive in San Francisco, pre-COVID. (source: Internet Archive Blog)
"Digitizing is a somewhat solitary task and some people “get in the zone” while scanning; others are very chatty or listen to music. Many employees have worked together for nearly a decade and there is a friendly, collaborative vibe at the centers. “We have all sorts of people—artists, printers and photographers. They are people who are meticulous and love books,” Mills said" - Source: Internet Archive Blog.
An employee at Internet Archive office digitizes a book. (source)

At a time when critical scientific data and public information is removed from the Internet at an unprecedented pace (especially since the second Trump election), the work of scanning as many books as possible and to preserve digital information is more critical than ever.

The Internet Archive is facing many threats, from budget issues to copyright lawsuits, but also recently cyberattacks. Hopefully it will continue its work, encouraged and funded by many public institutions (and many volunteers) over the world, willing to act in favor or the free and open preservation of knowledge.


Additional resources

Humming along in an old church, the Internet Archive is more relevant than ever
The Trump administration’s erasure of federal data has put the Internet Archive in the spotlight. The organization, with its small but mighty team, is working to help save the world’s digital history.
Giving “Last Chance Books” New Life Through Digitization | Internet Archive Blogs
<p>The Internet Archive is excited to host The Golden Gate Stereoscopic Society as they share their most accomplished 3D photography and an in-depth workshop of 3D photography techniques. RSVP HERE […]</p>\n
Meet Eliza Zhang, Book Scanner and Viral Video Star | Internet Archive Blogs
<p>The Internet Archive is excited to host The Golden Gate Stereoscopic Society as they share their most accomplished 3D photography and an in-depth workshop of 3D photography techniques. RSVP HERE […]</p>\n
How the Internet Archive Digitizes 3,500 Books a Day–the Hard Way, One Page at a Time
Does turning the pages of an old book excite you? How about 3 million pages?

Read more