The BookLiberator: An Overview
The BookLiberator is an affordable, easy-to-use device for converting physical books to digital format without harming the books. It combines innovative hardware design and open source software to digitize books at a rate of 600-900 pages per hour.
Both the software and the device already exist as a working prototype. The next step is to make the BookLiberator available in a prefabricated, consumer-ready kit at a reasonable price. Our aim is to crowdsource the digitization of the world’s printed data, doing for tree-bound texts what digital audio and CD rippers did for music.
We already have experience selling directly to consumers from a successful non-profit online store , and based on that experience we believe the BookLiberator can become self-sustaining from its own revenue. Because we are a non-profit, and decentralized personal book digitization is aligned with our mission, we will make both the hardware plans and the software available on a completely open source basis. The BookLiberator is intended to be a disruptive innovation: we will encourage others to build and sell them too, and do not mind if the competition cuts into our own revenue.
The BookLiberator is a decentralized answer to centralized efforts such as Google’s book scanning project and the upcoming Google Editions. The BookLiberator will enable people to collaboratively annotate, analyze and process texts, by liberating the contents of books from their physical containers, while ensuring that no single institution has control of the results.
Due to economies of scale, current book scanning efforts tend to be undertaken by large organizations, and the results of their scans are therefore centrally controlled. However well-intentioned such organizations may be, they inevitably become a natural target for parties seeking to limit the use of their scans. For example, the proposed Google Book Search Settlement places significant restrictions on Google’s (or anyone else’s) ability to make texts available, because the Publishers Association and the Authors Guild were able to focus their legal efforts at a single entity: Google, whom they sued despite Google’s care to abide by copyright law in the first place.
It is now clear that organizations based on copyright interests cannot be counted on to provide publicly beneficial interpretations of copyright law. The best solution is to decentralize — and thus strengthen — the public’s ability to interpret copyright law for itself. The BookLiberator is designed to do exactly that.
The digital revolution that enables us to easily manipulate music, images, and movies has largely passed over the world of books. This is nowhere more apparent than at universities, where students, who should be creatively interacting with texts, cannot do something as intuitive as share the notes they’ve written in the margins of their textbooks. The educational and social impact of personal book digitization is not limited to students, however. The success of sites like LibraryThing.com and OpenLibrary.org shows that people are eager to interact with each other’s libraries creatively. The BookLiberator will make this possible, in a way that, crucially, cannot be centrally controlled by any one organization or consortium.
The BookLiberator is made from stock and easily-manufactured parts, and consumer electronics (2 standard digital cameras).
The book is cradled in a stand similar to an old-fashioned book stand. The scanner is a transparent cube with cameras aimed across it on diagonally opposed faces, so each camera photographs the perfectly flat page on the opposite side, without harming the book. Thanks to this design, the scanner has no moving parts; rather, the scanner as a whole is the moving part. By removing the assumption that the scanner must remain stationary, the BookLiberator eliminates the page-curvature problem (while software can correct for curved text to some degree, flat images are best for proof-reading and for capturing illustrations and non-traditional text layouts accurately).
Current prototypes achieve scan rates of 600 to 900 pages per hour, so a typical book takes under an hour to digitize. Accompanying open-source software (we’re working with the Decapod project; see also launchpad.net/bookliberator) performs optical character recognition on the images, and then provides a specialized editing interface by which the user can do a line-by-line comparison of the images against the OCR’d text, to correct any mistakes.
Crucially, the BookLiberator separates software from hardware. Traditional book scanners are often intimately connected to their digitization software. But the standardization of cheap digital cameras has made another way possible: the BookLiberator simply supplies raw images of flat pages, pulled from the digital camera storage; the rest is up to the software, which can take advantage of the dynamics of open source development. While the current software package has already been used to digitize full books, improvements in the software will continue to reach users long after they’ve received their physical device. Users with the ability and desire will also be able to participate in the improvement of the software.
Production and Adoption Strategy
Hardware design is complete. Our next steps are:
- Settle the manufacturing process, and begin production and sale of physical units from our online store.
- Continue development of custom image processing software to streamline use of the images. Software updates will be a natural part of using the BookLiberator even after a customer has acquired the hardware, and the accompanying instructions will make that clear. Early adopters will need to do fairly frequent updates at first.
Our first manufacturing run will be small (500-1,000 units) and marketed at early adopters, especially the technologically-inclined open source community, who will help iron out kinks in both the hardware and software .
The next run will be larger (5,000-10,000 units, the exact size to be determined by demand and manufacturing latency). This time we will market to university students and education professionals, and the technically adept general reader: those who wish to preserve their personal libraries, and to read their books on their cell phones, e-book readers, and PDAs. The BookLiberator kit will be included a legal guide explaining fair use law, focusing on educational context, so that customers can make informed decisions about using the BookLiberator. We will also add features to the software to conveniently publish texts to the Internet Archive, when the text is not copyright-restricted.
Finally, we will market to institutions (libraries and schools), for both preservation and access purposes. This may involve offering larger models to support large-format books, and possibly pre-assembled scanning kiosks in which hardware and software are bundled together in a tested configuration.
We are looking for seed funding. The BookLiberator should become self-sustaining after the first manufacturing run, but we need to devote full-time attention to the project for about six to eight months leading up to that.
1 full-time staff for six months: $60,000 (gross) (for manufacturing oversight, financial and operational management, marketing, collecting feedback and facilitating continued improvement of the design) Prototyping expenses: $10,000 (parts, shop space) Marketing: $5,000 (promotion, placement, feedback gathering, community building and outreach) Seed manufacturing run: $25,000 Software development: $50,000 (partly contracted, partly done by the full-time staffer) ------------------------------------------- Total: $150,000 -------------------------------------------
A single BookLiberator can be sold at a price comparable to what one would spend on a typical e-book reader or cell phone:
Single BookLiberator without cameras: $125 – $150 Single BookLiberator with cameras: $300 – $350
That estimate is based on our current prototype costs plus the costs of running an online store:
Parts: $80 – $100 per kit Shipping to warehouse: $4 – $8 per kit (in bulk) Shipping to customer: $15 per kit (varies with location) Online store operation: $200 / month (approximate)
As most people already have digital cameras with standardized mount points, they will probably choose the cheaper option. But some customers will prefer the convenience of purchasing a BookLiberator with two already-tested cameras, and we may offer different with-camera options with varying levels of camera quality.
The parts costs are based on supporting books up to 30cm tall — the high end of the range for personal libraries. We surveyed 6.7 million books , and found that 30cm is sufficient for most books:
As the BookLiberator becomes more popular, we may add models that support large-format books, and make improvements to the software to support the layouts found in art books, which tend to be larger than text-only books.
Most of the project is parallelizable: for example, all the software development can happen completely independently of hardware manufacturing and fulfillment. Based on the turnaround times we’ve experienced with the current prototype manufacturers, we believe we can have the design finalized within three months, and the first production run completed and orderable within three to five months after that.
Comparison With Similar Efforts
Current commercial book digitizers start around $10,000.00 dollars (US) and can go into the hundreds of thousands of dollars . They are generally marketed toward customers doing mass digitization, such as libraries. The closest thing to a personal scanner currently on the market is the Atiz BookDrive Mini , which starts at $6000.00 and is marketed at “libraries and institutions”.
The Decapod project (sites.google.com/site/decapodproject) seems to be aiming for a consumer-ready product. However, it is still in the prototype stage, and has a design less conducive to home use than the BookLiberator; it requires a tripod and seems to be intended more for field use. From their web site: “Decapod will be an inexpensive attaché case sized hardware/software solution that can be readily procured and assembled and taken into the stacks or out into the field by local staff or volunteers to quickly and unobtrusively capture the material and deliver it in usable format.”
Update, January 2010: we have contacted the Decapod project and shipped them two BookLiberators. They are now testing their software with the BookLiberator as well as their tripod design. While it is too early to know for sure, this has the potential to lower our software development costs significantly.
There is a do-it-yourself project called the DIY BookScanner (diybookscanner.org). While extremely valuable for prototyping, any do-it-yourself project has an inherently limited audience, as most people simply do not have the time or desire to order parts separately and build a device from scratch. As far as we can tell, the DIY BookScanner has had a limited uptake. The point of the BookLiberator is that it comes in a ready-to-assemble kit and is ready to use once assembled — all the customer has to do is follow simple instructions once, Ikea-style.
The Intel ® Reader is intended primarily for “people with vision or reading-related disabilities, blindness, or low vision”, and sells for $1500.00 USD, much higher than the price we are aiming for.
Our team has a strong background in design, software development and online sales.
Ian Sullivan and James Vasile, the original developers of the design, are both experienced in open source software development and legal issues. James is also an attorney at the Software Freedom Law Center, an executive member of the board of Open Source Matters, Inc (the non-profit that produces Joomla, the world’s most popular free software Content Mangement System), and sits on the advisory board of the GNOME Foundation, which produces the preeminent Linux desktop environment.
Karl Fogel is a long-time open source software developer and author of two books on open source methods. He is the founder of QuestionCopyright.org, which runs an online store selling DVDs and other merchandise related to artist-approved freely redistributable works. By already having a functioning online store and order fulfillment mechanism in place, the BookLiberator project is free to focus on finalizing the design, sourcing the parts, testing, and manufacturing.
 The QuestionCopyright.com online store sells DVDs, t-shirts, on-demand prints, pins, and other merchandise related to freely redistributable works of art, sharing revenue with the authors of those works. The store’s gross revenue to date is about $100,000.00 US; it has been open for 1 year.
 Reaching a sufficient number of early adopters is possible through word-of-mouth and relatively inexpensive paid placement in forums like Slashdot, ThinkGeek, Make Magazine, and other sites well-known to open source technology enthusiasts.
 Raw data courtesy of the Internet Archive, which hosts book data supplied by the Library of Congress and the Open Library project. See LC “Books All” files (to 2006), and the Open Library’s JSON data dump (which includes information from libraries other than LC, from Amazon, etc). The LC data is in MARC format with the size in centimeters in field 300 $c. The OL data has size in the ‘physical_dimensions’ field, in centimeters except as otherwise specified (e.g., “11 x 9.4 x 0.7 inches”).
See http://blog.eogn.com/eastmans_online_genealogy/2009/07/kirtas-offers-digitized-books.html regarding the $120,000.00 price tag of a Kirtas book scanner, for example.
 See the Atiz web site, and the article http://www.teleread.org/2009/10/29/atiz-scanner-comes-in-at-lower-price-point.