How Big are Books?

If you're building a book scanner (such as a Decapod or BookLiberator), you might find this information useful:

Graph showing distribution of book sizes, with sweet spot at 30cm.

Summary: after surveying 6.7 million books, 30cm seems to be the sweet spot — if your scanner can handle that, then you should be able to scan most books.



Raw data courtesy of the Internet Archive, which hosts book data supplied by the Library of Congress and the Open Library project. See LC's "Books All" files (to 2006), and the Open Library's JSON data dump (which includes information from libraries other than LC, from Amazon, etc). The LC data is in MARC format with the size in centimeters in field 300 $c. The OL data has size in the 'physical_dimensions' field, in centimeters except as otherwise specified (e.g., "11 x 9.4 x 0.7 inches").

8 Comments

Re: How Big are Books?

Try this on for size: If we were concerned solely with “most” books, people like you wouldn’t be up in arms about orphan works.Long tail, anyone?

Re: How Big are Books?

I think the point he was going for was that the most interesting and important work for diy book scanning is 'in the margins' - the more difficult to find things. Which, maybe, could come in stranger sizes.

Re: How Big are Books?

Hunh. Maybe. But is there any reason to believe there's a correlation between the "long-tailness" of a book and its having an unusually large size? I have plenty of old books, and books that were not popular even when they were in print, and they are generally the same sizes as more recent and more popular books.

Re: How Big are Books?

To provide some insight to the graph above from an Internet Archive perspective."Large" books do not fit on IA's scanning hardware well (i.e. large map books etc). They are required to be specially captured by a dedicated and specially trained technician. Hence it's not a surprise that there are not many books larger than 30cm. Also the cost and care required to transport large books is substantially more than a smaller book. Hence the frequency of large books arriving at IA scanning centers may be low compared to the books in the 15cm to 30cm range.  My point being: there may actually be more books in the long tail than the graph conveys. (I do not work for the IA, but have observed their digitization process on a few occasions).

Re: How Big are Books?

But remember, the Internet Archive is just hosting the data. The data itself doesn't come from IA; it comes from the Library of Congress and the Open Library project. Even if there is some kind of selection bias going on, the extreme flatness of that tail is pretty convincing. There may be a few very large books out there, but it looks to me like they're not worth optimizing for when building a mass-market personal book digitizer.