Let the Great Cross-Referencing Begin: Google Book Search as Plagiarism Detector

The Google Book Search Library Project promises to be, among other things, the greatest plagiarism detector ever created. So why are the Association of American Publishers and the Authors Guild suing Google over its plan to digitize millions of books? In the case of the AAP, it’s probably because they understand that copyright law really exists to subsidize distributors, not writers or readers. They’re just looking out for their own interests. Or at least they think they are: it’s much more likely that Google search results will improve book sales than hurt them. In any case, one has to pause at the spectacle of a publishers’ association coming out against readers being able to locate the books they’re looking for more efficiently than ever before. But what’s more interesting, if not exactly unexpected, is that the Authors Guild is reacting in the same way. Here’s what the Guild’s president, Nick Taylor, had to say:

“This is a plain and brazen violation of copyright law. It’s not up to Google or anyone other than the authors, the rightful owners of these copyrights, to decide whether and how their works will be copied.”

How odd. Mostly, authors are not the owners of the copyrights in their work — publishers are. And even in those cases where the author retains copyright, she has usually signed a contract granting exclusive printing and distribution rights to a particular publisher. Nick Taylor’s comment might make sense in some idealistic world where authors typically retain control of their work, but for the authors he represents, the world is rarely like that. Meanwhile, the Authors Guild ignores an amazing possibility opened up by Google’s project: we will be able detect plagiarism with a thoroughness hitherto unthinkable. Google is the world’s premier search engine; they have made billions of dollars matching snippets of text together and displaying the results. After digitizing these texts, the natural thing to do is to start looking for ways to cross-reference them. For legitimate citations, the effect of this will be mere convenience: instead of trudging to the library or bookstore, you can click on a link. But for cases of plagiarism, the effect will be a revolution: whereas in the past, discovering plagiarism required that the same person read both books, it will now be possible to flag potential instances of unattributed copying automatically! So why isn’t the Authors Guild cheering Google on? A clue can be found in the Guild’s self-description, as given at the end of their press release about the Google lawsuit

:

“The Authors Guild is the nation’s largest and oldest society of published authors and the leading writers’ advocate for fair compensation, effective copyright protection, and free expression.”

There’s a subtle bit of cognitive slippage going on there. They start out stating (accurately) that they are the largest society for published authors. But then they go on to claim that they are the leading writers’ advocate for fair compensation, effective copyright protection, and free expression. Where did that slide from representing published authors to representing all authors happen? Anyone who writes is a writer; and thanks to the Internet, any writer who wants to be published can be, by simply making their work available on the Web. This is not wordplay, it is a fundamentally important fact of modern information distribution, as many popular bloggers have learned. The Author’s Guild does not represent most authors anymore, if it ever did. It represents a tiny minority of authors: those whose works have been found fit for distribution by a certain kind of publisher, the kind that makes a massive initial investment in a print run and then depends on strict monopoly control of the copyright to recover that investment. Tellingly, the Guild’s identifying statement doesn’t contain a word about plagiarism, a threat faced by all authors. While texts may be shareable resources, reputation and credit are not: plagiarism is a concern for all writers, no matter how their work is distributed. Yet the Guild’s omission isn’t limited to that one press release. A search for the word “plagiarism” across their entire web site returns only this:

Search word: plagiarism 0 results found.

Perhaps the Guild thinks that the phrase “effective copyright protection” includes plagiarism, but as we have noted elsewhere, copyright “protection” is really not about plagiarism: one can permit limitless attributed copying without approving of or permitting plagiarism. The two are separate, and the Authors Guild, of all organizations, should know this. The Authors Guild’s heart is in the right place; the problem is just that they’ve bought the industry myth: that authors’ interests are always the same as publishers’. If the AG really wants to look out for the interests of all authors, not just the small percentage with successful monopoly-based publishing arrangements, they’ll knock on Google’s door and ask how they can help. Instead, they’re suing for copyright violation, even though what Google is doing is both well within the bounds of so-called “fair use” and enormously beneficial to the Guild’s members. The Great Cross-Referencing has begun. Let us hope the Authors Guild sees the light and allows it to continue.

[Postscript: When I first wrote this article, I wasn’t aware that Amazon had already been doing in-book searching for some time. This means that Amazon could do automated plagiarism detection as well, and perhaps there are other organizations in the same position. But note that Amazon is not the target of publishing industry lawsuits, probably because Amazon negotiated with publishers for access to book text, rather than just scanning it in the way Google did.]

8 Comments on "Let the Great Cross-Referencing Begin: Google Book Search as Plagiarism Detector"


  1. Remember also that an author will fear plagiarism accusations against him, even if innocent. The matter is not always so clear cut. Google might therefore represent a threat.

    But the bigger issue of why the publishers are resisting Google is more interesting. It is really quite simple. Google, by better informing the reading public is diluting the marketing power of the publishers. The reader, by being better informed can make choices independently of the publisher’s marketing machine.


  2. Yes — one of the effects of loosening copyright is a much more collaborative notion of authorship. Things that today might be considered plagiarism, or at least lack of originality, might tomorrow be considered simply building on the works of others. The important thing is to have laws that enforce proportional attribution, rather than laws to restrict copying, in such a world.

    I totally agree about the dilution of publishing’s marketing power, very good point.


  3. It’s understandable that Authors Guild don’t hassle Amazon. After all, Amazon is selling the books creating profit for the writers. Google’s BSLP on the other hand only create profit for Google themselves using material that Google do not own in any way.

    So, even thought Amazon and Google might be doing the same things it’s still in different context.

    BSLP might turn out to have very positive effects for everybody, including the writers. But right now it’s probably impossible to tell for certain.

    M. Nowak
    Nowak blog


  4. But Google’s BS can create profits for writers, by leading to increased sales of books, by making them easier to find. That’s Google’s whole argument!

    (Not that profits should be the deciding factor in deciding who can share information, of course. But even granting that dubious proposition, Google’s service benefits writers.)

    See Tim O’Reilly’s New York Times Op-Ed piece on this: Search and Rescue. He explains in detail why Google BSLP should be welcomed by writers.


  5. Is using Copyscape really prevent plagiarism. We know that it takes time to index a published article so what if someone copied my article and publish it also and because it’s not indexed yet so it would pass Copyscape. How will plagiarism deals with it?

    -Jan


  6. See here for an answer (it’s a comment I wrote a while ago in a conversation about a different article).


  7. This is a period of transition, and people are frightened and excited by the possibilities. But let’s not get carried away. I appreciate your efforts to explore new ways of thinking about content and copyright, but I wish you wouldn’t characterize the players involved with sweeping generalizations and starry-eyed language.

    Everyone is looking out for their best interests from authors to publishers, from audiences to Google. And implying that all publishers are the willing bedfellows of Amazon really makes it look like you don’t have a clue.

    I work for a non-profit publisher. I would love it if it were my job to give away the books and make their content available to everyone for free. Some non-profit publishers are able to do this because their revenue stream is not based on book sales. But that is not a sustainable model for everyone at this time.

    Please do not imply that publishers are some species of parasite. Publishers do ad value. Publishing is more than making and distributing containers for content. Most of the cost of publishing doesn’t go into “dead trees” and shipping. The greater portion goes into vetting projects and editing them (Digital technology has not rendered these services obsolete. I believe it has only increased the need for them, just as it has increased the need for good librarians). Then there’s the cost of getting word out about the book. (Oddly enough, the internet hasn’t made promotion completely free yet. And if advertising was free, Google would shrivel.)

    I think Google books is a great way to promote books and I’ve argued with colleagues to help them understand why online book browsing benefits the publishers and the authors. I cringe at the idea of book designs that make it more difficult to copy the content at the expense of making the book easier to use. That said, there are instances when giving away too much content can cut into sales. And that revenue makes it possible to provide the content that Google and Google users (the eyeballs advertisers pay Google to gather) want. “Book people” are wary of Google. The are also wary of Amazon. Sure, we are looking out for ourselves, but we also provide a service. And there are other issues at stake such as privacy. If publishers are eliminated, will Google suddenly become a purely altruistic river of free-flowing information? I know Amazon won’t. Will the general public check their actions?


  8. I never said publishers were any kind of species of parasite; not sure how you read that into the article. I love my publisher (O’Reilly Media), in fact.

    What I’m saying is that there is no reason to stifle everyone’s ability to share and remix merely to subsidize one particular business model. Your sentence here encapsulates exactly the attitude we’re trying questioning: “That said, there are instances when giving away too much content can cut into sales.”.

    Who says it’s the publisher’s content? Maybe it’s the “author’s” in the sense that the author wrote it, but as far as actual possesion goes, it shouldn’t be anybody’s — that is, no one should have a monopoly right to restrict people from sharing it. Culture and knowledge belong to everyone. If you don’t want something shared and used, then don’t release it in the first place.

    An industry that is providing a service doesn’t need monopoly protection. I agree with you that publishers do provide a real service. But they shouldn’t restrict what the rest of us can do in order to provide that service; and if that implies some transformations in the way publishing is done, well, that’s a lot better than the status quo, in which publishers become de facto censors (or is telling people they can’t copy books somehow not censorship?).

    If Google were free to, it would offer print-on-demand for every single book in its database. It is only the lawsuits by the Publishers Association and the Authors Guild that prevented this. There is no need to depend on altruism on Google’s part, because Google wouldn’t have a monopoly either.