Project Gutenberg

From LIS 460 Summer 2007

Project Gutenberg ([1]) is a volunteer-driven internet project that aims to provide full text copies of written material that falls under public domain. Users can search the archive by title and author in the basic search and also limit their searches by etext number, subject, language, category, Library of Congress classification, file type and content in the full text (the last is still being refined). Users also have the option to browse by a number of Top 100 lists, save their places in various etexts on the site, use an RSS feed of recent additions to the archive, view digitized sheet music and help out the site by volunteering to submit etexts either by creating new files or by sharing existing files. The archive itself is hosted in several locations so as to avoid loss of data if any one location is lost [2].

Contents

The History of Project Gutenberg

Project Gutenberg is said to have been founded in 1971 when founder [Michael Hart], then a student at University of Illinois, was given access to the computer system at the University and used it to type up and distribute a copy of the Declaration of Independence. Hart then created eTexts of several other books, including the Bible, the works of Homer and several others, totaling around 350 individual texts by 1987.

With the assistance of volunteers Hart was able to increase production of etexts as well as expand the number of places where the texts would be available using mailing lists and mirror sites. Texts were first entered manually by Hart and other volunteers, copying public domain items into digital form. Since then the project has grown, with volunteers scanning books and converting the scans into plain text and HTML versions. Some books have also been recorded as MP3 audio files. [3]

Currently the database has access to around 21,000 eBooks. The majority are in the public domain but around 2% are copyrighted materials whose authors have granted permissions to Project Gutenberg for reproduction and distribution [4]. Project Gutenberg also has a number of sister sites and affiliates both in the US and in other countries. Sites based in other countries follow the copyright laws from those countries.

The Goals of Project Gutenberg

The mission statement of Project Gutenberg is "To encourage the creation and distribution of eBooks."[5]

This mission statement has been followed by the creation and posting of the thousands of eBooks which Project Gutenberg gives access to as well as through a number of essays by its founder, Michael Hart. It seems a simple enough mission statement the project itself is enormous in potential scope. By providing eBooks in the simplest format possible Project Gutenberg gives users the ability to take those texts and adapt them at will. The Project openly rejects offers of proprietary formats becoming the "official" format of Project Gutenberg in order to keep the books readable to a wide audience[6].

Texts are chosen for inclusion in the project by volunteers but users may also make requests. Requests are honored on the basis of popularity on the theory that the more people who want a text the more it is worth spending the time to create an eBook of it. (CITATION NEEDED)

Volunteers are encouraged to assist in finding complete versions of paper books or versions which include pages which are missing in the copies already owned by others working on a text. Volunteers are also asked to burn CDs and DVDs with files of eTexts for distribution to users without internet access[7].

But What About Google?

Project Gutenberg and Google Books 1 2 appear to be very similar projects. One might wonder why two such projects are needed if they do such similar things.

One difference is the outright intent. Google Books presents itself as a search function, allowing the user to search the full text of any of the books it has in its database. Project Gutenberg does allow for searching within the books in its database but the stated intent on the main page is the collection of texts, not the search function and the search itself is still a new service and experimental.

Another difference is that where Google Books has PDF scans for its books Project Gutenberg's books have been intentionally left in plain text and simple HTML formats for the most part. Michael Hart, Project Gutenberg's founder, has stated that he does not specify a particular format for the eBooks entered into the database, specifically because he does not want to lock anyone into a format they do not want to use. As a result, the majority of volunteers appear to choose the simplest format possible. This allows for Project Gutenberg's eBooks to be read even without any special proprietary software. On the other hand, it also means that the original formatting and illustrations of the original books is often lost. What it comes down to is the priorities of the founders and those in charge of the projects.

Pros and Cons of Using PG

One of the major advantages to using Project Gutenberg is that regardless of whether you own various proprietary softwares or not the eTexts Project Gutenberg has access to should display on your screen with relative ease. Users can also listen to audio versions when available instead of relying upon the availability of adaptive software.

Another advantage is the availability of free books for classrooms. Due to the volunteer-driven project to distribute CDs and DVDs full of eBooks a school or user doesn't even need to be able to get onto the Project Gutenberg site in order to read the books. All that is needed is a computer that can read text files and has a CD-ROM drive. For schools which have heavily filtered internet access or users who have no internet access from home this provides access to books for free. Related to that, the eBook format online allows for multiple students to read the same book at the same time, something a single copy in a library does not make possible.

A possible disadvantage to using Project Gutenberg is the lack of formatting, provided that a teacher or user wants the formatting intact. Images and formatting are available for many files but not for all.

Another possible disadvantage is accuracy. Project Gutenberg holds its eBooks up to an accuracy standard of 99.95% as per the Library of Congress' standards but due to the volunteer-driven nature of the project the accuracy of the eBooks is policed by the users. It is possible to come across an inaccurate eBook which has not been fully edited or noted for correction yet[8].

Yet another possible disadvantage to eBooks in general is that reading on a screen can be difficult for many people as opposed to reading on a page. This can be fixed by printing out the eBook if such a thing is possible for a given user but could also be seen as a waste of paper and as negating the point of the eBook in the first place.

Sources and Links

Project Gutenberg Main Page

Online writings of Michael Hart