Why are the document pages full of junk?

The document pages consist of the OCR text of that page as it was recognized, or not, by the Internet Archive's OCR software.

Unfortunately, OCR software can have a lot of trouble, especially with pages that have fancy typefaces. So while sometimes the software recognized the text correctly, or part of it at least, other times we're left with a hot mess.

9CHRIS was designed to allow users to clean up the text by editing the document pages. With enough input from everyone, the pages will eventually be accurate and the OCR junk will have been eliminated.

Remember: if you clean up a page, please change the "Document Status" line from UGLY to BEAUTIFUL!

Why the distinction between UGLY and BEAUTIFUL to show if a page has been cleaned up or not?
Comment by Eric Sun Feb 17 20:16:24 2013
The idea is to use unique terms that can be incorporated into a search. If you had DONE / NOT DONE, if you search for "DONE" you also get "NOT DONE." The Volume Status on individual volume pages also uses a pair of words that do not contain each other: UNVERIFIED / CORRECT.
Comment by Eric Sun Feb 17 20:18:48 2013