The Naked Scientists

The Naked Scientists Forum

Author Topic: Scanning Images  (Read 3922 times)

Offline PmbPhy

  • Neilep Level Member
  • ******
  • Posts: 2771
  • Thanked: 38 times
    • View Profile
Scanning Images
« on: 06/12/2014 03:46:00 »
I need some assistance scanning a textbook. I've only scanned in the front matter (table of contents, etc) and the first 9 pages and yet it's already 5 MB in size. I'm saving them in JPEG format. I tried GIF but most came out larger and when I reduce the color depth to save space the quality drops to a level that's unacceptable to me. Any ideas of how to do this so that the files come out smaller? I've seen PDF files of textbooks that are much smaller than I'd get if I did it this way


 

Offline RD

  • Neilep Level Member
  • ******
  • Posts: 8134
  • Thanked: 53 times
    • View Profile
Re: Scanning Images
« Reply #1 on: 06/12/2014 06:22:04 »
... I've only scanned in the front matter (table of contents, etc) and the first 9 pages and yet it's already 5 MB in size. ... I've seen PDF files of textbooks that are much smaller ...

Storing a page of text as an image will take up MUCH more memory than having the text as a text-file , ( like a word processor document ).

So If the PDF has the text as a text-file it will be much smaller than a PDF made up of scans (images) of the book.  If the text in the PDF is being stored as an image you won't be able to copy & paste a bit of text from it, ( as it's an image , like a photo ).

To reduce memory for text you could use grayscale with a reduced number of shades ( 8-shades example attached as png file ).

Make sure you are not scanning at a higher resolution than is necessary,
( i.e.  minimize the Dot-Per-Inch [DPI] setting on the scanner to the lowest value which is clearly readable ).

Quote from: meetingtomorrow.com
Scanning text documents is a relatively smooth process that does not take a lot of time. The lowest DPI that is needed for the scanned text to display and print properly is 300 DPI. If the text is going to be reprinted, a DPI setting of 600 or better is ideal. When saving text documents it is best to save the files as .PDF (portable document format). If you want to edit the text, use the Optical Character Recognition (OCR) feature on your scanner.
http://www.meetingtomorrow.com/cms-category/tips-for-scanning-documents-and-images

If your scanner/computer has OCR capability it can read an image containing text and convert it into a text-file , ( which will be smaller much than the image-file and editable ) , however in my experience the OCR makes mistakes : it cam misread worms. :)
« Last Edit: 06/12/2014 06:43:12 by RD »
 

Offline alancalverd

  • Global Moderator
  • Neilep Level Member
  • *****
  • Posts: 4727
  • Thanked: 155 times
  • life is too short to drink instant coffee
    • View Profile
Re: Scanning Images
« Reply #2 on: 06/12/2014 07:16:21 »
Do you need color/grayscale, or can you reduce the file size by scanning in black & white only?
 

Offline CliffordK

  • Neilep Level Member
  • ******
  • Posts: 6321
  • Thanked: 3 times
  • Site Moderator
    • View Profile
Re: Scanning Images
« Reply #3 on: 06/12/2014 09:16:09 »
It has been a while since I've done bulk scanning.  I think I used to target around 100 to 200K per page, without perfect resolution, but "good enough".

I'd get something that will allow you to assemble a PDF, for example scanning to Adobe Acrobat (the full version, not the free version), so you can have a final deliverable in a single easy to access file (unless you need the raw images for some reason).

An OCR can help with indexing.  Even if there are some errors, hopefully you pick up enough keywords to help with searches. 

There are a number of scanners optimized for loose document scanning to PDFs, but a textbook will require page by page scanning.

Be careful of copyright issues.
 

Offline PmbPhy

  • Neilep Level Member
  • ******
  • Posts: 2771
  • Thanked: 38 times
    • View Profile
Re: Scanning Images
« Reply #4 on: 06/12/2014 21:35:32 »
Quote from: RD
Storing a page of text as an image will take up MUCH more memory than having the text as a text-file , ( like a word processor document ).
Thanks but I'm well aware of these facts. Using OCR is too unwieldy. That's why I chosen not to use it.
 

Offline RD

  • Neilep Level Member
  • ******
  • Posts: 8134
  • Thanked: 53 times
    • View Profile
Re: Scanning Images
« Reply #5 on: 07/12/2014 06:40:35 »
Looks like the minimum size for an A4 page scanned at 300DPI in monochrome is about 200kb.
As a text-file, rather than an image of a page of text, it would take up less than a tenth of that amount of memory, which could account for similar looking "PDF files of textbooks" being "much smaller" if they are made from text-files rather than scanned images of pages.
« Last Edit: 07/12/2014 06:49:11 by RD »
 

Offline syhprum

  • Neilep Level Member
  • ******
  • Posts: 3823
  • Thanked: 19 times
    • View Profile
Re: Scanning Images
« Reply #6 on: 08/12/2014 19:58:28 »
I did a quick test and found that an OCR copy needs about half disk space than a png uses but unless you get very clean scans OCR can produce some strange results and unless you need to edit the text it is best avoided
 

Offline syhprum

  • Neilep Level Member
  • ******
  • Posts: 3823
  • Thanked: 19 times
    • View Profile
Re: Scanning Images
« Reply #7 on: 18/12/2014 01:42:52 »
There are two more lossy picture compression formats BPG and Web P on the Horizon that provide better quality and smaller file size than JPEG although there are copyright problems and the difficulty of getting them into general use.
 

The Naked Scientists Forum

Re: Scanning Images
« Reply #7 on: 18/12/2014 01:42:52 »

 

SMF 2.0.10 | SMF © 2015, Simple Machines
SMFAds for Free Forums