Naked Science Forum

Non Life Sciences => Geek Speak => Topic started by: syhprum on 30/11/2012 14:23:49

Title: Optical charactor recognition
Post by: syhprum on 30/11/2012 14:23:49
Does anyone have any OCR experience I have Acrobat 9.0 and SimpleOCR but neither make much of it I have an old 16 bit program Bit ware that might run on a 32 bit system that I will try to use that was very good with FAX if I can find the CD.
Title: Re: Optical charactor recognition
Post by: RD on 30/11/2012 14:57:57
You shouldn't use the jpg format for images with text, as it uses lossy compression (http://en.wikipedia.org/wiki/Lossy_compression) which degrades the image making the text difficult to read, (for humans and OCR).  lossless compression (http://en.wikipedia.org/wiki/Lossless_compression) formats like png or gif or bmp should be used for images with text ...

 [ Invalid Attachment ]
Title: Re: Optical charactor recognition
Post by: syhprum on 30/11/2012 17:29:19
This text panel was a screen grab from Facebook I was hoping I could advance my Geek status by producing a nice clean text version with OCR I can of course improve the visibility of it with Photoshop et al but I wanted to see some OCR in action.
Title: Re: Optical charactor recognition
Post by: RD on 01/12/2012 06:37:38
This text panel was a screen grab from Facebook I was hoping I could advance my Geek status by producing a nice clean text version with OCR I can of course improve the visibility of it with Photoshop et al but I wanted to see some OCR in action.

When you make a screengrab you can save it in a losseless compression format like GIF or PNG rather than lossy JPG. After conversion to JPG irreversible damage has occurred to the image data which even photoshop can't totally reverse.

I tried a randomly selected free online OCR service with your blurry image, unsurprisingly it failed miserably to convert barely readable small text, and surprisingly made mistakes with the giant heading text: "Students" => "StudonW"...

Quote from: free-online-ocr.com
by High School StudonW

1
2
3
4

S
6
7
8

9
10
11

12
http://www.free-online-ocr.com/

To be fair that website can OCR a (GIF) screengrab of its own pages ...
 [ Invalid Attachment ]

Title: Re: Optical charactor recognition
Post by: syhprum on 01/12/2012 09:18:28
I did of course use a lossless save when I made the original screen grab (.png) but when I sent the test picture to the forum I had to use .jpg due to the large size.
I am surprised that an expensive program like Adobe Acrobat 9 Pro Extended could not do better.
I am going to do further test with my own scans unfortunately I cannot run the old Bitware program as it needs windows 2000 or some olde worlde system that I do not have set up anywhere.
   




Adobe Acrobat 9 Pro Extended
Title: Re: Optical charactor recognition
Post by: RD on 01/12/2012 10:19:29
I did of course use a lossless save when I made the original screen grab (.png) but when I sent the test picture to the forum I had to use .jpg due to the large size.

If you lower the number of colours, aka "bit-depth", with png or gif you get a much smaller file size.
The default setting on png will give photographic quality (256 colours) which is unnecessarily high bit-depth for a text image. Greyscale with 4 "colours" (4 shades of grey) are sufficient for text.

You may have a preset to called "web optimised" to reduce the size of a PNG image file by reducing the number of colours, (the size is reduced greatly : to 1/6th)  ...

 [ Invalid Attachment ]
Title: Re: Optical charactor recognition
Post by: Mazurka on 05/12/2012 09:26:42
Cheers RD, that is a good tip.
Title: Re: Optical charactor recognition
Post by: phil2000 on 27/10/2014 12:48:32
Hey, in case anyone comes across such an problem again: The following tool https://www.ocrgeek.com/ is really handsome in doing OCR with PDF.