Naked Science Forum
Non Life Sciences => Geek Speak => Topic started by: syhprum on 30/11/2012 14:23:49
-
Does anyone have any OCR experience I have Acrobat 9.0 and SimpleOCR but neither make much of it I have an old 16 bit program Bit ware that might run on a 32 bit system that I will try to use that was very good with FAX if I can find the CD.
-
You shouldn't use the jpg format for images with text, as it uses lossy compression (http://en.wikipedia.org/wiki/Lossy_compression) which degrades the image making the text difficult to read, (for humans and OCR). lossless compression (http://en.wikipedia.org/wiki/Lossless_compression) formats like png or gif or bmp should be used for images with text ...
[ Invalid Attachment ]
-
This text panel was a screen grab from Facebook I was hoping I could advance my Geek status by producing a nice clean text version with OCR I can of course improve the visibility of it with Photoshop et al but I wanted to see some OCR in action.
-
This text panel was a screen grab from Facebook I was hoping I could advance my Geek status by producing a nice clean text version with OCR I can of course improve the visibility of it with Photoshop et al but I wanted to see some OCR in action.
When you make a screengrab you can save it in a losseless compression format like GIF or PNG rather than lossy JPG. After conversion to JPG irreversible damage has occurred to the image data which even photoshop can't totally reverse.
I tried a randomly selected free online OCR service with your blurry image, unsurprisingly it failed miserably to convert barely readable small text, and surprisingly made mistakes with the giant heading text: "Students" => "StudonW"...
by High School StudonW
1
2
3
4
S
6
7
8
9
10
11
12
http://www.free-online-ocr.com/
To be fair that website can OCR a (GIF) screengrab of its own pages ...
[ Invalid Attachment ]
-
I did of course use a lossless save when I made the original screen grab (.png) but when I sent the test picture to the forum I had to use .jpg due to the large size.
I am surprised that an expensive program like Adobe Acrobat 9 Pro Extended could not do better.
I am going to do further test with my own scans unfortunately I cannot run the old Bitware program as it needs windows 2000 or some olde worlde system that I do not have set up anywhere.
Adobe Acrobat 9 Pro Extended
-
I did of course use a lossless save when I made the original screen grab (.png) but when I sent the test picture to the forum I had to use .jpg due to the large size.
If you lower the number of colours, aka "bit-depth", with png or gif you get a much smaller file size.
The default setting on png will give photographic quality (256 colours) which is unnecessarily high bit-depth for a text image. Greyscale with 4 "colours" (4 shades of grey) are sufficient for text.
You may have a preset to called "web optimised" to reduce the size of a PNG image file by reducing the number of colours, (the size is reduced greatly : to 1/6th) ...
[ Invalid Attachment ]
-
Cheers RD, that is a good tip.
-
Hey, in case anyone comes across such an problem again: The following tool https://www.ocrgeek.com/ is really handsome in doing OCR with PDF.