The Naked Scientists

The Naked Scientists Forum

Author Topic: Optical charactor recognition  (Read 4314 times)

Offline syhprum

  • Neilep Level Member
  • ******
  • Posts: 3822
  • Thanked: 19 times
    • View Profile
Optical charactor recognition
« on: 30/11/2012 14:23:49 »
Does anyone have any OCR experience I have Acrobat 9.0 and SimpleOCR but neither make much of it I have an old 16 bit program Bit ware that might run on a 32 bit system that I will try to use that was very good with FAX if I can find the CD.


 

Offline RD

  • Neilep Level Member
  • ******
  • Posts: 8131
  • Thanked: 53 times
    • View Profile
Re: Optical charactor recognition
« Reply #1 on: 30/11/2012 14:57:57 »
You shouldn't use the jpg format for images with text, as it uses lossy compression which degrades the image making the text difficult to read, (for humans and OCR).  lossless compression formats like png or gif or bmp should be used for images with text ...

« Last Edit: 30/11/2012 15:09:45 by RD »
 

Offline syhprum

  • Neilep Level Member
  • ******
  • Posts: 3822
  • Thanked: 19 times
    • View Profile
Re: Optical charactor recognition
« Reply #2 on: 30/11/2012 17:29:19 »
This text panel was a screen grab from Facebook I was hoping I could advance my Geek status by producing a nice clean text version with OCR I can of course improve the visibility of it with Photoshop et al but I wanted to see some OCR in action.
 

Offline RD

  • Neilep Level Member
  • ******
  • Posts: 8131
  • Thanked: 53 times
    • View Profile
Re: Optical charactor recognition
« Reply #3 on: 01/12/2012 06:37:38 »
This text panel was a screen grab from Facebook I was hoping I could advance my Geek status by producing a nice clean text version with OCR I can of course improve the visibility of it with Photoshop et al but I wanted to see some OCR in action.

When you make a screengrab you can save it in a losseless compression format like GIF or PNG rather than lossy JPG. After conversion to JPG irreversible damage has occurred to the image data which even photoshop can't totally reverse.

I tried a randomly selected free online OCR service with your blurry image, unsurprisingly it failed miserably to convert barely readable small text, and surprisingly made mistakes with the giant heading text: "Students" => "StudonW"...

Quote from: free-online-ocr.com
by High School StudonW

1
2
3
4

S
6
7
8

9
10
11

12
http://www.free-online-ocr.com/

To be fair that website can OCR a (GIF) screengrab of its own pages ...


« Last Edit: 01/12/2012 07:12:44 by RD »
 

Offline syhprum

  • Neilep Level Member
  • ******
  • Posts: 3822
  • Thanked: 19 times
    • View Profile
Re: Optical charactor recognition
« Reply #4 on: 01/12/2012 09:18:28 »
I did of course use a lossless save when I made the original screen grab (.png) but when I sent the test picture to the forum I had to use .jpg due to the large size.
I am surprised that an expensive program like Adobe Acrobat 9 Pro Extended could not do better.
I am going to do further test with my own scans unfortunately I cannot run the old Bitware program as it needs windows 2000 or some olde worlde system that I do not have set up anywhere.
   




Adobe Acrobat 9 Pro Extended
 

Offline RD

  • Neilep Level Member
  • ******
  • Posts: 8131
  • Thanked: 53 times
    • View Profile
Re: Optical charactor recognition
« Reply #5 on: 01/12/2012 10:19:29 »
I did of course use a lossless save when I made the original screen grab (.png) but when I sent the test picture to the forum I had to use .jpg due to the large size.

If you lower the number of colours, aka "bit-depth", with png or gif you get a much smaller file size.
The default setting on png will give photographic quality (256 colours) which is unnecessarily high bit-depth for a text image. Greyscale with 4 "colours" (4 shades of grey) are sufficient for text.

You may have a preset to called "web optimised" to reduce the size of a PNG image file by reducing the number of colours, (the size is reduced greatly : to 1/6th)  ...

 

Offline Mazurka

  • Hero Member
  • *****
  • Posts: 510
    • View Profile
Re: Optical charactor recognition
« Reply #6 on: 05/12/2012 09:26:42 »
Cheers RD, that is a good tip.
 

Offline phil2000

  • First timers
  • *
  • Posts: 1
    • View Profile
Re: Optical charactor recognition
« Reply #7 on: 27/10/2014 12:48:32 »
Hey, in case anyone comes across such an problem again: The following tool newbielink:https://www.ocrgeek.com/ [nonactive] is really handsome in doing OCR with PDF.
 

The Naked Scientists Forum

Re: Optical charactor recognition
« Reply #7 on: 27/10/2014 12:48:32 »

 

SMF 2.0.10 | SMF © 2015, Simple Machines
SMFAds for Free Forums