
- #OCR AUTOMATOR USING PDFSCANNER PDF#
- #OCR AUTOMATOR USING PDFSCANNER INSTALL#
Feel free to modify and redistribute the Automator workflow or the script included within.
#OCR AUTOMATOR USING PDFSCANNER PDF#
Note: Turn off the “Prompt for OCR when opening a scanned document” option in PDFPen preferences to avoid “This document appears to be scanned” while the script sends files to PDFPen.Ĭredit: The “ OCR PDF Document” Automator workflow uses a modified version of an Apple Script created by Greg Scown. They no longer match the criteria you have set to find files with no text content.
Once the files have been processed and text content was found, they will disappear from your HoudahSpot search. PDFPen will launch in the background, process your files and quit. Select HoudahSpot > Services > OCR PDF Document from the menu. Select the PDF files you want to process. If you use PDFPen Pro instead of PDFpen, you’ll need to edit the script within the workflow to replace “PDFpen” with “PDFpen Pro”. #OCR AUTOMATOR USING PDFSCANNER INSTALL#
Automator will suggest to install the workflow as a Service.
Download the “ OCR PDF Document” Automator workflow. Since PDFPen is scriptable, you can automate the task. You can use PDFPen from Smile Software to do so. This will make their text machine-readable and searchable. Once you have found all PDF files lacking text information, you’ll need to process them using OCR. You don’t have to set up the search yourself: Simply download the “ Image PDFs” search template to find PDF files that lack OCR text. The first search field contains a “space” character This translates to the following search: Find PDF files that lack OCR text. It is safe to assume that any text contains either a space or a period. Thus, we will be looking for any PDF file that contains neither space or period. With a little trick, HoudahSpot can find PDF files that lack text content. How can you find these files and rectify this? Advanced users can use the included Automator action to create custom OCR workflows or folder actions.It is also possible to open or import existing PDF documents and perform OCR on them via a menu option (the language can be set in the Preferences). Or you may have received the scanned document from someone else.
You may have forgotten to run them through OCR. Unfortunately, you will find that not all of your PDF files have text content. The text can then be searched using HoudahSpot.
Once a scan has been processed by OCR, the PDF file contains both an image of the document and an invisible text version. Optical Character Recognition (OCR) makes these scanned documents much more useful than their paper originals. Scanning paper documents to PDF files lets you archive important (and not so important) documents without filling up cabinets.