Using OCR to rename files

Get help. Get answers. Let others lend you a hand.

Moderator: Mr_Noodle

Using OCR to rename files Mon Nov 11, 2024 10:19 am • by Merryperson
I am now using the latest version of Hazel and I have downloaded loads of bank statements which contains unhelpful names like PDF document-253C24477326-1.pdf
PDF document-AF4B56355D82-1.pdf
PDF document-E558D7164722-1.pdf

Inside each document is the details such the period of the bank statement such as Account summary for 6th October to 5th November 2024.
What I would like to so is rename each file with the dates of the summary possible using OCR.
Is this possible and if so how?
Thanks in advance
Merryperson
 
Posts: 10
Joined: Tue Nov 22, 2016 4:42 pm

Re: Using OCR to rename files Mon Nov 11, 2024 10:41 am • by Mr_Noodle
You can do something by using "Contents contain match". Look up match patterns in the help.

Hazel will run test recognition as necessary.
Mr_Noodle
Site Admin
 
Posts: 11685
Joined: Sun Sep 03, 2006 1:30 am
Location: New York City

Re: Using OCR to rename files Wed Nov 13, 2024 12:29 pm • by Merryperson
Many thanks
I have managed to use the OCR to identify the bank statements from the name of the bank and move them to a folder .

Is it possible to rename the file (using the OCR on the contents of the PDF) and use the issue date actually shown within the PDF file itself.
Perhaps I am asking too much???
Merryperson
 
Posts: 10
Joined: Tue Nov 22, 2016 4:42 pm

Re: Using OCR to rename files Wed Nov 13, 2024 10:47 pm • by AgingKeeper
Hi,
As Mr Noodle said you can use the "Contents" and "Contain Match" conditions to what you want to do. I recommend you look at the examples outlined in the forum as well as the Hazel User Guide for the best instructions. At a higher level, if you can identify text that is consistently and uniquely used in documents you should be able to "match" this text in the condition of your rule, creating "variables" that are then passed from the condition to the action. So you can extract a date from the condition and use it in the action to rename the file.
In my personal experience this works well if the text layer in the document matches what we actually see in the file. You can build up a condition that matches multiple items. So you can ID your bank name, your account number, a start and end date.

When my documents are scanned and OCR'ed I find that the text layer does not always reflect what I read in the file. This causes the file renaming to be frustratingly finicky, for me. I have not yet had the chance to see if Mr. Noodle's new version of Hazel is better than the OCR that comes from my ScanSnap scanner (which I believe uses a version of the Abbyy OCR engine).

Hope this help!
AgingKeeper
 
Posts: 9
Joined: Fri Mar 28, 2014 9:59 am


Return to Support

cron