Conversion of images with text to word files
Primary tabs
Many documents from old Hong Kong, e.g. Governmental Reports, have been digitized and are available online. Unfortunately, most of them are only photos of the old books/pages. Although available as pdf-files, they are just images. So re-typing is necessary.
The only way to convert theses images to text (e.g. word) is the use of an Optical Character Recognition software (OCR). On the internet, I found one than can be used free of charge (of course, it's full of adverts). The website is FREE ONLINE OCR SERVICE. I tested it and it works quite satisfactorily. Just upload a file (pdf or jpg), select language and desired product (Microsoft word, excel, or text), start and wait some ten seconds for the result. The converted files can be downloaded and further processed. The website allows you 15 texts per hour without any registration necessary. Here are two examples.
Straight text is quite easy. This is a jpg file from the 1891 Public Works Report.
And this is the converted word-file:
- Re-construction of Praya Bridge over Bowrington Canal.—The necessary wrought-iron girders for the reconstruction of this bridge having been obtained from England, a contract was entered into with Messrs. CIIAN A TONG & CO. for their erection and for the. masonry work required ; and with Messrs. FENWICK & CO. for the construction of cast and wrought-iron railings in November last.
- This bridge is 19 feet wide and has three spans of 29 feet 6 inches.
- On examination it was found that the foundations of the piers and abutments of the old bridge had been considerably undermined. These have now been protected with sheet piling and a concrete apron laid.
- The tops of the piers have been levelled ready to receive the girders.
- Satisfactory progress has been made by Messrs. FENWICK & CO. with the iron railings.
Almost perfect, only one correction necessary (CIIAN is CHAN).
More complicated texts are not that easy, but still faster than re-typing. See this one:
After conversion, the resulting word file is this one: (just for this presentation, it's a screen shot of the word file only, because there are many layout and format markers that couldn't uploaded to Gwulo without creating too much confusion)
Quite ok, but the program has some problems processing the two columns of text and reading some characters. Interesting to note that even the stains on the paper are converted (consequently to stains again that can be deleted easily). Some rearrangement was necessary, I made a table for the two columns of text. And here's the result:
982 THE HONGKONG GOVERNMENT GAZETTE, 21ST DECEMBER., 1889.
GOVERNMENT NOTIFICATION.—No. 522.
The following Regulations under The Tramways Ordinance, 1883, are published for general information.
By Command, A. LISTER,
Acting Colonial Secretary.
Colonial Secretary's Office, Hongkong, 21st December, 1889.
REGULATIONS
,Made the 16th day of December, 1889, by the Governor in
Council., under The Tramways Ordinance, 1883,
Section 42.
Application. |
|
Number of passengers to be carried on tram car |
|
Luggage, only to be carried on passenger car, under certain conditions |
|
No stoppage for Passengers except at authorised Stations |
|
Watchmen to be employed to prevent obstructions. |
|
Time of Inspection and Testing of carriages &c., of Machinery &c. |
|
Notice of alterations or changes in Machinery &c. |
|
Velocity of carriages |
|
Penalty for Breach of regulations |
|
COUNCIL CHAMBER, HONGKONG. ' |
ARATHOON SETH, Clerk of Councils |
This could be a helpful tool. Certainly more programs are available on the internet. I would be glad to hear the experiences others made using this or a similar tool.
PDF to Word
Very impressive - especially when the OCR app is free!
Thank you.