Transcribing the Rate Books - Kwun Tong retirees
Primary tabs
Submitted by annelisec on Sun, 2016-09-18 05:12
I'll be in Hong Kong for 3 weeks in November and had the idea of perhaps making contact with retirees who live in Kwun Tong near the Public Records Office to join to transcribe a page or two each of the Rate Books. There is a large public housing estate there and I imagine there may be one or two English-reading folks who would enjoy the task.
Forum:
Maybe digitise first?
The results of transcription are very valuable, but it is time-consuming work. Transcription of the jurors' lists has almost ground to a halt at the moment. So finding the best answr to "what do I get out of it" is worth spending time on.
One idea - will it be easier to start asking volunteers with cameras to help by digitising and making the images publicly available, then look at transcription as a follow-up project?
Regards, David
The genesis of this idea was
The genesis of this idea was that the people who live in Kwun Tong have no idea the PRO is there. My idea was actually community outreach first, and transcription second. Let's see where it takes.me.
By the way, I've stopped the Jury lists because I can't type that much in one day, and it is hard on my eyes.. And contrary to claim, it takes me 1 hour to do one page - accurately. (I am not a detail oriented person)
It's usually very quiet at
It's usually very quiet at the PRO, so it'll be good to get more people using it. Fingers crossed your project gets a good response.
Is there some reason we're
OCR
I used OCR to process the 1941 list. It was faster than the approach we currently use, so it took me 10-20 minutes a page instead of 20-30 minutes. But then I had to do the whole 82 pages, which takes a lot of time. So I'd rather the slightly slower non-OCR approach, as it can share the work across multiple people.
If we just load the PDF and set the OCR going, it doesn't take much time, but I found the structure of the tables wasn't preserved. Once the columns are out of synch, the Jurors list doesn't make much sense. So the 10-20 minutes per page were spent defining the table layout before the OCR, then correcting the text afterwards.
If you're getting better results without taking too much of your time, please post up a sample post-OCR Jurors List and let's take a look.