![]() |
![]() ![]() ![]() ![]() ![]() |
![]() ![]() |
Member Contributions | |
![]() |
![]() |
![]() ![]() |
Author | Message |
mandel
Newbie ![]() Joined: 22 December 2007 Online Status: Offline Posts: 3 |
![]() ![]() ![]() Posted: 22 December 2007 at 2:19am |
HI,
I'm a new forum member thinking of making a new, word processed pdf version of the FSI Cantonese course materials. The FSI courses are helpful and deserves better treatment. Only that the whole process is very labor intensive and requires the transformation of the present (typewritten) pdf textbook into image files (preferably PNG). After that I will OCR the images and then proofread book to ensure that its contents are all right, before releasing a word document for all to proofread. Barring any mistakes, the end product will be released as a pdf file over at this site. The project will use free fonts and will be non-commercialized, so I don't think there would be any problem. Possibly we could get someone to clean up the audio digitally too.
Obviously this will take lots of time and effort, and I need help. Anyone interested in this project can email me at mandel1luke@yahoo.com. Perhaps three or four of us can come up with an uncluttered word processed pdf textbook which will make learning Cantonese easier.
Thanks.
|
|
![]() |
|
DemiPuppet
Administrator ![]() Joined: 27 May 2006 Location: United States Online Status: Offline Posts: 163 |
![]() ![]() ![]() |
PDF files are already available on this site. All modern OCR programs take PDF files as well as images files as input. No need to convert to PNG. I just verified this using ABBYY Fine Reader 7.0. It did a fairly decent job of converting the images into text (except for the marks over the characters).
|
|
![]() |
|
mandel
Newbie ![]() Joined: 22 December 2007 Online Status: Offline Posts: 3 |
![]() ![]() ![]() |
It depends on what model of an OCR you use. ABBYY FIne Reader is a paid software, whereas the OCR I used is freeware (FreeOCR), which does an excellent job converting image files to text, but which cannot convert PDF to text. Unfortunately ABBYY Fine Reader is a paid software I cannot afford. If you could join this project and help do the OCR (hopefully not too hectic a job), then I will do the diacritics and the proofreading itself, and we can give a more reader-friendly and uncluttered pdf booklet to learners. If this project is successful, we can extend it to other languages. All uncommercialized and for free of course. Your input is very much appreciated. P.S. I've check out the ABBYY Fine Reader. It is indeed an excellent piece of software, which does exactly what Demipuppet says it does. However, as I was testing a trial version, it did not allow me to save the OCR'd document as a file.
I humbly urged anyone with the ABBYY Fine Reader or a similar OCR software which can read pdf to email me the OCR'd doc file. Never mind the mistakes; I will proofread it and turn it into a word-processed pdf textbook, to be hosted here.
Thanks to all. Edited by mandel - 22 December 2007 at 9:16pm |
|
![]() |
|
DemiPuppet
Administrator ![]() Joined: 27 May 2006 Location: United States Online Status: Offline Posts: 163 |
![]() ![]() ![]() |
Check your Yahoo account for email.
OCR is the easy part. I know from experience that there is still a tremendous amount of work. I'm still editing/proofreading the FSI Hindi Basic course I OCR'd a couple of months ago. BTW, the free graphic editing program GIMP can convert PDF document pages into PNG (or any other graphic format). You also need to have Ghostscript installed. Edited by DemiPuppet - 24 December 2007 at 9:15am |
|
![]() |
|
mandel
Newbie ![]() Joined: 22 December 2007 Online Status: Offline Posts: 3 |
![]() ![]() ![]() |
Thanks very much for your OCR, Demi puppet.
You're right, there's still lots of work to be done. Proofreading is difficult for my side, esp. since this course uses the Yale system with lots of diacritic marks which must be edited manually.
BTW, does anyone know how to input the character M with a grave accent (`) above it under Unicode? It's required in the Yale transcripts, but somehow I can't find it. Many thanks beforehand.
|
|
![]() |
|
unzum
Newbie ![]() Joined: 25 April 2007 Location: United Kingdom Online Status: Offline Posts: 10 |
![]() ![]() ![]() |
I know what you mean mandel. I wanted to write some flashcards for Cantonese and must have spent about an hour looking for a way to write in Yale.
The closest I could find was http://toshuo.com/cantonese-tone-tool/ Hope that helps. |
|
![]() |
|
sceva
Newbie ![]() Joined: 29 February 2008 Online Status: Offline Posts: 1 |
![]() ![]() ![]() |
If you want pictures to use with your flashcards, check out http://www.foreignlanguageflashcards.com. They have some blank files that will let you type in the language you are learning. It is really easy, and the pictures make it funner to study.
|
|
![]() |
|
pudding
Newbie ![]() Joined: 15 June 2008 Location: New Zealand Online Status: Offline Posts: 5 |
![]() ![]() ![]() |
(My first post)
I am currently transcribing the Volume I text into annotated HTML (HTML with comment tags, noting where I have changed the text and where page-breaks were in the original text for proofing). I've nearly finished typing out the coliform(sic?), and am considering giving OCR a go, however I have reservations about transcribing it verbatim, and am considering re-writing it using LHSK JyutPing which seems to be the most widely used romanization on the internet, or at least in the resources I have available to me. I find PDF's difficult to work with, as it is difficult to annotate them without Acrobat and I like the flexibility of HTML. I will be posting a link to what I've done when I finish the introduction, though I'm still a little bit hazy as to what to do with regards to copyright, I don't really care if anyone copies my work or commercially reproduces what I've done, but I would appreciate credit for the transcription. I'm also thinking of other ways I can enhance the text, hyperlinking and perhaps chinese characters immediately spring to mind, while still keeping it compatible with the existing audio recordings. |
|
![]() |
|
pudding
Newbie ![]() Joined: 15 June 2008 Location: New Zealand Online Status: Offline Posts: 5 |
![]() ![]() ![]() |
Just another note, if anyone else has a portion or the entire of the text OCR'd or retyped I would really appreciate if you could let me know or share, I hate to think I'm going to all this effort to retype something someone else already has.
|
|
![]() |
|
![]() ![]() |
||
Forum Jump |
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot delete your posts in this forum You cannot edit your posts in this forum You cannot create polls in this forum You cannot vote in polls in this forum |