Active TopicsActive Topics  Display List of Forum MembersMemberlist  CalendarCalendar  Search The ForumSearch  HelpHelp
  RegisterRegister  LoginLogin
Member Contributions
 FSI Language Courses Forum : Language Courses : Member Contributions  
Message Icon Topic: OCR of FSI Texts Post Reply Post New Topic
Author Message
DemiPuppet
Administrator
Administrator


Joined: 27 May 2006
Location: United States
Online Status: Offline
Posts: 163
Quote DemiPuppet Replybullet Topic: OCR of FSI Texts
    Posted: 31 December 2008 at 10:01pm
For what it's worth, I've run the Adobe Acrobat OCR tool against some of the PDF files on the site (I'm still working on others).  It's far from perfect, but the results might be useful for someone who wants to cut and paste text from the PDF files and is willing to make some hand edits. It might also be useful for anyone who wants to create HTML versions. Thanks to all who submitted the original PDF files.

Note that the files are larger than the originals since I've also added the book covers in most cases.  I also made sure that Acrobat was set up to retain the original file quality unchanged which may have added to the size increase.

The Finnish and French covers are quite nice looking.

Finnish (The workbook was already OCR'd)
http://www.sendspace.com/file/075xlv

French
http://www.sendspace.com/file/7713sb

Greek
http://www.sendspace.com/file/e7i30i

Hungarian
http://www.sendspace.com/file/xazv36
IP IP Logged
VagabondPilgrim
Newbie
Newbie


Joined: 08 December 2008
Location: United States
Online Status: Offline
Posts: 5
Quote VagabondPilgrim Replybullet Posted: 31 December 2008 at 11:44pm
I've gone ahead and posted these to the alternate site.
IP IP Logged
flutable
Newbie
Newbie


Joined: 13 March 2007
Online Status: Offline
Posts: 30
Quote flutable Replybullet Posted: 02 January 2009 at 4:58am
@Demipuppet, thanks for this, I thought I'd OCRd the text already. It's nice how the OCR process also straightens the pages!
IP IP Logged
onebir
Ambassador
Ambassador


Joined: 16 October 2006
Online Status: Offline
Posts: 116
Quote onebir Replybullet Posted: 02 January 2009 at 6:41am
Great work on the alternate site!

Perhaps it could also link to the material that's been uploaded to ERIC recently?  eg the FSI readers (FSI Finnish, Hungarian Turkish & Indonesian) and large amounts of DLI/Spoken Language material others have noticed has been uploaded.

Copyright might an issue for some of these texts, but linking to a govt-affiliated site that's hosting them can hardly be a copyright violation...

(Note that the links from the ERIC search results don't point direct to the PDFs.  But if you download these with flashget, the servlet that gets downloaded does provide a direct link.)
IP IP Logged
DemiPuppet
Administrator
Administrator


Joined: 27 May 2006
Location: United States
Online Status: Offline
Posts: 163
Quote DemiPuppet Replybullet Posted: 02 January 2009 at 9:32pm
Here are a few more OCR texts:

Le Monde Francophone
http://www.sendspace.com/file/d59jlk

German Basic
http://www.sendspace.com/file/15wf5a

Spanish Basic vols 1 and 2; several Spanish Programmatic files
http://www.sendspace.com/file/aza47g

Spanish Basic Vol 2 has missing page 28.36 corrected.
IP IP Logged
VagabondPilgrim
Newbie
Newbie


Joined: 08 December 2008
Location: United States
Online Status: Offline
Posts: 5
Quote VagabondPilgrim Replybullet Posted: 14 January 2009 at 8:20pm
Just a note to say that these are now on the alternate site as well.  Actually, they've been there since shortly after the last post.  I just haven't gotten around to mentioning it.  Embarrassed
IP IP Logged
Post Reply Post New Topic
Printable version Printable version

Forum Jump
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot delete your posts in this forum
You cannot edit your posts in this forum
You cannot create polls in this forum
You cannot vote in polls in this forum



This page was generated in 0.094 seconds.