SWAC Metatags

Recent technological developments have made the systematic recording of words and expressions and the creation of language audio collections possible. With specific tools, it is now possible to record 1000 words in less than an hour.

These audio collections can be used for:

Linguistic research (record and compare the pronunciation in different regions)
Didactics (e.g. «English Irregular Verbs»)
Illustration (for electronic dictionaries)

The exchange of audio files has been made much easier by Internet. Files can be copied and downloaded easily. However, since the recordings must be associated with other data (what is the word or expression ? what language is it in?) in order to be indexed or used properly, it is useful to have a standard recording format that contains the associated data. In this way, the audio collections can easily be produced by different software on different platforms and by different people.

We suggest a simple and practical way to associate data to the audio recording, the aim being to show the way to define the data rather than defining the data itself.

The Vorbis Comment metadata system allows you to stock additional data in Ogg Vorbis, Flac and Ogg Speex files. This solution is really adapted to setting up audio word collections. This is an existing, free and widely supported technology. Audio file transfer is easy and, because the associated data is in the audio file already, in the form of metadata in a Vorbis Comment tag, there is no need for additional description.

Below is a proposed list of standard field names with a description of intended use. We recommend adopting the same standard field names for a community that is producing and using audio word collections, on the same principle as the Vorbis Comment field recommendations : for example, ina music collection, you do not have to complete a field that gives the name of the artist but, if you do, you must call the field “ARTIST” and not “BAND” or anything else.

None of these fields are intended to be mandatory, although we believe that no real automated processing can be done without the SWAC_TEXT and SWAC_LANG fields.

FIELDS

1. Data for the pronounced text

SWAC_TEXT

Text pronounced by the speaker

« house »
« it's raining cats and dogs ! »

SWAC_LANG

The language of the word pronounced (ISO 639-3)

record	value
« rendezvous »	eng
« rendez-vous »	fra
« crocodile »	eng
« crocodile »	fra

SWAC_ALPHAIDX

Items which allow programs to generate automatically the alphabetical index of the audio collection. The separator is «|» (U+007C)

record	value
« house » (eng)	house
« It's raining cats and dogs! » (eng)	rain\|cat\|dog
« I am » (eng)	be
« 啊 » (chi)	ā
« se laver » (fra)	laver (se)
« j'ai faim » (fra)	avoir\|faim
« ett fönster » (swe)	fönster
« telefonul » (ron)	telefon

SWAC_BASEFORM

When the record is a derivative form of a word, this field indicates the base word

record	value
« I was » (eng)	to be
« je vais » (fra)	aller
« друзей » (rus)	друг

SWAC_FORM_NAME

When the SWAC_BASEFORM is defined, this field indicates the name of the form

record	value
« je vais » (fra)	Present. 1p.S.
« друзей » (rus)	Gen. Pl.

SWAC_FORM_REF

Name of the referential used by the SWAC_FORM_NAME field (such as LMF codification)

SWAC_HOMOGRAPHIDX

Index which can help the user to differentiate homographs in the audio collection. The SWAC_HOMOGRAPHIDX is based on the grammatical difference between homographs.

record	value
« пропа́сть » (rus)	verb
« про́пасть » (rus)	noun
« os » (fra) /os/	sing
« os » (fra) /o/	plur

But it can also be a translation into another language (basically in English) or a small explanation if the difference is not of grammatical nature.

record	value
« мука́ » (rus)	flow
« му́ка » (rus)	pain
« bass » (eng)	fish
« bass » (eng)	music

SWAC_HOMOGRAPHIDX_REF

Name of the referential used by the SWAC_HOMOGRAPHIDX field.

2. Speaker data

SWAC_SPEAK_NAME

Speaker's name

« Jacques Durand »
« Иван Иванович Иванов »

SWAC_SPEAK_GENDER

Speaker's gender [M/F]

M: masculine
F: feminine

SWAC_SPEAK_BIRTH_YEAR

Speaker's year of birth

(Format: YYYY)

SWAC_SPEAK_LANG

Speaker's native speaking language

(ISO 639-3)

SWAC_SPEAK_LANG_COUNTRY

Country where the speaker acquired the SWAC_SPEAK_LANG

(ISO-3166-1)

SWAC_SPEAK_LANG_REGION

Region where the speaker acquired the SWAC_SPEAK_LANG

« Pays basque »

SWAC_SPEAK_LANG_LOC

Location of the SWAC_SPEAK_REGION (format: WGS 84 DM)

N 48°52.233 E 2°24.232

SWAC_SPEAK_PRON

General note about the pronunciation of the speaker (for example, a speech defect)

SWAC_SPEAK_LIV_COUNTRY

Speaker's living country code

(ISO-3166-1)

SWAC_SPEAK_LIV_TOWN

Speaker's living town

« Saint-Jean-Pied-de-Port »

SWAC_SPEAK_CONTACT

Contact data for the speaker

« jacques-durand@shtooka.net »

SWAC_SPEAK_DESC

Free note about the speaker

3. Word pronunciation data

SWAC_PRON_INTONATION

Note about the intonation

record	value
« oh »	Surprise
« oh »	Realization

SWAC_PRON_SPEED

[1/2/3]

1: slow pronunciation for pedagogical use
2: normal pronunciation
3: fast

SWAC_PRON_COMMENT

Comments on the pronunciation of the word by the speaker

record	value
« abasourdir » (fra) /a.ba.zuʁ.diʁ/	Academic pronunciation
« abasourdir » (fra) /a.ba.suʁ.diʁ/	Popular pronunciation
« догово́р » (rus)	Standard pronunciation
« до́говор » (rus)	Popular pronunciation in the south of Russia

SWAC_PRON_API

Phonetic transcription (using the international API phonetic alphabet)

SWAC_PRON_PHON

Specific phonetic transcription in the language system concerned

record	value
« мука » (rus)	мука́ (with the diacritic symbol)
« 啊 » (chi)	ā (the pinyin transcription)

4. Audio collection data

SWAC_COLL_NAME

« Base Audio Libre De Mots Français »

SWAC_COLL_SECTION

Section in the audio collection

SWAC_COLL_DESC

Description of the collection

SWAC_COLL_ORG

Organization producing the audio collection

SWAC_COLL_ORG_URL

URL for data on the organization producing the audio collection

SWAC_COLL_LICENSE

License which applies to the collection

SWAC_COLL_COPYRIGHT

Audio collection copyrights

SWAC_COLL_AUTHORS

Audio collection authors

SWAC_COLL_URL

URL for general information on the collection

5. Technical data

SWAC_TECH_QLT

Audio Quality [1/2/3/4/5]

1: very poor
2: poor
3: normal
4: good
5: very good

SWAC_TECH_DATE

Date of recording

(Format: YYYY-MM-DD)

SWAC_TECH_SOFT

The program used to record the sound

Note about the Vorbis Comment specifications:

Please consult the Vorbis Comment home page at: http://xiph.org/vorbis/doc/v-comment.html for more information on general comment tag specifications.

The content of tags such as TITLE, DESCRIPTION, LICENSE and COPYRIGHT can be set to any value. These fields can be completed automatically using data provided by SWAC fields, although it is recommended that the GENRE field is set to « Speech ».

GENRE: « Speech »

According to the general Vorbis Comment specifications, the use of additional fields is allowed. This enables SWAC Fields to cohabit with other specific data. For example, electronic dictionaries can use specific fields such as « OMEGAWIKI_ARTICLEIDX » to link audio items to their articles.

Note about the ID3v2 Tagging Format:

Since the availability of the 2.4 version of the ID3 Tagging Format, it is possible to store Unicode character strings in MP3 audio files. We do not recommend the use of this tagging format but SWAC fields can be stored as « TXXX » frames.

Please consult the ID3 Tagging Format home page at http://www.id3.org/ for more information.

Note about this document:

This document is distributed and licensed by the Shtooka Project under the Creative Commons BY-SA License. More information about the license at: http://creativecommons.org/licenses/by/2.0/fr/deed.en_GB