Page cover image

Add Audio Pronunciation from Lingua Libre (Simple Way)

A Simple Way for Add Audio Pronunciations for Non-Lingua Libre Bot Wiktionaries

Step 1

How to Get the List of Already Recorded Words from Lingua Libre.

There are two methods for this Step.

  • Go to Lingua Libre's Sound library, Type or Select the Language you needed and Search. Next Click on Export this query in csv. (Note: CSV query file shows only 102 Recorded words). This is not recommended by me (User:Sriveenkat).

It is much better than the previous method. Go to Lingua Libre's Datasets, Find for the language you needed. And Download the Language .zip dataset. (Note: You have a good internet speed for Download large datasets and Your computer should have space to store this dataset)

Download Page of the Datasets

Step 2

For the Example I Downloaded Malayalam dataset (Q437-mal-Malayalam.zip). First you need to after downloaded the data set file and then extract them by using any Zip Extractor. After Extracted,

  • We don't needed for this mnt folder, So Delete it .

  • We just need the Q437-mal-Malayalam Folder. Inside the Q437-mal-Malayalam Folder have Users Folders inside them audio files. (If hard Understand Please watch the below video.)

Showing the File Structure of datasets - Video

Step 3

We are now going to use Regex (Regular expression) to find out which word is in our Wiktionary among the recorded words.

I suggesting to using Geany Text Editor. This available in Linux, Windows and Mac.

  • Step 1: First select all the audio files, then copy it.

  • Step 2: Open a text editor and paste the copied.

  • Step 3: It shows with your directory namespace. Then you have to delete it with Find and Replace Feature.

  • Step 4: Next Delete .ogg (File Format) Remove using Find and Replace Feature.

  • Step 5. Maybe, if some files have underscore (_), change it to a space. (This not mentioned in the Demo video.) Change using Find and Replace Feature.

  • Now is the Regex technique use, Type in the Find box \n and Type in the Replace box ]] - [[. Importantly tick on the Use regular expression. If you don't do this, it doesn't work.

Please see the demo video on below

After Regex is Finished. Paste the text in your [[User:<Your Username>/Sandbox]]or Create a separate Sandbox for them.

User: Vis M's Recorded words list in My Sandbox at Tamil Wiktionary

In the picture you can see which entries are in the Wiktionary and which entries are not in the Wiktionary. So, we can easily identify them and add Audio Pronunciations on those entries. if already someone added pronunciation, we can skip those entries.

Accents

As Some languages have different accents, we can specify the country or the place of the person who recorded the audio. On the User metadata page in the Lingua Libre of the Speaker, we can see it if he/she mentions their country or their place.

Example: User Metadata Page in the Lingua Libre

Example Template for Adding Accents:

{{audio|LL-Q5885 (tam)-Sriveenkat-{{subst:BASEPAGENAME}}.wav|Audio (India)}}

{{audio|LL-Q36236 (mal)-Vis M-{{subst:BASEPAGENAME}}.wav|Audio (Alappuzha, Kerala,India)}} or (India) or (Alappuzha, Kerala)

Templates

Lingua Libre audio files names like LL-<Language Wikidata ID>(<language ISO-3 code>)-<Wikipedia Username>-<word>.wav So we can easily set up a template, for each Wiktionary template may vary slightly.

For Tamil Wiktionary

In Tamil Wiktionary. The flag of the country is used to represent the Speaker's Accent.

Take Vis M's Record for example. Vis M's has mentioned his accent on his Lingua Libre's metadata page. So we can input his accent "Alappuzha, Kerala".

{{audio|LL-Q36236 (mal)-Vis M-അമ്മ.wav|[[File:Flag of India.svg|24px|ஆலப்புழா, கேரளம்]]}}

For English Wiktionary

The English Wiktionary does not use flags, it uses text.

Take Vis M's Record for example. Vis M's has mentioned his accent on his Lingua Libre's page. So we can input his accent "Alappuzha, Kerala".

{{audio|ml|LL-Q36236 (mal)-Vis M-അമ്മ.wav|Audio (Alappuzha, Kerala)}}

Note that the French Wiktionary Lingua Libre bot automatically adds our accents in our audio pronunciations.

A Screenshot of "മാവ്" entry in French Wiktionary

Tips

An easy way is to use "subst:BASEPAGENAME" feature of mediawiki for automatically inserting title of the audio. To add File:LL-Q5885 (tam)-Sriveenkat-அம்மா.wav to entry for "அம்மா", simply paste the following text.

==Pronunciation==

* {{audio|LL-Q5885 (tam)-Sriveenkat-{{subst:BASEPAGENAME}}.wav}}

(Of course, localize the above text for each Wiktionary.) This way you don't have to modify text for each entry. - This introduced me by User:Vis M

Last updated