Data for Parley and Similar Applications
We, that is Vox Humanitatis, are at a stage that we have access to more and more data and it would be a shame if we did not give it out in a format people can simply use. Nevertheless there are some issues to it:
The amount of data for “learning lessons” is simply too huge. That is: files with 10.000 terms as vocabulary are simply too much and nobody will ever really learn them. Instead, depending from school type, university, study course or simply private learning special subsets of this data are necessary.
We asked translators and language enthusiasts to help by integrating and all did a really great job. Thanks to many generous people the basics terminology grew a lot. We already attributed classes (or maybe you might understand the term “category” better) to the various terms and so you have some kind of “lessons” in particular from our own data, but: it is still too much for most who are learning a language.
Then, during the last weeks also the Wiktionary data (English version) became part of our data set and this on one hand freed additional data available for language learning and on the other: it can confuse potential new users even more.
At this stage the vocabulary needs in particular integration in less resourced languages and of course also further checking and contribution to the mayor languages is highly welcome.
What we need most now is teachers who add tags to our lists in order to create “lessons for their students” or students who help to prepare the data for the whole class.
Having this: the script we are using to extract the data needs to be adapted. Actually the best thing would be a “conversion engine”, because that could help to get the data in many different formats.
For those who don't know Parley: it's a vocabulary trainer and much more – please have a look at http://edu.kde.org/parley for further information.
- Bina Meusl's blog
- Login or register to post comments


Can you elaborate a bit on what type of contribution is needed?
I'm a computer scientist with a minor in computational linguistics (focused on machine translation), but also took some lectures in phonetics and phonology. Perhaps I could help out with some programming. What needs to be done with the script, and what language is it written in?