Abstract: Lexical chunks have in recent years become widely recognized as a crucial aspect of second language competence. We address two major sorts of challenge that chunks pose for lexicography and describe computational approaches to addressing these challenges. The first challenge is lexical knowledge discovery, that is, the need to uncover which strings of words constitute chunks worthy of learners’ attention. The second challenge is the problem of representation, that is, how such knowledge can be made accessible to learners. To address the first challenge, we propose a greedy algorithm run on 20-million words of BNC that iterates applications of word association measures on increasingly longer n-grams. This approach places priority on high recall and then attempts to isolate false positives by sorting mechanisms. To address the challenge of representation we propose embedding the algorithm in a browser-based agent as an extension of our current browser-based collocation detection tool.
Abstract: The goal of this paper is to explore extensions to electronic dictionaries. Adding certain functions could considerably extend the range of tasks for which they could provide support. Putting the needed information at the distance of a mouse click would allow for active reading. This would require tight coupling of the dictionary with a text editor: all the information in the dictionary should be accessible via a mouseclick. Dictionaries combined with a flashcard system and an exercise generator could support the memorization and automation of words and syntactic structures. Finally, structuring the dictionary in a way akin to the human mind (associative network) could help the writer to find new ideas, and if needed, the word he is looking for. In sum, rather than considering the dictionary just as another component of the process of language production or comprehension chain, we consider it as the single most important resource, provided that one knows how to use it.
Abstract: This paper first presents a history of Computer-Assisted Learning (CAL), setting its origins in the 1920s with the invention of mechanical learning machines. The use of the computer then allowed the development of different types of language learning activities: comprehension tasks, simulations, etc. However, without the contribution of natural language processing (NLP), these activities are of limited use. We address the problem of the integration of NLP in CALL systems while summing up the challenges this integration has to overcome today and synthesize the workshop presentations. These presentations deal with a range of issues from error detection and correction to the extension of electronic dictionaries through the implementation of comprehensive language learning tools. We will see that the key to the integration of NLP in CALL is in the pluridisciplinary work between didacticians, IT and NLP specialists.
Abstract: New text analysis softwares issued from fields of research such as Machine Learning and Natural Languages Processing prove to be relevant tools for the language sciences. Littératron is a new data-processing tool for the automatic extraction of syntactic patterns, designed at LIP6 by Jean-Gabriel Ganascia. Associated with a linear text analyser, it reveals the stylistic peculiarities of a text. We will see that Littératron carries out a linguistic diagnosis of learners if used in language sciences, especially in the field of acquisition of written French as a foreign language. The learner can be from a heterogeneous group (various language levels and various mother tongues) or from a homogeneous group (only one language level and one mother tongue, here, Arabic). The interest of this approach is related to three fields: first, language didactics, on a purely educational basis; next, computational linguistics; finally, computer-assisted learning.
Abstract: In this article, we present a Computer Assisted Language Learning (CALL) environment for Basque. The environment has different aims: on the one hand, to offer the users (teachers, learners and computational linguists) different tools and language resources to clarify the linguistic doubts they might have about the language, and on the other hand, to store information about language learners, deviations and errors as the basis for further studies in CALL and Natural Language Processing (NLP). The environment is composed of a workbench (Lentillak), two web applications (Erreus and Irakazi), several NLP tools and two databases (Errors and Deviations), and it takes in learner corpora as well as native corpora. In addition, we present the experiment we have carried out to evaluate the usefulness of the NLP tools.
Abstract: We present an approach to Computer-Assisted Assessment of free-text material based on symbolic analysis of student input. The theory that underlies this approach arises from previous work on DidaLect, a tutorial system for reading comprehension in French as a Second Language. The theory enables processing of a free-text segment for assessment to operate without precoded reference material. A study based on a small collection of student answers to several types of questions has justified our approach, and helped to define a methodology and design a prototype.
Abstract: The quality of most CALL programs is not well balanced with respect to the use of computer technology and of language content and processing. This imbalance can be explained by a number of constraints pulling CALL developers in diverging directions. For commercial CALLware the poor learner fit and lack of feedback is a serious impediment. So far ICALL approaches trying to overcome this have not been of a sufficiently high quality due to the vast distance between most learner language and the text genres NLP is helpful for. The way forward suggested here is for ICALL to take a localized, (bilingual) lexicon-centred approach that combines sophisticated resources with improved learner fit for more creative and interactive CALLware.
Detecting grammatical errors using probabilistic parsing Presentation by Joachim Wagner, Jennifer Foster and Josef van Genabith, National Centre for Language Technology, Dublin City University at IICALL 2006