As a corpus is a representation of the linguistic reality, it is important to have homogeneous, quantifiable and valid data. This article aims at discussing the issue of elaborating a corpus of oral data from language learners of Spanish. We hereby do not merely focus on the data collection, but also on the difficulties that arise regarding the experimental design, the selection of the participants, the elaboration of a transcription model and the analysis of the data. The discussion will be based upon our own research project, for which oral samples from Spanish language learners of different proficiency levels have been collected in order to be analysed cross-sectionally. Furthermore, this article focuses on the oral experiment specifically designed for this project, similar to those of previous studies on similar subjects. Next to this, we will also discuss the procedure used for the transcription of the data and finally, a codification system will be elaborated.
Original languageEnglish
Title of host publicationCILC2016. 8th International Conference on Corpus Linguistics
EditorsAntonio Moreno Ortiz, Chantal Pérez-Hernández
Number of pages9
Publication statusPublished - 29 Nov 2016
Event8th International Conference on Corpus Linguistics - Málaga, Spain
Duration: 2 Mar 20164 Mar 2016

Publication series

NameEPiC Series in Language and Linguistics
ISSN (Electronic)2398-5283


Conference8th International Conference on Corpus Linguistics

    Research areas

  • Corpus design, ELE, Corpus compilation

ID: 27411929