Università di Bergamo > Cerlis > Versione Italiana



Corpus of  Academic English


Research project funded by the Italian Ministry of Research.


As an aid in the analysis of variation in intercultural communication, and in the identification of textual variants arising from the use of English as a first language, second language, or lingua franca of the scientific community, we have devised a corpus formed by English – and in part Italian –texts for academic communication, produced by scholars and academic institutions in various parts of the world.


CADIS can thus enable researchers to analyse the most significant macro-microlinguistic variants in terms of identity, evaluation and interpretation in the light of recent linguistic scholarship. More specifically, the data allows an in-depth analysis of the following aspects:


  • genre and macrostructure, with their resulting lexico-grammatical realisations;
  • speech acts expressing positive/negative evaluation, both exophoric and metatextual;
  • pragmatic, interpersonal plane of discourse (stance, hedging, politeness);
  • evidence of popularisation and/or promotional discourse;
  • function of verbal and lexical modality;
  • degree of background knowledge required (content schemata);
  • correlation with such authorial variables as gender and academic standing.


Besides including two alternative languages and representing native as well as non-native speakers, CADIS also represents four different disciplinary areas:


  • Applied Linguistics
  • Economics
  • Law
  • Medicine


For each disciplinary area, four different textual genres have been considered:

  • Abstracts
  • Book reviews
  • Editorials
  • Research articles


The structural complexity of CADIS reflects its contrastive orientation: it is in fact designed to be internally comparable, so as to enable researchers to analyse and contrast the chosen texts, not only by disciplinary area, genre, language and culture, but also historically. This is possible because the corpus covers a time frame of over 30 years, from 1980 to the present day.


The English texts were taken from more than 30 peer-reviewed journals available by subscription through the University of Bergamo website. Because all the journals selected have a high impact factor, we are confident that the content of our corpus is highly representative of each specialised community from which it originated. The same principle is followed in the sampling of Italian academic texts, which have been selected from the most important journals available in each field.


CADIS comprises 2761 academic texts, reaching a total of about 12 million tokens, selected and classified by disciplinary area, genre, language, author (i.e. NS/NNS), geographical provenance, date of publication and source journal.



Project  leader:


Prof.  Maurizio Gotti


Project  members:


Dr. Patrizia Anesa

Dr. Ulisse Belotti

Dr. Larissa DAngelo

Dr. Davide Giannoni

Dr. Stefania Maci

Dr. Michele Sala



CADIS Corpus (1980-2011) 


The Subcorpora


Applied Linguistics (1980 – 1999) ENG


Applied Linguistics (2000 – 2011) ENG


Applied Linguistics (1980 – 1999) ITA


Applied Linguistics (2000 – 2011) ITA


Economics (1980 – 1999) ENG


Economics (2000 – 2011) ENG


Economics (1980 – 1999) ITA


Economics (2000 – 2011) ITA


Law  (1980 – 1999) ENG


Law  (2000 – 2011) ENG


Law  (1980 – 1999) ITA


Law  (2000 – 2011) ITA


Medicine  (1980 – 1999) ENG


Medicine  (2000 – 2011) ENG


Medicine  (1980 – 1999) ITA


Medicine  (2000 – 2011) ITA