We are pleased to announce that prof. Mark Davies of Brigham Young University in Provo, Utah, USA has accepted the invitation of the Institute of the Czech National Corpus (Faculty of Arts, Charles University in Prague) and will be presenting two lectures on corpus linguistics, on Monday April 11th at 6 PM at the Faculty of Arts, and on Tuesday April 12th at 1 PM at the ICNC.
Due to limited space, please use this page to register for both lectures at your earliest convenience. For details about the lectures (venue, abstracts) see below. For more information about the speaker see his personal web page.
Corpus-based analyses of variation in English: Why both size and structure matter
Monday, April 11th 2016, 6 PM | Faculty of Arts (room 104, 1st floor), nám. J. Palacha 2
English corpus linguistics has a tradition of using small (1-5 million word) corpora to look at variation for high frequency phenomena. Within the last 5-10 years, however, very large web-based corpora (like those from Sketch Engine) have also become available. While both of these types of corpora certainly have their advantages, I argue that both have serious weaknesses when it comes to looking at many types of variation in English.
I will present many examples of lexical, morphological, syntactic, and semantic variation in English, which can only be studied using corpora that are both large and which have a structure that lends itself to looking at variation (rather than just as a “blob” of billions of words of web pages).
These examples of genre-based, historical, and dialectal variation in English will come from the 520 million word Corpus of Contemporary American English (COCA), the 400 million word Corpus of Historical American English (COHA), and the 1.9 billion word Corpus of Global Web-based English (GloWbE). All of these corpora are much larger than comparable corpora of English, and their unique structure allows them to provide insight into variation in English that cannot be obtained with any other source.
New from the BYU corpora: the NOW corpus and virtual corpora
Tuesday, April 12th 2016, 1 PM | Institute of the Czech National Corpus (room 5), Panská 7
May 2016 will see two exciting developments from the BYU corpora (corpus.byu.edu), which are probably the most widely-used corpora at present. In this presentation I will give a “sneak peek” of these changes.
First, we will release the NOW corpus (Newspapers on the Web). The corpus is composed of about three billion words of data from web-based newspapers for every day from January 2010 until now. Most importantly, the corpus grows by about 6-7 million words each day, which makes it ideal for looking at ongoing changes in the language.
Second, we have incorporated into all of the BYU corpora the ability to create and use “virtual corpora” (previously only available with the BYU Wikipedia corpus). Users can create virtual corpora based on source (e.g. a particular magazine or newspaper or author), title, date, (sub-)genre, and even words within the text. They can then search within their virtual corpora, compare across them, and even extract keywords.