Contact a Humanities Office or Academic unit.
Find your course outlines.


Academic Year: Fall/Winter 2014/2015

Term: 2

Day/Evening: D

Instructor: Dr. Victor Kuperman


Office: Togo Salmon Hall 510

Phone: 905-525-9140 x 20384


Office Hours: TBA

Course Objectives:

This course studies computational tools and techniques of language processing using corpora (large electronic collections of texts) as an object of inquiry. Students will be trained in basic text-processing, statistical and programming skills using the free statistical software package R. Topics covered will include morphological, lexical and syntactic change, description of authorship and style, construction of frequency list and concordances, and others. Two-hour slots will be used for lectures and tutorials. One-hour slots will be mostly preserved for independent work under the guidance of the instructor.


Textbooks, Materials & Fees:

10%          - in-class assignments (marked as pass/fail)

35%       - home assignments

20%       - midterm

35%       - independent research project

There is no final exam. Oral presentations may be offered to a limited number of students for bonus points.

Method of Assessment:

This is a hands-on course administered in a computer lab. Students learn by attending, and participating in lectures and lab sessions, and by completing the assigned readings and the programming/research assignments. Students will be expected to conduct a small-scale independent research project involving collection, processing, statistical analysis and presentation of primary data.

Policy on Missed Work, Extensions, and Late Penalties:

Assignments submitted within 48 hours of the due date are subject to a 25% penalty. Assignments submitted 2 to 7 days after the due date are subject to a 50% penalty. Assignments are not accepted more than 7 days after the due date.

McMaster Student Absence Form (MSAF)

This is a self-reporting tool for undergraduate students to report absences DUE TO MINOR MEDICAL SITUATIONS that last up to 5 days and provides the ability to request accommodation for any missed academic work. Please note, this tool cannot be used during any final examination period. You may submit a maximum of 1 Academic Work Missed request per term. It is YOUR responsibility to follow up with your Instructor immediately (NORMALLY WITHIN TWO WORKING DAYS) regarding the nature of the accommodation. If you are absent for reasons other than medical reasons, for more than 5 days, or exceed 1 request per term, you MUST visit your Associate Dean's Office/Faculty Office). You may be required to provide supporting documentation. This form should be filled out immediately when you are about to return to class after your absence.



Please Note the Following Policies and Statements:

Academic Dishonesty

You are expected to exhibit honesty and use ethical behaviour in all aspects of the learning process. Academic credentials you earn are rooted in principles of honesty and academic integrity.

Academic dishonesty is to knowingly act or fail to act in a way that results or could result in unearned academic credit or advantage. This behaviour can result in serious consequences, e.g. the grade of zero on an assignment, loss of credit with a notation on the transcript (notation reads: "Grade of F assigned for academic dishonesty"), and/or suspension or expulsion from the university.

It is your responsibility to understand what constitutes academic dishonesty. For information on the various types of academic dishonesty please refer to the Academic Integrity Policy, located at

The following illustrates only three forms of academic dishonesty:

  1. Plagiarism, e.g. the submission of work that is not one’s own or for which other credit has been obtained.
  2. Improper collaboration in group work.
  3. Copying or using unauthorized aids in tests and examinations.

Email correspondence policy

It is the policy of the Faculty of Humanities that all email communication sent from students to instructors (including TAs), and from students to staff, must originate from each student’s own McMaster University email account. This policy protects confidentiality and confirms the identity of the student.  Instructors will delete emails that do not originate from a McMaster email account.

Modification of course outlines

The University reserves the right to change dates and/or deadlines etc. for any or all courses in the case of an emergency situation or labour disruption or civil unrest/disobedience, etc. If a modification becomes necessary, reasonable notice and communication with the students will be given with an explanation and the opportunity to comment on changes. Any significant changes should be made in consultation with the Department Chair.

McMaster Student Absence Form (MSAF)

In the event of an absence for medical or other reasons, students should review and follow the Academic Regulation in the Undergraduate Calendar Requests for Relief for Missed Academic Term Work. Please note these regulations have changed beginning Fall 2015. You can find information at If you have any questions about the MSAF, please contact your Associate Dean's office.

Academic Accommodation of Students with Disabilities

Students who require academic accommodation must contact Student Accessibility Services (SAS) to make arrangements with a Program Coordinator. Academic accommodations must be arranged for each term of study. Student Accessibility Services can be contacted by phone 905-525-9140 ext. 28652 or e-mail For further information, consult McMaster University's Policy for Academic Accommodation of Students with Disabilities.

Academic Accommodation for Religious, Indigenous and Spiritual Observances

Students requiring academic accommodation based on religion and spiritual observances should follow the procedures set out in the Course Calendar or by their respective Faculty. In most cases, the student should contact his or her professor or academic advisor as soon as possible to arrange accommodations for classes, assignments, tests and examinations that might be affected by a religious holiday or spiritual observance.

Topics and Readings:

Required reading:

Stefan Th. Gries, Quantitative corpus linguistics with R: A practical introduction. 2009. Routledge

Other readings (complete list to be announced in class)

Kilgarriff, Adam and Gregory Grefenstette (2003) “Web as corpus”. Computational Linguistics 29:1-15



Week 1

Introduction into Corpora

Reading: Gries, 2.1, Kilgarriff and Grefenstette (2003)


Week 2

Reading: Gries: Chapter 2


Week 3

introduction into R

Reading: Gries: Chapter 3.1-3.5


Week 4 

Reading: Gries: Chapter 3.6


Week 5

Reading: Gries: Chapter 3.7


Week 6


Reading: Gries, chapter 4.1


Week 7 No classes


Week 8

Reading: Gries, chapter 4.2


Week 9

Reading: Gries, chapter 4.4-4.5


Week 10

Reading: Gries, chapter 5.1-5.2


Week 11

Reading: Gries, chapter 5.3-5.4


Week 12

Reading: Gries, chapter 5.5


Week 13

Reading: Gries, chapter 6


Week 14

Reading: Gries, chapter 6


Week 15

Wrap-up of the course

Other Course Information:

The instructor and university reserve the right to modify elements of the course during the term. The university may change the dates and deadlines for any or all courses in extreme circumstances. If either type of modification becomes necessary, reasonable notice and communication with the students will be given with explanation and the opportunity to comment on changes. It is the responsibility of the student to check their McMaster email and course websites weekly during the term and to note any changes.

It is the policy of the Faculty of Humanities that all email communication sent from students to instructors must originate from the student’s own McMaster University email account.  This policy protects confidentiality and confirms the identity of the student.  Instructors will delete emails that do not originate from a McMaster email account.


In this course we will be using the Avenue to Learn.  Students should be aware that, when they access the electronic components of this course, private information such as first and last names, user names for the McMaster e-mail accounts, and program affiliation may become apparent to all other students in the same course. The available information is dependent on the technology used.  Continuation in this course will be deemed consent to this disclosure.  If you have any questions or concerns about such disclosure please discuss this with the course instructor. It is likely that the H1N1 flu virus will be circulating on campus this season.  Please make every effort to protect yourself and your classmates from catching  flu or any other contagious illness.  Symptoms of H1N1 flu include fever, cough, runny nose, sore throat, body aches, fatigue and lack of appetite.  If you have these symptoms, DO NOT COME TO CLASS.  For more information about the flu, go to

Students with special needs who require accommodations must register with the Centre for Student Development.