Contact a Humanities Office or Academic unit.
Find your course outlines.

LINGUIST 4D03 Computers&LinguisticAnalysis

Academic Year: Winter 2017

Term: Fall

Day/Evening: D

Instructor: Prof. Michel Genereux


Office: Togo Salmon Hall 623

Phone: 905-525-9140 x 24940

Office Hours: Thursdays 10:30 am to 12:00 noon TSH 623

Course Objectives:

This course studies computational tools and techniques of language processing using corpora (large
electronic collections of texts) as an object of inquiry. Students will be trained in basic text-processing,
statistical and programming skills using the free statistical software package R. Topics covered will
include morphological, lexical and syntactic change, construction of frequency list and concordances,
description of authorship and style, and others. The first two hours will be taken by lectures, and
tutorials. The remaining hour will be for independent, guided work on exercises, sometimes including
homework and in-class assignments.

Textbooks, Materials & Fees:

Required reading: Stefan Th. Gries, Quantitative corpus linguistics with R: A practical introduction. 2009.
Companion website:
Other readings (see WEEKLY schedule)

Method of Assessment:

This is a hands-on course administered in a computer lab. Students learn by attending, and participating
in lectures and lab sessions, and by completing the assigned readings and the programming/research
assignments. Students will be expected to conduct a small-scale independent research project involving
collection, processing, statistical analysis and presentation of primary data.
EVALUATION CRITERIA: for homeworks and in-class assignments, if you cannot submit as a pair, submit
individually. Same grade for both. If there is one name, there is one grade - no exceptions.
26% - homework: 2 x 5% + 4 x 4% (group of 1 or 2) submission deadlines TBA
4% - in class assignments: 4 x 1% (group of 1 or 2) submit online no later than one hour after class to
the dated Dropbox on Avenue. The Dropbox will close at a later time.
5% - reading assignments (250 words): 2 x 2.5% (graduate+: 2 x 2.5%, 500 words)
15% - midterm, 1.5 hours:
5% (20 MCQ, closed books) on InC1-2, HM0-2, 10% (open books) on W1-5
35% - independent research project (graduate 45%): a study of a phonetic, morphological, lexical,
semantic, syntactic or discourse phenomenon using one or several corpora as a data source, tools and
statistical techniques of corpus analysis.
15% - final exam, 1.5 hours: 15% (only for undergraduate students)
5% (20 MCQ, closed books) on InC3-5, HW3-5, 10% (open books) on W6-13

Policy on Missed Work, Extensions, and Late Penalties:

Assignments submitted within 48 hours of the due date are subject to a 25% penalty. Assignments
submitted 2 to 7 days after the due date are subject to a 50% penalty. Assignments are not accepted
more than 7 days after the due date. Exceptions to due dates and exam dates can be made only with
medical documentation. Submit your medical documentation to the office of the Dean of your Faculty
using the appropriate forms. Note that the University's policy on medical notes is subject to change at
any time in the event of a health emergency on campus.

Please Note the Following Policies and Statements:

Academic Dishonesty

You are expected to exhibit honesty and use ethical behaviour in all aspects of the learning process. Academic credentials you earn are rooted in principles of honesty and academic integrity.

Academic dishonesty is to knowingly act or fail to act in a way that results or could result in unearned academic credit or advantage. This behaviour can result in serious consequences, e.g. the grade of zero on an assignment, loss of credit with a notation on the transcript (notation reads: "Grade of F assigned for academic dishonesty"), and/or suspension or expulsion from the university.

It is your responsibility to understand what constitutes academic dishonesty. For information on the various types of academic dishonesty please refer to the Academic Integrity Policy, located at

The following illustrates only three forms of academic dishonesty:

  1. Plagiarism, e.g. the submission of work that is not one’s own or for which other credit has been obtained.
  2. Improper collaboration in group work.
  3. Copying or using unauthorized aids in tests and examinations.

Email correspondence policy

It is the policy of the Faculty of Humanities that all email communication sent from students to instructors (including TAs), and from students to staff, must originate from each student’s own McMaster University email account. This policy protects confidentiality and confirms the identity of the student.  Instructors will delete emails that do not originate from a McMaster email account.

Modification of course outlines

The University reserves the right to change dates and/or deadlines etc. for any or all courses in the case of an emergency situation or labour disruption or civil unrest/disobedience, etc. If a modification becomes necessary, reasonable notice and communication with the students will be given with an explanation and the opportunity to comment on changes. Any significant changes should be made in consultation with the Department Chair.

McMaster Student Absence Form (MSAF)

In the event of an absence for medical or other reasons, students should review and follow the Academic Regulation in the Undergraduate Calendar Requests for Relief for Missed Academic Term Work. Please note these regulations have changed beginning Fall 2015. You can find information at If you have any questions about the MSAF, please contact your Associate Dean's office.

Academic Accommodation of Students with Disabilities

Students who require academic accommodation must contact Student Accessibility Services (SAS) to make arrangements with a Program Coordinator. Academic accommodations must be arranged for each term of study. Student Accessibility Services can be contacted by phone 905-525-9140 ext. 28652 or e-mail For further information, consult McMaster University's Policy for Academic Accommodation of Students with Disabilities.

Academic Accommodation for Religious, Indigenous and Spiritual Observances

Students requiring academic accommodation based on religion and spiritual observances should follow the procedures set out in the Course Calendar or by their respective Faculty. In most cases, the student should contact his or her professor or academic advisor as soon as possible to arrange accommodations for classes, assignments, tests and examinations that might be affected by a religious holiday or spiritual observance.

Topics and Readings:

W1 Jan 6 Corpus Linguistics (CL) RA1, HW0 (W1)
W2 Jan 13 Corpus analysis 1 RA1 due, HW0 due, HW1 (W2-3, R1.1-2) RA1
W3 Jan 20 Corpus analysis 2 RA1+, InC1 R1.1
W4 Jan 27 Corpora and lexical studies HW1 due, HW2 (W4-5, R2.1-2) R2.1
W5 Feb 3 Corpora and visualization RA1+ due, InC2 R2.2, RA1+
W6 Feb 10 Midterm, CL with R 1 HW2 due, RA2 RA2
W7 Feb 17 CL with R 1 RA2 due, HW3 (W6-7, R3.1-2) R3.1 R3.2
W8 Feb 24 Recess
W9 Mar 3 CL with R 2 HW3 due, HW4 (W9-10, R4.1-3) R4.1 R4.2
W10 Mar 10 CL with R 3 RA2+, InC3 R4.3
W11 Mar 17 Statistics with R 1 HW4 due, HW5 (W11-13, R5.1-3) R5.1
W12 Mar 24 Statistics with R 2 Final paper proposal due R5.2
W13 Mar 31 Statistics with R 3 RA2+ due, InC4 R5.3, RA2+
HW5 due Apr 7, Final Exam (April 11-27), Final paper due April 21
RA1 “What is Corpus Linguistics“ RA1+ (graduate, one article in computational linguistics, TBD)
R1.1 Gries 2.1, 2.2 (Corpora, Frequency lists)
R2.1 Gries 2.3 2.4 (colloc., conc.) R2.2 Gries 3.1-5 (vectors, factors, data frames, lists)
RA2 TBD RA2+ (graduate, one article in machine or statistical learning, TBD)
R3.1 Gries 3.6-3.8 (programming, strings, files) R3.2 Gries 4.1 (frequency lists)
R4.1 Gries 4.2 (conc.) R4.2 Gries 4.3 (colloc.) R4.3 Gries 4.4-4.5 (multi-tiered corpora, unicode)
R5.1 Gries 5.1-5.2 (statistical thinking, categorical) R5.2 Gries 5.3-5.4 (interval/ratio, plots)
R5.3 Gries 5.5 5.6 (reporting, case studies)

Other Course Information:

The instructor and university reserve the right to modify elements of the course during the term. The
university may change the dates and deadlines for any or all courses in extreme circumstances. If either
type of modification becomes necessary, reasonable notice and communication with the students will be
given with explanation and the opportunity to comment on changes. It is the responsibility of the
student to check their McMaster email and course websites weekly during the term and to note any
It is the policy of the Faculty of Humanities that all email communication sent from students to
instructors must originate from the student’s own McMaster University email account. This policy
protects confidentiality and confirms the identity of the student. Instructors will delete emails that do
not originate from a McMaster email account.
In this course we will be using the Avenue to Learn. Students should be aware that, when they access
the electronic components of this course, private information such as first and last names, user names
for the McMaster e-mail accounts, and program affiliation may become apparent to all other students in
the same course. The available information is dependent on the technology used. Continuation in this
course will be deemed consent to this disclosure. If you have any questions or concerns about such
disclosure please discuss this with the course instructor. Monitor this website at least once a week for
news and updates: your responsibility.
Students with special needs who require accommodations must register with the Centre for Student