STA414S/2104S: Statistical Methods for Data Mining and Machine Learning

January - April, 2009

Apr. 7, 2009: Now that you have seen the Final Hw, several people are asking to do a project instead. Not fair! But also, the relative difficulty of the two options will be considered in the grading.

Apr.2, 2009: Final HW Due on April 16, before 5 p.m.

Meets in Sid Smith, SS1088, Tuesday 12-2, Thursday 12-1.

Course Information
This course will consider topics in statistics that have played a role in the development of techniques for data mining and machine learning. We will cover linear methods for regression and classification, nonparametric regression and classification methods, generalized additive models, aspects of model inference and model selection, model averaging and tree based methods.

Prerequisite: Either STA 302H (regression) or CSC 411H (machine learning). CSC108H was recently added: this is not urgent but you must be willing to use a statistical computing environment such as R or Matlab.

Office Hours: Tuesdays, 3-4; Thursdays, 2-3; or by appointment.

Textbook: Hastie, Tibshirani and Friedman. The Elements of Statistical Learning. Springer-Verlag.

Book web page

Course evaluation: Three regular homework sets: 60%. Final homework or project: 40%.

For the final homework you will be expected to work alone, on all material related to the homework. It will be handed out on March 31, and due on April 16. For the three regular homework sets you are welcome to discuss the material with others, but submitted work must be your own, and you must acknowledge all sources for code. Graduate students registered under STA2104S may submit a project in lieu of the final homework: this will be a review of a topic not covered in class, drawing on textbook and journal sources. Suggestions for topics will be given during the course.

Computing: I will refer to, and provide explanations for, the R computing environment. You are welcome to use some other package if you prefer. There are many online resources for R, including:

Material from lectures