Instructors: Prof.
Shou-de Lin
Prof. Chih-jen Lin
Prof. Hsuan-tien Lin
Classroom: CSIE 105
Meeting Time: Wed 9:10am~12:00
Office Hour: After class or by appointment
TA: Tim Kuo <d97944007@csie.ntu.edu.tw>, Todd G. McKenzie <d97041@csie.ntu.edu.tw>, Chen-Tse Tsai<ctse.tsai@gmail.com>
Course Description:
While it is possible to learn a variety of machine learning and data mining theories from lectures or books, applying them accurately and efficiently to the real-world data is a completely different story. Very often data miners have to suffer a painful process of trial and error due to lack of experience. Dealing with the practical issues on data is rather an art than science, nevertheless, in this course we try to build up our experiences from tackling a real-world problem proposed as the ongoing competitions in data mining society. In particular, we aim at attending the ACM KDD CUP 2011, which is currently the most prestigious data mining competition. We expect to run this course in an interactive way, so students must discuss with the lecturers and other classmates about their findings as well as the problems they encountered every week.
Pre-requisite courses:
You have to take at least one of the following
courses (two or more is even better):
Machine
Learning
Statistical Artificial Intelligence
Optimization and Machine Learning
Courses Format and Loading:
You need to implement different kinds of intelligent
systems for the competition and run extensive experiments to verify them. You
will compete with the other students in the class as well as other teams
all over the world in KDD CUP. Note that this is an extremely intensive course.
The students will have WEEKLY presentation about your progress in the previous
week. Since the estimated time spent on this course is at least 10 hours per
week, we in general need an approval from your advisor to attend it if you are a
graduate student.
Grades:
It will depend on your weekly performance (judged by your efforts, novelty, and
presentation), and weighted by how much you contribute to the overall
competition results.
Syllabus:
This course started from Nov 30, 2010 until June 30, 2011 ( you need to commit until June 30 if you want to participate this course). If you have interests to take this course, you need to send an email (sdlin@csie.ntu.edu.tw) to the instructor ASAP.
Date | Topics | Notes |
30-Nov | Course Description & Yahoo Music Data Description | |
14-Dec | Overview of Recommendation Systems | |
28-Dec | Netflix Winners' Reports | |
18-Jan | Model-based CF approach | |
25-Jan | Random-walk based CF approach | |
15-Feb | Optimization for CF | |
23-Feb | First Class (overview of the class and what we have done so far) | Working on Yahoo music dataset |
2-Mar | TBD afterwards | Working on Yahoo music dataset |
9-Mar | Working on Yahoo music dataset | |
16-Mar | 3/15: competition begins | |
23-Mar | working on KDDCUP 2011 dataset | |
30-Mar | ||
6-Apr | ||
13-Apr | ||
20-Apr | ||
27-Apr | ||
4-May | ||
11-May | ||
18-May | ||
25-May | ||
1-Jun | ||
8-Jun | ||
15-Jun | ||
22-Jun | ||
30-Jun | Competition Ends |
FAQ (modified from last years FAQ):
Q: I am interested in learning data mining and machine learning methods. Is this course the place to go?This course aims at attending data mining competitions (i.e., KDD CUP). So this is not a place for you to learn basic materials of machine learning and data mining. We suppose you already know the basics. Therefore we require the participants have taken some preliminary courses (See above).
To make sure we provide sufficient supports to every student in the class, we plan to take no more than 25 students in this class. If there are more than 25 students express the interests to join, we would select based on their prerequisite knowledge and intension. Students form teams (3 person each team) in this class.
In general the answer is no, because you will not learn a lot without getting your hands dirty in this class. We don't want to waste your time and we hope every member in the class indeed spends significant amount of efforts on the competition.
Please anticipate spending at least 10 hours per week on this course. Simply put this: the more efforts you put in, the better results you will get. When your fellow classmates spend (or have to spend) lots of time and efforts on this, you will not be competitive if you don't.
We have a homepage (as you are reading it). However, the course wiki will be the main place to give details. You will see our progress on the competitions there. Every enrolled student will get a wiki account.
You have one single homework (that is, the music recommendation problem in KDDCUP 2011) throughout the whole course. You need to give a 20 min presentation on your progress EVERY WEEK.
No, you should work as hard as others. We will find a way to evaluate each individual student's performance.
Failed approaches indeed show something. You should frankly present what you have tried. Competition results are related but not completely related to your final scores. We encourage creative thinking and out-of-the-box ideas. Novel ideas will be rewarded even if it is not proven by you to be useful.
In general the department's machines (e.g., 217) should be enough. We will also provide some machines we purchased for this competition.
It depends on lots of factors and the instructors will decide what is the best strategy for submission when the time is closer. It is possible that we will only allow selective teams to submit results and/or to form a new ensemble of teams for submission. In any case, every team's contribution (with ideas and either positive or negative results) would be fairly acknowledged. At the current point, the policy is that no individuals nor teams may submit their results to KDD Cup 2011 unless granted by the instructors in advance. Violating the policy would lead to serious punishments.
Of course. You pass only if you work hard enough. (Similarly, in industry, underperformers will be fired).
Well, we were not perfect, of
course, but we did ok.
This year's performance will be considered satisfiable if similar to the past
year.