Data Mining and Machine Learning: Theory and Practice
Our team from National Taiwan University wins KDD cup 2010
See the competition results.
Our paper,
talk slides at KDD cup 2010 workshop,
and more complete slides
A brief description of our approach:
The 19 students and one non-registered RA were split to seven groups.
Six groups expand features by various binarization and discretization
techniques. The resulting sparse feature sets are trained by logistic
regression (using LIBLINEAR).
One group condenses features so that the
number is less than 20. Then random forest is applied (using Weka).
Initial development was conducted on an internal split of training data
for training and validation. We identify some useful feature
combination. For the final submission, each group submits a few results
and TAs ensemble them by linear regression.
Course Details
Course Outline
While it is possible to learn a variety of classification, clustering
and other mining techniques from lectures or books, applying them
efficiently and accurately to the real-world data is a completely
different story. Very often a painful process of trial and error is
needed. While dealing with the practical issues on data is rather an
art than science, in this course, we try to gain experiences from
tackling some real-world problems proposed as the past or ongoing
competitions in machine learning or data mining society. In
particular, we aim at attending the ACM KDD CUP, which is currently
the most prestigious data mining competition. We expect to run this
course in an interactive way, so students must discuss with the
lecturers and other classmates about their findings as well as the
problems they encountered every week.
Course Format (tentative)
More details will be on the wiki.
- First three weeks: overview of machine learning
and data mining techniques by instructors
- Work on existing competitions
-
Work on KDD Cup 2010
Since this course is the first of its kind,
the setting may not be perfect. Your comments
and suggestions are welcome. Moreover,
your active participation will
help to make this course a success.
Exams
No exams
Grading
It will be based on your results and
presentations every week.
Last modified: Wed Mar 16 12:08:01 CST 2011