Class created given by Edo liberty.

Data Mining is concerned with efficiently extracting statistics, patterns, structures, or meanings from raw data. This task becomes hard when the amount of data is large, which is often the case in modern data sets. This course will survey modern algorithms, concepts, and data structures common to data mining in large data sets. We will try to cover, among other topics: data sampling, finding causal relations and frequent item sets, counting in data streams, ranking and sorting, approximating large matrix operations, dimensionality reduction and efficient searching in high dimensions. We will also discuss modern cluster architectures and computational models.

I recommend that students be familiar with probability theory, basic combinatorics, linear algebra, basic complexity theory, and traditional data structures, at least on at introductory level. The class will attempt to be self contained nonetheless.

Undergraduate students:

- Final exam (50% of the grade).
- Home assignments (50% of the grade).
- A project (for extra credit) is optional for students who are interested.

- No final exam.
- Home assignments (50% of the grade).
- Final project (50% of the grade). Should require roughly a week's worth of work, comparable to learning for and taking the final exam. These project can contain both theoretical and experimental elements.

http://groups.google.com/group/algorithms-in-data-mining-tau

- Exam 1: "Moed Alef"
- Exam 2: "Moed Beit"

- Lesson 1: Introduction presentation.
- Lesson 1: Probability recap class notes.
- Lesson 1: Mark and recapture class notes.
- Lesson 2: Sampling class notes.
- Lesson 2: Chernoff bound class notes by John Canny.
- Lesson 3: Frequent items in streams class notes.
- Assignment 1: Size of Trees and Approximate Histograms. Due Monday Nov 28th
- Lesson 4: Frequency Moments class notes.
- Lesson 5: Random Projections class notes.
- Lesson 6: Singular Value Decomposition.
- Assignment 2: Weak Random Projection. Due Monday Dec 26th. Solutions
- Lesson 7: Matrix Sampling and efficient rank-k approximation
- Lesson 7: Roman Versynin's proof of the matrix Chernoff bound by Ahlswede and Winter
- Lesson 8: Metric Embedding into random trees (Notes from a Metric Embedding class given by the late Avner Magen)
- Lesson 9: Nearest neighbor search, and kd-trees
- Lesson 10: Approximate nearest neighbor search, Locallity Sensitive Hashing
- Lesson 11: k-means clustering
- Assignment 2: Meta-algorithms and the power method. Due Monday Jan 30th
- Lesson 12: Very
short intro to spectral graph theory (partial and temporary class
notes). Based on class
notes by Dan Spielman.

- Lesson 13: Review of last year's exam and answers "Moed Alef".