Tel-Aviv University, School of Computer Science
Leveraging Big Data, Fall 2013/2014
Time & Location
Mondays 16:00-19:00, Sherman Building Room 105 (Life Sciences building)
Course Outline
The purpose of this class is to introduce some key tools and techniques that are used to leverage massive data and are currently not covered in the basic curriculum.
The class meets once a week for a 3 hour lecture. Pointers and notes/slides for the material covered in each lecture will be posted on this Web page.
Prerequisities
Completion of all first-year courses (linear-algebra, calculus, probability), programming, and data structures and algorithms. The class is open to graduate students and third-year undergraduates.
Instructors
-
Edith Cohen, Visiting
Professor
-
Amos Fiat, Professor
-
Haim Kaplan, Professor
-
Tova Milo, Professor
Office hours:
By appointment { edco, fiat, haimk, milo } AT cs.tau.ac.il
Course Workload and Grading
2-4 Problem sets/projects (some require programming) 50%
Final exam 50%
Problem Sets
Submission Instructions:
Please submit in pdf format to the email address taubigdata (at) gmail.com.Email title should include HWx and your name -- x= HW number (1,2,3).
File name: firstname_lastname_HWx.pdf
Full name and ID to be part of title.
-
Homework1
Solutions:
Asaf Ezra ,
Tal Saiag
Posted October 31, 2013. Due November 17, 2013.
EXTENSION: Due to multiple requests, due date is now on or before Thursday, November 21.
Work submitted by November 17 will recieve 8 points extra credit.
If you have already handed in the assignment and want to use the extra time to work on it more, you can of course resubmit, but loose the bonus points. -
Homework2
Solutions:
Guy Lev
Posted December 4, 2013. Due December 29, 2013.
-
Homework3
Solutions:
Barak Cohen ,
Asaf Ezra
Posted January 8, 2014. Due January 27, 2013.
EXTENSION: Due to multiple requests, due date is now on or before 18:00 (6pm) January 31.
Practice Final Exam
Mock-up finalForum
forumInitial login is with your TAU user name and password, then register to create an account.
Please post questions/inquiries that are of general interest to students
Schedule and slides
Topics (tentative)
- Data Streams, Synopsis structures, Min-Hash sketches/samples (2-3 sessions): Approximate distinct counting, frequent items/heavy hitters, similarity estimation, sliding windows and time-decay on streams
- Mining large graphs (2-3 sessions), with focus on social networks and web graphs. Centrality, similarity, all-distances sketches, community detection, link analysis, spectral techniques.
- Multi-dimensional data and more data mining (4-6 sessions): dimensionality reduction,locality-sensitive hashing, spectral techniques (Latent Semantic Analysis), Principal components analysis, collaborative filtering and recommendation systems, clustering, summarization (sketching and sampling), mining association rules ("correlation" rules).
- Map-reduce, Pig Latin, and NoSQL (2 sessions)
- Data Sharing Incentives and Markets (1-2 sessions)
- Privacy Issues, Differential privacy (1 session)