COS 597A - fall 2023: Long Term Memory in AI - Vector Search and Databases, Princeton University. Co-instructed with Matthijs Douze
Long Term Memory is a foundational capability in the modern AI Stack. At their core, these systems use vector semantics. Vector search is also a basic tool for systems that manipulate large collections of media like search engines, knowledge bases, content moderation tools, recommendation systems, etc. As such, the discipline lays at the intersection of Artificial Intelligence and Database Management Systems. This course covers the theoretical foundations and practical implementation of vector search applications, algorithms, and systems.
0368-3248-01-Algorithms in Data Mining - Tel Aviv University 2011-2013
The course covered algorithmic tools for data mining massive data sets with an emphasis on randomization and streaming algorithms. The class notes are provided for free academic use. The algorithms are presented in their most basic form for didactic reasons. The proofs are simple and self contained. Each is presentable in roughly one long frontal session. An advanced undergraduate or graduate student with some hands-on experience in probability, linear algebra, algorithms, and combinatorics should be able to follow this course.
Interns and students I've had the privilege to mentor, and where they are now:
Omri Weinstein - Columbia University
Noa Avigdor-Elgrabli - Yahoo Research
Roy Schwartz - The Technion Institute
Dan Garber - Haifa University
Nikita Ivkin - Amazon AI Labs
Mina Ghashami - Visa Research
Ofir Geri - Stanford University
Yu Bai - Salesforce Research
Nicholas Ryder - OpenAI
Aditya Krishnan - Johns Hopkins University
[2021] CMU Database seminar: The Pinecone Vector Database System
[2018] Amazon SageMaker: Infinitely Scalable Machine Learning Algorithms
[2018] Streaming Data Mining: Mergeable Summaries and the DataSketches Library
[2016] Online Data Mining.
[2011] Fast Random Projection
[Nov 2021] Andy Pavlo's CMU Database Seminar about The Pinecone Vector Database System.
[July 2020] Keynote at San Francisco Data Council
[March 2020] Keynote at the Future of Information and Communication Conference about the Benefits and Challenges of combining Deep Learning and Retrieval Tasks.
[June 2019] Keynote at the Time Series Workshop at ICML about streaming algorithms, Apache DataSketches, and some sneak preview on the new coreset results in machine learning. The slides are found here: Streaming algorithms, Apache DataSketches, and new results on coresets. Thank you Yuyang (Bernie) Wang Cheng Tang, Qi (Rose) Yu, Scott Yang and, Vitaly Kuznetsov for the invitation and for organizing this great workshop.
[Apr 2019] Keynote at the Southern Data Science Conference about streaming algorithms and the datasketches library. Thank you Khalifeh AlJadda for the opportunity and for putting together an outstanding conference. Kudos!
[Feb 2019] Keynote at ITA about coresets, discrepancy, and sketches in machine learning together with Dimitris Achlioptas, Ben Recht, and Chris Re. The slides are available here but the paper is yet unpublished.
[Nov 2018] SageMaker Algorithms at MLConf in San Francisco.
[Oct 2018] SageMaker Algorithms at the first ever Amazon Research day in Haifa. Thank you Yoelle Maarek and Liane Lewin for organizing this awesome event.
[Aug 2018] KDD Keynote on deep learning on AWS and SageMaker Algorithms with Alex Smola.
[Jun 2018] Keynote at TMA conference in Vienna. I talked at the TMA Experts Summit about SageMaker algorithm (presentation). Later, I gave a keynote and the TMA conference about data sketches and mergeble summaries (presentation).
[Jun 2017] Shonan - Japan Processing Big Data Streams workshop where I presented my work with Zohar Karnin and Kevin Lang on streaming quantiles. Vladimir Braverman, David Woodruff and Ke Yi did a wonderful job organizing it.
[May 2017] Amazon posted a blog post called In the Research Spotlight in which they interview me about my career and current efforts in AWS.
Correlation Clustering: from Theory to Practice
KDD 2014 Tutorial [slides]
[bib]
Streaming Data Mining
KDD 2012 tutorial on practical algorithms in mining streaming data; with
Jelani Nelson.
Fast Random Projections survey and new results,
SODA 2011 and IAS and Yale math seminars 2011.
Video of the talk at IAS available here.