Research

My academic research focuses on mathematical foundations and algorithms dealing with large amounts of data. Topics include streaming algorithms, numerical linear algebra, machine learning, fast dimensionality reduction, vector search, clustering, and data mining. I taught these topics at Tel Aviv university and at Princeton.

I received my B.Sc in Physics and Computer Science from Tel Aviv University and my Ph.D. in Computer Science from Yale University. After that, I was a Postdoctoral fellow at Yale in the Program in Applied Mathematics. I taught Advanced CS courses at Tel Aviv University and at Princeton.

Academic Service

I served on academic review committees for conferences and journals including NeurIPS, SIGIR, AISTATS, ICML, KDD, WSDM, WWW, COLT, ICML, ESA, SODA, and FOCS.

Highlights

Nearly Optimal Attention Coresets

Edo Liberty, Alexandr Andoni, Eldar Kleiner

tl;dr: This new paper nearly resolves a critical step in KV-caches compression, allowing model serving of larger contexts or more sessions on the same infrastructure. It proves any KV cache (K,V) contains a subset (K',V') of size roughly O(sqrt(d)e^(r)/eps)) such that ||Attn(q,K,V) - Attn(q,K',V')|| < eps for all queries ||q|| < r.
Amazon SageMaker Elastic Algorithms

Edo Liberty, Zohar Karnin, Bing Xiang, Laurence Rouesnel, Baris Coskun, Ramesh Nallapati, Julio Delgado Mangas, Amir Sadoughi, Yury Astashonok, Piali Das, Can Balioglu, Saswata Chakravarty, Madhav Jha, Philip Gautier, Tim Januschowski, Valentin Flunkert, David Arpin, and Alex Smola.

SIGMOD 2020

tl;dr: The culmination of more than two years of work, this paper describes the algorithms and distributed architecture behind Amazon SageMaker's elastic ML algorithms.
Optimal Quantile Approximation in Streams

Zohar Karnin, Kevin Lang, Edo Liberty

FOCS 2016

tl;dr: This paper describes the KLL algorithm. It resolves one of the longest standing and basic problems in the streaming algorithms literature. Namely, optimally approximating ranks and quantiles in streaming data. It is implemented in many database systems including BigQuery and Apache DataSketches. [slides] [Python code]
Relative Error Streaming Quantiles

Graham Cormode, Zohar Karnin, Edo Liberty, Justin Thaler, Pavel Veselý

PODS 2021, Best paper award - 2022 ACM SIGMOD Research Highlight Award

tl;dr: This (finally) solves a problem I wanted to solve for years. Namely, how to efficiently sketch quantiles with relative errors. This is critical for large scale performance monitoring, for example.
Simple and Deterministic Matrix Sketches

Edo Liberty

KDD 2013, Best paper award

tl:dr: This paper introduced frequent-directions (FD), an incredibly simple, practically efficient, and theoretically optimal algorithm for approximating the covariance of vector streams. The followup work Even Simpler Deterministic Matrix Sketching later simplified the proof to a single line. Frequent-directions is implemented in many AI platforms, typically for approximating the Hessian. It is also widely taught in CS classes. [slides], [talk], [bib].

See Frequent-Directions in python and experiments repo that Mina Ghashami and I made public.
Threading Machine Generated Email

Nir Ailon, Zohar Karnin, Edo Liberty, Yoelle Maarek

Best paper award TechPulse 2012, and WSDM 2013

tl:dr the paper shows how to use sketches to find causality relations between billions of events using trillions of observations. [bib]
An Almost Optimal Unrestricted Fast Johnson-Lindenstrauss Transform

Nir Ailon, Edo Liberty

SODA 2011, Best paper award

tl:dr The main result of my PhD work on fast dimension reduction. Specifically, matching the optimal target dimension and running time of the Johnson-Lindenstrauss lemma with fast random Hadamard based projection algorithms. Random Hadamard transformations are, by now, a workhorse of vector databases, quantization, and model compression. [bib]
Coresets, Discrepancy, and Sketches in Machine Learning

Zohar Karnin and Edo Liberty

COLT 2019

tl;dr: This ML-theory paper shows that many types of machine learning models have much smaller coresets than those previously known. As a special case of the general result, it resolves the open problem regarding the coreset complexity of gaussian density estimation.

Conference Publications

Nearly Optimal Attention Coresets

Edo Liberty, Alexandr Andoni, Eldar Kleiner

Arxiv 2026
Even Simpler Deterministic Matrix Sketching

Edo Liberty

Arxiv 2022
Projective Clustering Product Quantization

Aditya Krishnan, Edo Liberty

Arxiv 2021
Relative Error Streaming Quantiles

Graham Cormode, Zohar Karnin, Edo Liberty, Justin Thaler, Pavel Veselý

Best paper at PODS 2021
From the lab to production: A case study of session-based recommendations in the home-improvement domain

Pigi Kouki, Ilias Fountalis, Nikolaos Vasiloglou, Xiquan Cui, Edo Liberty, Khalifeh Al Jadda

RecSys 2020
Amazon SageMaker Elastic Algorithms

Edo Liberty, Zohar Karnin, Bing Xiang, Laurence Rouesnel, Baris Coskun, Ramesh Nallapati, Julio Delgado Mangas, Amir Sadoughi, Yury Astashonok, Piali Das, Can Balioglu, Saswata Chakravarty, Madhav Jha, Philip Gautier, Tim Januschowski, Valentin Flunkert, David Arpin, and Alex Smola.

SIGMOD 2020
Streaming Quantiles Algorithms with Small Space and Update Time

Nikita Ivkin, Edo Liberty, Kevin Lang, Zohar Karnin, Vladimir Braverman

Sensors 2022
Coresets, Discrepancy, and Sketches in Machine Learning

Zohar Karnin and Edo Liberty

COLT 2019
Asymmetric Random Projections

Nicholas Ryder, Zohar Karnin, and Edo Liberty

ARXIV 2019
Proxquant: Quantized neural networks via proximal operators

Yu Bai, Yu-Xiang Wang, Edo Liberty

ICLR 2019
A High-Performance Algorithm for Identifying Frequent Items in Data Streams

Daniel Anderson, Pryce Bevan, Kevin Lang, Edo Liberty, Lee Rhodes, Justin Thaler

IMC 2017
Greedy Minimization of Weakly Supermodular Set Functions

Edo Liberty, Maxim Sviridenko, Approx 2017

[slides]

[bib]
Optimal Quantile Approximation in Streams

Zohar Karnin, Kevin Lang, Edo Liberty

FOCS 2016

[slides]
A Short Proof for Gap Independence of Simultaneous Iteration

Edo Liberty

Arxiv 2016

Professor Wenjian Yu of Tsinghua University pointed out that a the square was omitted from (1+eps) in equation 2. The proof is still correct after a straight forward correction. This will be corrected in the next version.
Efficient Frequent Directions Algorithm for Sparse Matrices

Mina Ghashami, Edo Liberty, Jeff M. Phillips

KDD 2016
Stratified Sampling meets Machine Learning

Kevin Lang, Edo Liberty, Konstantin Shmakov

ICML 2016 [slides]
Space Lower Bounds for Itemset Frequency Sketches

Edo Liberty, Michael Mitzenmacher, Justin Thaler, Jonathan Ullman

PODS 2016

[bib]
An Algorithm for Online K-Means Clustering

Edo Liberty, Ram Sriharsha, Maxim Sviridenko

ALENEX 2016 [bib]
Online PCA with Spectral Bounds

Zohar Karnin, Edo Liberty

COLT 2015

[bib]

(see also 5 minute video lecture)
Online Principal Component Analysis

Christos Boutsidis, Dan Garber, Zohar Karnin, Edo Liberty

SODA 2014 [bib]
Near-optimal Distributions for Data Matrix Sampling

Dimitris Achlioptas, Zohar Karnin, Edo Liberty

NIPS 2013 [bib]
Simple and Deterministic Matrix Sketches

Edo Liberty (see slides and experimental results in json format)

Also, here is talk I gave at the Simons Institute about this.

Best paper at KDD 2013 [bib]

See also frequent direction git repo by Mina Ghashami and myself.
Threading Machine Generated Email

Nir Ailon, Zohar Karnin, Edo Liberty, Yoelle Maarek

Best paper at TechPulse 2012 and WSDM 2013 [bib]
Unsupervised SVMs: On the complexity of the Furthest Hyperplane Problem

Zohar Karnin, Edo Liberty, Shachar Lovett, Roy Schwartz, and Omri Weinstein

COLT 2012 [Slides] [bib]
Framework and Algorithms for Network Bucket Testing

Liran Katzir, Edo Liberty, and Oren Somekh

WWW 2012 [bib]
An Almost Optimal Unrestricted Fast Johnson-Lindenstrauss Transform

Nir Ailon, Edo Liberty

Best paper at SODA 2011 [bib]
Improved Approximation Algorithms for Bipartite Correlation Clustering

Nir Ailon, Noa Avigdor-Elgrabli, Edo Liberty, Anke van Zuylen

ESA 2011 [slides] [bib]
Automatically Tagging Email by Leveraging Other Users' Folders

Yehuda Koren, Edo Liberty,Yoelle Maarek, and Roman Sandler

KDD 2011 [bib]
Estimating Sizes of Social Networks via Biased Sampling

Liran Katzir, Edo Liberty, and Oren Somekh

WWW 2011 [bib]
Inverted Index Compression via Online Document Routing

Gal Lavee, Ronny Lempel, Edo Liberty, and Oren Somekh

WWW 2011 [bib]
Correlation Clustering Revisited: The "True" Cost of Error Minimization Problems

Nir Ailon, Edo Liberty

ICALP 2009 [bib]
Dense Fast Random Projections and Lean Walsh Transforms

Edo Liberty, Nir Ailon, Amit Singer

RANDOM 2008 [bib]
Fast Dimension Reduction Using Rademacher Series on Dual BCH Codes

Nir Ailon, Edo Liberty

SODA 2008 [bib]

Journal Publications

An Approximate Algorithm for Maximum Inner Product Search over Streaming Sparse Vectors

Sebastian Bruch, Franco Maria Nardini, Amir Ingber, Edo Liberty

TOIS - ACM Transactions on Information Systems 2023
Frequent Directions: Simple and Deterministic Matrix Sketching

Mina Ghashami, Edo Liberty, Jeff M. Phillips, David P. Woodruff

[bib]
Estimating Sizes of Social Networks via Biased Sampling

Liran Katzir, Edo Liberty, Oren Somekh, Ioana A. Cosma

Journal of Internet Mathematics [bib]
An Almost Optimal Unrestricted Fast Johnson-Lindenstrauss Transform

Nir Ailon, Edo Liberty

Transactions on Algorithms [bib]
Improved Approximation Algorithms for Bipartite Correlation Clustering

Nir Ailon, Noa Avigdor-Elgrabli, Edo Liberty, and Anke van Zuylen

SIAM Journal on Computing [bib]
Unsupervised SVMs: On the complexity of the Furthest Hyperplane Problem

Zohar Karnin, Edo Liberty, Shachar Lovett, Roy Schwartz and Omri Weinstein

JMLR 2012 (Journal of Machine Learning Research) [bib]
Dense Fast Random Projections and Lean Walsh Transforms,

Edo Liberty, Nir Ailon, Amit Singer

DCG 2010 (Discrete and Computational Geometry) [bib]
The Mailman algorithm: a note on matrix vector multiplication

Edo Liberty, Steven Zucker

IPL 2009 (Information Processing Letters) [bib]
Fast Dimension Reduction Using Rademacher Series on Dual BCH Codes

Nir Ailon, Edo Liberty

DCG 2008 (Discrete and Computational Geometry) [bib]
A fast randomized algorithm for the approximation of matrices

Edo Liberty, Franco Woolfe, Vladimir Rokhlin, and Mark Tygert

ACHA 2008 (Applied and Computational Harmonic Analysis) [bib]
Randomized algorithms for the low-rank approximation of matrices,

Edo Liberty, Franco Woolfe, Per-Gunnar Martinsson, Vladimir Rokhlin, and Mark Tygert.

PNAS 2007 (Proceedings of the National Academy of Sciences) [bib]
Electrons and Phonons on the Square Fibonacci Tiling

Roni Ilan, Edo Liberty, Shahar Even-Dar Mandel, and Ron Lifshitz.

Ferroelectrics 2004.

Thesis

Accelerated Dense Random Projections

PhD Thesis. See also Talk slides