Research

My academic research focuses on mathematical foundations and algorithms dealing with large amounts of data. Topics include streaming algorithms, numerical linear algebra, machine learning, fast dimensionality reduction, vector search, clustering, and data mining. I taught these topics at Tel Aviv university and at Princeton.

I received my B.Sc in Physics and Computer Science from Tel Aviv University and my Ph.D. in Computer Science from Yale University. After that, I was a Postdoctoral fellow at Yale in the Program in Applied Mathematics. I taught Advanced CS courses at Tel Aviv University and at Princeton.

Academic Service

I served on academic review committees for conferences and journals including NeurIPS, SIGIR, AISTATS, ICML, KDD, WSDM, WWW, COLT, ICML, ESA, SODA, and FOCS.

Highlights

  • Nearly Optimal Attention Coresets

    Edo Liberty, Alexandr Andoni, Eldar Kleiner

    tl;dr: This new paper nearly resolves a critical step in KV-caches compression, allowing model serving of larger contexts or more sessions on the same infrastructure. It proves any KV cache (K,V) contains a subset (K',V') of size roughly O(sqrt(d)e^(r)/eps)) such that ||Attn(q,K,V) - Attn(q,K',V')|| < eps for all queries ||q|| < r.

  • Amazon SageMaker Elastic Algorithms

    Edo Liberty, Zohar Karnin, Bing Xiang, Laurence Rouesnel, Baris Coskun, Ramesh Nallapati, Julio Delgado Mangas, Amir Sadoughi, Yury Astashonok, Piali Das, Can Balioglu, Saswata Chakravarty, Madhav Jha, Philip Gautier, Tim Januschowski, Valentin Flunkert, David Arpin, and Alex Smola.

    SIGMOD 2020

    tl;dr: The culmination of more than two years of work, this paper describes the algorithms and distributed architecture behind Amazon SageMaker's elastic ML algorithms.

  • Optimal Quantile Approximation in Streams

    Zohar Karnin, Kevin Lang, Edo Liberty

    FOCS 2016

    tl;dr: This paper describes the KLL algorithm. It resolves one of the longest standing and basic problems in the streaming algorithms literature. Namely, optimally approximating ranks and quantiles in streaming data. It is implemented in many database systems including BigQuery and Apache DataSketches. [slides] [Python code]

  • Relative Error Streaming Quantiles

    Graham Cormode, Zohar Karnin, Edo Liberty, Justin Thaler, Pavel Veselý

    PODS 2021, Best paper award - 2022 ACM SIGMOD Research Highlight Award

    tl;dr: This (finally) solves a problem I wanted to solve for years. Namely, how to efficiently sketch quantiles with relative errors. This is critical for large scale performance monitoring, for example.

  • Simple and Deterministic Matrix Sketches

    Edo Liberty

    KDD 2013, Best paper award

    tl:dr: This paper introduced frequent-directions (FD), an incredibly simple, practically efficient, and theoretically optimal algorithm for approximating the covariance of vector streams. The followup work Even Simpler Deterministic Matrix Sketching later simplified the proof to a single line. Frequent-directions is implemented in many AI platforms, typically for approximating the Hessian. It is also widely taught in CS classes. [slides], [talk], [bib].

    See Frequent-Directions in python and experiments repo that Mina Ghashami and I made public.

  • Threading Machine Generated Email

    Nir Ailon, Zohar Karnin, Edo Liberty, Yoelle Maarek

    Best paper award TechPulse 2012, and WSDM 2013

    tl:dr the paper shows how to use sketches to find causality relations between billions of events using trillions of observations. [bib]

  • An Almost Optimal Unrestricted Fast Johnson-Lindenstrauss Transform

    Nir Ailon, Edo Liberty

    SODA 2011, Best paper award

    tl:dr The main result of my PhD work on fast dimension reduction. Specifically, matching the optimal target dimension and running time of the Johnson-Lindenstrauss lemma with fast random Hadamard based projection algorithms. Random Hadamard transformations are, by now, a workhorse of vector databases, quantization, and model compression. [bib]

  • Coresets, Discrepancy, and Sketches in Machine Learning

    Zohar Karnin and Edo Liberty

    COLT 2019

    tl;dr: This ML-theory paper shows that many types of machine learning models have much smaller coresets than those previously known. As a special case of the general result, it resolves the open problem regarding the coreset complexity of gaussian density estimation.

Conference Publications

Journal Publications