Edo Liberty

edo@edoliberty.com, bio, cv

I am the founder of HyperCube Inc. A stealth-mode company in the machine learning and cloud computing space. We are hiring!

Until April 2019, I was the Director of Research at AWS managing Amazon AI Labs. The Lab built cutting-edge machine learning algorithms, systems, and services for AWS customers. We build parts of SageMaker, Kinesis, QuickSight, Amazon Elastic Search, Glue, Rekognition, DeepRacer, Personalize, Forecast, and other yet-to-be-released services from AWS.

Before that, I was a Senior Research Director at Yahoo and head of Yahoo's Research Lab in New York. We worked on building horizontal machine learning platforms and improving applications such as online advertising, search, security, media recommendation, email abuse prevention, and many more.

I received my B.Sc in Physics and Computer Science from Tel Aviv University and my Ph.D. in Computer Science from Yale University. After that, I was a Post-Doctoral fellow at Yale in the Program in Applied Mathematics.

My research focuses on solving well-known problems with new algorithms that are suitable for use on large amounts of data. Topics include fast dimensionality reduction, clustering, streaming algorithms, machine learning, large scale numerical linear algebra, and high dimensional geometry.

Some recorded talks


Keynotes and Tutorials

[March 2020] Keynote at the Future of Information and Communication Conference about the Benefits and Challenges of combining Deep Learning and Retrieval Tasks.

[June 2019] Keynote at the Time Series Workshop at ICML about streaming algorithms, Apache DataSketches, and some sneak preview on the new coreset results in machine learning. The slides are are found here: Streaming algorithms, Apache DataSketches, and new results on corsets. Thank you Yuyang (Bernie) Wang Cheng Tang, Qi (Rose) Yu, Scott Yang and, Vitaly Kuznetsov for the invitation and for organizing this great workshop.

[Apr 2019] Keynote at the Southern Data Science Conference about streaming algorithms and the datasketches library. Thank you Khalifeh AlJadda for the opportunity and for putting together an outstanding conference. Kudos!

[Feb 2019] Keynote at ITA about coresets, discrepancy, and sketches in machine learning together with Dimitris Achlioptas, Ben Recht, and Chris Re. The slides are available here but the paper is yet unpublished.

[Nov 2018] SageMaker Algorithms at MLConf in San Francisco.

[Oct 2018] SageMaker Algorithms at the first ever Amazon Research day in Haifa. Thank you Yoelle Maarek and Liane Lewin for organizing this awesome event.

[Aug 2018] KDD Keynote on deep learning on AWS and SageMaker Algorithms with Alex Smola.

[Jun 2018] Keynote at TMA conference in Vienna. I talked at the TMA Experts Summit about SageMaker algorithm (presentation). Later, I gave a keynote and the TMA conference about data sketches and mergeble summaries (presentation).

[Jun 2017] Shonan - Japan Processing Big Data Streams workshop where I presented my work with Zohar Karnin and Kevin Lang on streaming quantiles. Vladimir Braverman, David Woodruff and Ke Yi did a wonderful job organizing it.

[May 2017] Amazon posted a blog post called In the Research Spotlight in which they interview me about my career and current efforts in AWS.

Correlation Clustering: from Theory to Practice
KDD 2014 Tutorial [slides] [bib]

Streaming Data Mining
KDD 2012 tutorial on practical algorithms in mining streaming data; with Jelani Nelson.

Fast Random Projections survey and new results,
SODA 2011 and IAS and Yale math seminars 2011.
Video of the talk at IAS available here.

Some Code

Data Sketches: Thanks to Lee Rhodes and Jon Malkin we finally have a homebrew package and a new command line tool for unix based systems. Try the following on your mac:

>> brew tap DataSketches/sketches-cmd
>> brew install data-sketches
>> ds man

Frequent Directions: I have been asked to make some matrix sketching code available for a long time now. So, Mina Ghashami and I made some of our frequent direction git repo public. This code is distributed freely for academic use only. Please feel free to send pull requests.

Streaming Quantiles in Python: I'm excited about resolving one of the longests standing open problems in the streaming model. We designed an optimal algorithm for finding any approximate quantile of a stream of elements. See also the paper which Zohar Karnin, Kevin Lang, and myself posted on Arxiv.

Ezuzah Chrome Extention: Your browser is your door to the internet, why not hang a Mzuzah? (a digital art piece)

Past Interns

Omri Weinstein Columbia University
Noa Avigdor-Elgrabli Yahoo Research
Roy SchwartzThe Technion Institute
Dan Garber Haifa University
Nikita Ivkin Amazon AI Labs
Mina Ghashami Visa Research
Ofir Geri Stanford University
Yu Bai Salesforce Research
Nicholas Ryder OpenAI

Academic work



0368-3248-01-Data Mining - Tel Aviv University
The course covered algorithmic tools for data mining massive data sets.
It was given as a theory/algorithms class with and emphasis on randomization.
fall 2011
fall 2012
fall 2013

Conference Publications

From the lab to production: A case study of session-based recommendations in the home-improvement domain
Pigi Kouki, Ilias Fountalis, Nikolaos Vasiloglou, Xiquan Cui, Edo Liberty, Khalifeh Al Jadda
RecSys 2020

Relative Error Streaming Quantiles
Graham Cormode, Zohar Karnin, Edo Liberty, Justin Thaler, Pavel Vesely
ARXIV 2020

Amazon SageMaker Elastic Algorithms
Edo Liberty, Zohar Karnin, Bing Xiang, Laurence Rouesnel, Baris Coskun, Ramesh Nallapati, Julio Delgado Mangas, Amir Sadoughi, Yury Astashonok, Piali Das, Can Balioglu, Saswata Chakravarty, Madhav Jha, Philip Gautier, Tim Januschowski, Valentin Flunkert, David Arpin, and Alex Smola.

Streaming Quantiles Algorithms with Small Space and Update Time
Nikita Ivkin, Edo Liberty, Kevin Lang, Zohar Karnin, Vladimir Braverman
ARXIV 2019

Coresets, Discrepancy, and Sketches in Machine Learning
Zohar Karnin and Edo Liberty
COLT 2019

Assymetric Random Projections
Nicholas Ryder, Zohar Karnin, and Edo Liberty ARXIV 2019

Proxquant: Quantized neural networks via proximal operators
Yu Bai, Yu-Xiang Wang, Edo Liberty
ICLR 2019

A High-Performance Algorithm for Identifying Frequent Items in Data Streams
Daniel Anderson, Pryce Bevan, Kevin Lang, Edo Liberty, Lee Rhodes, Justin Thaler
IMC 2017

Greedy Minimization of Weakly Supermodular Set Functions
Edo Liberty, Maxim Sviridenko, Approx 2017 [slides][bib]

Optimal Quantile Approximation in Streams
Zohar Karnin, Kevin Lang, Edo Liberty
FOCS 2016 [slides]

A Short Proof for Gap Independence of Simultaneous Iteration
Edo Liberty
Arxiv 2016
Professor Wenjian Yu of Tsinghua University pointed out that a the square was omitted from (1+eps) in equation 2. The proof is still correct after a straight forward correction. This will be corrected in the next version.

Efficient Frequent Directions Algorithm for Sparse Matrices
Mina Ghashami, Edo Liberty, Jeff M. Phillips
KDD 2016

Stratified Sampling meets Machine Learning
Kevin Lang, Edo Liberty, Konstantin Shmakov
ICML 2016 [slides]

Space Lower Bounds for Itemset Frequency Sketches
Edo Liberty, Michael Mitzenmacher, Justin Thaler, Jonathan Ullman
PODS 2016 [bib]

An Algorithm for Online K-Means Clustering
Edo Liberty, Ram Sriharsha, Maxim Sviridenko
ALENEX 2016 [bib]

Online PCA with Spectral Bounds
Zohar Karnin, Edo Liberty
COLT 2015 [bib]
(see also 5 minute video letcure)

Online Principal Component Analysis
Christos Boutsidis, Dan Garber, Zohar Karnin, Edo Liberty
SODA 2014 [bib]

Near-optimal Distributions for Data Matrix Sampling
Dimitris Achlioptas, Zohar Karnin, Edo Liberty
NIPS 2013 [bib]

Simple and Deterministic Matrix Sketches
Edo Liberty (see slides and experimental results in json format)
Also, here is talk I gave at the Simons Institute about this.
Best paper at KDD 2013 [bib]
See also frequent direction git repo by Mina Ghashami and myself.

Threading Machine Generated Email
Nir Ailon, Zohar Karnin, Edo Liberty, Yoelle Maarek
Best paper at TechPulse 2012 and WSDM 2013 [bib]

Unsupervised SVMs: On the complexity of the Furthest Hyperplane Problem
Zohar Karnin, Edo Liberty, Shachar Lovett, Roy Schwartz, and Omri Weinstein
COLT 2012 [Slides] [bib]

Framework and Algorithms for Network Bucket Testing
Liran Katzir, Edo Liberty, and Oren Somekh
WWW 2012 [bib]

An Almost Optimal Unrestricted Fast Johnson-Lindenstrauss Transform
Nir Ailon, Edo Liberty
Best paper at SODA 2011 [bib]

Improved Approximation Algorithms for Bipartite Correlation Clustering
Nir Ailon, Noa Avigdor-Elgrabli, Edo Liberty, Anke van Zuylen
ESA 2011 [slides] [bib]

Automatically Tagging Email by Leveraging Other Users' Folders
Yehuda Koren, Edo Liberty,Yoelle Maarek, and Roman Sandler
KDD 2011 [bib]

Estimating Sizes of Social Networks via Biased Sampling
Liran Katzir, Edo Liberty, and Oren Somekh
WWW 2011 [bib]

Inverted Index Compression via Online Document Routing
Gal Lavee, Ronny Lempel, Edo Liberty, and Oren Somekh
WWW 2011 [bib]

Correlation Clustering Revisited: The "True" Cost of Error Minimization Problems
Nir Ailon, Edo Liberty
ICALP 2009 [bib]

Dense Fast Random Projections and Lean Walsh Transforms,
Edo Liberty, Nir Ailon, Amit Singer
RANDOM 2008 [bib]

Fast Dimension Reduction Using Rademacher Series on Dual BCH Codes
Nir Ailon, Edo Liberty
SODA 2008 [bib]

Journal Publications

Frequent Directions: Simple and Deterministic Matrix Sketching
Mina Ghashami, Edo Liberty, Jeff M. Phillips, David P. Woodruff
In review [bib]

Estimating Sizes of Social Networks via Biased Sampling
Liran Katzir, Edo Liberty, Oren Somekh, Ioana A. Cosma
Journal of Internet Mathematics [bib]

An Almost Optimal Unrestricted Fast Johnson-Lindenstrauss Transform
Nir Ailon, Edo Liberty
Transactions on Algorithms [bib]

Improved Approximation Algorithms for Bipartite Correlation Clustering
Nir Ailon, Noa Avigdor-Elgrabli, Edo Liberty, and Anke van Zuylen
SIAM Journal on Computing [bib]

Unsupervised SVMs: On the complexity of the Furthest Hyperplane Problem
Zohar Karnin, Edo Liberty, Shachar Lovett, Roy Schwartz and Omri Weinstein
JMLR 2012 (Journal of Machine Learning Research) [bib]

Dense Fast Random Projections and Lean Walsh Transforms,
Edo Liberty, Nir Ailon, Amit Singer
DCG 2010 (Discrete and Computational Geometry) [bib]

The Mailman algorithm: a note on matrix vector multiplication
Edo Liberty, Steven Zucker
IPL 2009 (Information Processing Letters) [bib]

Fast Dimension Reduction Using Rademacher Series on Dual BCH Codes
Nir Ailon, Edo Liberty
DCG 2008 (Discrete and Computational Geometry) [bib]

A fast randomized algorithm for the approximation of matrices
Edo Liberty, Franco Woolfe, Vladimir Rokhlin, and Mark Tygert
ACHA 2008 (Applied and Computational Harmonic Analysis) [bib]

Randomized algorithms for the low-rank approximation of matrices,
Edo Liberty, Franco Woolfe, Per-Gunnar Martinsson, Vladimir Rokhlin, and Mark Tygert.
PNAS 2007 (Proceedings of the National Academy of Sciences) [bib]

Electrons and Phonons on the Square Fibonacci Tiling
Roni Ilan, Edo Liberty, Shahar Even-Dar Mandel, and Ron Lifshitz.
Ferroelectrics 2004.

PHD Thesis

Accelerated Dense Random Projections
PhD Thesis. See also Talk slides


Classifying man versus machine generated email
Zohar Karnin, Guy Halawi, David Wajc, Edo Liberty

A System for Email sequence identification
Edo Liberty, Zohar Karnin, Yoelle Maarek, Natalie Aizenberg

Sponsored Apps Marketplace in eMail
Ronny Lempel, Yoelle Maarek, Edward Bortnikov, Edo Liberty

Mining Global Email Folders For Identifying Auto-folders tags
Vishwanath Ramarao, Andrei Broder, Idan Szpektor, Edo Liberty, Yehuda Koren, Mark Risher, and Yoelle Maarek

Email sequence identification
Edo Liberty ,Zohar Karnin, Yoelle Maarek

Mailing List Identification and Representation
Zohar Karnin, Michal Aharon, Edo Liberty, Yoelle Maarek

Identification of subject line templates
Zohar Karnin, Edo Liberty, David Wajk, Guy Halawi

Electronic Mail Personal Vault
Edo Liberty, Yoelle Maarek

Mail Lint: Write Better Emails
Joel Tetreaul, Aasish Pappu, Edo Liberty ,Liangliang Cao, Meizhu Liu ,Ellie Tobochnik, Gilad Tzur, Yoelle Maarek

Methods for Displaying Contextually Targeted Content on a Connected Television
Zeev Neumeier, Edo Liberty

Methods for Identifying Video Segments and Displaying Contextually Targeted Content on Connected Televisions
Zeev Neumeier, Edo Liberty

Method And System For Clustering Data Points
Nir Ailon, Edo Liberty, Hari Khalsa

Methods for filtering data and filling in missing data using nonlinear inference
Edo Liberty, Steven Zucker, Yosi Keller, Mauro M. Maggioni, Ronald R. Coifman, Frank Geshwind, and in collaboration with Plain Sight Systems.

Generalized Stratified Sampling
Kevin Lang, Edo Liberty ,Konstantin Shmakov

Contest Generation Methods for Daily Fantasy Sports
Justin Thaler, Maxim Sviridenko, Edo Liberty, Prerit Uppal, Ron Belmarch, Jerry Shen