New paper: Congrats to Pinecone's research team publishing the paper An Approximate Algorithm for Maximum Inner Product Search over Streaming Sparse Vectors at the ACM Transactions on Information Systems (TOIS). Research led by the brilliant Sebastian Bruch, Franco Maria Nardini, and Amir Ingber.
New graduate course: I will teach Long Term Memory in AI - Vector Search and Databases at Princeton University this upcoming fall (COS 597A) with Matthijs Douze the architect and main developer of FAISS.
I'm the founder and CEO of Pinecone, the first vector database for machine learning.
Until April 2019, I was a Director of Research at AWS and Head of Amazon AI Labs. The Lab built cutting-edge machine learning algorithms, systems, and services for AWS customers. We build parts of SageMaker, Kinesis, QuickSight, Amazon ElasticSearch, Glue, Rekognition, DeepRacer, Personalize, Forecast, and other yet-to-be-released services from AWS.
Before AWS, I was a Senior Research Director at Yahoo and Head of Yahoo's Research Lab in New York. We worked on building horizontal machine learning platforms and improving applications such as online advertising, search, security, media recommendation, email abuse prevention, and many more.
I received my B.Sc in Physics and Computer Science from Tel Aviv University and my Ph.D. in Computer Science from Yale University. After that, I was a Postdoctoral fellow at Yale in the Program in Applied Mathematics.
My research focuses on mathematical foundations and algorithms for challenges arising in dealing with large amounts of data. Topics include fast dimensionality reduction, clustering, streaming algorithms, machine learning, large scale numerical linear algebra, and high dimensional geometry.
[Nov 2021] Andy Pavlo's CMU Database Seminar about The Pinecone Vector Database System.
[July 2020] Keynote at San Fransisco Data Council
[March 2020] Keynote at the Future of Information and Communication Conference about the Benefits and Challenges of combining Deep Learning and Retrieval Tasks.
[June 2019] Keynote at the Time Series Workshop at ICML about streaming algorithms, Apache DataSketches, and some sneak preview on the new coreset results in machine learning. The slides are are found here: Streaming algorithms, Apache DataSketches, and new results on corsets. Thank you Yuyang (Bernie) Wang Cheng Tang, Qi (Rose) Yu, Scott Yang and, Vitaly Kuznetsov for the invitation and for organizing this great workshop.
[Apr 2019] Keynote at the Southern Data Science Conference about streaming algorithms and the datasketches library. Thank you Khalifeh AlJadda for the opportunity and for putting together an outstanding conference. Kudos!
[Feb 2019] Keynote at ITA about coresets, discrepancy, and sketches in machine learning together with Dimitris Achlioptas, Ben Recht, and Chris Re. The slides are available here but the paper is yet unpublished.
[Nov 2018] SageMaker Algorithms at MLConf in San Francisco.
[Oct 2018] SageMaker Algorithms at the first ever Amazon Research day in Haifa. Thank you Yoelle Maarek and Liane Lewin for organizing this awesome event.
[Aug 2018] KDD Keynote on deep learning on AWS and SageMaker Algorithms with Alex Smola.
[Jun 2018] Keynote at TMA conference in Vienna. I talked at the TMA Experts Summit about SageMaker algorithm (presentation). Later, I gave a keynote and the TMA conference about data sketches and mergeble summaries (presentation).
[Jun 2017] Shonan - Japan Processing Big Data Streams workshop where I presented my work with Zohar Karnin and Kevin Lang on streaming quantiles. Vladimir Braverman, David Woodruff and Ke Yi did a wonderful job organizing it.
[May 2017] Amazon posted a blog post called In the Research Spotlight in which they interview me about my career and current efforts in AWS.
Correlation Clustering: from Theory to Practice
KDD 2014 Tutorial [slides]
[bib]
Streaming Data Mining
KDD 2012 tutorial on practical algorithms in mining streaming data; with
Jelani Nelson.
Fast Random Projections survey and new results,
SODA 2011 and IAS and Yale math seminars 2011.
Video of the talk at IAS available here.
Apache DataSketches is the leading and most popular open source implementation of streaming algorithms for sketching and summarizing data such as counting distinct items (like HLL), frequent items, (aka top-k), streaming quantiles, and more. It is used by Druid, Spark, Yahoo, AWS, Google, and many more.
Frequent Directions: I have been asked to make some matrix sketching code available for a long time now. So, Mina Ghashami and I made some of our frequent direction git repo public. This code is distributed freely for academic use only. Please feel free to send pull requests.
Streaming Quantiles in Python: I'm excited about resolving one of the longests standing open problems in the streaming model. We designed an optimal algorithm for finding any approximate quantile of a stream of elements. See also the paper which Zohar Karnin, Kevin Lang, and myself posted on Arxiv.
Omri Weinstein - Columbia University
Noa Avigdor-Elgrabli - Yahoo Research
Roy Schwartz - The Technion Institute
Dan Garber - Haifa University
Nikita Ivkin - Amazon AI Labs
Mina Ghashami - Visa Research
Ofir Geri - Stanford University
Yu Bai - Salesforce Research
Nicholas Ryder - OpenAI
Aditya Krishnan - Johns Hopkins University
I often serve on academic review committees for conferences, journals, and grants. Past activities include being area Chair, SPC, PC and/or reviewer for KDD, WSDM, COLT, ICML, ESA, KDD, ICML, WSDM, WWW, NIPS, COLT, SODA, FOCS, SIGIR, AISTATS, and NYCE.
COS 597A : Long Term Memory in AI - Vector Search and Databases, Princeton University. Co-instructed with Matthijs Douze
0368-3248-01-Algorithms in Data Mining - Tel Aviv University 2011-1013
The course covered algorithmic tools for data mining massive data sets. It was given as a theory/algorithms class with and emphasis on randomization and streaming.
Sebastian Bruch, Franco Maria Nardini, Amir Ingber, Edo Liberty
ACM Transactions on Information Systems 2023
tl;dr: This paper explores algorithms and optimizations for hybrid search in vector databases.
Edo Liberty
tl;dr: This is a super simple one-line proof of Frequent Directions, the matrix sketching algorithm in my 2013 KDD best paper (see below).
Graham Cormode, Zohar Karnin, Edo Liberty, Justin Thaler, Pavel Veselý
PODS 2021, Best paper award
tl;dr: This (finally) solves a problem I wanted to solve for years. Namely, how to efficiently sketch quantiles with relative errors. This is critical for large scale performance monitoring, for example.
Edo Liberty, Zohar Karnin, Bing Xiang, Laurence Rouesnel, Baris Coskun, Ramesh Nallapati, Julio Delgado Mangas, Amir Sadoughi, Yury Astashonok, Piali Das, Can Balioglu, Saswata Chakravarty, Madhav Jha, Philip Gautier, Tim Januschowski, Valentin Flunkert, David Arpin, and Alex Smola.
SIGMOD 2020
tl;dr: The culmination of more than two years of work, this paper describes the algorithms and distributed architecture behind Amazon SageMaker's slastic ML algorithms.
Zohar Karnin and Edo Liberty
COLT 2019
tl;dr: This ML-theory paper shows that many types of machine learning models have much smaller coresets than those previously known. As a special case of the general result, it resolves the open problem regarding the coreset complexity of gaussian density estimation.
Zohar Karnin, Kevin Lang, Edo Liberty
FOCS 2016
tl;dr: This paper describes the KLL algorithm. It resolves one of the longest standing and basic problems in the streaming algorithms literature. Namely, optimally approximating ranks and quantiles in streaming data.[slides]
Edo Liberty
KDD 2013, Best paper award
tl:dr: This paper introduced frequent-directions, an incredibly simple, practically efficient, and theoretically optimal algorithm for approximating the covariance of vector streams. [slides], [experimental results], [talk], [bib]. [git repo].
Nir Ailon, Zohar Karnin, Edo Liberty, Yoelle Maarek
TechPulse 2012, Best paper award and WSDM 2013
tl:dr the paper shows how to use sketches to find causality relations between billions of events using trillions of observations. [bib]
Nir Ailon, Edo Liberty
SODA 2011, Best paper award
tl:dr The main result of my PhD work on fast dimension reduction. Specifically, matching the optimal target dimension of the Johnson-Lindenstrauss lemma with fast projection algorithms. [bib]
Aditya Krishnan, Edo Liberty
Arxiv 2021
Graham Cormode, Zohar Karnin, Edo Liberty, Justin Thaler, Pavel Veselý
Best paper at PODS 2021
Pigi Kouki, Ilias Fountalis, Nikolaos Vasiloglou, Xiquan Cui, Edo Liberty, Khalifeh Al Jadda
RecSys 2020
Edo Liberty, Zohar Karnin, Bing Xiang, Laurence Rouesnel, Baris Coskun, Ramesh Nallapati, Julio Delgado Mangas, Amir Sadoughi, Yury Astashonok, Piali Das, Can Balioglu, Saswata Chakravarty, Madhav Jha, Philip Gautier, Tim Januschowski, Valentin Flunkert, David Arpin, and Alex Smola.
SIGMOD 2020
Nikita Ivkin, Edo Liberty, Kevin Lang, Zohar Karnin, Vladimir Braverman
Sensors 2022
Zohar Karnin and Edo Liberty
COLT 2019
Nicholas Ryder, Zohar Karnin, and Edo Liberty
ARXIV 2019
Yu Bai, Yu-Xiang Wang, Edo Liberty
ICLR 2019
Daniel Anderson, Pryce Bevan, Kevin Lang, Edo Liberty, Lee Rhodes, Justin Thaler
IMC 2017
Edo Liberty, Maxim Sviridenko, Approx 2017
Zohar Karnin, Kevin Lang, Edo Liberty
FOCS 2016
Edo Liberty
Arxiv 2016
Professor Wenjian Yu of Tsinghua University pointed out that a the square was omitted from (1+eps) in equation 2. The proof is still correct after a straight forward correction. This will be corrected in the next version.
Mina Ghashami, Edo Liberty, Jeff M. Phillips
KDD 2016
Kevin Lang, Edo Liberty, Konstantin Shmakov
ICML 2016 [slides]
Edo Liberty, Michael Mitzenmacher, Justin Thaler, Jonathan Ullman
PODS 2016
Edo Liberty, Ram Sriharsha, Maxim Sviridenko
ALENEX 2016 [bib]
Zohar Karnin, Edo Liberty
COLT 2015
(see also 5 minute video letcure)
Christos Boutsidis, Dan Garber, Zohar Karnin, Edo Liberty
SODA 2014 [bib]
Dimitris Achlioptas, Zohar Karnin, Edo Liberty
NIPS 2013 [bib]
Edo Liberty (see slides and experimental results in json format)
Also, here is talk I gave at the Simons Institute about this.
Best paper at KDD 2013 [bib]
See also frequent direction git repo by Mina Ghashami and myself.
Nir Ailon, Zohar Karnin, Edo Liberty, Yoelle Maarek
Best paper at TechPulse 2012 and WSDM 2013 [bib]
Zohar Karnin, Edo Liberty, Shachar Lovett, Roy Schwartz, and Omri Weinstein
Liran Katzir, Edo Liberty, and Oren Somekh
WWW 2012 [bib]
Nir Ailon, Edo Liberty
Best paper at SODA 2011 [bib]
Nir Ailon, Noa Avigdor-Elgrabli, Edo Liberty, Anke van Zuylen
Yehuda Koren, Edo Liberty,Yoelle Maarek, and Roman Sandler
KDD 2011 [bib]
Liran Katzir, Edo Liberty, and Oren Somekh
WWW 2011 [bib]
Gal Lavee, Ronny Lempel, Edo Liberty, and Oren Somekh
WWW 2011 [bib]
Nir Ailon, Edo Liberty
ICALP 2009 [bib]
Edo Liberty, Nir Ailon, Amit Singer
RANDOM 2008 [bib]
Nir Ailon, Edo Liberty
SODA 2008 [bib]
Sebastian Bruch, Franco Maria Nardini, Amir Ingber, Edo Liberty
TOIS - ACM Transactions on Information Systems 2023
Mina Ghashami, Edo Liberty, Jeff M. Phillips, David P. Woodruff
Liran Katzir, Edo Liberty, Oren Somekh, Ioana A. Cosma
Journal of Internet Mathematics [bib]
Nir Ailon, Edo Liberty
Transactions on Algorithms [bib]
Nir Ailon, Noa Avigdor-Elgrabli, Edo Liberty, and Anke van Zuylen
SIAM Journal on Computing [bib]
Zohar Karnin, Edo Liberty, Shachar Lovett, Roy Schwartz and Omri Weinstein
JMLR 2012 (Journal of Machine Learning Research) [bib]
Edo Liberty, Nir Ailon, Amit Singer
DCG 2010 (Discrete and Computational Geometry) [bib]
Edo Liberty, Steven Zucker
IPL 2009 (Information Processing Letters) [bib]
Nir Ailon, Edo Liberty
DCG 2008 (Discrete and Computational Geometry) [bib]
Edo Liberty, Franco Woolfe, Vladimir Rokhlin, and Mark Tygert
ACHA 2008 (Applied and Computational Harmonic Analysis) [bib]
Edo Liberty, Franco Woolfe, Per-Gunnar Martinsson, Vladimir Rokhlin, and Mark Tygert.
PNAS 2007 (Proceedings of the National Academy of Sciences) [bib]
Roni Ilan, Edo Liberty, Shahar Even-Dar Mandel, and Ron Lifshitz.
Ferroelectrics 2004.
PhD Thesis. See also Talk slides
Edo Liberty
Nir Ailon, Edo Liberty, Hari Khalsa
Edo Liberty, Steven Zucker, Yosi Keller, Mauro M. Maggioni, Ronald R. Coifman, Frank Geshwind, and in collaboration with Plain Sight Systems.
Kevin Lang, Edo Liberty ,Konstantin Shmakov
KJ Lang, E Liberty, K Shmakov
Edo Liberty, Stefano Stefani, Alexander Smola, Craig Wiley, Steve Loeppky, Tom Faulhaber, Swami Sivasubramanian, Zohar Karnin
Edo Liberty, Zohar Karnin
Edo Liberty, Stefano Stefani, Swami Sivasubramanian, Zohar Karnin, Tom Faulhaber, Alexan- der Smola, Craig Wiley, Amir Sadoughi, Dayanand Rangegowda
Edo Liberty, Stefano Stefani, Steve Loeppky, Craig Wiley, Tom Faulhaber
Edo Liberty, Madhav Jha
Stefano Stefani, Craig Wiley, Thomas Faulhaber, Alexander Smola, Steven Loeppky, Richard Bice, Edo Liberty, Swaminathan Sivasubramanian, Charles Swan, Taylor Goodhart
Mu Li, Edo Liberty, Alexander Smola, Leyuan Wang
Madhav Jha, Edo Liberty
S Genc, E Liberty
Edo Liberty, Leo Dirac
Zohar Karnin, Guy Halawi, David Wajc, Edo Liberty
Edo Liberty, Zohar Karnin, Yoelle Maarek, Natalie Aizenberg
Ronny Lempel, Yoelle Maarek, Edward Bortnikov, Edo Liberty
Vishwanath Ramarao, Andrei Broder, Idan Szpektor, Edo Liberty, Yehuda Koren, Mark Risher, and Yoelle Maarek
Edo Liberty ,Zohar Karnin, Yoelle Maarek
Zohar Karnin, Michal Aharon, Edo Liberty, Yoelle Maarek
Zohar Karnin, Edo Liberty, David Wajk, Guy Halawi
Edo Liberty, Yoelle Maarek
J Tetreault, A Pappu, E Liberty, L Cao, M Liu, E Pavlick, G Tsur, Y Maarek
Joel Tetreaul, Aasish Pappu, Edo Liberty ,Liangliang Cao, Meizhu Liu ,Ellie Tobochnik, Gilad Tzur, Yoelle Maarek
Zeev Neumeier, Edo Liberty
Zeev Neumeier, Edo Liberty
Justin Thaler, Maxim Sviridenko, Edo Liberty, Prerit Uppal, Ron Belmarch, Jerry Shen
Justin Thaler, Maxim Sviridenko, Edo Liberty, Prerit Uppal, Ron Belmarch, Jerry Shen