Engineering

I spent much of my career in software companies as an applied scientist. Most of that work is proprietary; prototyping, benchmarking, data pipelines, training models, etc. The public artifacts are mainly patents and the occasional open-source project. They are listed below by company (most recent first) based on the kinds of infrastructure I was working on.

Pinecone

Founder and Chief Scientist. Building the Pinecone vector database - managed, large-scale, low-latency vector search that serves as long-term memory for AI applications.

Open source; benchmarks

  • Big ANN Benchmarks - a benchmark and competition for billion-scale approximate nearest-neighbor search, pushing the state of the art in vector-search algorithms and systems.

  • vq-bench - a benchmark for vector quantization (coming soon).

AWS

Director of Research and Head of Amazon AI Labs. Built the algorithms and distributed systems behind Amazon SageMaker - AWS's platform for training and serving machine learning models at scale.

See the paper Amazon SageMaker Elastic Algorithms (SIGMOD 2020) on the Research page.

Patents

  • System and Method for Experimentation and Deployment of Machine Learning Models on Cloud Based Platforms

    Edo Liberty, Stefano Stefani, Alexander Smola, Craig Wiley, Steve Loeppky, Tom Faulhaber, Swami Sivasubramanian, Zohar Karnin

  • Method for post-training Hyperparameter Tuning by training Machine Learning States

    Edo Liberty, Zohar Karnin

  • Autoscaling of Training Machine Learning Jobs on Cloud Infrastructures

    Edo Liberty, Stefano Stefani, Swami Sivasubramanian, Zohar Karnin, Tom Faulhaber, Alexander Smola, Craig Wiley, Amir Sadoughi, Dayanand Rangegowda

  • A system for autoscaling and hosting of ML Models for production inference

    Edo Liberty, Stefano Stefani, Steve Loeppky, Craig Wiley, Tom Faulhaber

  • Online training with delayed feedback with applications to bandwidth-efficient communication over networks

    Edo Liberty, Madhav Jha

  • System Architecture for Container Based Large Scale Machine Learning Platforms

    Stefano Stefani, Craig Wiley, Thomas Faulhaber, Alexander Smola, Steven Loeppky, Richard Bice, Edo Liberty, Swaminathan Sivasubramanian, Charles Swan, Taylor Goodhart

  • Method and Systems for Optimal Graph Synchronization for Distributed Machine Learning

    Mu Li, Edo Liberty, Alexander Smola, Leyuan Wang

  • Machine Learning model-assisted real-time enhancement of audio/video over a network call to significantly lower bandwidth requirements

    Madhav Jha, Edo Liberty

  • Training machine learning models for physical agents and robotic controls with simulations

    S. Genc, E. Liberty

  • Machine Learning system to remove accent from spoken speech

    Edo Liberty, Leo Dirac

Yahoo

Senior Research Director and Head of Yahoo Labs, New York. Built horizontal machine-learning platforms and the streaming-data systems that powered Yahoo's products, from advertising to mail.

Open source

  • Apache DataSketches is the leading and most popular open source implementation of streaming algorithms for sketching and summarizing data such as counting distinct items (like HLL), frequent items (aka top-k), streaming quantiles, and more. It is used by Druid, Spark, Yahoo, AWS, Google, and many more.

Patents

  • Generalized Stratified Sampling

    Kevin Lang, Edo Liberty, Konstantin Shmakov

  • On-line content sampling

    KJ Lang, E Liberty, K Shmakov

  • Classifying man versus machine generated email

    Zohar Karnin, Guy Halawi, David Wajc, Edo Liberty

  • A System for Email sequence identification

    Edo Liberty, Zohar Karnin, Yoelle Maarek, Natalie Aizenberg

  • Sponsored Apps Marketplace in eMail

    Ronny Lempel, Yoelle Maarek, Edward Bortnikov, Edo Liberty

  • Mining Global Email Folders For Identifying Auto-folders tags

    Vishwanath Ramarao, Andrei Broder, Idan Szpektor, Edo Liberty, Yehuda Koren, Mark Risher, and Yoelle Maarek

  • Email sequence identification

    Edo Liberty, Zohar Karnin, Yoelle Maarek

  • Mailing List Identification and Representation

    Zohar Karnin, Michal Aharon, Edo Liberty, Yoelle Maarek

  • Identification of subject line templates

    Zohar Karnin, Edo Liberty, David Wajc, Guy Halawi

  • Computerized system and method for modifying a message to apply security features to the message's content

    Edo Liberty, Yoelle Maarek

  • Electronic message composition support method and apparatus

    J Tetreault, A Pappu, E Liberty, L Cao, M Liu, E Pavlick, G Tsur, Y Maarek

  • Mail Lint: Write Better Emails

    Joel Tetreault, Aasish Pappu, Edo Liberty, Liangliang Cao, Meizhu Liu, Ellie Tobochnik, Gilad Tzur, Yoelle Maarek

  • Contest Generation Methods for Daily Fantasy Sports

    Justin Thaler, Maxim Sviridenko, Edo Liberty, Prerit Uppal, Ron Belmarch, Jerry Shen

  • Fantasy Sports Data Analysis for Game Structure Development

    Justin Thaler, Maxim Sviridenko, Edo Liberty, Prerit Uppal, Ron Belmarch, Jerry Shen

Google

Worked as an Intern (twice) at Google, specifically on Google Analytics and Google Maps.

Patents

  • Method And System For Clustering Data Points

    Nir Ailon, Edo Liberty, Hari Khalsa

Inscape

Technical founder. Built automatic content-recognition (ACR) infrastructure - fingerprinting broadcast video in real time to identify what is on screen and target contextually relevant content across millions of connected televisions.

Patents

  • Methods for Displaying Contextually Targeted Content on a Connected Television

    Zeev Neumeier, Edo Liberty

  • Methods for Identifying Video Segments and Displaying Contextually Targeted Content on Connected Televisions

    Zeev Neumeier, Edo Liberty

During my PhD

  • Methods for filtering data and filling in missing data using nonlinear inference

    Edo Liberty, Steven Zucker, Yosi Keller, Mauro M. Maggioni, Ronald R. Coifman, Frank Geshwind, and in collaboration with Plain Sight Systems.

For fun

  • Ezuzah Chrome Extension (a digital art piece) - your browser is your door to the internet, why not hang a Mezuzah?