https://github.com/alibaba/nann
Introduction
NANN is a flexible, high-performance framework for large-scale retrieval problems based on TensorFlow.
Background
NANN has been deeply cultivated and widely applied since 2021 in Alibaba, which supports many businesses such as Taobao display advertising, Taobao search advertising, and Shenma search. NANN aims to solve the large-scale retrieval problem by integrating the post-training index with arbitrarily advanced models. Model-based and heuristic methods are provided to ensure that arbitrarily advanced models can still maintain their capability during large-scale retrieval. Also, NANN includes in-depth performance optimizations for GPU and CPU, which guarantee the inference performance. Moreover, NANN lays emphasis on user-friendliness, especially for TensorFlow users.
https://zhuanlan.zhihu.com/p/614214963
一、背景
大规模信息检索一直是搜推广领域的核心问题之一,而基于任意复杂模型的检索方案无疑是业界重要的迭代方向之一。近年来,阿里妈妈展示广告Match团队与预测引擎团队专注于从算法与工程角度推动工业级大规模检索技术的研发,我们在基于任意复杂模型的检索方向上积累了一定经验并取得了不错的业务效果,现整理发布NANN(Neural Approximate Nearest Neighbor,以下简称NANN)并对外开源,希望通过社区的协同创造力,共同推进该领域的发展。
本文介绍的NANN源自阿里妈妈展示广告Match团队研发的二向箔算法体系,该方案在保留复杂模型召回能力的同时,将索引学习和模型训练解耦,提供了轻量化的任意复杂模型召回解决方案。NANN基于Tensorflow,提供了性能benchmarking工具以及完整的由模型训练至在线deployment的demo。
该方案由阿里妈妈技术团队自研,已在阿里巴巴集团内部其他业务进行推广上线,在典型的搜索、推荐、广告场景均取得了显著的业务收益。
本文将围绕NANN核心功能及算法体系更新进行简要介绍,欢迎大家试用和交流讨论。