In machine learning parlance, clustering or similarity search looks for similarities in sets of data that normally don’t make such a job easy. If you wanted to compare 100 million images against each other and find the ones that looked most like each other, that’s a clustering job. The hard part is scaling well across multiple processors, where you’d get the biggest speedup.
Facebook’s AI research division (FAIR) recently unveiled, with little fanfare, a proposed solution called Faiss. It’s an open source library, written in C++ and with bindings for Python, that allows massive datasets like still images or videos to be searched efficiently for similarities.