Semantic Search in Under 3MB

Fri, 19 Jun 2026 22:24:12 -0700

This project is a continuation of my previous autoresearch project, which optimized a reranking model to be under 10MB. Digging deeper by hand, I was able to take the size reduction much further, while outperforming reranking models which are 30x larger on this task. In the end I was able to reduce the payload from 11.4 MB to 2.79 MB gzipped.

You can see it in action on my resume page.

Each square represents 1 kB. The majority of overall size reduction came from removing the ORT dependency. However, other changes enabled much better representation quality than the baseline.

Quantization on Luke Salamone's Blog

Semantic Search in Under 3MB