Blog | UTN Data Systems

String Fingerprints

Cloud data warehouses are text-heavy. As the amount of text data to scan increases, queries become slower, therefore query engines require fast pre-filters to accelerate them. We present string fingerprints, a lightweight secondary index structure designed to approximate LIKE predicates, albeit with false positives. Fingerprints can be optimized for specific workloads using mixed-integer optimization and even generalize to unseen table filters.

5 min read · March 23, 2026

2026
Benchmarking Semantic Query Processing Systems

Semantic query processing is emerging as a new layer atop relational engines, elevating LLM-backed semantic operators to first-class SQL primitives for multimodal data. We present SemBench, the first benchmark to rigorously evaluate these systems end-to-end, and outline our roadmap towards our own system, Spectra, to make semantic operators affordable at scale.

13 min read · February 16, 2026

2026
Democratizing Data Science

Our vision is to build an end-to-end agentic data platform, enabling domain experts to acquire, clean, analyze, and visualize data in a principled manner by combining the benefits of LLMs with decades of database research.

5 min read · January 16, 2026

2026
Launching Our Blog And Wrapping Up 2025

I'm super excited to launch our blog! We'll use this space to share what's happening in our lab, from research papers and systems to the day-to-day life of our team. To kick things off, let's look back at 2025.

4 min read · December 31, 2025

2025

String Fingerprints

Benchmarking Semantic Query Processing Systems

Democratizing Data Science

Launching Our Blog And Wrapping Up 2025