UTN logo with text UTN Data Systems logo
UTN Data Systems
  • Home
  • Blog (current)
  • Publications
  • Projects
  • GitHub
  • Text Compression Through the Looking Glass

    Cold and unstructured text has long been a storage burden, driving costs for data that is unlikely to ever be accessed again. The rise of accessible large language models (LLMs) has intensified this challenge by dramatically increasing the volume of generated content that still needs to be retained, e.g. for compliance reasons. This post explores a new class of LLM-based compression methods that can significantly reduce the storage footprint of text-heavy data, and explains why LLMs are particularly well-suited to text compression.

    12 min read   ·   May 11, 2026

    2026

  • String Fingerprints

    Cloud data warehouses are text-heavy. As the amount of text data to scan increases, queries become slower, therefore query engines require fast pre-filters to accelerate them. We present string fingerprints, a lightweight secondary index structure designed to approximate LIKE predicates, albeit with false positives. Fingerprints can be optimized for specific workloads using mixed-integer optimization and even generalize to unseen table filters.

    5 min read   ·   March 23, 2026

    2026

  • Benchmarking Semantic Query Processing Systems

    Semantic query processing is emerging as a new layer atop relational engines, elevating LLM-backed semantic operators to first-class SQL primitives for multimodal data. We present SemBench, the first benchmark to rigorously evaluate these systems end-to-end, and outline our roadmap towards our own system, Spectra, to make semantic operators affordable at scale.

    13 min read   ·   February 16, 2026

    2026

  • Democratizing Data Science

    Our vision is to build an end-to-end agentic data platform, enabling domain experts to acquire, clean, analyze, and visualize data in a principled manner by combining the benefits of LLMs with decades of database research.

    5 min read   ·   January 16, 2026

    2026

  • Launching Our Blog And Wrapping Up 2025

    I'm super excited to launch our blog! We'll use this space to share what's happening in our lab, from research papers and systems to the day-to-day life of our team. To kick things off, let's look back at 2025.

    4 min read   ·   December 31, 2025

    2025

© Copyright 2026. Impressum. Last updated: May 12, 2026.
Cookie cow

We use analytics cookies to understand site usage.