nCompass Technologies: Optimize performance on GPUs - 10x faster

Optimize performance on GPUs - 10x faster

Identifying performance bottlenecks and strategizing ways to solve them takes 4-8x longer than actually writing the code to fix them. We're building an agent that is an expert at analyzing the performance GPU systems like inference engines at all levels of the stack - from CPU-GPU interactions down to GPU kernels. Pairing our agent with Cursor / Claude Code allows you to automate both the reasoning and code implementation steps of performance optimization. What used to take weeks can now be done in days. Along with our AI agent, we have features such as running diffs on system traces as well as sharing and collaboration features that make our VSCode extension the most powerful way to work on performance optimization.

Active Founders

Aditya Rajagopal

Co-Founder

I am a recent PhD graduate from Imperial College London with experience in machine learning algorithms, compilers and hardware architectures. I've worked in compiler teams at Qualcomm and Huawei as well as served as a reviewer for ICML. My co-founder and I are building nCompass which is a platform for accelerating and hosting both open-source and custom large AI models. Our focus is on providing rate unlimited and low latency large AI inference with only one line of code.

Aditya Rajagopal

Co-Founder

I am a recent PhD graduate from Imperial College London with experience in machine learning algorithms, compilers and hardware architectures. I've worked in compiler teams at Qualcomm and Huawei as well as served as a reviewer for ICML. My co-founder and I are building nCompass which is a platform for accelerating and hosting both open-source and custom large AI models. Our focus is on providing rate unlimited and low latency large AI inference with only one line of code.

Diederik Vink

Founder

I'm a recent Imperial College London PhD Graduate where I specialized in reconfigurable hardware architectures for accelerated machine learning and reduced precision training algorithms. I have worked as an AI feasibility consultant prototyping and evaluating AI spin-outs. We are building nCompass, a platform for accelerating and hosting both open-source and custom large AI models. Our focus is on providing rate-unlimited and low latency large AI inference with only one line of code.

Diederik Vink

Founder

I'm a recent Imperial College London PhD Graduate where I specialized in reconfigurable hardware architectures for accelerated machine learning and reduced precision training algorithms. I have worked as an AI feasibility consultant prototyping and evaluating AI spin-outs. We are building nCompass, a platform for accelerating and hosting both open-source and custom large AI models. Our focus is on providing rate-unlimited and low latency large AI inference with only one line of code.

Company Launches

nCompass - Optimize performance on GPUs, 10x faster

See original launch post

Hey everyone,

tldr;

I’m Aditya, co-founder of nCompass Technologies. We’re building a developer tool that unifies profiling, trace collaboration and trace analysis of AI systems. We automate performance optimization of AI systems across all levels of the infrastructure stack.

Using this tool, we implemented a Hopper GEMM kernel that outperformed NVIDIA's CUTLASS GEMMs by 3%, within a day - this took us months before.

Checkout our product demo below. It’s free to use and you can get started today in VS Code, Cursor or Claude Code - Quick Start

https://www.youtube.com/watch?v=Q3Pq-BPU2Ec

THE PROBLEM

Identifying the root cause of performance bottlenecks is 4-8x slower than writing the code to fix them.

If you are optimizing a system like vLLM, you have to:

Run a profile and then copy a giant trace file to your local machine just to view it.
Spend hours identifying opportunities for performance improvement.
If this involves writing a kernel, you profile the kernel, spend hours or days digging through ncu traces that are massive data dumps.
Then you identify your bottlenecks and formulate a plan.

Running this loop till you have a performant system can take weeks, even months.

OUR SOLUTION

By building an AI agent that can analyze profiling data as well as interact with a bank of deep technical knowledge and expertise, we’re automating the process of identifying performance bottlenecks.

Now in a single VSCode interface, you can:

Open and view trace files
Use our novel tools like trace diffs to analyze them
Generate share links to easily share traces with team members
Feed source + profiling data to our AI agent and get back actionable analysis on how you can optimize the performance of your system.

This applies to both systems and GPU kernel level analysis and our agent integrates directly into Cursor / Claude Code, so you never have to leave your normal workflow!

Anyone can now write both correct and performant code with AI!

ASKS

Install our VSCode extension and start optimizing your systems performance!

We also offer FDE services - if you would like us to step in and analyze your system’s performance and provide you with an analysis of how much we could improve it by - reach out at hello@ncompass.tech

uploaded image

Previous Launches

Optimize performance on GPUs - 10x faster

nCompass Technologies: Reliable LLM API with no rate-limits

nCompass Technologies: Realtime audio denoising

nCompass Technologies - Low-latency deployment of AI models made easy