LoreonLabsPlatform

Overview

Intelligence

Markets
Builders
Research
Ecosystems
Launchpads

Search

Red Hat

Matthew Bonanni

vLLM Maintainer | MLE at Red Hat | Stanford PhD '25 | HPC, C++, CUDA, LLM inference

Followers

59

Public repos

25

Stars (recent)

17

Ecosystems

1

Projects

Repositories this builder owns.

A high-throughput and memory-efficient inference and serving engine for LLMs

flash-attention

Fast and memory-efficient exact attention

matthewbonanni.github.io

No description.

Tensors and Dynamic neural networks in Python with strong GPU acceleration

Connected narratives

AI Agents Consumer Crypto SocialFi Stablecoins Agent Commerce Onchain Apps

Related builders

Others building in the same ecosystem.

Interactive visualizer for transformer self-attention variants (MHA, GQA, MQA, MLA) with tensor shapes, FLOPs/memory cost analysis, and per-GPU roofline estimates

A simple GPU reservation tool for single host shared development systems

Common recipes to run vLLM

This repo hosts code for vLLM CI & Performance Benchmark infrastructure.

Recent activity

Most recently pushed work.

MatthewBonanni/vllm
pushed 8h ago
MatthewBonanni/flash-attention
pushed 8h ago
MatthewBonanni/matthewbonanni.github.io
pushed 10h ago
MatthewBonanni/pytorch
pushed 5d ago