Quantization Tutorial

Why memory swizzling is hidden tax on AI compute

Memory swizzling is the quiet tax that every hierarchical-memory accelerator pays. It is fundamental to how GPUs, TPUs, NPUs, ...

IEEE

Quantization via Distillation and Contrastive Learning

Abstract: Quantization is a critical technique employed across various research fields for compressing deep neural networks (DNNs) to facilitate deployment within resource-limited environments. This ...

IEEE

Content-Aware Quantization Index Modulation: Leveraging Data Statistics for Enhanced Image Watermarking

Abstract: Image watermarking techniques have continuously evolved to address new challenges and incorporate advanced features. The advent of data-driven approaches has enabled the processing and ...

GitHub

GGUF Quantization support for native ComfyUI models

This is currently very much WIP. These custom nodes provide support for model files stored in the GGUF format popularized by llama.cpp. While quantization wasn't feasible for regular UNET models ...

GitHub

APHQ-ViT: Post-Training Quantization with Average Perturbation Hessian Based Reconstruction for Vision Transformers

This repository contains the official PyTorch implementation for the CVPR 2025 paper "APHQ-ViT: Post-Training Quantization with Average Perturbation Hessian Based Reconstruction for Vision ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results