In brief
- Google said its TurboQuant algorithm can cut a major AI memory bottleneck by at least sixfold with no accuracy loss during inference.
- Memory stocks including Micron, Western Digital and Seagate fell after the paper circulated.
- The method compresses inference memory, not model weights, and has only been tested in research benchmarks.
Google Research published TurboQuant on Wednesday, a compression algorithm that shrinks a major inference-memory