KVarN: Native vLLM backend for KV-cache quantization by Huawei

github.com

133 points by theanonymousone 19 hours ago


throwa356262 - 18 hours ago

Better performance than TQ and better quality than FP16?

Am I reading this right??

v3ss0n - 18 hours ago

Why this is not a PR for vLLM ?

mikeayles - an hour ago

[dead]

shockembopper - 17 hours ago

[dead]

0xjeffro - 12 hours ago

yao yao ling xian