Launch HN: General Instinct (YC P26) – Frontier models on edge devices

BoorishBears · 2026-06-05T18:14:49 1780683289

I like the technique described here around distillation to recover from quantization, but I don't understand why we keep performing lossy compression on LLMs then using benchmarks that were nearly saturated before post-training to measure the effects.

You could erase the gains from literally half the compute going into some of these recent models and barely make a dent in MMLU-Pro and GPQA-D.

rdksu · 2026-06-05T18:28:00 1780684080

Have you run ablations on the actual effect/impact of on-policy distillation on contributing to the performance ? Just Curious ! As Unsloth based mixed quantisation methods on MoE models are widely used with great community rep.

VikRubenfeld · 2026-06-05T16:50:37 1780678237

You've likely heard about this - he'd probably like to talk to you and might potentially give you some good PR.

https://www.youtube.com/watch?v=rAzT5lcezPs&t=467s

smokel · 2026-06-05T17:29:11 1780680551

For those too lazy to watch someone talk on video for ages to make a point:

The link is to a famous YouTuber called PewDiePie and he uses a local LLM to parse his email, to save time with that. They have an autoreply system and get notified about urgent matters.

guanming0717 · 2026-06-05T16:59:04 1780678744

Thanks for sharing! I'd love to chat with him. Would you be open to introducing us? :)

XenophileJKO · 2026-06-05T17:34:33 1780680873

I'm still kind of surprised that people are targeting edge deployment of MoE models. By definition they optimize for computation cost at the expense of memory efficiency. We generally need the opposite on the edge.

I'm hoping to see more work in the other direction with cyclic/looped transformers and other memory dense approaches.

rohansood15 · 2026-06-05T17:40:34 1780681234

Have you benchmarked against other 3-bit dynamic quants like Unsloth? I am sorry but this framing against a full precision, newer, smaller MoE just seems misleading. Also, Gemma-4-26B-A4B is not the SOTA for edge. Even at launch, that would be the 31B.

guanming0717 · 2026-06-05T17:44:31 1780681471

Yes I did, with other SOTA quant methods like HQQ, AWQ etc. You can find more info in our blog :) https://general-instinct.com/blog/frontier-moe-sub-4-bit

rohansood15 · 2026-06-05T17:58:55 1780682335

I can't find it. Can you state your performance versus comparable 3-bit quantization from Unsloth/Bartowski? Edit: I appreciate that you seem to have open-sourced the quantization pipeline. This is not to question your work, but to understand where the outputs stand relative to the SoTA for quantization.