Article
AI News AI Platforms

Subquadratic claims to have solved the efficiency problem in large language models

by TechDefused Newsroom
A digital sign displaying the message 'EXPECT DELAYS' is prominently featured in an urban environment during low visibility conditions. The background includes a bank and a brick building, indicating a commercial area. — Credit: Photo by Erik Mclean on Unsplash c Photo by Erik Mclean on Unsplash

Subquadratic, a Miami-based AI startup, released results from third-party testing claiming its SubQ model runs far faster and far cheaper than typical large language models while handling context windows up to 12 million tokens.

Appen, an independent testing firm, measured SubQ as 56 times faster than models using FlashAttention in a theoretical speed test. The model scored 89.7% on LiveCodeBench for coding and 98% on a long-context retrieval test at 6 million and 12 million token windows.

The cost comparison is striking. Running Nvidia's RULER 128 benchmark on Anthropic's Opus 4.6 costs $2,600. Subquadratic says the same test costs $8 on SubQ. If accurate, that is not an incremental improvement. That is a different category of efficiency.

CEO Justin Dangel said the company is "kicking off a new age of efficiency." The claim rests on replacing transformers' dense attention mechanism with dynamic sparse attention that selectively multiplies token pairs rather than computing attention across all pairs. In theory, that solves the quadratic compute bottleneck that has constrained long-context processing.

In theory.

Opacity problem

Subquadratic declines to disclose the exact selection algorithm used for its dynamic sparse attention. That is a notable choice. If the company has solved a fundamental problem in AI efficiency, publishing the algorithm would establish credibility, attract talent and potentially command a premium valuation.

The silence suggests either that the solution is incremental rather than fundamental, or that the company believes it can maintain competitive advantage through secrecy. Neither is reassuring.

The company also bootstrapped SubQ from weights of an open-source Qwen model rather than training wholly from scratch. That means SubQ is an optimized version of an existing model, not a new architecture proving the approach works from first principles.

Access bottleneck

Subquadratic has kept access severely limited. The company says tens of thousands have joined a waitlist and more than 500 enterprises have signed up for early access. But very few people have live access to test the claims.

That is the real red flag. If SubQ delivers what the benchmarks claim, access would be the constraint limiting adoption, not proof of concept. Companies would fight for access. The fact that Subquadratic is managing access carefully suggests the company is controlling the narrative around what SubQ can actually do.

What remains to be proven

The third-party testing is real. Appen is a credible firm. The benchmarks are impressive on paper. But benchmarks are not products. A model that is 56 times faster in theoretical tests may not be 56 times faster in production workloads with real data and real constraints.

Subquadratic will eventually have to grant access to enough users that independent verification becomes possible. Until then, the efficiency gains remain claims, not proven capability

by TechDefused Newsroom