Down And Out With Cerebras Code

Out of Fireworks and into the fire

However, my start with Cerebras’s hosted Qwen was not the same as what I experienced (for a lot more money) on Fireworks, another provider. Initially, Cerebras’s Qwen didn’t even work in my CLI. It also didn’t seem to work in Roo Code or any other tool I knew how to use. After taking a bug report, Cerebras told me it was my code. My same CLI that worked on Fireworks, for Claude, for GPT-4.1 and GPT-5, for o3, for Qwen hosted by Qwen/Alibaba was at fault, said Cerebras. To be fair, my log did include deceptive artifacts when Cerebras fragmented the stream, putting out stream parts as messages (which Cerebras still does on occasion). However, this has been generally their approach. Don’t fix their so-called OpenAI compatibility—blame and/or adapt the client. I took the challenge and adapted my CLI, but it was a lot of workarounds. This was a massive contrast with Fireworks. I had issues with Fireworks when it started and showed them my debug output; they immediately acknowledged the problem (occasionally it would spit out corrupt, native tool calls instead of OpenAI-style output) and fixed it overnight. Cerebras repeatedly claimed their infrastructure was working perfectly and requests were all successful—in direct contradiction to most commentary on their Discord.

Feeling like I had finally cracked the nut after three weeks of on-and-off testing and adapting, I grabbed a second Cerebras Code Max account when the window opened again. This was after discovering that for part of the time, Cerebras had charged me for a Max account but given me a Pro account. They fixed it and offered no compensation for the days my service was set to Pro, not Max, and it is difficult to prove because their analytics console is broken, in part because it provides measurements in local time, but the limits are in UTC.

Then I did the math. One Cerebras Code Max account is limited to 120 million tokens per day at a cost equivalent to four times that of a Cerebras Code Pro account. The Pro account is 24 million tokens per day. If you multiply that by four, you get 96 million tokens. However, the Pro account is limited to 300k tokens per minute, compared to 400k for the Max. Using Cerebras is a bit frustrating. For 10 to 20 seconds, it really flies, then you hit the cap on tokens per minute, and it throws 429 errors (too many requests) until the minute is up. If your coding tool is smart, it will just retry with an exponential back-off. If not, it will break the stream. So, had I bought four Pro accounts, I could have had 1,200,000 TPM in theory, a much better value than the Max account.

Source link

What's Hot

China tells tech firms to stop buying Nvidia’s AI chips: Report

Critics Question OpenAI’s $100 Billion Gift to Its Nonprofit

ChatGPT teen-safety measures to include age verification, OpenAI says

Down and out with Cerebras Code

Qwen Code is good but not great

Alibaba’s New Speech Recognition Model Pushes Accuracy But Keeps Weights Closed

Why Qwen3 Next Is the Most Efficient AI Model Yet

Jennifer Packer and Marie Watt Win $250,000 Heinz Award

KAWS Named Uniqlo’s First Artist-in-Residence

Sylvester Stallone Owns Works by Warhol, Condo, and Other Art Stars

LA Louver Gallery to Shutter Venice Gallery After 50 Years