Skip to content

Atlas Accelerator from Positron AI outperforms Nvidia H200 in inference, requiring only 33% of the power. It can process 280 tokens per second per user with Llama 3.1 8B model within a 2000W energy limit.

Cloudflare is trial-testing Positron AI's Atlas machine, a power-efficient inference-only solution, which supposedly surpasses Nvidia's H200 DGX in performance while utilizing only a third of the power.

Atlas accelerator from Positron AI is claimed to outperform Nvidia H200 in inference, doing so with...
Atlas accelerator from Positron AI is claimed to outperform Nvidia H200 in inference, doing so with only 33% of the power consumption. This device is reported to deliver a rate of 280 tokens per second per user when utilizing Llama 3.1 8B under a 2000W power envelope.

Atlas Accelerator from Positron AI outperforms Nvidia H200 in inference, requiring only 33% of the power. It can process 280 tokens per second per user with Llama 3.1 8B model within a 2000W energy limit.

Positron AI, a US-based company specialising in AI accelerators, is making waves in the AI industry with its latest innovation, the Atlas solution. This custom AI inference accelerator significantly outperforms Nvidia's DGX H200 system in terms of power consumption and efficiency.

Atlas, Positron AI's flagship product, delivers around 280 tokens per second per user at 2000W, while Nvidia's 8-way DGX H200 system achieves about 180 tokens per second per user but consumes 5900W. This translates to Atlas having roughly 3 to 4.5 times better performance per watt and about 3 times better value for money compared to the Nvidia system.

One of the key features of Atlas is its highly efficient memory usage. It uses over 93% of its memory bandwidth, compared to a typical GPU's 10-30%, enabling about 70% faster throughput while consuming two-thirds less power than Nvidia’s top chips.

Atlas is built specifically for AI inference tasks, optimising power consumption and throughput. Unlike Nvidia GPUs that target diverse workloads including training, Atlas is designed from the ground up for inference tasks. It is also compatible with all Hugging Face transformer models and provides an OpenAI-compatible API for easy integration with existing AI workflows.

The ASIC chips in Atlas are produced at TSMC’s Fab 21 in Arizona using advanced N4/N5 processes, and the system is assembled in the USA, emphasising its domestic supply chain significance. This manufacturing strategy could potentially reduce data center capital expenditure (CapEx) by about 50% compared to Nvidia H100/H200 systems due to its energy efficiency and architectural design.

Hyperscale cloud service provider Cloudflare is currently testing Positron AI's Atlas solution for AI inference. The Titan, based on eight Asimov AI accelerators with 16 GB of memory in total, can run models with up to 16 trillion parameters on a single machine. Asimov, Positron AI's 2nd Generation AI inference accelerator, is expected to compete against Nvidia's Vera Rubin platforms in 2026.

As the power demands of the AI industry continue to escalate, with some AI model training clusters consuming the same power as cities, solutions like Atlas that offer efficient inference capabilities and minimal power consumption are becoming increasingly important.

| Feature | Positron AI Atlas | Nvidia DGX H200 | |------------------------------|--------------------------------|----------------------------------| | Tokens per second per user | ~280 | ~180 | | Power consumption | 2000W | 5900W | | Performance per watt | ~3 to 4.54 times higher | Baseline | | Memory bandwidth utilization | >93% | Typically 10-30% | | Supported models | All Hugging Face transformers | Wide range, general-purpose GPUs | | Cost efficiency | 3.08x better value (performance-per-dollar) | Baseline |

In summary, Positron AI's Atlas offers a specialized AI inference solution with substantially better power efficiency and performance per watt than Nvidia's H200, making it a compelling alternative for data centers focused on AI inference workloads.

[1] Positron AI press release

[2] Positron AI whitepaper

[3] Positron AI blog post

[4] TechCrunch article

The Positron AI Atlas, a US-based company's flagship product, outperforms Nvidia's DGX H200 system in terms of power consumption and efficiency, delivering around 280 tokens per second per user at 2000W, compared to about 180 tokens per second per user for the Nvidia system, which consumes 5900W.

Atlas, an AI inference accelerator by Positron AI, uses over 93% of its memory bandwidth, while a typical GPU uses 10-30%, enabling faster throughput and consuming two-thirds less power than Nvidia’s top chips.

Read also:

    Latest