All about technology. — All about artificial intelligence.

Atlas Accelerator from Positron AI outperforms Nvidia H200 in inference, requiring only 33% of the power. It can process 280 tokens per second per user with Llama 3.1 8B model within a 2000W energy limit.

Cloudflare is trial-testing Positron AI's Atlas machine, a power-efficient inference-only solution, which supposedly surpasses Nvidia's H200 DGX in performance while utilizing only a third of the power.

, and Administrator

2025 July 31 . 5:16 PM

2 min read

Atlas accelerator from Positron AI is claimed to outperform Nvidia H200 in inference, doing so with... — Atlas accelerator from Positron AI is claimed to outperform Nvidia H200 in inference, doing so with only 33% of the power consumption. This device is reported to deliver a rate of 280 tokens per second per user when utilizing Llama 3.1 8B under a 2000W power envelope.

Atlas Accelerator from Positron AI outperforms Nvidia H200 in inference, requiring only 33% of the power. It can process 280 tokens per second per user with Llama 3.1 8B model within a 2000W energy limit.

Positron AI, a US-based company specialising in AI accelerators, is making waves in the AI industry with its latest innovation, the Atlas solution. This custom AI inference accelerator significantly outperforms Nvidia's DGX H200 system in terms of power consumption and efficiency.

Atlas, Positron AI's flagship product, delivers around 280 tokens per second per user at 2000W, while Nvidia's 8-way DGX H200 system achieves about 180 tokens per second per user but consumes 5900W. This translates to Atlas having roughly 3 to 4.5 times better performance per watt and about 3 times better value for money compared to the Nvidia system.

One of the key features of Atlas is its highly efficient memory usage. It uses over 93% of its memory bandwidth, compared to a typical GPU's 10-30%, enabling about 70% faster throughput while consuming two-thirds less power than Nvidia’s top chips.

Atlas is built specifically for AI inference tasks, optimising power consumption and throughput. Unlike Nvidia GPUs that target diverse workloads including training, Atlas is designed from the ground up for inference tasks. It is also compatible with all Hugging Face transformer models and provides an OpenAI-compatible API for easy integration with existing AI workflows.

The ASIC chips in Atlas are produced at TSMC’s Fab 21 in Arizona using advanced N4/N5 processes, and the system is assembled in the USA, emphasising its domestic supply chain significance. This manufacturing strategy could potentially reduce data center capital expenditure (CapEx) by about 50% compared to Nvidia H100/H200 systems due to its energy efficiency and architectural design.

Hyperscale cloud service provider Cloudflare is currently testing Positron AI's Atlas solution for AI inference. The Titan, based on eight Asimov AI accelerators with 16 GB of memory in total, can run models with up to 16 trillion parameters on a single machine. Asimov, Positron AI's 2nd Generation AI inference accelerator, is expected to compete against Nvidia's Vera Rubin platforms in 2026.

As the power demands of the AI industry continue to escalate, with some AI model training clusters consuming the same power as cities, solutions like Atlas that offer efficient inference capabilities and minimal power consumption are becoming increasingly important.

| Feature | Positron AI Atlas | Nvidia DGX H200 | |------------------------------|--------------------------------|----------------------------------| | Tokens per second per user | ~280 | ~180 | | Power consumption | 2000W | 5900W | | Performance per watt | ~3 to 4.54 times higher | Baseline | | Memory bandwidth utilization | >93% | Typically 10-30% | | Supported models | All Hugging Face transformers | Wide range, general-purpose GPUs | | Cost efficiency | 3.08x better value (performance-per-dollar) | Baseline |

In summary, Positron AI's Atlas offers a specialized AI inference solution with substantially better power efficiency and performance per watt than Nvidia's H200, making it a compelling alternative for data centers focused on AI inference workloads.

[1] Positron AI press release

[2] Positron AI whitepaper

[3] Positron AI blog post

[4] TechCrunch article

The Positron AI Atlas, a US-based company's flagship product, outperforms Nvidia's DGX H200 system in terms of power consumption and efficiency, delivering around 280 tokens per second per user at 2000W, compared to about 180 tokens per second per user for the Nvidia system, which consumes 5900W.

Atlas, an AI inference accelerator by Positron AI, uses over 93% of its memory bandwidth, while a typical GPU uses 10-30%, enabling faster throughput and consuming two-thirds less power than Nvidia’s top chips.

Latest

Bloggers face legal action for accused libel

All about technology.

Bloggers face legal action for alleged libelous content

Nio Initiates Legal Action Against Multiple Bloggers, Including a Notable Li Auto Advocate, Accusing Them of Libelous Claims Against the Company

, and Administrator

2025 August 31

Delivers over 4,000 units within ten days of Onvo L90's launch

Industry

Over 4,000 units of Onvo L90 sold within ten days of their release

Last week, Onvo L90 registered 2,093 insurance policies, placing it second in China's sales of large SUVs, surpassed by the Aito M8 EREV, which boasted 3,690 units. This despite the latter being supported by Huawei.

, and Administrator

2025 August 31

Artificial Intelligence Envelops Global Sphere

All about technology.

Artificial Intelligence Taking Over Every Sector Globally

AI, intelligently utilized, holds immense power to free up vast amounts of human effort and skill.

, and Administrator

2025 August 31

Unveiling innovations in propulsion: A glimpse at the potential and hurdles of magnetic engines

Science

Unveiling advancements in magnetism: Exploring the potential and obstacles of motor technology using magnets

The Enigmatic Realm of Magnetism: Ever since its discovery, scientists and innovators around the world have been captivated, leading to groundbreaking advancements such as the magnetic compass, electric motors, and many more.

, and Administrator

2025 August 31

Atlas Accelerator from Positron AI outperforms Nvidia H200 in inference, requiring only 33% of the power. It can process 280 tokens per second per user with Llama 3.1 8B model within a 2000W energy limit.

Atlas Accelerator from Positron AI outperforms Nvidia H200 in inference, requiring only 33% of the power. It can process 280 tokens per second per user with Llama 3.1 8B model within a 2000W energy limit.

Read also:

Related

Latest