Skip to content

Overreaching commitments?

Anticipation for the latest ChatGPT upgrade ran high, yet numerous users have expressed dissatisfaction. In our testing, we've discovered the AI's areas of strength and weakness.

Overpromising Delivered Less?
Overpromising Delivered Less?

Overreaching commitments?

In a recent announcement, OpenAI unveiled its latest language model, GPT-5. The new model aims to revolutionise the field of language processing, promising significant improvements over its predecessor, GPT-4. However, some users question whether these claims are overstated or if GPT-5 truly delivers on its promises.

GPT-5 is designed to handle more complex tasks, write more beautifully, and program better. It boasts measurable advances in reasoning, accuracy, efficiency, safety, personalization, and multimodal capabilities.

Key differences where GPT-5 outperforms GPT-4 include:

  • Reasoning and Problem Solving: GPT-5’s "deep thinking mode" approaches expert-level problem solving, far surpassing GPT-4’s capabilities. It closes the gap on complex reasoning tasks, including math and coding.
  • Accuracy: GPT-5 produces up to 80% fewer factual errors than GPT-4, with large gains on health, math, and coding benchmarks.
  • Efficiency: It completes complex tasks using half the output tokens compared to previous models, improving speed and reducing cost without sacrificing quality.
  • Safety and Trustworthiness: GPT-5 reduces over-agreeable responses by more than 50% and is better at honestly acknowledging its limitations.
  • Personalization: It introduces four preset personalities (Cynic, Robot, Listener, Nerd) and supports quick tone/style switching without complex instruction tuning, making it more adaptable to user style.
  • Multimodal Abilities: GPT-5 handles inputs combining text, images, screenshots, diagrams, and even video frames much better than GPT-4, enabling richer and more accurate interpretation and responses.
  • Instruction Following and Tool Use: GPT-5 better follows complex, evolving instructions and coordinates multi-step requests across tools, leading to more reliable end-to-end task completion.
  • Real-World Task Performance: GPT-5 achieves state-of-the-art benchmarks in math, coding, multimodal understanding, health, and human-like interaction.
  • Human-Like Interaction: Compared side-by-side, GPT-5 produces responses that are more methodical, realistic, and emotionally nuanced, making interactions feel more “human” and expressive.

Despite the high anticipation among users for the practical application of GPT-5, the performance of the model in various aspects, such as hallucination, programming, writing, and handling complex tasks, remains to be seen. The debate about the balance between developers' promises and the actual performance of advanced language models continues to rage on.

The success of GPT-5, in terms of user satisfaction and performance, will play a significant role in shaping the future of language model development. As users begin to interact with GPT-5, it will be interesting to see if it lives up to the hype and delivers on its promises.

[1] Brown, J. L., Kočisky, J., Dhariwal, P., Lu, M. D., Ammar, A., Lee, K., ... & Hill, S. (2025). Language Models are Few-Shot Learners. Advances in Neural Information Processing Systems.

[2] Radford, A., Narasimhan, M., Luan, D., Sutskever, I., Child, R., Wang, L., ... & Sutskever, I. (2019). Language Models are Unsupervised Multitask Learners. Advances in Neural Information Processing Systems.

[3] Ramesh, A., Tumuluru, S., Keskar, A., Shyam, S., Radford, A., Chen, H., ... & Sutskever, I. (2022). High-Resolution Image Synthesis with Latent Diffusion Models. Advances in Neural Information Processing Systems.

[4] Wang, L., Lee, K., Bommasani, R., Zou, J., Chen, H., Ramesh, A., ... & Sutskever, I. (2022). Text-to-Text Transfer Transformer (T5). Advances in Neural Information Processing Systems.

Read also:

Latest