Abstract
The emergence of ChatGPT has generated much speculation in the press about
its potential to disrupt social and economic systems. Its astonishing language
ability has aroused strong curiosity among scholars about its performance in
different domains. There have been many studies evaluating the ability of
ChatGPT and GPT-4 in different tasks and disciplines. However, a comprehensive
review summarizing the collective assessment findings is lacking. The objective
of this survey is to thoroughly analyze prior assessments of ChatGPT and GPT-4,
focusing on its language and reasoning abilities, scientific knowledge, and
ethical considerations. Furthermore, an examination of the existing evaluation
methods is conducted, offering several recommendations for future research in
evaluating large language models.