The Chicago Journal

ChatGPT: performance has fallen following several updates

ChatGPT: performance has fallen following several updates
Image Commercially Licensed from: DepositPhotos

ChatGPT — OpenAI startled the world in late 2022 when it released ChatGPT, a revolutionary AI language model. The AI service’s success paved the way for a one-of-a-kind AI race, with hundreds of tech companies vying to emulate it.

Despite the criticism, OpenAI has upgraded the service, making the language model as error-free as possible. After a few tweaks, ChatGPT appears to have found its stride.

The most recent version of the AI language model pioneer caused a cryptocurrency increase, prompting calls to halt research. According to fresh study, the AI bots may have met a fault, resulting in a decline.

Read also: FedNow is set to debut later this month

The ChatGPT study

Stanford and UC Berkeley researchers conducted a thorough examination of several versions of ChatGPT between March and June 2022. They devised demanding criteria to evaluate the chatbot’s ability to do coding, arithmetic, and visual thinking tasks. The performance of ChatGPT was poor.

The results of the testing indicated a concerning drop in performance between the versions examined. ChatGPT properly answered 488 of 500 questions on prime numbers during a March math challenge, obtaining a 97.6% accuracy rate. The proportion had dropped to 2.4% by June, with only 12 questions properly answered.

The drop in quality was not apparent until the chatbot’s software development skills were put to the test.

“For GPT-4, the percentage of generations that are directly executable dropped from 52.0% in March to 10.0% in June,” the study said.

The results were obtained by running the models in their most basic version, without the use of any code interpreter plugins.

The researchers employed visual indicators and a dataset from the Abstract Reasoning Corpus for reasoning. There was a decrease, although not as big as in math and coding.

“GPT-4 in June made mistakes on queries on which it was correct for in March,” the study said.

Possible reasons for the decline

The drop was unexpected, leading the question, “What could explain ChatGPT’s painfully obvious downgrades in recent months?” One possible explanation is that it was a result of OpenAI’s breakthroughs.

Another likely explanation is that the changes were made to prevent ChatGPT from responding to potentially dangerous enquiries. However, the alignment for safety may restrict ChatGPT’s utility for other reasons.

The method, according to the researchers, produces wordy, indirect solutions rather than plain ones.

According to AI researcher Santiago Valderrama, “GPT-4 is getting worse over time, not better.” He also believes that the original ChatGPT architecture has been replaced by a cheaper, faster mix of models.

“Rumors suggest they are using several smaller and specialized GPT-4 models that act similarly to a large model but are less expensive to run,” he noted. 

According to Valderrama, while smaller models may deliver faster responses, they do so at the cost of a lack of data.

“There are hundreds (maybe thousands already?) of replies from people saying they have noticed the degradation in quality,” Valderrama continued. “Browse the comments, and you’ll read about many situations where GPT-4 is not working as before.”

Other insights

Another AI researcher, Dr. Jim Fan, tweeted about some of his discoveries after attempting to link the disparities in the data. According to Fan, they were compared to how OpenAI built its models.

“Unfortunately, more safety typically comes at the cost of less usefulness, leading to a possible degrade in cognitive skills,” he wrote.

“My guess (no evidence, just speculation) is that OpenAI spent the majority of efforts doing lobotomy from March to June, and didn’t have time to fully recover the other capabilities that matter.”

Fan went on to claim that the safety alignment added unnecessary length to the code by presenting meaningless information outside of prompts.

“I believe this is a side effect of safety alignment,” he offered. “We’ve all seen GPTs add warnings, disclaimers, and back-pedaling.”

Fans blamed ChatGPT’s demise on cost-cutting efforts, as well as the addition of disclaimers and warnings. A lack of significant community feedback might possibly have played a role. Despite the need for more testing, the findings validated users’ fears concerning the decrease of ChatGPT’s once-highly regarded outputs.

Proponents have urged for open-source alternatives such as Meta’s LLaMA, which allows for community debugging, to minimize future degeneration. They also emphasized the need of continuous benchmarking in discovering regressions.

Meanwhile, ChatGPT aficionados should temper their expectations because the quality of the AI chatbot’s unique language model appears to have dropped.