After months of rumors and speculation, OpenAI has announced GPT-4: the latest in its line of AI language models that power applications like ChatGPT and the new Bing.
The company claims the model is “more creative and
collaborative than ever before” and “can solve difficult problems with greater
accuracy.” It can parse both text and image input, though it can only respond
via text. OpenAI also cautions that
the systems retain many of the same problems as earlier language models,
including a tendency to make up information (or “hallucinate”) and the capacity
to generate violent and harmful text.
OpenAI says it’s
already partnered with a number of companies to integrate GPT-4 into their
products, including Duolingo, Stripe . The new model is available to the
general public via ChatGPT Plus, OpenAI’s
$20 monthly ChatGPT subscription, and is powering Microsoft’s Bing chatbot. It
will also be accessible as an API for
developers to build on. (There is a waitlist here, which OpenAI says will start admitting users
today.)
In a research blog post, OpenAI said the distinction between GPT-4 and its predecessor
GPT-3.5 is “subtle” in casual conversation (GPT-3.5 is the model that powers
ChatGPT). OpenAI CEO Sam Altman
tweeted that GPT-4 “is still flawed, still limited” but that it also “still
seems more impressive on first use than it does after you spend more time with
it.”
The company says GPT-4’s improvements are evident in the
system’s performance on a number of tests and benchmarks, including the Uniform
Bar Exam, LSAT, SAT Math, and SAT Evidence-Based Reading & Writing exams.
In the exams mentioned, GPT-4 scored in the 88th percentile and above, and a
full list of exams and the system’s scores can be seen here.
Speculation about GPT-4 and its capabilities have been rife
over the past year, with many suggesting it would be a huge leap over previous
systems. However, judging from OpenAI’s
announcement, the improvement is more iterative, as the company previously
warned.
“People are begging to be disappointed and they will be,”
said Altman in an interview about GPT-4 in January. “The hype is just like...
We don’t have an actual AGI and that’s sort of what’s expected of us.”
The rumor mill was further energized last week after a
Microsoft executive let slip that the system would launch this week in an
interview with the German press. The executive also suggested the system would
be multi-modal — that is, able to generate not only text but other mediums.
Many AI researchers believe that multi-modal systems that integrate text,
audio, and video offer the best path toward building more capable AI systems.
GPT-4 is indeed multimodal, but in fewer mediums than some
predicted. OpenAI says the system can
accept both text and image inputs and emit text outputs. The company says the
model’s ability to parse text and image simultaneously allows it to interpret
more complex input. In the samples below, you can see the system explaining
memes and unusual images:
It’s been a long journey to get to GPT-4, with OpenAI — and AI language models in general
— building momentum slowly over several years before rocketing into the
mainstream in recent months.
The original research paper describing GPT was published in
2018, with GPT-2 announced in 2019 and GPT-3 in 2020. These models are trained
on huge datasets of text, much of it scraped from the internet, which is mined
for statistical patterns. These patterns are then used to predict what word
follows another. It’s a relatively simple mechanism to describe, but the end
result is flexible systems that can generate, summarize, and rephrase writing,
as well as perform other text-based tasks like translation or generating code.
OpenAI originally
delayed the release of its GPT models for fear they would be used for malicious
purposes like generating spam and misinformation. But in late 2022, the company
launched ChatGPT — a conversational chatbot based on GPT-3.5 that anyone could
access. ChatGPT’s launch triggered a frenzy in the tech world, with Microsoft
soon following it with its own AI chatbot Bing (part of the Bing search engine)
and Google scrambling to catch up.
As predicted, the wider availability of these AI language
models has created problems and challenges. The education system is still
adapting to the existence of software that writes respectable college essays;
online sites like Stack Overflow and sci-fi magazine Clarkesworld have had to
close submissions due to an influx of AI-generated content; and early uses of
AI writing tools in journalism have been rocky at best. But, some experts have
argued that the harmful effects have still been less than anticipated.
In its announcement of GPT-4, OpenAI stressed that the system had gone through six months of
safety training, and that in internal tests, it was “82 percent less likely to
respond to requests for disallowed content and 40 percent more likely to
produce factual responses than GPT-3.5.”
However, that doesn’t mean the system doesn’t make mistakes
or output harmful content. For example, Microsoft revealed that its Bing
chatbot has been powered by GPT-4 all along, and many users were able to break
Bing’s guardrails in all sorts of creative ways, getting the bot to offer
dangerous advice, threaten users, and make up information. GPT-4 also still
lacks knowledge about events “that have occurred after the vast majority of its
data cuts off” in September 2021
No comments:
Post a Comment