In August, a blog post about personal productivity surfaced at the top of Hacker News, a website and message board well known in Silicon Valley circles that focuses on computer science and entrepreneurship.
There was nothing terribly unique about the post. It offered simple, relatively common advice. Some readers discussed its merits in comments and offered their own thoughts about how to be more productive.
A few people, however, found the post a little suspicious.
“This is either something written by GPT-3, or the human equivalent,” one person commented. “Zero substantive content, pure regurgitation.”
Another person had the same assessment: “I think this was written by GPT-3.”
It turns out that they were right. The blog post was written almost entirely by a piece of software called Generative Pre-trained Transformer 3, or GPT-3. Liam Porr, a computer science student at the University of California, Berkeley, used the new machine learning model to generate the post with the intention of fooling the public into believing it was the product of a human mind.
He wanted to see just how good GPT-3 is.
“With something like a language model, there’s not a good quantitative way to see how good it is, because language and writing is qualitative,” Porr said. “With this kind of experiment, I can concretely say 20,000 unique people came to my website and only three actually had the sense to say it was written by a robot.”
GPT-3 is not the first natural language program of its kind, but it has already gotten widespread attention for how good it is at mimicking simple human writing. But its release into the world, while not entirely public, has caused some concern that it could be used to quickly and cheaply generate misinformation or propaganda. While it was a harmless experiment, Porr’s post offered a concrete example of the risk.
And it adds GPT-3 to other pieces of advanced software that have been disseminated through the internet and have caused alarm. Deepfake technology, which can make doctored videos of people, has become common enough to spur congressional hearings.
But the technology has also been warmly welcomed by some technologists who are already using it to automate parts of their operations.
OpenAI, an artificial intelligence research lab, announced GPT-3 in July. To simulate human language, the autocomplete model was trained on a massive dataset — 60 million domains on the internet and the sites they navigate to, as well as other sites and text the researchers spoon-fed it.
The program can’t think for itself. Instead, it can take a simple thought from a person and guess what will come next.
People interested in experimenting with the language generator can request access from the research lab. Porr himself said it was pretty easy for him to gain access thanks to his connections to his college machine-learning community. OpenAI has also made GPT-3’s API commercially available.
Based on the numerous use cases popping up online, everyone from hobbyists to machine learning experts to a friend of a friend of a machine learning expert hasn’t had too much trouble gaining access to the simple yet powerful piece of tech.
Francis Jervis, founder of the automated luxury tenants rights organization Augrented, is one of those who gained early access to GPT-3.
Jervis uses the tool to help tenants automate eviction forms to landlords. Renters are able to enter four to five bullet points arguing against their eviction, and GPT-3 will generate a paragraph that fits into a negotiation letter template. Jervis notes that users should still fact-check the text that the language model generates.
“It’ll occasionally add in creative details, which might not be 100 percent appropriate to use in this kind of context,” he said. “This is why I wouldn’t use it for DIY legal documentation.”
Qasim Munye, who is studying medicine at Kings College London, applied for access to GPT-3 as soon as it was released and was approved a few weeks later. He built the tool into his existing app, Shortly, which helps users write short stories.
“For me it’s like, I might not want to use this tech where truth is important,” he said. “A lot of the times it will be very confidently wrong, a very wrong answer but in a confident way, so I wanted a use case where truth wasn’t important, and fiction is one example of that.”
GPT-3’s potential to be profoundly inaccurate hasn’t inhibited those with access to use it as a source for knowledge and enlightenment. Learn From Anyone, a software that creates one-on-one conversations with any famous or historical figures, is powered by GPT-3.
“Ever wanted to learn about rockets from Elon Musk? How to write better from Shakespeare? Philosophy from Aristotle? GPT-3 made it possible,” founder McKay Wrigley tweeted in July.
And there aren’t just issues of validity to be considered. GPT-3 is a model trained on a breathless expanse of the internet, including nearly 500 billion words from Wikipedia, fan fiction and Reddit. As it has been repeatedly shown, the internet is rife with bias and discrimination — something that can get baked into automated systems.
Animashree Anandkumar, professor of computing at the California Institute of Technology and director of Machine Learning research at Nvidia, a computer graphics chip company, said that as a researcher working in AI, she wants to use these types of models for the benefit of humanity, and that that means rebuilding the foundations of the industry itself.
She noted that GPT-3’s use of parts of the internet to train its systems, such as Reddit, can introduce biases.
“It was certainly not a minority person who decided to use links from Reddits,” she said, adding that she has been threatened on the news aggregation site. She pointed out that a decision like that highlights the lack of diversity in the teams that build these technologies.
“There are already many examples of hiring apps discriminating against women and minorities based on indirect cues,” Anandkumar said. “If this is used as a way to generate text, it would only be generating certain types of gendered-language and not handle applications that require an unbiased approach to gender, race, religion and many other attributes.”
OpenAI declined to comment, but did address the potential harms and biases of its AI model in a blog post.
Anandkumar said that it can’t replace a human journalist because it doesn’t know the facts of the world, and cannot differentiate fact versus fiction. She said it will be a great source for fake news, and while in fiction it can be a fun way to see what the tool generates, there are still issues there.
“If you are constantly looking at stories GPT-3 generates and it’s gendered and it has sexism and racism, that furthers as a positive feedback loop,” she said.
“I worry if we are constantly exposed to AI models like GPT-3 that are biased, that further amplify the bias and continue to propagate it, what is the impact on human psychology?”