(Bloomberg) -- The Defense Department’s top artificial intelligence official said the agency needs to know more about AI tools before it fully commits to using the technology and urged developers to be more transparent.
Craig Martell, the Pentagon’s chief digital and artificial intelligence officer, wants companies to share insights into how their AI software is built — without forfeiting their intellectual property — so that the department can “feel comfortable and safe” adopting it.
AI software relies on large language models, or LLMs, which use massive data sets to power tools such as chatbots and image generators. The services are typically offered without showing their inner workings — in a so-called black box. That makes it hard for users to understand how the technology comes to decisions or what makes it get better or worse at its job over time.
“We’re just getting the end result of the model-building — that’s not sufficient,” Martell said in an interview. The Pentagon has no idea how models are structured or what data has been used, he said.
Read More: How Large Language Models Work, Making Chatbots Lucid
Companies also aren’t explaining what dangers their systems could pose, Martell said.
“They’re saying: ‘Here it is. We’re not telling you how we built it. We’re not telling you what it’s good or bad at. We’re not telling you whether it’s biased or not,’” he said.
He described such models as the equivalent of “found alien technology” for the Defense Department. He’s also concerned that only a few groups of people have enough money to build LLMs. Martell didn’t identify any companies by name, but Microsoft Corp., Alphabet Inc.’s Google and Amazon.com Inc. are among those developing LLMs for the commercial market, along with startups OpenAI and Anthropic.
Martell is inviting industry and academics to Washington in February to address the concerns. The Pentagon’s symposium on defense data and AI aims to figure out what jobs LLMs may be suitable to handle, he said.
Martell’s team, which is already running a task force to assess LLMs, has already found 200 potential uses for them within the Defense Department, he said.
“We don’t want to stop large language models,” he said. “We just want to understand the use, the benefits, the dangers and how to mitigate against them.”
There is “a large upswell” within the department of people who would like to use LLMs, Martell said. But they also recognize that if the technology hallucinates — the term for when AI software fabricates information or delivers an incorrect result, which is not uncommon — they are the ones that must take responsibility for it.
He hopes the February symposium will help build what he called “a maturity model” to establish benchmarks relating to hallucination, bias and danger. While it might be acceptable for the first draft of a report to include AI-related mistakes — something a human could later weed out — those errors wouldn’t be acceptable in riskier situations, such as information that’s needed to make operational decisions.
A classified session at the three-day February event will focus on how to test and evaluate models, and protect against hacking.
Martell said his office is playing a consulting role within the Defense Department, helping different groups figure out the right way to measure the success or failure of their systems. The agency has more than 800 AI projects underway, some of them involving weapons systems.
Given the stakes involved, the Pentagon will apply a higher bar for how it uses algorithmic models than the private sector, he said.
“There’s going to be lots of use cases where lives are on the line,” he said. “So allowing for hallucination or whatever we want to call it — it’s just not going to be acceptable.”
©2023 Bloomberg L.P.