Artificial intelligence (AI) is in that stage of a new technology development when excitement about it is matched by alarm and uncertainty.

Most of the FUD — fear, uncertainty and doubt — about AI has been about the prospect that machines will take away people's jobs.

But that's a distraction from the immediate challenge over the ownership of information that's being used to train generative language programs, such as ChatGPT, that are the first large-scale manifestations of AI.

This is a major issue for the media industry, of course. Eagan-based West Publishing, the unit of Thomson Reuters that is one of the nation's largest legal publishers, is the plaintiff in an AI-related lawsuit that the entire industry is watching. The firm alleges that Ross Intelligence has unfairly used large portions of its legal database to build an AI platform.

But it's also an issue for most corporations and consumers, who may not realize that as they use chatbots, they could be giving up information with financial value or that they may wish to keep private.

Since its release last fall, ChatGPT quickly became one of the most popular apps ever created. Investors poured money into companies that make key components for AI systems.

And all of Silicon Valley's heavyweights are betting big on AI. Microsoft jumped first by taking a sizable stake in OpenAI and reshaping its Bing search engine around generative language technology. Google quickly scrambled, wary of losing its dominant grip on search. Apple just last week announced its own foray into an AI product.

But earlier this month, the Federal Trade Commission announced it was investigating OpenAI, maker of ChatGPT, for possibly running afoul of laws designed to protect consumers' data. That came after Italy briefly banned ChatGPT over privacy concerns.

The issue is that the algorithmic models that generate the output on ChatGPT (and other new services like it) also consume the input. In other words, OpenAI uses the information to improve the model, which they hope will make the output better in the future.

For instance, an accountant may place some financial tables into a chatbot to generate an essay that describes what they show. In doing so, the data will stay with the chatbot, which may use it to answer another person's question about the accountant's company.

To get a grip on that risk, the FTC ordered OpenAI to explain, among other things, "the process of retraining a Large Language Model to create a substantially new version of the model."

Some businesses are learning this the hard way. Samsung Electronics forbade the use of ChatGPT after a chip engineer uploaded some software code to it, looking for help diagnosing problems with it. It feared the code was available to other users of the AI platform.

OpenAI has said that ChatGPT has no information in it that's newer than September 2021. However, some of that limitation is overcome by using a "browsing" feature that has access to the internet, where the language model can get news and more recent data.

"It's a case of that old adage of 'if something is free, then you are the product,'" said Matthew Kraft, an insider risk adviser at Code42, a Minneapolis-based firm that provides data-security services to businesses.

"It works the same way with generative AI because they want the information that we're inputting to help train the model," Kraft said.

In May, he wrote an essay for Code42's customers that outlined the risk. The company added AI models to the list of places it helps companies monitor for data leaks. Many corporate IT departments recognized the problem right away, of course.

But one of the cases that Code42 addressed was for doctors using ChatGPT to more quickly write up information to be sent to a patient's insurer. Putting a patient's name into the public version of the chatbot could violate medical privacy laws.

"If they copy a patient's medical file and try to paste that in an untrusted location, we can block that," Kraft said. "But we can also not block it and simply let the hospital's data-security team know that someone uploaded a file and it's something they may want to look at."

I've spent most of my career writing about technology companies and their innovations, and I tend to be an optimist about the way they ultimately work out.

And we've all grown accustomed to the two-way street between ourselves and providers of digital services, from global search engines to local retailers. Even the Star Tribune, with users' permission, monitors what they read to provide a "news for you" feature.

But the risk in generative language technology is on a new level and something for all of us to watch with caution.