Generative AI Has an Intellectual Property Problem

Strategies to help companies mitigate the legal risk and ensure they’re in compliance with the law.

by

and

David A. Schweidel

by

and

David A. Schweidel

April 07, 2023

HBR Staff/Pexels

Summary.

Generative AI, which uses data lakes and question snippets to recover patterns and relationships, is becoming more prevalent in creative industries. However, the legal implications of using generative AI are still unclear, particularly in relation to copyright infringement, ownership of AI-generated works, and unlicensed content in training data. Courts are currently trying to establish how intellectual property laws should be applied to generative AI, and several cases have already been filed. To protect themselves from these risks, companies that use generative AI need to ensure that they are in compliance with the law and take steps to mitigate potential risks, such as ensuring they use training data free from unlicensed content and developing ways to show provenance of generated content.

Leer en español
Ler em português

Generative AI can seem like magic. Image generators such as Stable Diffusion, Midjourney, or DALL·E 2 can produce remarkable visuals in styles from aged photographs and water colors to pencil drawings and Pointillism. The resulting products can be fascinating — both quality and speed of creation are elevated compared to average human performance. The Museum of Modern Art in New York hosted an AI-generated installation generated from the museum’s own collection, and the Mauritshuis in The Hague hung an AI variant of Vermeer’s Girl with a Pearl Earring while the original was away on loan.

The capabilities of text generators are perhaps even more striking, as they write essays, poems, and summaries, and are proving adept mimics of style and form (though they can take creative license with facts).

While it may seem like these new AI tools can conjure new material from the ether, that’s not quite the case. Generative AI platforms are trained on data lakes and question snippets — billions of parameters that are constructed by software processing huge archives of images and text. The AI platforms recover patterns and relationships, which they then use to create rules, and then make judgments and predictions, when responding to a prompt.

This process comes with legal risks, including intellectual property infringement. In many cases, it also poses legal questions that are still being resolved. For example, does copyright, patent, trademark infringement apply to AI creations? Is it clear who owns the content that generative AI platforms create for you, or your customers? Before businesses can embrace the benefits of generative AI, they need to understand the risks — and how to protect themselves.

Where Generative AI Fits into Today’s Legal Landscape

Though generative AI may be new to the market, existing laws have significant implications for its use. Now, courts are sorting out how the laws on the books should be applied. There are infringement and rights of use issues, uncertainty about ownership of AI-generated works, and questions about unlicensed content in training data and whether users should be able to prompt these tools with direct reference other creators’ copyrighted and trademarked works by name without their permission.

These claims are already being litigated. In a case filed in late 2022, Andersen v. Stability AI et al., three artists formed a class to sue multiple generative AI platforms on the basis of the AI using their original works without license to train their AI in their styles, allowing users to generate works that may be insufficiently transformative from their existing, protected works, and, as a result, would be unauthorized derivative works. If a court finds that the AI’s works are unauthorized and derivative, substantial infringement penalties can apply.

Similar cases filed in 2023 bring claims that companies trained AI tools using data lakes with thousands — or even many millions — of unlicensed works. Getty, an image licensing service, filed a lawsuit against the creators of Stable Diffusion alleging the improper use of its photos, both violating copyright and trademark rights it has in its watermarked photograph collection.

In each of these cases, the legal system is being asked to clarify the bounds of what is a “derivative work” under intellectual property laws — and depending upon the jurisdiction, different federal circuit courts may respond with different interpretations. The outcome of these cases is expected to hinge on the interpretation of the fair use doctrine, which allows copyrighted work to be used without the owner’s permission “for purposes such as criticism (including satire), comment, news reporting, teaching (including multiple copies for classroom use), scholarship, or research,” and for a transformative use of the copyrighted material in a manner for which it was not intended.

This isn’t the first time technology and copyright law have crashed into each other. Google successfully defended itself against a lawsuit by arguing that transformative use allowed for the scraping of text from books to create its search engine, and for the time being, this decision remains precedential.

But there are other, non-technological cases that could shape how the products of generative AI are treated. A case before the U.S. Supreme Court against the Andy Warhol Foundation — brought by photographer Lynn Goldsmith, who had licensed an image of the late musician, Prince — could refine U.S. copyright law on the issue of when a piece of art is sufficiently different from its source material to become unequivocally “transformative,” and whether a court can consider the meaning of the derivative work when it evaluates that transformation. If the court finds that the Warhol piece is not a fair use, it could mean trouble for AI-generated works.

All this uncertainty presents a slew of challenges for companies that use generative AI. There are risks regarding infringement — direct or unintentional — in contracts that are silent on generative AI usage by their vendors and customers. If a business user is aware that training data might include unlicensed works or that an AI can generate unauthorized derivative works not covered by fair use, a business could be on the hook for willful infringement, which can include damages up to $150,000 for each instance of knowing use. There’s also the risk of accidentally sharing confidential trade secrets or business information by inputting data into generative AI tools.

Mitigating Risk and Building a Way Forward

This new paradigm means that companies need to take new steps to protect themselves for both the short and long term.

AI developers, for one, should ensure that they are in compliance with the law in regards to their acquisition of data being used to train their models. This should involve licensing and compensating those individuals who own the IP that developers seek to add to their training data, whether by licensing it or sharing in revenue generated by the AI tool. Customers of AI tools should ask providers whether their models were trained with any protected content, review the terms of service and privacy policies, and avoid generative AI tools that cannot confirm that their training data is properly licensed from content creators or subject to open-source licenses with which the AI companies comply.

Developers

In the long run, AI developers will need to take initiative about the ways they source their data — and investors need to know the origin of the data. Stable Diffusion, Midjourney and others have created their models based on the LAION-5B dataset, which contains almost six billion tagged images compiled from scraping the web indiscriminately, and is known to include substantial number of copyrighted creations.

Stability.AI, which developed Stable Diffusion, has announced that artists will be able to opt out of the next generation of the image generator. But this puts the onus on content creators to actively protect their IP, rather than requiring the AI developers to secure the IP to the work prior to using it — and even when artists opt out, that decision will only be reflected in the next iteration of the platform. Instead, companies should require the creator’s opt-in rather opt-out.

Developers should also work on ways to maintain the provenance of AI-generated content, which would increase transparency about the works included in the training data. This would include recording the platform that was used to develop the content, details on the settings that were employed, tracking of seed-data’s metadata, and tags to facilitate AI reporting, including the generative seed, and the specific prompt that was used to create the content. Such information would not only allow for the reproduction of the image, allowing its veracity to be verified easily, but it would also speak to the user’s intent, thereby protecting business users that may need to overcome intellectual property infringement claims, as well as demonstrate that the output was not due to willful intent to copy or steal.

Developing these audit trails would assure companies are prepared if (or, more likely, when) customers start including demands for them in contracts as a form of insurance that the vendor’s works aren’t willfully, or unintentionally, derivative without authorization. Looking further into the future, insurance companies may require these reports in order to extend traditional insurance coverages to business users whose assets include AI-generated works. Breaking down the contributions of individual artists who were included in the training data to produce an image would further support efforts to appropriately compensate contributors, and even embed the copyright of the original artist in the new creation.

Creators

Both individual content creators and brands that create content should take steps to examine risk to their intellectual property portfolios and protect them. This involves proactively looking for their work in compiled datasets or large-scale data lakes, including visual elements such as logos and artwork and textual elements, such as image tags. Obviously, this could not be done manually through terabytes or petabytes of content data, but existing search tools should allow the cost-effective automation of this task. New tools can even promise obfuscation from these algorithms.

Content creators actively should monitor digital and social channels for the appearance of works that may be derived from their own. For brands with valuable trademarks to protect, it’s not simply a matter of looking for specific elements such as the Nike Swoosh or Tiffany Blue. Rather, there may be a need for trademark and trade dress monitoring to evolve in order to examine the style of derivative works, which may have arisen from being trained on a specific set of a brand’s images. Even though critical elements such as a logo or specific color may not be present in an AI-generated image, other stylistic elements may suggest that salient elements of a brand’s content were used to produce a derivative work. Such similarities may suggest the intent to appropriate the average consumer’s goodwill for the brand by using recognizable visual or auditory elements. Mimicry may be seen as the sincerest form of flattery, but it also can suggest the purposeful misuse of a brand.

The good news regarding trademark infringement for business owners is that trademark attorneys have well-established how to notify and enforce trademark rights against an infringer, such as by sending strongly worded cease-and-desist notice or licensing demand letter, or moving directly to filing a trademark infringement claim, regardless of whether an AI platform generated the unauthorized branding, or a human did.

Businesses

Businesses should evaluate their transaction terms to write protections into contracts. As a starting point, they should demand terms of service from generative AI platforms that confirm proper licensure of the training data that feed their AI. They should also demand broad indemnification for potential intellectual property infringement caused by a failure of the AI companies to properly license data input or self-reporting by the AI itself of its outputs to flag for potential infringement.

At minimum, businesses should add disclosures in their vendor and customer agreements (for custom services and products delivery), if either party is using generative AI to ensure that intellectual property rights are understood and protected on both sides of the table as well as how each party will support registration of authorship and ownership of those works. Vendor and customer contracts can include AI-related language added to confidentiality provisions in order to bar receiving parties from inputting confidential information of the information-disclosing parties into text prompts of AI tools.

Some leading firms have created generative AI check lists for contract modifications for their clients that assess each clause for AI implications in order to reduce unintended risks of use. Organizations that use generative AI, or work with vendors that do, should keep their legal counsel abreast of the scope and nature of that use as the law will continue to evolve rapidly.

• • •

Going forward, content creators that have a sufficient library of their own intellectual property upon which to draw may consider building their own datasets to train and mature AI platforms. The resulting generative AI models need not be trained from scratch but can build upon open-source generative AI that has used lawfully sourced content. This would enable content creators to produce content in the same style as their own work with an audit trail to their own data lake, or to license the use of such tools to interested parties with cleared title in both the AI’s training data and its outputs. In this same spirit, content creators that have developed an online following may consider co-creation with followers as another means by which to source training data, recognizing that these co-creators should be asked for their permission to make use of their content in terms of service and privacy policies that are updated as the law changes.

Generative AI will change the nature of content creation, enabling many to do what, until now, only a few had the skills or advanced technology to accomplish at high speed. As this burgeoning technology develops, users must respect the rights of those who have enabled its creation – those very content creators who may be displaced by it. And while we understand the real threat of generative AI to part of the livelihood of members of the creative class, it also poses a risk to brands that have used visuals to meticulously craft their identity. At the same time both creatives and corporate interests have a dramatic opportunity to build portfolios of their works and branded materials, meta-tag them, and train their own generative-AI platforms that can produce authorized, proprietary, (paid-up or royalty-bearing) goods as sources of instant revenue streams.

GA

Gil Appel is an Assistant Professor of Marketing at the GW School of Business. His research uncovers insights driven by consumer interactions with digital technologies, such as big data, social media, NFTs, and AI.
JN

Juliana Neelbauer is a partner at Fox Rothschild LLP in the corporate, intellectual property, emerging markets, and entertainment and sports law groups. She is a lecturer at the University of Maryland and Georgetown University regarding securities law, negotiations, digital assets, and business law.
DS

David A. Schweidel is Rebecca Cheney McGreevy Endowed Chair and Professor of Marketing at Emory University’s Goizueta Business School. His research focuses on consumer interactions with technology, and how this shapes marketing practice.

Generative AI Has an Intellectual Property Problem

Where Generative AI Fits into Today’s Legal Landscape

Mitigating Risk and Building a Way Forward

Developers

Creators

Businesses

Partner Center

Explore HBR

HBR Store

About HBR

Manage My Account

Follow HBR