Intellectual property rights over input data in AI training

1. Introduction

In the context of the Fourth Industrial Revolution, Artificial Intelligence (AI) – particularly Generative AI – has been creating remarkable technological breakthroughs. To achieve natural language processing capabilities and complex visual thinking, AI models require a massive resource: Input Data. This “training” process depends largely on collecting and analyzing billions of data units from cyberspace, including literary and artistic works, source code, and personal data. 

However, Large-scale Data Scraping activities are posing legal challenges to the traditional Intellectual Property (IP) protection system. The conflict between the need to access and use data to promote technological innovation and the issue of protecting the legitimate rights and interests of individuals and organizations holding intellectual property rights is a legal issue that needs to be resolved in recent times. 

This article aims to briefly introduce the current legal status regarding intellectual property rights over input data in the AI training process, and includes the following main contents: (a) AI training data collection activities and the relationship with intellectual property rights; (b) The status of the “legal gap” in Vietnam and related risks; (c) legal recommendations for enterprises in the context of awaiting the completion of the law. 

2. AI training data collection activities and the relationship with intellectual property rights

Technically, the process of developing Large Language Models (LLMs) often involves the use of automated tools to scan, copy, and store data from public platforms. From the perspective of Intellectual Property Law, this act directly impacts the fundamental property rights of copyright owners, such as: 

  1. Reproduction Right:The act of AI systems downloading data to servers for analysis, whether as temporary copies in random access memory (RAM) or long-term storage copies for training, may in principle constitute an act of reproducing works. 
  2. Right to Prepare Derivative Works:When the Output of AI bears expression characteristics similar to or developed based on the original work, the legal boundary between “inspiration” and the act of “infringing the right to prepare derivative works” becomes very tenuous. 
  3. Related Rights:For data that are sound recordings, video recordings, or broadcasts, exploitation may affect the rights of producers and broadcasting organizations. 

Accordingly, the central legal question posed is: Is this exploitation considered an exception, allowing use without permission (based on fair use principles or exceptions regarding data mining), or is this an act of infringing IP rights on an industrial scale? 

3. Inadequacies in the Vietnamese Legal Framework on Intellectual Property: when the mechanism of“exceptions”has not kept pace with the speed of scientific and technological development 

Regarding the Vietnamese Legal Framework on Intellectual Property, including the Law on Intellectual Property 2005 (amended and supplemented in 2009, 2019, and 2022) and implementing documents, Vietnamese law has established a legal framework regulating copyright and related rights, including specific regulations on conditions for protection, content of rights, limitations on rights, as well as enforcement mechanisms and handling of infringement acts. The basic principle recorded by law is the exclusive right of the owner. Accordingly, any act of exploiting or using objects protected by a third party must have the consent of the rights holder, except for cases falling under the list of exceptions and limitations on rights prescribed by law. However, for a specific field like AI, the legal system still has certain “gaps”, specifically as follows: 

First, the absence of definitions and specialized regulations. A reality is that current Vietnamese intellectual property law regulations do not have an official definition of “AI training data” nor regulations directly governing the use of protected works for the purpose of AI training. The failure to clearly determine the legal status regarding input data of AI leads to a situation where there is no unified legal basis to determine infringing acts, while simultaneously potentially giving rise to legal disputes regarding the legitimate rights and interests of subjects during the process of technological development. 

Second, the principle of “exclusivity” may be a major barrier to AI training data collection activities. According to the spirit of Vietnamese legal regulations on intellectual property, intellectual property rights in general and the property rights of authors are exclusive rights. Pursuant to this principle, any act of copying works to feed into Generative AI applications, if without the prior consent of the rights holder, has a high probability of being considered an infringing act. In other words, the current legal mechanism still operates strictly according to the principle: use requires permission and fee payment, unless falling into the exceptions under intellectual property law regulations analyzed below. 

Third, the “narrow door” of exceptions. Many views suggest that AI training may fall under the cases prescribed in Article 25, Article 25a, and Article 32 of the Law on Intellectual Property. However, if analyzing deeply the nature of the exceptions, this argument appears legally unsound. Specifically, the current regulations on exceptions in Vietnam are designed mainly for non-commercial purposes and serving public interests (such as scientific research, teaching, library archiving, or supporting people with disabilities). Meanwhile, the majority of current AI models aim at commercial purposes or operate on an industrial scale. The disparity between the commercial nature of AI and the non-commercial nature of statutory exceptions makes the argument that the purpose of AI training falls under permissible exceptions legally unsound. 

4. Conclusion and recommendations

The development of Artificial Intelligence is an inevitable trend, requiring the Intellectual Property legal system to have flexible adaptation to balance interests between parties. In the context that the legal corridor is still in the process of completion, proactivity in legal risk management is a key factor, specifically as follows: 

For Enterprises developing and applying AI: It is necessary to build a strict legal Due Diligence process regarding input data sources. Prioritize the use of Open Data, data belonging to the Public Domain, or establish clear Licensing Agreements. 

For IP Rights Owners: It is necessary to proactively review and update Terms of Use on digital platforms, supplementing regulations prohibiting or restricting automated data collection (crawling/scraping) for AI training purposes. In addition, applying Technological Protection Measures (TPMs) is a necessary self-defense solution to prevent potential infringing acts. 

Date Written: 20/11/2025

Disclaimers:

This article is for general information purposes only and is not intended to provide any legal advice for any particular case. The legal provisions referenced in the content are in effect at the time of publication but may have expired at the time you read the content. We therefore advise that you always consult a professional consultant before applying any content.

For issues related to the content or intellectual property rights of the article, please email cs@apolatlegal.vn.

Apolat Legal is a law firm in Vietnam with experience and capacity to provide consulting services related to and contact our team of lawyers in Vietnam via email info@apolatlegal.com.

Share: share facebook share twitter share linkedin share instagram

Find out how we can help your business

SEND AN ENQUIRY



    Send Contact
    Call Us
    Zalo