Balancing data sovereignty and AI
With data regulations increasing in complexity, where data comes from, where it goes, and who processes it is increasingly important. In many parts of the world, data regulations require data to remain in its region of origin unless external organizations can demonstrate compliance with those regulations. This concept is called "data sovereignty": the idea that data is regulated by the laws of the country or region in which the data is processed.
Yet even if the location or vendor to which data is being transferred is themselves compliant, cross-border data transfers can result in violations. For instance, government agencies in some countries may be empowered to examine data traversing their borders, which would violate the data regulations of other countries.
Organizations that transfer data out of their region of origin without adequate protection in place can face serious legal and financial consequences. As an example, in 2023 Meta received a US $1.3 billion fine for transferring personal data from the EU to the US without adequate privacy protections for the transferred data.
The above has given rise to the concept of data localization: To maintain data regulatory compliance and consumer trust, organizations often face the need to keep data within their own regions.
The idea of data localization is that data is kept within a given country or region, rather than transferred across borders and processed or stored on servers in remote areas. However, this approach makes cloud computing and the use of external third-party services more complex, as such services are usually not localized in this way. Cloud data centers, are located all over the world, regardless of where the services they support are based.
This means the need to localize data, for many organizations, may come into conflict with one of the most important cloud-based services available today: Artificial intelligence (AI).
AI has emerged as a powerful tool for business
In recent years, a combination of more powerful hardware and increasingly refined software has led to an explosion in AI capabilities. Organizations are incorporating AI into their processes to assist with predictive modeling, content ideation, research, sentiment analysis, and customer service automation. Analyst firms such as McKinsey continue to be optimistic about the expanding business uses for generative AI (GenAI). Most businesses do not have time or resources to build their own AI models, so are relying on outside vendors in order to use these technologies.
AI, however, vacuums up data in order to function. AI models are based on large data sets that are used for training complex algorithms. Large data sets can be, and are, stored in a variety of places. But because of its scalability, training data for AI is almost always stored in the cloud, in data centers around the world. (From the FAQs for OpenAI consumer services: "Content is stored on OpenAI systems and our trusted service providers' systems in the US and around the world [emphasis added].")
This means data uploaded to AI or used to train GenAI models passes outside the control of the organization that originally had the data, and is most likely outside the geographic region where it originates from.
As the models receive more inputs, they continue to be fine-tuned. This means inputs may influence future outputs — or even reappear as future outputs (the latter of which is a risk to sensitive data that led some organizations to ban the use of GenAI by their employees). Often this happens with very little visibility — AI users may not know where the machines are that process the data they provide. Also of concern is shadow AI, or unsanctioned usage of AI tools that occurs without the visibility or approval of IT teams.
Recommended by LinkedIn
In many jurisdictions, this potentially brings businesses into conflict with data sovereignty requirements. The risks of conflicting with such requirements include fines (from small fines to the massive one levied against Meta), sanctions, and a decline in public reputation and customer trust.
On the other hand, the risks from not using AI, and falling behind the competition, pose a similar threat to businesses.
To summarize: AI is hugely useful but may be risky for organizations operating under strict data regulations — unless they can find a data-sovereignty-friendly approach for AI.
Options for leveraging AI without crossing borders
How can companies use AI while avoiding the risk of data crossing geographical borders? What is needed is an approach that offers computational power capable of supporting complex AI models but in a localized fashion. Organizations also need to make sure they control where their data is stored and processed, both in transit and at rest.
The best path forward is therefore data localization combined with a local AI instances, either built on a third-party platform or offered pre-built by a vendor. Full data localization involves full control over where data is stored, where users are served from, and where cryptographic keys are stored (since this dictates where data exists in decrypted form). These capabilities must be integrated with a powerful global AI network with local presence, one with sufficient computational power available on demand to operate AI models.
Businesses simultaneously facing the need to use AI and the need to localize data, need a partner who understands these requirements and can support them. Cloudflare offers a data localization suite to support all organizations that have data sovereignty requirements to meet. But more importantly, Cloudflare for AI offers access to GPUs anywhere in the world, and quick ways for developers to integrate popular AI models.
This article is part of a series on the latest trends and topics impacting today’s technology decision-makers.
Dive deeper into this topic.
Learn more about how to simplify and secure AI initiatives in the The connectivity cloud: A way to take back IT and security control ebook.
Uwielbiam to