GenAI Security Checklist: Top 10 Authorization Strategies for Handling Sensitive Data

Pietro Mazzoleni

People data & People Analytics Leader At IBM | AI | Enterprise Data Platforms | Data Transformation

Published Oct 22, 2024

IMPORTANT: This article is a collaborative effort, co-authored with my colleague and friend, Grant Miller , IBM Distinguished Engineer and CTO for IBM Data Protection. Together, we’ve combined our points of view and experience to shape the insights shared in this piece.

As organizations increasingly adopt Generative AI (GenAI) for applications involving people-related data, ensuring the responsible and secure handling of sensitive information has become paramount. GenAI's versatility is revolutionizing fields such as human resources, finance, healthcare, and customer service. However, its use also presents unique challenges in managing personally identifiable information (PII) and sensitive personal information (SPI). These challenges encompass ethical considerations, regulatory compliance, data governance, and the potential for unintended consequences affecting individuals and society.

This article addresses a critical issue: securing access to PII and SPI in GenAI applications. Drawing on our experience, expertise, and real-world implementations at IBM, we outline the top 10 strategies organizations should use to safeguard sensitive data in GenAI deployments. These recommendations are categorized into three key scenarios of GenAI where sensitive data is typically encountered:

Using PII/SPI for training and fine-tuning models
Accessing PII/SPI during user interactions
Generating PII/SPI as part of user interactions

Each scenario presents unique risks and considerations, requiring tailored approaches to ensure the secure and ethical handling of sensitive information. In this article, we will introduce 10 authorization strategies and explore how to apply these strategies across various GenAI use cases. We will also highlight best practices and provide practical examples, with a particular focus on the HR domain.

Top 10 Authorization Strategies for Handling Sensitive Data in GenAI

A. Using PII/SPI for training and fine-tuning models

Generative AI solutions are predominantly powered by Large Language Models (LLMs), which are sophisticated AI systems trained on massive datasets to comprehend and generate human-like language. Within enterprise contexts, it is highly uncommon to use personally identifiable information (PII) or sensitive personal information (SPI) to train a new model from scratch. Instead, organizations typically leverage pre-trained LLMs and fine-tune them using a much smaller set of domain-specific data. This approach allows the organization to customize the general-purpose LLM to perform specialized tasks while minimizing the need for extensive data.

For example, consider the following two use-case of fine-tuning LLMs for HR:

HR Case Resolution Support: Fine-tuning an LLM with historical case management data can help less experienced HR professionals resolve employee cases more effectively. By training the model on company-specific processes, terminology, and past cases, the LLM can suggest responses tailored to similar situations and even anticipate the impact of policy changes.
Policy Q&A Chatbot: By fine-tuning the model with internal policy documents (such as travel, benefits, and new-hire policies), a chatbot can offer employees accurate and compliant responses to policy-related inquiries. This approach improves response times, ensures consistency, and maintains alignment with the organization's regulatory standards.

Using people data for fine-tuning requires a robust strategy to manage sensitive information. Below are the first three strategies for ensuring privacy and security for this scenario:

1/10 - Understand the Distinction Between Confidentiality and Privacy for proper classification

Not all sensitive data is the same, and understanding the nuances between confidentiality and privacy is critical. Confidential data, such as general HR policies or organizational processes, can often be accessed by a broader group without compromising privacy. However, personal information that pertains to a specific individual (e.g., performance reviews or sale quota attainment) needs to be handled with strict privacy controls. Misclassification can result in either unnecessary restrictions or the unintentional exposure of sensitive data. For instance, in IBM HR, most of our Gen-AI solutions are specifically fine-tuned using confidential information like policies and regulations, but not PI/SPI data.

2/10 - Use Data already accessible by the Intended Audience

When fine-tuning models, consider focusing on data that is already accessible to the intended users of the solution. For example, if the model will be used only by HR Case Management team, and they already have access to all data used for tuning, it is relatively safe to fine-tune the model using historical data. Conversely, if the use case is designed to support a broader audience (e.g., all HR personnel or all managers within a company), limiting access to only the relevant subset of data after the model has been fine-tuned with broader PI/SPI data becomes highly complex (if even practically possible).

3/10 – Opt for Very Coarse-Grain Anonymization to Ensure Broad Data Protection

To protect privacy, one might think to apply anonymization or data masking techniques when using people data for model fine-tuning. However, anonymization done during the model tuning is something that needs to be done very carefully. Depending on the amount anonymized data being passed, one might be able to infer the identity of an individual by providing enough context. This problem is not new, but it is difficult to truly control it when data are passed to a GenAI model. That said, use-cases which have a high-level abstraction of the data, perhaps to the line of the one used for external company reports, could be safely used for tuning.

Be careful: depending on the amount of anonymized data shared, it may be possible to infer an individual’s identity if enough context is provided

B. Accessing PII/SPI during user interactions

This second scenario focuses on how sensitive data is accessed during interactions between users and digital assistants powered by GenAI. In these cases, PII/SPI is not used to train or fine-tune models but is dynamically retrieved on-demand to respond to specific user queries.

Let’s consider two examples once again drawn from the HR space:

ESG Reporting: Environmental, Social, and Governance (ESG) reporting often requires retrieving sensitive data related to Diversity, Equity, and Inclusion (DEI). GenAI can streamline this process by assisting users in dynamically pulling data from internal HR management systems or data lakes based on prior reports.
Process Automation: GenAI solutions have the potential to significantly streamline HR process automation. For example, IBM utilizes its AskHR solution to automate routine tasks like employee promotions and expense approvals. In these cases, Personally Identifiable (PI) and Sensitive Personal Information (SPI) data is extracted from the system of record, processed based on user inputs, and then seamlessly transmitted back to finalize the workflow in the relevant system.

Given the risks associated with retrieving PII/SPI data during these interactions, organizations must implement stringent access controls and continuously monitor data usage. Below are three more strategies to ensure secure handling of sensitive data for these scenarios:

4/10 - GenAI is not the place to set authorization – Use existing entitlements and APIs instead

GenAI should not be used to set data access controls. Instead, utilize the entitlements and APIs of the systems where the source data is stored. The user identity should be passed to the source system, which acts as an access decision point, and fetches only the data the user is authorized to access. If a digital assistant is used to access the data, then the user’s identity who is interacting with the agent should be propagated along with the agent’s identity to be used for determining access. Consequently, a digital assistant may have its own rights against the source data. In this case, least privilege is used between the agent and the user.

Remember: A digital assistant may have its own rights to PII/SPI data, independent of the rights held by the user operating the assistant.

This approach ensures consistent and efficient access control by implementing it as close to the source as possible. For example, IBM’s AskHR solution does not manage HR Partner identities or their access rights to PI/SPI data. Instead, the user’s identity is transmitted to WF360, IBM people data platform, which is responsible for managing user authorization across all reports and analytics data needs.

5/10 - Apply Data Masking especially for summary-Level responses

Data masking is an effective way to protect sensitive information by partially or fully obfuscating data elements based on the context. For example, in business intelligence reports, IBM employs masking techniques that display only aggregated results if a query returns data on fewer than five employees. Although masking cannot eliminate all risks, it greatly reduces the likelihood of exposing identifiable details in summary-level interactions.

6/10 - Monitor and Audit Continuously as GenAI Evolves

Even with robust controls in place to secure data as described, continuous monitoring and auditing are critical to maintaining data security as GenAI systems evolve. Monitor user interactions, data access patterns, and any anomalies in real time to identify and mitigate new risks as they emerge. Record actions and implement audit trails to track who accessed what data, when, and for what purpose. This proactive approach helps detect potential misuse early and ensures ongoing compliance with organizational policies and regulations.

C. Generating PII/SPI as part of user interactions

The third scenario involves PII/SPI data that is shared by users during their interactions with a GenAI solution. This can occur either implicitly, where a digital assistant pulls contextual information to offer personalized experiences, or explicitly, where users input personal details directly into the conversation. Because these interactions happen dynamically, it can be highly challenging to pin-point which PII/SPI are been shared and for which purpose. However, ensuring proper handling of such data remain critical to maintaining user trust and compliance.

To further illustrate this scenario, let's consider the following two Gen-AI use cases from the HR domain:

Personal Career Coach: To foster internal career growth and boost employee engagement, many organizations, including IBM, are experimenting with solutions that guide employees on their next career steps. In this scenario, employees share their career aspirations, preferred roles, and skills they want to develop. In response, GenAI provides tailored career recommendations, potential mentors, and links to relevant learning or internal job openings.
Guide on Filing Concerns: In HR, employees may not always feel comfortable discussing workplace issues such as policy violations or conflicts. A digital assistant can provide a “safe space” where employees can explore options and learn about the process for filing complaints before formally raising a concern. This type of interaction involves highly sensitive personal disclosures, which must be managed with utmost care to maintain confidentiality and especially trust into the process.

Key is relying on having the use-case approved to pull certain data. Here are another three strategies to support these use cases:

7/10 – Establish a “User–Digital Assistant Confidentiality” Agreement

Conversations between users and digital assistants may contain sensitive PII/SPI that is difficult to classify and has no long-term value to the organization. In use cases like the “Guide on Filing Concerns,” employees will only trust the assistant if they are assured that their personal information is not being stored or used beyond the immediate session. For this, consider implementing a “User–Digital Assistant Confidentiality” policy that guarantees session data, transient or stored, will be immediately deleted after the interaction ends. This approach not only reduces privacy risks but also enhances user trust in the system.

Consider this: a 'User–Digital Assistant Confidentiality' policy to ensure session data, whether transient or stored, is deleted immediately after each interaction. This reduces privacy risks and builds user trust

8/10 - Apply Standard Data Handling Practices to GenAI-Generated PII/SPI

Any PII/SPI generated during user interactions with GenAI should be treated with the same rigor as sensitive data collected by non GenAI interaction. This means applying all the standard principles: obtaining user consent, enforcing data retention policies, and ensuring data is only shared on a strict “need-to-know” basis. At IBM, for example, the “Security and Privacy by Design” framework guides all our development, ensuring that privacy and security are integrated into the product lifecycle from the ground up.

9/10 - Prepare for Incident Response

Effective incident response planning is crucial across all GenAI use cases but is particularly important for interactions involving PII/SPI. Organizations should be ready to respond quickly if there is a data breach, unauthorized access, or unexpected data exposure. This includes having designated teams, communication plans, and predefined actions to take if a user’s sensitive data is compromised. Establish clear protocols for detecting, reporting, and mitigating incidents.

As GenAI systems become deeply embedded in enterprise solutions, establishing robust data governance frameworks for handling PII and SPI is not just a technical requirement, but a strategic imperative. Effectively managing sensitive data across the three key scenarios—data used for fine-tuning, data retrieved during interactions, and data generated through interactions—ensures that organizations can safeguard user information, maintain compliance, and foster trust in AI technologies.

Throughout this article, we have outlined nine practical strategies for managing sensitive data in GenAI applications. However, before concluding, it’s important to recognize that not all GenAI use cases carry the same level of risk when it comes to PII/SPI. For this reason, we recommend one final strategy:

Final note: Not all GenAI use cases carry the same risk for PII/SPI. Choose wisely.

10/10: Tailor Your Strategy Based on Use-Case (Risk) Assessment

When working with GenAI, it’s crucial to understand that different use cases present varying levels of risk depending on how they interact with sensitive data. For example, a chatbot providing generic career guidance may not require the same stringent data handling measures as a tool designed to support confidential employee relations issues. Privacy risks, data quality, robustness of data pipelines, and ethical considerations are key factors that should be part of a comprehensive risk assessment when selecting and designing GenAI use cases. For a deeper dive into these considerations and how to apply them, please refer to our separate article.

Thank you for your attention. We look forward to hearing which strategies resonate most with you and if there are any critical ones we may have overlooked. Please also consider joining my people data platform newsletter for monthly insights.

IMPORTANT: This article is a collaborative effort, co-authored with my colleague and friend, Grant Miller, IBM Distinguished Engineer and CTO for IBM Data Protection. Thanks, Grant, for the opportunity and the partnership!

People Data Platform

1,086 follower

+ Subscribe

Dharmesh Vadgama

1mo

Insightful

To view or add a comment, sign in

GenAI Security Checklist: Top 10 Authorization Strategies for Handling Sensitive Data

Pietro Mazzoleni

People data & People Analytics Leader At IBM | AI | Enterprise Data Platforms | Data Transformation

A. Using PII/SPI for training and fine-tuning models

1/10 - Understand the Distinction Between Confidentiality and Privacy for proper classification

2/10 - Use Data already accessible by the Intended Audience

3/10 – Opt for Very Coarse-Grain Anonymization to Ensure Broad Data Protection

B. Accessing PII/SPI during user interactions

4/10 - GenAI is not the place to set authorization – Use existing entitlements and APIs instead

5/10 - Apply Data Masking especially for summary-Level responses

6/10 - Monitor and Audit Continuously as GenAI Evolves

C. Generating PII/SPI as part of user interactions

7/10 – Establish a “User–Digital Assistant Confidentiality” Agreement

8/10 - Apply Standard Data Handling Practices to GenAI-Generated PII/SPI

9/10 - Prepare for Incident Response

10/10: Tailor Your Strategy Based on Use-Case (Risk) Assessment

People Data Platform

1,086 follower

More articles by Pietro Mazzoleni

Explore topics

A. Using PII/SPI for training and fine-tuning models

1/10 - Understand the Distinction Between Confidentiality and Privacy for proper classification

2/10 - Use Data already accessible by the Intended Audience

3/10 – Opt for Very Coarse-Grain Anonymization to Ensure Broad Data Protection

B. Accessing PII/SPI during user interactions

4/10 - GenAI is not the place to set authorization – Use existing entitlements and APIs instead

5/10 - Apply Data Masking especially for summary-Level responses

6/10 - Monitor and Audit Continuously as GenAI Evolves

C. Generating PII/SPI as part of user interactions

7/10 – Establish a “User–Digital Assistant Confidentiality” Agreement

8/10 - Apply Standard Data Handling Practices to GenAI-Generated PII/SPI

9/10 - Prepare for Incident Response

10/10: Tailor Your Strategy Based on Use-Case (Risk) Assessment

People Data Platform

1,086 follower

More articles by Pietro Mazzoleni

The Power of Integration: Why People Data Thrives Within Enterprise Frameworks

A Milestone of 1,000+: Honoring Your Support and Sharing 2025 Directions

People Data Excellence: Driving Quality through Empowerment, Standardization, and Automation

Generative AI in HR: making smart choices depending on your data maturity

Blueprint for Balance: IBM’s Guide to Managing and Protecting People Data

Mastering data governance for effective people data platforms: lessons from what we did at IBM

Transforming HR: How IBM measures the success of its people data platform investments

Unlocking People Data: Master the Art of Prioritization

Unlocking People Data: Integrated Reporting Experiences - Lessons from Transforming IBM's Data Platform to Elevate People Analytics

Unlocking People Data: Lessons from Transforming IBM's Data Platform to Elevate People Analytics - The Why and the What

Explore topics