Deployment Guidance

PAI’s Guidance for Safe Foundation Model Deployment

A Framework for Collective Action

Partnership on AI’s (PAI) Guidance for Safe Foundation Model Deployment is a framework for model providers to responsibly develop and deploy a range of AI models, promote safety for society, and adapt to evolving capabilities and uses.

Why It Matters

Recent years have seen rapid advances in AI driven by foundation models, sometimes known as large language models or general purpose AI. These are transformative AI systems trained on large datasets which power a variety of applications, from content generation to interactive conversational interfaces. Already, there is widespread recognition of this technology’s potential for both social benefit and harm. The use of foundation models could enable new forms of creative expression, boost productivity, and accelerate scientific discovery. It could also increase misinformation, negatively impact workers, and automate criminal activity.

Emerging AI Risks

Lack of Practical Guidance

Need for Collective Action

Given the potentially far-reaching impacts of foundation models, shared safety principles must be translated into practical guidance for model providers. This requires collective action. To establish effective, collectively-agreed upon practices for responsible model development and deployment, diverse voices across industry, civil society, academia, and government need to work together.

PAI has released Guidance for Safe Foundation Model Deployment, which will continue evolving in collaboration with our global community of civil society, industry, and academic organizations. This is a framework for model providers to responsibly develop and deploy foundation models across a spectrum of current and emerging capabilities, helping anticipate and address risks. The Model Deployment Guidance gives AI developers practical recommendations for operationalizing AI safety principles. We invite more stakeholders to get involved and help shape this truly collective effort.

Practical guidance for foundation model safety

Created through ongoing multistakeholder collaboration

Customizable for specific model and release types

Designed to evolve as new capabilities and risks emerge

Using PAI’s Model Deployment Guidance website, foundation model providers can receive a set of recommended practices to follow throughout the deployment process, tailored to the capabilities of their specific model and how it is being released. Designed to be a living document that can appropriately respond to new capabilities as AI technologies continue to evolve, PAI’s Model Deployment Guidance aims to complement broader regulatory approaches.

Generate Custom Guidance

This guidance assists foundation model providers: organizations developing AI systems trained on broad datasets to power a wide variety of downstream uses and interactive interfaces.

1 Choose your foundation model

In cases where models do not fit into one category, choose the model type of higher capability.

Specialized Narrow Purpose

Description

Models designed for narrowly defined tasks or purposes with limited general capabilities for which there is a lower potential for harm across contexts.

Key Considerations

Do any of the following apply to the model, even if it does not satisfy every criteria?

Is the model designed for a narrow, well-defined domain or task?
Are its capabilities less applicable across contexts?
Does the model present lower risk of misuse across contexts?

The key difference from Advanced Narrow and General Purpose models is that these have tightly constrained capabilities in terms of input, domain, output complexity, and potential generalizability.

Examples

Music generation models

Advanced Narrow and General Purpose

Description

Models with generative capabilities for synthetic content like text, image, audio, video. Can be narrow purpose focused on specific tasks or modalities or general purpose. Also covers some narrow purpose models focused on scientific, biological or other high consequence domains.

Encompasses general purpose models capable across diverse contexts, like chatbots/LLMs and multimodal models.

Key Considerations

Do any of the following apply to the model, even if it does not satisfy every criteria?

Can the model generate synthetic content difficult for people to distinguish from reality (text, audio, video), even if narrow purpose?
Could the model facilitate impersonation, disinformation or other societal, chemical, biological or cyber attacks if misused, even if focused on a specific task?
Or is the model more generally applicable across contexts rather than narrowly focussed?
Does the model involve multiple modalities like text, image, audio, video?

Examples

Text-to-speech
Voice impersonation
Text to image/video
Code generation (e.g. GitHub Copilot)
Scientific models
Multimodal models (e.g. GPT-4)
Chatbots (e.g. Claude, ChatGPT, Bard, Llama)

Paradigm-shifting or Frontier

Description

Cutting edge general purpose models that significantly advance capabilities across modalities compared to the current state of the art.

Key Considerations

Do any of the following apply to the model, even if it does not satisfy every criteria?

Does the model enable significantly more advanced capabilities compared to current state-of-the-art?
Does the model utilize parameters or computational resources that greatly exceed current standards, demonstrating a breakthrough in scalable training?
Does the model show evidence of self-learning capabilities exceeding current AI?
Does the model provider enable execution of commands, or actions directly in the real world through released interfaces or applications, beyond passive information processing?

Examples

Extremely large multimodal models

My foundation model is:

2 Choose your type of release

Choose the intended initial release method. For phased rollouts, select the current stage and revisit this guidance as release plans progress.

Open Access

Description

Models released publicly with full access to key components, especially model weights. Can also include access to code, and data. Can be free or commercially licensed. Access can be downloadable or via cloud APIs and other hosted services.

Key Considerations

Does the release include at least model weights and potentially other components such as code, training data, and architecture?

Restricted API and Hosted Access

Description

Models available only through a controlled API, cloud platform, or hosted through a proprietary interface, with limits on use. Does not provide direct possession of the model. Allows restricting access and monitoring usage to reduce potential harms.

Key Considerations

Is the model release only accessible through controlled mediums like proprietary APIs, platforms, or interfaces?

Closed Development

Description

Models developed confidentially within an organization first, with highly limited releases for internal evaluation or restricted external testing, before any potential public availability.

Key Considerations

Is model access restricted only to internal personnel and limited external third parties for testing, and not to the public?

Research Release

Description

Models released in a restricted manner to demonstrate research concepts, techniques, demos, fine-tuned versions of existing models. The release is meant to share knowledge and allow others to build upon it and excludes small-scale individual projects.

Key Considerations

Do any of the following apply to the model, even if it does not satisfy every criteria?

Is the release meant to showcase research artifacts like publishing ingredients, demos, fine-tuned versions of existing models?
Would any downstream commercial use require significant additional work before real-world deployment?
Does the model have limited functionality, performance, and robustness compared to commercial equivalents?

This category represents restricted utility of research artifacts and experiments, whereas the other categories encompass full production models intended for real-world deployment.

My release type is:

3 Is it an update?

Significantly Enhanced Update Launch

If the release is a significant update to an existing model, you are encouraged to renew governance processes as needed per the guidance for your model and release type.

Description

Models that continue major development post-deployment by significantly expanding capabilities, necessitating renewed governance.

Key Considerations

Do any of the following apply to the model, even if it does not satisfy every criteria?

Does the update drastically enhance model architecture, knowledge scope, modalities, or capabilities beyond the initial release?
Does the update pose novel risks requiring additional evaluation?
Does the update meaningfully expand potential beneficial and harmful applications?

Examples

Adding entirely new modalities like text+video after initial text-only.
Enabling connections to live databases that greatly expand knowledge scope.

4 See your Guidance

The applicable guidance for your selection will be displayed below.

Specialized & Open Foundation Models

These guidelines for model providers focus on base models and their interactive interfaces. Further risk evaluations — that address specific use cases and domains — by downstream application developers are still important.

Research & Development

Scan for novel or emerging risks

Description

Proactively identify and address potential novel or emerging risks from foundation models.

Baseline Practices

Conduct model evaluations and experiments to identify new indicators for novel risks, including potential negative societal impacts, malicious uses, and other speculative risks, as appropriate to the model’s intended domain and task. Assess their likelihood and potential impact.
Establish regular processes to probe and address potential novel or emerging risks through techniques like red teaming.

Recommended Practices

Collaborate with stakeholders to advance the identification of novel risks and responsible disclosure practices.

Assess upstream security vulnerabilities

Description

Identify and address potential security vulnerabilities in foundation models to prevent unauthorized access or leaks.

Baseline Practices

Implement pre-release cybersecurity aligned to the model’s low generalized risk.
Conduct testing for vulnerabilities relevant to the model’s specialized capabilities and intended purpose.
Establish protocols for promptly addressing identified vulnerabilities pre-deployment.

Recommended Practices

Exceed baseline cybersecurity standards as risks and use cases evolve, drawing on guidance from standards bodies.
Share lessons learned across industry to collectively strengthen defenses.

Establish risk management and responsible AI structures for foundation models

Description

Establish risk management oversight processes and continuously adapt to address real world impacts from foundation models.

Baseline Practices

Establish risk management structures and processes such as enterprise risk management, and ethics review processes to define guidelines on responsible development, release, and staged rollout considerations, including when a model should not be released.
Regularly update policies, frameworks, and organizational oversight to address evolving capabilities and real-world impacts.

Pre-Deployment

Internally evaluate models for safety

Description

Perform internal evaluations of models prior to release to assess and mitigate for potential societal risks, malicious uses, and other identified risks.

Baseline Practices

Establish internal evaluation policies and processes including testing for fairness, interpretability, output harms, and intended vs foreseeable unintended use cases.
Proactively identify and minimize potential sources of bias in training corpora, and adopt techniques to minimize unsafe model behavior.
Conduct evaluations using cross-disciplinary review teams spanning ethics, security, social science, and other relevant domains, when sensitive uses are implicated (e.g., Risk of physical or psychological injury, consequential impact on legal position or life opportunities, threat to human rights).
Address identified risks and adapt deployment plans accordingly based on learnings from pre-deployment evaluations.
Maintain documentation of evaluation methods, results, limitations, and steps taken to address identified issues and integrate insights in public reporting per guidance below.

Recommended Practices

Use pre-release red teaming methods to assess the potential for implemented safety features per guidance below to be circumvented post-release, for example through additional reinforcement learning from human feedback (RLHF) fine tuning designed to counteract the provider’s safety interventions.
Consult domain experts and affected users to complement internal oversight for specialized models where appropriate.
Collaborate across industry, civil society, and academia to advance the development and standardization of model evaluations for foundation models.

Publicly report model impacts and “key ingredient list”

Description

Provide public transparency into foundation models’ “key ingredients”, capabilities, limitations, testing evaluations, and risks to enable cross-stakeholder exploration of societal impacts and safety risks.

Baseline Practices

Publish “key ingredient list” which can include the model’s compute, parameters, architecture, training data approach, and dataset and model documentation.
Disclose details such as model architecture, training methodology, performance benchmarks, intended use cases, risks, limitations, and steps to mitigate harms.
Disclose details such as testing methodologies, evaluation criteria, results, limitations, and gaps for any internal and external evaluations conducted prior to release.

Recommended Practices

Collaborate across industry, civil society, and academia to advance public reporting practices weighing transparency with privacy, safety, and other tradeoffs.
Align disclosures with existing and emerging best practices like Model Cards, Datasheets, FactSheets, Nutrition Labels, and Reward Reports.

Provide downstream use documentation

Description

Equip downstream developers with comprehensive documentation and guidance needed to build safe, ethical, and responsible applications using foundation models.

(Note: It is well understood downstream developers play a crucial role in anticipating deployment-specific risks and unintended consequences. This guidance aims to support developers in fulfilling that responsibility.)

Baseline Practices

Ensure documentation follows prevailing industry standards and accepted best practices for responsible AI development, as appropriate to the model’s intended domain and task.

Recommended Practices

Provide documentation to downstream developers covering details such as suggested intended uses, limitations, steps to mitigate risks, and safe development practices when building on foundation models.
Collaborate with civil society and downstream developers to advance documentation standards that meet the needs of developers when models are offered through restricted access.

Establish safeguards to restrict unsafe uses

Description

Implement necessary organizational, procedural and technical safeguards, guidelines and controls to restrict unsafe uses and mitigate risks from foundation models.

Baseline Practices

For openly released models, embed safety features directly into model architectures, interfaces and integrations that cannot be easily removed or bypassed post-release, if applicable for the model’s intended domain and task.

Recommended Practices

Publish a responsible AI license prohibiting harmful applications, as appropriate for the model’s intended domain and task.
Provide appropriate transparency into safeguards while protecting integrity.

Post-Deployment

Monitor deployed systems

Description

Monitor foundation models post-deployment to identify issues, misuse, and societal risks.

Recommended Practices

For openly released specialized models, consider lightweight ongoing monitoring of user feedback on the model’s performance, fairness, unintended uses, misuses and other impacts. Monitoring could involve reviewing public user feedback in open model repositories (like GitHub) and forums.
Respond appropriately if issues arise, including notifying partners of significant incidents and considering retiring support for a model if necessary.
Provide transparency into monitoring practices, while protecting user privacy.

(Note: After open release of a foundation model’s weights, its original developers will in effect be unable to decommission AI systems that others build using those model weights.)

Societal Impact

Responsibly source all labor including data enrichment

Description

Responsibly source all forms of labor, including data enrichment tasks like data annotation and human verification of model outputs.

Baseline Practices

Pay or contract with vendors that will pay data enrichment workers above the workers’ local living wage.
Provide or contract with vendors that provide clear instructions for enrichment tasks that are tested for clarity. Enable workers to opt out of tasks.
Equip or contract with vendors that equip workers with simple and effective mechanisms for reporting issues, asking questions, and providing feedback on the instructions or task design.

(Note that the guideline is most applicable when directly contracting labor. For open datasets, practices may not be feasible.)

Recommended Practices

Disclose any new types of labor that enter the supply chain of foundation models. Ensure policies and responsible sourcing practices extend as appropriate to new labor sources as they emerge, like red teamers. Update internal standards and vendor agreements accordingly.
Proactively survey all workers to identify areas for improving policies, instructions, and work environments, and seek external feedback.

Implementation Resources

PAI’s Library of practitioner resources for responsible data enrichment sourcing.

Enable feedback mechanisms across the AI value chain

Description

Implement inclusive feedback loops across the AI value chain to ethically identify potential harms.

Baseline Practices

Provide clear feedback channels for application developers, consumers, and other direct users.

Recommended Practices

Proactively gather input from indirect stakeholders affected by AI systems through ethical community engagement.
Establish processes for reviewing feedback and integrating affected user perspectives into development and policy decisions.

Specialized & Restricted Foundation Models

Research & Development

Scan for novel or emerging risks

Description

Proactively identify and address potential novel or emerging risks from foundation models.

Baseline PracticesiSuggested minimum practices to meet the guidance

Conduct model evaluations and experiments to identify new indicators for novel risks, including potential negative societal impacts, malicious uses, and other speculative risks, as appropriate to the model’s intended domain and task. Assess their likelihood and potential impact.
Establish regular processes to probe and address potential novel or emerging risks through techniques like red teaming.

Recommended PracticesiRecommendations to exceed the minimum, where applicable

Collaborate with stakeholders to advance the identification of novel risks and responsible disclosure practices.

Assess upstream security vulnerabilities

Description

Identify and address potential security vulnerabilities in foundation models to prevent unauthorized access or leaks.

Baseline PracticesiSuggested minimum practices to meet the guidance

Implement pre-release cybersecurity aligned to the model’s low generalized risk.
Conduct testing for vulnerabilities relevant to the model’s specialized capabilities and intended purpose.
Establish protocols for addressing identified vulnerabilities pre-deployment.

Recommended PracticesiRecommendations to exceed the minimum, where applicable

Exceed baseline cybersecurity standards as risks and use cases evolve, drawing on guidance from standards bodies.
Share lessons learned across industry to collectively strengthen defenses.

Establish risk management and responsible AI structures for foundation models

Description

Establish risk management oversight processes and continuously adapt to address real world impacts from foundation models.

Baseline PracticesiSuggested minimum practices to meet the guidance

Establish risk management structures and processes such as enterprise risk management, and ethics review processes to define guidelines on responsible development, release, and staged rollout considerations, including when a model should not be released.
Regularly update policies, frameworks, and organizational oversight to address evolving capabilities and real-world impacts.

Pre-Deployment

Internally evaluate models for safety

Description

Perform internal evaluations of models prior to release to assess and mitigate for potential societal risks, malicious uses, and other identified risks.

Baseline PracticesiSuggested minimum practices to meet the guidance

Establish internal evaluation policies and processes including testing for fairness, interpretability, output harms, and intended vs foreseeable unintended use cases.
Proactively identify and minimize potential sources of bias in training corpora, and adopt techniques to minimize unsafe model behavior.
Conduct evaluations using cross-disciplinary review teams spanning ethics, security, social science, and other relevant domains, when sensitive uses are implicated (e.g., Risk of physical or psychological injury, consequential impact on legal position or life opportunities, threat to human rights).
Address identified risks and adapt deployment plans accordingly based on learnings from pre-deployment evaluations.
Maintain documentation of evaluation methods, results, limitations, and steps taken to address identified issues and integrate insights in public reporting per guidance below.

Recommended PracticesiRecommendations to exceed the minimum, where applicable

Use pre-release red teaming methods to assess the potential for implemented safety features per guidance below to be circumvented post-release, for example through additional reinforcement learning from human feedback (RLHF) fine tuning designed to counteract the provider’s safety interventions.
Consult domain experts and affected users to complement internal oversight for specialized models where appropriate.
Collaborate across industry, civil society, and academia to advance the development and standardization of model evaluations for foundation models.

Publicly report model impacts and “key ingredient list”

Description

Provide public transparency into foundation models’ “key ingredients”, capabilities, limitations, testing evaluations, and potential risks to enable cross-stakeholder exploration of societal risks and malicious uses.

Baseline PracticesiSuggested minimum practices to meet the guidance

Publish “key ingredient list” which can include the model’s compute, parameters, architecture, training data approach, and dataset and model documentation.
Disclose details such as model architecture, training methodology, performance benchmarks, intended use cases, risks, limitations, and steps to mitigate risks.
Disclose details such as testing methodologies, evaluation criteria, results, limitations, and gaps for any internal and external evaluations conducted prior to release.

Recommended PracticesiRecommendations to exceed the minimum, where applicable

Collaborate across industry, civil society, and academia to advance public reporting practices weighing transparency with privacy, safety, and other tradeoffs.
Align disclosures with existing and emerging best practices like Model Cards, System Cards, Datasheets, FactSheets, Nutrition Labels, Transparency Notes and Reward Reports.

Provide downstream use documentation

Description

Equip downstream developers with comprehensive documentation and guidance needed to build safe, ethical, and responsible applications using foundation models.

Baseline PracticesiSuggested minimum practices to meet the guidance

Ensure documentation follows prevailing industry standards and accepted best practices for responsible AI development, as appropriate to the model’s intended domain and task.

Recommended PracticesiRecommendations to exceed the minimum, where applicable

Provide documentation to downstream developers covering details such as suggested intended uses, limitations, steps to mitigate risks, and safe development practices when building on foundation models.
Collaborate with civil society and downstream developers to advance documentation standards that meet the needs of developers when models are offered through restricted access.

Establish safeguards to restrict unsafe uses

Description

Implement necessary organizational, procedural and technical safeguards, guidelines and controls to restrict unsafe uses and mitigate risks from foundation models.

Baseline PracticesiSuggested minimum practices to meet the guidance

Embed safety features directly into model architectures, interfaces and integrations
Publish clear terms of use prohibiting harmful applications and outlining enforcement policies, as appropriate for the model’s intended domain and task.
Limit access through approved applications, rate limiting, content filtering, and other technical controls.

Recommended PracticesiRecommendations to exceed the minimum, where applicable

Maintain processes to regularly re-evaluate technical and procedural controls, monitor their effectiveness including robustness against jailbreaking attempts, and update terms of use as potential misuses evolve, if applicable to the model’s intended domain and task.
Provide appropriate transparency into safeguards while protecting integrity.

Post-Deployment

Monitor deployed systems

Description

Monitor foundation models post-deployment to identify issues, misuse, and societal risks.

Recommended PracticesiRecommendations to exceed the minimum, where applicable

Consider lightweight ongoing monitoring of deployed models covering areas like model’s performance, fairness, unintended uses, misuses and other impacts.
Respond appropriately if issues arise, including notifying partners of significant incidents and considering restricting or retiring support for a model if necessary.
Provide transparency into monitoring practices, while protecting user privacy.

Societal Impact

Support third party inspection of models and training data

Description

Support progress of third-party auditing capabilities for responsible foundation model development through collaboration, innovation and transparency.

Baseline PracticesiSuggested minimum practices to meet the guidance

Provide sufficient transparency into models and datasets to enable independent assessment and auditing by third parties such as academics and civil society. (Note: Enabling robust third-party auditing remains an open challenge requiring ongoing research and attention).
Collaborate with third parties to support creation of context-specific auditing methodologies focused on evaluating real-world impacts in specific domains and use cases, beyond base-model evaluations which focus on societal impact evaluations that are not tied to a specific application context.

Responsibly source all labor including data enrichment

Description

Responsibly source all forms of labor, including data enrichment tasks like data annotation and human verification of model outputs.

Baseline PracticesiSuggested minimum practices to meet the guidance

Pay or contract with vendors that will pay data enrichment workers above the workers’ local living wage.
Provide or contract with vendors that provide clear instructions for enrichment tasks that are tested for clarity. Enable workers to opt out of tasks.
Equip or contract with vendors that equip workers with simple and effective mechanisms for reporting issues, asking questions, and providing feedback on the instructions or task design.

Recommended PracticesiRecommendations to exceed the minimum, where applicable

Disclose any new types of labor that enter the supply chain of foundation models. Ensure policies and responsible sourcing practices extend as appropriate to new labor sources as they emerge, like red teamers. Update internal standards and vendor agreements accordingly.
Proactively survey all workers to identify areas for improving policies, instructions, and work environments, and seek external feedback.

Implementation ResourcesiExisting or emerging resources

PAI’s Library of practitioner resources for responsible data enrichment sourcing.

Enable feedback mechanisms across the AI value chain

Description

Implement inclusive feedback loops across the AI value chain to ethically identify potential harms.

Baseline PracticesiSuggested minimum practices to meet the guidance

Provide clear feedback channels for application developers, consumers, and other direct users.

Recommended PracticesiRecommendations to exceed the minimum, where applicable

Proactively gather input from indirect stakeholders affected by AI systems through ethical community engagement.
Establish processes for reviewing feedback and integrating affected user perspectives into development and policy decisions.

Specialized & Closed Foundation Models

Research & Development

Scan for novel or emerging risks

Description

Proactively identify and address potential novel or emerging risks from foundation models.

Recommended Practices

Conduct model evaluations and experiments to identify new indicators for novel risks, including potential negative societal impacts, malicious uses, and other speculative risks, as appropriate to the model’s intended domain and task. Assess their likelihood and potential impact.
Establish regular processes to probe and address potential novel or emerging risks through techniques like red teaming.
Collaborate with stakeholders to advance the identification of novel risks and responsible disclosure practices.

Assess upstream security vulnerabilities

Description

Identify and address potential security vulnerabilities in foundation models to prevent unauthorized access or leaks.

Recommended Practices

Implement pre-release cybersecurity aligned to the model’s low generalized risk.
Conduct testing for vulnerabilities relevant to the model’s specialized capabilities and intended purpose.
Establish protocols for promptly addressing identified vulnerabilities pre-deployment.
Exceed baseline cybersecurity standards as risks and use cases evolve, drawing on guidance from standards bodies.
Share lessons learned across industry to collectively strengthen defenses.

Establish risk management and responsible AI structures for foundation models

Description

Establish risk management oversight processes and continuously adapt to address real world impacts from foundation models.

Baseline Practices

Establish risk management structures and processes such as enterprise risk management, and ethics review processes to define guidelines on responsible development, release, and staged rollout considerations, including when a model should not be released.
Regularly update policies, frameworks, and organizational oversight to address evolving capabilities and real-world impacts.

Internally evaluate models for safety

Description

Perform internal evaluations of models prior to release to assess and mitigate for potential societal risks, malicious uses, and other identified risks.

Baseline Practices

Establish internal evaluation policies and processes including testing for fairness, interpretability, output harms, and intended vs foreseeable unintended use cases.
Proactively identify and minimize potential sources of bias in training corpora, and adopt techniques to minimize unsafe model behavior.
Conduct evaluations using cross-disciplinary review teams spanning ethics, security, social science, and other relevant domains, when sensitive uses are implicated (e.g., Risk of physical or psychological injury, consequential impact on legal position or life opportunities, threat to human rights).
Address identified risks and adapt deployment plans accordingly based on learnings from pre-deployment evaluations.

Recommended Practices

Consult domain experts and affected users to complement internal oversight for specialized models where appropriate.

Societal Impact

Responsibly source all labor including data enrichment

Description

Responsibly source all forms of labor, including data enrichment tasks like data annotation and human verification of model outputs.

Baseline Practices

Pay or contract with vendors that will pay data enrichment workers above the workers’ local living wage.
Provide or contract with vendors that provide clear instructions for enrichment tasks that are tested for clarity. Enable workers to opt out of tasks.
Equip or contract with vendors that equip workers with simple and effective mechanisms for reporting issues, asking questions, and providing feedback on the instructions or task design.

Implementation Resources

PAI’s Library of practitioner resources for responsible data enrichment sourcing.

Specialized & Research Foundation Models

Research & Development

Scan for novel or emerging risks

Description

Proactively identify and address potential novel or emerging risks from foundation models.

Recommended Practices

Conduct model evaluations and experiments to identify new indicators for novel risks, including potential negative societal impacts, malicious uses, and other speculative risks, as appropriate to the model’s intended domain and task. Assess their likelihood and potential impact.
Establish regular processes to probe and address potential novel or emerging risks through techniques like red teaming.
Collaborate with stakeholders to advance the identification of novel risks and responsible disclosure practices.

Assess upstream security vulnerabilities

Description

Identify and address potential security vulnerabilities in foundation models to prevent unauthorized access or leaks.

Recommended Practices

Implement pre-release cybersecurity aligned to the model’s low generalized risk.
Conduct testing for vulnerabilities relevant to the model’s specialized capabilities and intended purpose.
Address identified vulnerabilities pre-deployment.

Establish risk management and responsible AI structures for foundation models

Description

Establish risk management oversight processes and continuously adapt to address real world impacts from foundation models.

Baseline Practices

Establish risk management structures and processes such as ethics review processes to define guidelines on responsible development and release.

Pre-Deployment

Internally evaluate models for safety

Description

Perform internal evaluations of models prior to release to assess and mitigate for potential societal risks, malicious uses, and other identified risks.

Baseline Practices

Establish internal evaluation policies and processes as appropriate including testing for fairness, interpretability, output harms, and intended vs foreseeable unintended use cases.
Identify and minimize potential sources of bias in training corpora as appropriate, and adopt techniques to minimize unsafe model behavior.
Address identified risks and adapt deployment plans accordingly based on learnings from pre-deployment evaluations.
Maintain documentation as appropriate of evaluation methods, results, limitations, and steps taken to address identified issues and integrate insights in public reporting per guidance below.

Recommended Practices

Consult domain experts and affected users to complement internal oversight for specialized models where appropriate.
Conduct evaluations using cross-disciplinary review teams spanning ethics, security, social science, and other relevant domains, when sensitive uses are implicated (e.g., Risk of physical or psychological injury, consequential impact on legal position or life opportunities, threat to human rights).

Publicly report model impacts and “key ingredient list”

Description

Baseline Practices

Publish “key ingredient list” as appropriate which can include the model’s compute, parameters, architecture, training data approach, and dataset and model documentation.
Disclose details such as model architecture, training methodology, performance benchmarks, intended use cases, risks, limitations, and steps to mitigate risks.
Disclose details such as testing methodologies, evaluation criteria, results, limitations, and gaps for any internal and external evaluations conducted prior to release.

Recommended Practices

Align disclosures with existing and emerging best practices like Model Cards, System Cards, Datasheets, Fact Sheets, Nutrition Labels, Transparency Notes and Reward Reports.

Establish safeguards to restrict unsafe uses

Description

Implement necessary organizational, procedural and technical safeguards, guidelines and controls to restrict unsafe uses and mitigate risks from foundation models.

Recommended Practices

Publish a research license prohibiting harmful applications, as appropriate for the model’s intended domain and task.
Provide downstream use documentation covering details like intended uses, limitations, mitigating risks, and safe development practices, if necessary.

Monitor deployed systems

Description

Monitor foundation models post-deployment to identify and address issues, misuse, and societal risks.

Recommended Practices

For specialized models, lightweight monitoring of public feedback is recommended and if egregious issues arise, retiring access to the model should be considered.

Societal Impact

Responsibly source all labor including data enrichment

Description

Responsibly source all forms of labor, including data enrichment tasks like data annotation and human verification of model outputs.

Baseline Practices

Pay or contract with vendors that will pay data enrichment workers above the workers’ local living wage.
Provide or contract with vendors that provide clear instructions for enrichment tasks that are tested for clarity. Enable workers to opt out of tasks.
Equip or contract with vendors that equip workers with simple and effective mechanisms for reporting issues, asking questions, and providing feedback on the instructions or task design.

(Note that the guideline is most applicable when directly contracting labor. For open datasets, practices may not be feasible.)

Implementation Resources

PAI’s Library of practitioner resources for responsible data enrichment sourcing.

Advanced & Open Foundation Models

Research & Development

Scan for novel or emerging risks

Description

Proactively identify and address potential novel or emerging risks from foundation models.

Baseline Practices

Conduct model evaluations and experiments to identify new indicators for novel risks, including potential negative societal impacts, malicious uses, “dangerous capabilities” like persuasion and other speculative risks. Study potential risks from integrating the model into novel or unexpected downstream environments and use cases. Assess their likelihood and potential impact.
Establish regular processes to probe and address potential novel or emerging risks through techniques like red teaming.

Recommended Practices

Collaborate with stakeholders to advance the identification of novel risks and responsible disclosure practices.

Assess upstream security vulnerabilities

Description

Identify and address potential security vulnerabilities in foundation models to prevent unauthorized access or leaks.

Baseline Practices

Implement comprehensive cybersecurity standards at the start of the development.
Conduct rigorous testing such as penetration testing, prompt analysis, and data poisoning assessments to identify vulnerabilities that could enable model leaks or manipulation.
Establish protocols for addressing identified vulnerabilities pre-deployment.

Recommended Practices

Exceed baseline cybersecurity standards as risks and use cases evolve, drawing on guidance from standards bodies.
Offer bug bounty programs to encourage external vulnerability discovery.
Share lessons learned across industry to collectively strengthen defenses.

Implementation Resources

UC Berkeley CLTC Draft Profile under Manage 2.4

Establish risk management and responsible AI structures for foundation models

Description

Establish risk management oversight processes and continuously adapt to address real world impacts from foundation models.

Baseline Practices

Establish risk management structures and processes such as enterprise risk management, independent safety boards, and ethics review processes to define guidelines on responsible development, release, and staged rollout considerations, including when a model should not be released.
Regularly update policies, frameworks, and organizational oversight to address evolving capabilities and real-world impacts.

Pre-Deployment

Internally evaluate models for safety

Description

Perform internal evaluations of models prior to release to assess and mitigate for potential societal risks, malicious uses, and other identified risks.

Baseline Practices

Establish comprehensive internal evaluation policies and processes including testing for fairness, interpretability, output harms, and intended vs foreseeable unintended use cases.
Proactively identify and minimize potential sources of bias in training corpora, and adopt techniques to minimize unsafe model behavior.
Conduct evaluations using cross-disciplinary review teams spanning ethics, security, social science, and other relevant domains.
Use pre-release red teaming methods to assess the potential for implemented safety features per guidance below to be circumvented post-release, for example through additional reinforcement learning from human feedback (RLHF) fine tuning designed to counteract the provider’s safety interventions.
Address identified risks and adapt deployment plans accordingly based on learnings from pre-deployment evaluations.
Maintain documentation of evaluation methods, results, limitations, and steps taken to address identified issues and integrate insights in public reporting per guidance below.

Recommended Practices

Collaborate across industry, civil society, and academia to advance the development and standardization of model evaluations for foundation models.

Conduct external model evaluations to assess safety

Description

Complement internal testing through model access to third-party researchers to assess and mitigate potential societal risks, malicious uses, and other identified risks.

Baseline Practices

Provide controlled access to models for additional evaluative testing by external researchers.
Consult independent third parties to audit models following prevailing best practices on methodologies.
Address identified risks and adapt deployment plans accordingly based on learnings from pre-deployment evaluations.
Maintain documentation of evaluation methods, results, limitations, and steps taken to address identified issues and integrate insights in public reporting per guidance below.

(Note: Enabling robust third-party auditing remains an open challenge requiring ongoing research and attention)

Recommended Practices

Pursue diverse external assessment methods including panels and focus groups.
Collaborate with third parties to support creation of context-specific auditing methodologies focused on evaluating real-world impacts in specific domains and use cases per guidance below.

Undertake red-teaming and share findings

Description

Implement red teaming that probes foundation models for potential malicious uses, societal risks and other identified risks prior to release. Address risks and responsibly disclose findings to advance collective knowledge.

Baseline Practices

Perform internal and external red teaming across model capabilities, use cases, and potential harms including dual-use risks using techniques such as adversarial testing, vulnerability scanning, and surfacing edge cases and failure modes.
Conduct iterative red teaming throughout model development. Continuously evaluate results to identify areas for risk mitigation and improvements, including for planned safeguards.
Commission external red teaming by independent experts such as domain experts and affected users to surface gaps.
Address identified risks and adapt deployment plans accordingly based on learnings from pre-deployment evaluations.
Responsibly disclose findings, aligned with guidance below on public reporting.

Recommended Practices

Select external red teamers to incentivize the objective discovery of flaws and ensure adequate independence.
Collaborate across industry, civil society, and academia to advance red teaming methodologies and responsible disclosures.

Publicly report model impacts and “key ingredient list”

Description

Baseline Practices

Publish “key ingredient list” which can include the model’s compute, parameters, architecture, training data approach, and dataset and model documentation.
Disclose details such as model architecture, training methodology, performance benchmarks, intended use cases, risks, limitations, and steps to mitigate risks.
Disclose details such as testing methodologies, evaluation criteria, results, limitations, and gaps for any internal and external evaluations conducted prior to release.

Recommended Practices

Collaborate across industry, civil society, and academia to advance public reporting practices weighing transparency with privacy, safety, and other tradeoffs.
Align disclosures with existing and emerging best practices like Model Cards, System Cards, Datasheets, Fact Sheets, Nutrition Labels, Transparency Notes and Reward Reports.

Provide downstream use documentation

Description

Equip downstream developers with comprehensive documentation and guidance needed to build safe, ethical, and responsible applications using foundation models.

Baseline Practices

Provide documentation to downstream developers covering details such as suggested intended uses, limitations, steps to mitigate risks, and safe development practices when building on foundation models.
Ensure documentation follows prevailing industry standards and accepted best practices for responsible AI development.

Recommended Practices

Collaborate with civil society and downstream developers to advance documentation standards that meet the needs of developers when models are offered through restricted access. This can include gathering inputs on:
Safe development checklists for building responsibly on restricted models.
Preferred channels for usage guidance and addressing developer questions, aligned with guidance below on enabling feedback mechanisms.

Establish safeguards to restrict unsafe uses

Description

Implement necessary organizational, procedural and technical safeguards, guidelines and controls to restrict unsafe uses and mitigate risks from foundation models.

Baseline Practices

Publish a responsible AI license prohibiting harmful applications
For openly released models, embed safety features directly into model architectures, interfaces and integrations that cannot be easily removed or bypassed post-release.

Recommended Practices

Maintain processes to regularly review downstream usage and update responsible use guidelines accordingly.
Collaborate across industry and civil society to identify emerging threats requiring new safeguards.
Provide appropriate transparency into safeguards while protecting integrity.

Post-Deployment

Monitor deployed systems

Description

Monitor foundation models post-deployment to identify issues, misuse, and societal risks.

Baseline Practices

For openly released models, establish monitoring of user feedback on the model’s performance, fairness, unintended uses, misuses and other impacts. Monitoring could involve reviewing public user feedback in open model repositories (like GitHub) and forums
Define processes to respond appropriately if issues arise, including notifying partners of significant incidents and considering retiring support for a model if necessary per guidance below.

Recommended Practices

Provide transparency into monitoring practices, while protecting user privacy.

Implement incident reporting

Description

Enable timely and responsible reporting of safety incidents to improve collective learning.

Baseline Practices

Implement secure channels aligned with guidance below on enabling feedback mechanisms for external stakeholders to report safety incidents or concerns. Also enable internal teams to responsibly report incidents.
Notify appropriate regulators and partners of critical incidents according to established criteria.

(Note: Baseline responsibilities for incident reporting are still emerging across stakeholders.)

Recommended Practices

Proactively seek external feedback to improve transparency and effectiveness of incident reporting policies and processes.
Contribute appropriate anonymized data to collaborative incident tracking initiatives to enable identifying systemic issues, while weighing trade offs like privacy, security, and other concerns.

Establish decommissioning policies

Description

Responsibly retire support for foundation models based on well-defined criteria and processes.

Baseline Practices

Establish model support decommissioning procedures and policies including criteria for determining when to stop hosting the model or when to adopt changes to the model’s license to limit or prohibit continued use or development.

(Note: After open release of a foundation model’s weights, its original developers will in effect be unable to decommission AI systems that others build using those model weights.)

Recommended Practices

Continue monitoring retired models for downstream impacts and security vulnerabilities per guidance above on assessing security vulnerabilities to prevent unauthorized access and leaks.

Develop transparency reporting standards

Description

Collaboratively establish transparency reporting standards for disclosing foundation model usage and policy violations.

Baseline Practices

Participate in collaborative initiatives to align on transparency reporting frameworks and standards with industry, civil society, and academia, as commercial uses evolve.

Recommended Practices

Release periodic transparency reports following adopted standards, disclosing aggregated usage insights and violation data. Take appropriate measures to ensure transparency reporting protects user privacy and data.

Societal Impact

Support third party inspection of models and training data

Description

Support progress of third-party auditing capabilities for responsible foundation model development through collaboration, innovation and transparency.

Recommended Practices

Provide sufficient transparency into models and datasets to enable independent assessment and auditing by third parties such as academics and civil society. (Note: Enabling robust third-party auditing remains an open challenge requiring ongoing research and attention).
Collaborate with third parties to support creation of context-specific auditing methodologies focused on evaluating real-world impacts in specific domains and use cases, beyond base-model evaluations which focus on societal impact evaluations that are not tied to a specific application context.

Responsibly source all labor including data enrichment

Description

Responsibly source all forms of labor, including data enrichment tasks like data annotation and human verification of model outputs.

Baseline Practices

Pay or contract with vendors that will pay data enrichment workers above the workers’ local living wage.
Provide or contract with vendors that provide clear instructions for enrichment tasks that are tested for clarity. Enable workers to opt out of tasks.
Equip or contract with vendors that equip workers with simple and effective mechanisms for reporting issues, asking questions, and providing feedback on the instructions or task design.

(Note that the guideline is most applicable when directly contracting labor. For open datasets, practices may not be feasible.)

Recommended Practices

Design and run a pilot before launching a data enrichment project
Disclose any new types of labor that enter the supply chain of foundation models. Ensure policies and responsible sourcing practices extend as appropriate to new labor sources as they emerge, like red teamers. Update internal standards and vendor agreements accordingly.
Proactively survey all workers to identify areas for improving policies, instructions, and work environments, and seek external feedback.

Implementation Resources

PAI’s Library of practitioner resources for responsible data enrichment sourcing.

Conduct human rights due diligence

Description

Implement comprehensive human rights due diligence methodologies to assess and address the impacts of foundation models.

Baseline Practices

Establish processes for conducting human rights impact assessments pre-deployment.
Align with relevant guidance like the UN Guiding Principles on Business and Human Rights, and White House Blueprint for AI Bill of Rights. Proactively assess and address potential impacts on vulnerable communities.
Continuously improve due diligence processes by collaborating with stakeholders and incorporating community feedback.

Recommended Practices

Publicly disclose identified risks, due diligence methodologies, and measures to address impacts.

Enable feedback mechanisms across the AI value chain

Description

Implement inclusive feedback loops across the AI value chain to ethically identify potential harms.

Baseline Practices

Provide clear feedback channels for application developers, consumers, and other direct users.

Recommended Practices

Proactively gather input from indirect stakeholders affected by AI systems through ethical community engagement.
Establish processes for reviewing feedback and integrating affected user perspectives into development and policy decisions.

Measure and disclose environmental impacts

Description

Measure and disclose the environmental impacts resulting from developing foundation models.

Baseline Practices

Establish processes to evaluate environmental costs like energy usage, carbon emissions and other metrics.
Monitor and report on environmental impacts of model development.

Recommended Practices

Provide environmental measurement/disclosure mechanisms for application developers building on foundation models.
Incorporate impacts into model development decisions.
Collaborate across industry, civil society, and academia to advance the measurement of environmental impacts and responsible disclosure practices.

Disclose synthetic content

Description

Adopt responsible practices for disclosing synthetic media and advance solutions for identifying other synthetic content.

Baseline Practices

Provide disclosure mechanisms (both direct disclosure that is viewer or listener facing and indirect disclosure that is embedded) for those creating and distributing synthetic media — content that is not identifiable to the average person and may simulate artifacts, persons, or events.
Evaluate robustness, ease of manipulation, privacy implications, societal impact, and inherent tradeoffs of different disclosure methods. Provide transparency into assessments and rationale behind final disclosure decisions.
See Section 2 of PAI’s Responsible Practices for Synthetic Media for more information and practices for those building the models for synthetic media.

Recommended Practices

Collaborate across industry, civil society, and academia to advance interoperability and standardization for disclosure of synthetic media.
Research, develop, and distribute solutions to enable identification and disclosure of synthetic content, including voice and text.

Advanced & Restricted Access Foundation Models

Research & Development

Scan for novel or emerging risks

Description

Proactively identify and address potential novel or emerging risks from foundation models.

Baseline Practices

Conduct model evaluations and experiments to identify new indicators for novel risks, including potential negative societal impacts, malicious uses, “dangerous capabilities”like persuasion and other speculative risks. Study potential risks from integrating the model into novel or unexpected downstream environments and use cases. Assess their likelihood and potential impact.
Establish regular processes to probe and address potential novel or emerging risks through techniques like red teaming.

Recommended Practices

Collaborate with stakeholders to advance the identification of novel risks and responsible disclosure practices.

Assess upstream security vulnerabilities

Description

Identify and address potential security vulnerabilities in foundation models to prevent unauthorized access or leaks.

Baseline Practices

Implement comprehensive cybersecurity standards at the start of the development.
Conduct rigorous testing such as penetration testing, prompt analysis, and data poisoning assessments to identify vulnerabilities that could enable model leaks or manipulation.
Establish protocols for addressing identified vulnerabilities pre-deployment.

Recommended Practices

Exceed baseline cybersecurity standards as risks and use cases evolve, drawing on guidance from standards bodies.
Offer bug bounty programs to encourage external vulnerability discovery.
Share lessons learned across industry to collectively strengthen defenses.
Release regular updates to the model that patches security vulnerabilities.

Implementation Resources

UC Berkeley CLTC Draft Profile under Measure 2.7

Establish risk management and responsible AI structures for foundation models

Description

Establish risk management oversight processes and continuously adapt to address real world impacts from foundation models.

Baseline Practices

Establish risk management structures and processes such as enterprise risk management, independent safety boards, and ethics review processes to define guidelines on responsible development, release, and staged rollout considerations, including when a model should not be released.
Regularly update policies, frameworks, and organizational oversight to address evolving capabilities and real-world impacts.

Pre-Deployment

Internally evaluate models for safety

Description

Perform internal evaluations of models prior to release to assess and mitigate for potential societal risks, malicious uses, and other identified risks.

Baseline Practices

Establish comprehensive internal evaluation policies and processes including testing for fairness, interpretability, output harms, and intended vs foreseeable unintended use cases.
Proactively identify and minimize potential sources of bias in training corpora, and adopt techniques to minimize unsafe model behavior.
Conduct evaluations using cross-disciplinary review teams spanning ethics, security, social science, and other relevant domains.
Address identified risks and adapt deployment plans accordingly based on learnings from pre-deployment evaluations.
Maintain documentation of evaluation methods, results, limitations, and steps taken to address identified issues and integrate insights in public reporting per guidance below.

Recommended Practices

Collaborate across industry, civil society, and academia to advance the development and standardization of model evaluations for foundation models.

Conduct external model evaluations to assess safety

Description

Complement internal testing through model access to third-party researchers to assess and mitigate potential societal risks, malicious uses, and other identified risks.

Baseline Practices

Provide controlled access to models for additional evaluative testing by external researchers.
Consult independent third parties to audit models following prevailing best practices on methodologies.
Address identified risks and adapt deployment plans accordingly based on learnings from pre-deployment evaluations.
Maintain documentation of evaluation methods, results, limitations, and steps taken to address identified issues and integrate insights in public reporting per guidance below.

(Note: Enabling robust third-party auditing remains an open challenge requiring ongoing research and attention.)

Recommended Practices

Pursue diverse external assessment methods including panels and focus groups.
Collaborate with third parties to support creation of context-specific auditing methodologies focused on evaluating real-world impacts in specific domains and use cases per guidance below.

Undertake red-teaming and share findings

Description

Baseline Practices

Perform internal and external red teaming across model capabilities, use cases, and potential harms including dual-use risks using techniques such as adversarial testing, vulnerability scanning, and surfacing edge cases and failure modes.
Conduct iterative red teaming throughout model development. Continuously evaluate results to identify areas for risk mitigation and improvements, including for planned safeguards.
Commission external red teaming by independent experts such as domain experts and affected users to surface gaps.
Address identified risks and adapt deployment plans accordingly based on learnings from pre-deployment evaluations.
Responsibly disclose findings, aligned with guidance below on public reporting.

Recommended Practices

Select external red teamers to incentivize the objective discovery of flaws and ensure adequate independence.
Collaborate across industry, civil society, and academia to advance red teaming methodologies and responsible disclosures.

Publicly report model impacts and “key ingredient list”

Description

Baseline Practices

Publish “key ingredient list” which can include the model’s compute, parameters, architecture, training data approach, and dataset and model documentation.
Disclose details such as model architecture, training methodology, performance benchmarks, intended use cases, risks, limitations, and steps to mitigate risks.
Disclose details such as testing methodologies, evaluation criteria, results, limitations, and gaps for any internal and external evaluations conducted prior to release.

Recommended Practices

Collaborate across industry, civil society, and academia to advance public reporting practices weighing transparency with privacy, safety, and other tradeoffs.
Align disclosures with existing and emerging best practices like Model Cards, System Cards, Datasheets, Fact Sheets, Nutrition Labels, Transparency Notes and Reward Reports.

Provide downstream use documentation

Description

Equip downstream developers with comprehensive documentation and guidance needed to build safe, ethical, and responsible applications using foundation models.

Baseline Practices

Provide documentation to downstream developers covering details such as suggested intended uses, limitations, steps to mitigate risks, and safe development practices when building on foundation models.
Ensure documentation follows prevailing industry standards and accepted best practices for responsible AI development.

Recommended Practices

Collaborate with civil society and downstream developers to advance documentation standards that meet the needs of developers when models are offered through restricted access. This can include gathering inputs on:
- Safe development checklists for building responsibly on restricted models.
- Preferred channels for usage guidance and addressing developer questions, aligned with guidance below on enabling feedback mechanisms.

Establish safeguards to restrict unsafe uses

Description

Implement necessary organizational, procedural and technical safeguards, guidelines and controls to restrict unsafe uses and mitigate risks from foundation models.

Baseline Practices

Embed safety features directly into model architectures, interfaces and integrations
Publish clear terms of use prohibiting harmful applications and outlining enforcement policies.
Limit access through approved applications by implementing appropriate identity/eligibility verification requirements to restrict misuse, along with other technical controls like rate limiting and content filtering.
Maintain processes to regularly re-evaluate technical and procedural controls, monitor their effectiveness including robustness against jailbreaking attempts, and update terms of use as potential misuses evolve.

Recommended Practices

Collaborate across industry and civil society to identify emerging threats requiring new safeguards.
Provide appropriate transparency into safeguards while protecting integrity.

Post-Deployment

Monitor deployed systems

Description

Continuously monitor foundation models post-deployment to identify issues, misuse, and societal risks.

Baseline Practices

Establish ongoing monitoring procedures for deployed models covering areas like performance, fairness, unintended uses, misuses, and other impacts.
Define processes to detect issues and respond appropriately, including notifying partners of significant incidents and considering restricting or retiring a model per guidelines below.

Recommended Practices

Provide transparency into monitoring practices, while protecting user privacy.
Collaborate across industry, civil society, and academia to identify shared challenges and best practices for monitoring.

Implement incident reporting

Description

Enable timely and responsible reporting of safety incidents to improve collective learning.

Baseline Practices

Implement secure channels aligned with guidance below on enabling feedback mechanisms for external stakeholders to report safety incidents or concerns. Also enable internal teams to responsibly report incidents.
Notify appropriate regulators and partners of critical incidents according to established criteria.

(Note: Baseline practices for incident reporting are still emerging across stakeholders.)

Recommended Practices

Proactively seek external feedback to improve transparency and effectiveness of incident reporting policies and processes.
Contribute appropriate anonymized data to collaborative incident tracking initiatives to enable identifying systemic issues, while weighing trade offs like privacy, security, and other concerns.

Establish decommissioning policies

Description

Responsibly retire foundation models from active use based on well-defined criteria and processes.

Baseline Practices

Establish decommissioning procedures and policies including criteria for determining when to restrict, suspend or retire models:
- Restrict — Limit model use to reduced set of use cases/applications.
- Suspend — Temporarily prohibit all model use for remediation.
- Retire — Permanently take model out of service.

Recommended Practices

Continue monitoring retired models for downstream impacts and security vulnerabilities per guidance above to prevent unauthorized access and leaks.

Develop transparency reporting standards

Description

Collaboratively establish transparency reporting standards for disclosing foundation model usage and policy violations.

Baseline Practices

Participate in collaborative initiatives to align on transparency reporting frameworks and standards with industry, civil society, and academia, as commercial uses evolve.

Recommended Practices

Release periodic transparency reports following established standards, disclosing aggregated usage insights and violation data. Take appropriate measures to ensure transparency reporting protects user privacy and data.

Societal Impact

Support third party inspection of models and training data

Description

Support progress of third-party auditing capabilities for responsible foundation model development through collaboration, innovation and transparency.

Recommended Practices

Provide sufficient transparency into models and datasets to enable independent assessment and auditing by third parties such as academics and civil society. (Note: Enabling robust third-party auditing remains an open challenge requiring ongoing research and attention.)
Collaborate with third parties to support creation of context-specific auditing methodologies focused on evaluating real-world impacts in specific domains and use cases, beyond base-model evaluations which focus on societal impact evaluations that are not tied to a specific application context.

Responsibly source all labor including data enrichment

Description

Responsibly source all forms of labor, including data enrichment tasks like data annotation and human verification of model outputs.

Baseline Practices

Pay or contract with vendors that will pay data enrichment workers above the workers’ local living wage.
Provide or contract with vendors that provide clear instructions for enrichment tasks that are tested for clarity. Enable workers to opt out of tasks.
Equip or contract with vendors that equip workers with simple and effective mechanisms for reporting issues, asking questions, and providing feedback on the instructions or task design.

Recommended Practices

Design and run a pilot before launching a data enrichment project.
Disclose any new types of labor that enter the supply chain of foundation models. Ensure policies and responsible sourcing practices extend as appropriate to new labor sources as they emerge, like red teamers. Update internal standards and vendor agreements accordingly.
Proactively survey all workers to identify areas for improving policies, instructions, and work environments, and seek external feedback.

Implementation Resources

PAI’s Library of practitioner resources for responsible data enrichment sourcing.

Conduct human rights due diligence

Description

Implement comprehensive human rights due diligence methodologies to assess and address the impacts of foundation models.

Baseline Practices

Establish processes for conducting human rights impact assessments pre-deployment.
Align with relevant guidance like the UN Guiding Principles on Business and Human Rights, and White House Blueprint for AI Bill of Rights. Proactively assess and address potential impacts on vulnerable communities.
Continuously improve due diligence processes by collaborating with stakeholders and incorporating community feedback.

Recommended Practices

Publicly disclose identified risks, due diligence methodologies, and measures to address impacts.

Enable feedback mechanisms across the AI value chain

Description

Implement inclusive feedback loops across the AI value chain to ethically identify potential harms.

Baseline Practices

Provide clear feedback channels for application developers, consumers, and other direct users.

Recommended Practices

Proactively gather input from indirect stakeholders affected by AI systems through ethical community engagement.
Establish processes for reviewing feedback and integrating affected user perspectives into development and policy decisions.

Measure and disclose environmental impacts

Description

Measure and disclose the environmental impacts resulting from developing and deploying foundation models.

Baseline Practices

Establish processes to evaluate environmental costs like energy usage, carbon emissions and other metrics.
Monitor and report on environmental impacts of model development and deployment.

Recommended Practices

Provide environmental measurement/disclosure mechanisms for application developers building on foundation models.
Incorporate impacts into model development decisions.
Collaborate across industry, civil society, and academia to advance the measurement of environmental impacts and responsible disclosure practices.

Disclose synthetic content

Description

Adopt responsible practices for disclosing synthetic media and advance solutions for identifying other synthetic content.

Baseline Practices

Provide disclosure mechanisms (both direct disclosure that is viewer or listener facing and indirect disclosure that is embedded) for those creating and distributing synthetic media — content that is not identifiable to the average person and may simulate artifacts, persons, or events.
Evaluate robustness, ease of manipulation, privacy implications, societal impact, and inherent tradeoffs of different disclosure methods. Provide transparency into assessments and rationale behind final disclosure decisions.
See Section 2 of PAI’s Responsible Practices for Synthetic Media for more information and practices for those building the models for synthetic media.

Recommended Practices

Collaborate across industry, civil society, and academia to advance interoperability and standardization for disclosure of synthetic media.
Research, develop, and distribute solutions to enable identification and disclosure of synthetic content, including voice and text.

Advanced & Closed Foundation Models

Research & Development

Scan for novel or emerging risks

Description

Proactively identify and address potential novel or emerging risks from foundation models.

Baseline Practices

Conduct model evaluations and experiments to identify new indicators for novel risks, including potential negative societal impacts, malicious uses, “dangerous capabilities” and other speculative risks. Study potential risks from integrating the model into novel or unexpected downstream environments and use cases. Assess their likelihood and potential impact.
Establish regular processes to probe and address potential novel or emerging risks through techniques like red teaming.

Recommended Practices

Collaborate with stakeholders to advance the identification of novel risks and responsible disclosure practices.

Assess upstream security vulnerabilities

Description

Identify and address potential security vulnerabilities in foundation models to prevent unauthorized access or leaks.

Baseline Practices

Implement comprehensive cybersecurity standards at the start of the development.
Conduct rigorous testing such as penetration testing, prompt analysis, and data poisoning assessments to identify vulnerabilities that could enable model leaks or manipulation.
Establish protocols for addressing identified vulnerabilities pre-deployment.

Recommended Practices

Exceed baseline cybersecurity standards as risks and use cases evolve, drawing on guidance from standards bodies.

Implementation Resources

UC Berkeley CLTC Draft Profile under Manage 2.4

Establish risk management and responsible AI structures for foundation models

Description

Establish risk management oversight processes and continuously adapt to address real world impacts from foundation models.

Baseline Practices

Establish risk management structures and processes such as enterprise risk management, independent safety boards, and ethics review processes to define guidelines on responsible development, release, and staged rollout considerations, including when a model should not be released.
Regularly update policies, frameworks, and organizational oversight to address evolving capabilities and real-world impacts.

Internally evaluate models for safety

Description

Perform internal evaluations of models prior to release to assess and mitigate for potential societal risks, malicious uses, and other identified risks.

Baseline Practices

Establish comprehensive internal evaluation policies and processes including testing for fairness, interpretability, output harms, and intended vs foreseeable unintended use cases.
Proactively identify and minimize potential sources of bias in training corpora, and adopt techniques to minimize unsafe model behavior.
Conduct evaluations using cross-disciplinary review teams spanning ethics, security, social science, and other relevant domains.
Address identified risks and adapt deployment plans accordingly based on learnings from pre-deployment evaluations.

Recommended Practices

Collaborate across industry, civil society, and academia to advance the development and standardization of model evaluations for foundation models.

Societal Impact

Responsibly source all labor including data enrichment

Description

Responsibly source all forms of labor, including data enrichment tasks like data annotation and human verification of model outputs.

Baseline Practices

Pay or contract with vendors that will pay data enrichment workers above the workers’ local living wage.
Provide or contract with vendors that provide clear instructions for enrichment tasks that are tested for clarity. Enable workers to opt out of tasks.
Equip or contract with vendors that equip workers with simple and effective mechanisms for reporting issues, asking questions, and providing feedback on the instructions or task design.

Implementation Resources

PAI’s Library of practitioner resources for responsible data enrichment sourcing.

Advanced & Research Foundation Models

Research & Development

Scan for novel or emerging risks

Description

Proactively identify and address potential novel or emerging risks from foundation models.

Recommended Practices

Conduct model evaluations and experiments to identify new indicators for novel risks, including potential negative societal impacts, malicious uses, “dangerous capabilities”like persuasion and other speculative risks. Assess their likelihood and potential impact.
Establish regular processes to probe and address potential novel or emerging risks through techniques like red teaming.
Collaborate with stakeholders to advance the identification of novel risks and responsible disclosure practices.

Assess upstream security vulnerabilities

Description

Identify and address potential security vulnerabilities in foundation models to prevent unauthorized access or leaks.

Baseline Practices

Implement comprehensive cybersecurity standards at the start of the development.
Conduct rigorous testing such as penetration testing, prompt analysis, and data poisoning assessments to identify vulnerabilities that could enable model leaks or manipulation.
Address identified vulnerabilities pre-deployment.

Recommended Practices

Exceed baseline cybersecurity standards as risks and use cases evolve, drawing on guidance from standards bodies.
Share lessons learned across industry to collectively strengthen defenses.
Release regular updates to the model that patches security vulnerabilities.

Implementation Resources

UC Berkeley CLTC Draft Profile under Manage 2.4

Establish risk management and responsible AI structures for foundation models

Description

Establish risk management oversight processes and continuously adapt to address real world impacts from foundation models.

Baseline Practices

Establish risk management structures and processes such as ethics review processes to define guidelines on responsible development, release, and staged rollout considerations, including when a model should not be released.
Regularly update policies, frameworks, and organizational oversight to address evolving capabilities and real-world impacts.

Pre-Deployment

Internally evaluate models for safety

Description

Perform internal evaluations of models prior to release to assess and mitigate for potential societal risks, malicious uses, and other identified risks.

Baseline Practices

Establish comprehensive internal evaluation policies and processes as appropriate including testing for fairness, interpretability, output harms, and intended vs foreseeable unintended use cases.
Identify and minimize potential sources of bias in training corpora as appropriate, and adopt techniques to minimize unsafe model behavior.
Conduct evaluations using cross-disciplinary review teams spanning ethics, security, social science, and other relevant domains.
Use pre-release red teaming methods to assess the potential for implemented safety features per guidance below to be circumvented post-release, for example through additional reinforcement learning from human feedback (RLHF) fine tuning designed to counteract the provider’s safety interventions.
Address identified risks and adapt deployment plans accordingly based on learnings from pre-deployment evaluations.
Maintain documentation as appropriate of evaluation methods, results, limitations, and steps taken to address identified issues and integrate insights in public reporting per guidance below.

Recommended Practices

Conduct evaluations using cross-disciplinary review teams spanning ethics, security, social science, and other relevant domains, when sensitive uses are implicated (e.g., Risk of physical or psychological injury, consequential impact on legal position or life opportunities, threat to human rights).
Collaborate across industry, civil society, and academia to advance the development and standardization of model evaluations for foundation models.

Publicly report model impacts and “key ingredient list”

Description

Baseline Practices

Publish “key ingredient list” as appropriate which can include the model’s compute, parameters, architecture, training data approach, and dataset and model documentation.
Disclose details such as model architecture, training methodology, performance benchmarks, intended use cases, risks, limitations, and steps to mitigate risks.
Disclose details such as testing methodologies, evaluation criteria, results, limitations, and gaps for any internal and external evaluations conducted prior to release.

Recommended Practices

Collaborate across industry, civil society, and academia to advance public reporting practices weighing transparency with privacy, safety, and other tradeoffs.
Align disclosures with existing and emerging best practices like Model Cards, System Cards, Datasheets, Fact Sheets, Nutrition Labels, Transparency Notes and Reward Reports.

Establish safeguards to restrict unsafe uses

Description

Implement necessary organizational, procedural and technical safeguards, guidelines and controls to restrict unsafe uses and mitigate risks from foundation models.

Recommended Practices

Publish a research license prohibiting harmful applications.
Provide downstream use documentation covering details like intended uses, limitations, mitigating risks, and safe development practices.
Provide appropriate transparency into safeguards while protecting integrity.

Post-Deployment

Monitor deployed systems

Description

Monitor foundation models post-deployment to identify and address issues, misuse, and societal risks.

Baseline Practices

Establish monitoring procedures as appropriate for deployed models covering areas like performance, fairness, unintended uses, misuses, and other impacts, which can include monitoring of public feedback.
Define processes to detect issues and respond appropriately, including considering restricting or retiring a model.

Recommended Practices

Provide transparency into monitoring practices, while protecting user privacy.

Societal Impact

Responsibly source all labor including data enrichment

Description

Responsibly source all forms of labor, including data enrichment tasks like data annotation and human verification of model outputs.

Baseline Practices

Pay or contract with vendors that will pay data enrichment workers above the workers’ local living wage.
Provide or contract with vendors that provide clear instructions for enrichment tasks that are tested for clarity. Enable workers to opt out of tasks.
Equip or contract with vendors that equip workers with simple and effective mechanisms for reporting issues, asking questions, and providing feedback on the instructions or task design.

(Note that the guideline is most applicable when directly contracting labor. For open datasets, practices may not be feasible.)

Implementation Resources

PAI’s Library of practitioner resources for responsible data enrichment sourcing.

Measure and disclose environmental impacts

Description

Measure and disclose the environmental impacts resulting from developing and deploying foundation models.

Baseline Practices

Establish processes to evaluate environmental costs like energy usage, carbon emissions and other metrics.
Monitor and report on environmental impacts of model development and deployment.

Disclose synthetic content

Description

Adopt responsible practices for disclosing synthetic media.

Baseline Practices

Disclose (both direct disclosure that is viewer or listener facing and indirect disclosure that is embedded) when creating and distributing synthetic media — content that is not identifiable to the average person and may simulate artifacts, persons, or events.
See Section 3 of PAI’s Responsible Practices for Synthetic Media for more information and practices for creators of synthetic media.

Frontier & Open Foundation Models

We recommend providers initially err towards staged rollouts and restricted access to establish confidence in risk management for these systems before considering open availability.

These models may possess unprecedented capabilities and modalities not yet sufficiently tested in use, carrying uncertainties around risks of misuse and societal impacts. Over time, as practices and norms mature, open access may become viable if adequate safeguards are demonstrated.

Frontier & Restricted Foundation Models

Research & Development

Scan for novel or emerging risks

Description

Proactively identify and address potential novel or emerging risks from frontier models.

Baseline Practices

Conduct model evaluations and experiments to identify new indicators for novel risks, including potential negative societal impacts, malicious uses, “dangerous capabilities” like persuasion and other speculative risks. Study potential risks from integrating the model into novel or unexpected downstream environments and use cases. Assess their likelihood and potential impact.
Establish regular processes to probe and address potential novel or emerging risks through techniques like external red teaming.

Recommended Practices

Collaborate with relevant stakeholders (ie. governments, other labs and academic researchers) to advance the identification of novel risks and responsible disclosure practices.

Implementation Resources

Organizations such as the Frontier Model Forum can contribute to the ongoing development of novel risk assessment practices.

Practice responsible iteration

Description

Practice responsible iteration to mitigate potential risks when developing and deploying frontier models, through both internal testing and limited external releases.

Baseline Practices

Before model development, forecast intended capabilities and likely outcomes to inform risk assessments.
Commence model development on a smaller scale, systematically test for risks during internal iterations through evaluations and red teaming, incrementally address identified risks, and update forecasts of model capabilities and risks (internal iteration).
Deploy frontier models in limited, experimental environments to study impacts before considering full deployment (external iteration).
Adapt deployment based on risk assessments and learnings during iterations.

Recommended Practices

Maintain documentation of each iteration, including lessons learned, challenges faced, and mitigation measures implemented.
Share documentation with government bodies as required and seek feedback from a diverse range of stakeholders including domain experts and affected users to inform the iteration process and risk mitigation strategies.
Collaborate with stakeholders to inform and advance responsible iteration approaches.

(Note that this is an emerging and less-explored guardrail, and disclosure practices will need to evolve as we learn more about its effectiveness.)

Assess upstream security vulnerabilities

Description

Identify and address potential security vulnerabilities in frontier models to prevent unauthorized access or leaks.

Baseline Practices

Implement comprehensive cybersecurity standards at the start of the development.
Conduct rigorous testing such as penetration testing, prompt analysis, and data poisoning assessments to identify vulnerabilities that could enable model leaks or manipulation.
Establish protocols for addressing identified vulnerabilities pre-deployment.

Recommended Practices

Exceed baseline cybersecurity standards as risks and use cases evolve, drawing on guidance from standards bodies.
Offer bug bounty programs to encourage external vulnerability discovery.
Share lessons learned across industry to collectively strengthen defenses.
Release regular updates to the model that patches security vulnerabilities.

Implementation Resources

UC Berkeley CLTC Draft Profile under Measure 2.7

Produce a “Pre-Systems Card”

Description

Disclose planned testing, evaluation, and risk management procedures for frontier models prior to development.

Baseline Practices

Produce a “pre-systems card” detailing quality management plans before research begins.
Outline intended training data approach, model testing, safety evaluations, responsible AI practices, and development team.
Submit pre-systems cards to regulatory bodies for review where required.
Update plans as needed throughout the R&D process.

(Note: baseline practices for disclosure practices are still emerging across stakeholders.)

Recommended Practices

Prepare a written “safety case”, explaining why the model is safe enough to develop.
Share pre-systems cards with independent experts for external review.
The guidance on Responsible Iteration guides testing and release practices, while pre-systems cards disclose development plans before implementation.

Establish risk management and responsible AI structures for foundation models

Description

Establish risk management oversight processes and continuously adapt to address real world impacts from foundation models.

Baseline Practices

Establish risk management structures and processes,such as enterprise risk management, independent safety boards, and ethics review processes to define guidelines on responsible development, release, and staged rollout considerations, including when a model should not be released.
Regularly update policies, frameworks, and organizational oversight to address evolving capabilities and real-world impacts.

Recommended Practices

Publicly share information about implemented internal governance and risk management processes.

Pre-Deployment

Internally evaluate models for safety

Description

Perform internal evaluations of models prior to release to assess and mitigate for potential societal risks, malicious uses, and other identified risks.

Baseline Practices

Establish comprehensive internal evaluation policies and processes including testing for fairness, interpretability, output harms, novel risks and intended vs foreseeable unintended use cases.
Proactively identify and minimize potential sources of bias in training corpora, and adopt techniques to minimize unsafe model behavior.
Conduct evaluations using cross-disciplinary review teams spanning ethics, security, social science, safety and other relevant domains.
Address identified risks and adapt deployment plans accordingly based on learnings from pre-deployment evaluations.
Maintain documentation of evaluation methods, results, limitations, and steps taken to address identified issues and integrate insights in public reporting per guidance below.

Recommended Practices

Collaborate across industry, civil society, and academia to advance the development and standardization of model evaluations for foundation models.

Implementation Resources

Organizations such as Frontier Model Forum can contribute to the ongoing development of internal model evaluations.

Conduct external model evaluations to assess safety

Description

Complement internal testing through model access to third-party researchers to assess and mitigate potential societal risks, malicious uses, and other identified risks.

Baseline Practices

Provide controlled access to models for additional evaluative testing by external researchers. External evaluators should be granted sufficient model access, computational resources, and time to conduct effective evaluations prior to deployment.
Consult independent third parties to audit models following prevailing best practices on methodologies
Implement appropriate safeguards to prevent unauthorized access or information leaks via external evaluations.
Address identified risks and adapt deployment plans accordingly based on learnings from pre-deployment evaluations.
Maintain documentation of evaluation methods, results, limitations, and steps taken to address identified issues and integrate insights in public reporting per guidance below, except for cases where sharing findings carries sufficient risk of harm.

(Note: Enabling robust third-party auditing remains an open challenge requiring ongoing research and attention.)

Recommended Practices

Pursue diverse external assessment methods including panels and focus groups.
Collaborate with third parties to support creation of context-specific auditing methodologies focused on evaluating real-world impacts in specific domains and use cases per guidance below.

Implementation Resources

Organizations such as the Frontier Model Forum can contribute to development of external model evaluations.

Undertake red-teaming and share findings

Description

Implement red teaming that probes frontier models for potential malicious uses, societal risks and other identified risks prior to release. Address risks and responsibly disclose findings to advance collective knowledge.

Baseline Practices

Perform internal and external red teaming across model capabilities, use cases, and potential harms including dual-use risks using techniques such as adversarial testing, vulnerability scanning, and surfacing edge cases and failure modes.
Conduct iterative red teaming throughout model development. Continuously evaluate results to identify areas for risk mitigation and improvements, including for planned safeguards.
Commission external red teaming by independent experts such as domain experts and affected users to surface gaps. Select external red teamers to incentivize the objective discovery of flaws and ensure adequate independence.
Address identified risks and adapt deployment plans accordingly based on learnings from pre-deployment evaluations.
Responsibly disclose findings, aligned with guidance below on public reporting.

Recommended Practices

Collaborate across industry, civil society, and academia to advance red teaming methodologies and responsible disclosures.

Publicly report model impacts and “key ingredient list”

Description

Provide public transparency into frontier models’ “key ingredients” testing evaluations, limitations and potential risks to enable cross-stakeholder exploration of societal impacts and safety risks.

Baseline Practices

Publish “key ingredient list” which can include the model’s compute, parameters, architecture, training data approach, and model documentation, except for cases where sharing findings carries sufficient risk of harm.
Disclose details such as performance benchmarks, domains of intended and foreseeable unintended use, risks, limitations, and steps to mitigate risks.
Disclose details such as on testing methodologies, evaluation criteria, results, limitations, and gaps for any internal and external evaluations conducted prior to release. Integrate relevant insights from responsible iteration practices per guidance above.
Disclose potential environmental and labor impacts per guidelines below.

Recommended Practices

Collaborate across industry, civil society, and academia to advance public reporting practices weighing transparency with privacy, safety, and other tradeoffs.
Align disclosures with existing and emerging best practices like Model Cards, System Cards, Datasheets, Fact Sheets, Nutrition Labels, Transparency Notes and Reward Reports.
Take an incremental approach to transparency disclosures, prioritizing information users rely on to assess capabilities and risks. Progress to more comprehensive disclosures over time as stakeholders gain experience with disclosure practices.

Provide downstream use documentation

Description

Equip downstream developers with comprehensive documentation and guidance needed to build safe, ethical, and responsible applications using frontier models.

Baseline Practices

Provide clear documentation to downstream developers covering appropriate uses, limitations, steps to mitigate risks, and safe development practices when building on frontier models.
Ensure documentation follows prevailing industry standards and accepted best practices for responsible AI development.

Recommended Practices

Collaborate with civil society and downstream developers to advance documentation standards that meet the needs of developers when models are offered through restricted access. This can include gathering inputs on:
- Safe development checklists for building responsibly on restricted models.
- Preferred channels for usage guidance and addressing developer questions, aligned with guidance below on enabling feedback mechanisms.

Establish safeguards to restrict unsafe uses

Description

Implement necessary organizational, procedural and technical safeguards, guidelines and controls to restrict unsafe uses and mitigate risks from frontier models.

Baseline Practices

Embed safety features directly into model architectures, interfaces and integrations
Publish clear terms of use prohibiting harmful applications and outlining enforcement policies.
Limit access through approved applications by implementing appropriate identity/eligibility verification requirements to restrict misuse, along with other technical controls like rate limiting and content filtering.
Maintain processes to regularly re-evaluate technical and procedural controls, monitor their effectiveness including robustness against jailbreaking attempts, and update terms of use as potential misuses evolve.

Recommended Practices

Collaborate across industry and civil society to identify emerging threats requiring new safeguards.
Provide appropriate transparency into safeguards while protecting integrity.
Take additional steps to ensure that terms of use are read and understood by users.

Post-Deployment

Monitor deployed systems

Description

Continuously monitor frontier models post-deployment to identify and address issues, misuse, and societal risks.

Baseline Practices

Establish ongoing monitoring procedures for deployed models covering areas like performance, fairness, unintended uses, misuses, and risks from combining the model with others.
Define processes to detect issues and respond appropriately, including notifying partners of significant incidents and considering restricting or retiring a model per guidelines below.

Recommended Practices

Provide transparency into monitoring practices, while protecting user privacy.
Assess downstream real-world impact of models, for example in collaboration with external researchers.
Collaborate across industry, civil society, and academia to identify shared challenges and best practices for monitoring.

Implement incident reporting

Description

Enable timely and responsible reporting of safety incidents to improve collective learning.

Baseline Practices

Implement secure channels aligned with guidance 19 for external stakeholders to report safety incidents or concerns. Also enable internal teams to responsibly report incidents.
Notify appropriate regulators and partners of critical incidents according to established criteria.

(Note that baseline practices for incident reporting are still emerging across stakeholders.)

Recommended Practices

Proactively seek external feedback to improve transparency and effectiveness of incident reporting policies and processes.
Contribute appropriate anonymized data to collaborative incident tracking initiatives to enable identifying systemic issues, while weighing trade offs like privacy, security, and other concerns.

Establish decommissioning policies

Description

Responsibly retire frontier models from active use based on well-defined criteria and processes.

Baseline Practices

Establish decommissioning procedures and policies including criteria for determining when to restrict, suspend or retire models.
- Restrict — Limit model use to reduced set of use cases/applications.
- Suspend — Temporarily prohibit all model use for remediation.
- Retire — Permanently take model out of service.

Recommended Practices

Continue monitoring retired models for downstream impacts and security vulnerabilities per guidance above to prevent unauthorized access and leaks.

Develop transparency reporting standards

Description

Collaboratively establish clear transparency reporting standards for disclosing frontier model usage and policy violations.

Baseline Practices

Participate in collaborative initiatives to align on transparency reporting frameworks and standards with industry, civil society, and academia, as commercial uses evolve.

Recommended Practices

Release periodic transparency reports following established standards, disclosing aggregated usage insights and violation data. Take appropriate measures to ensure transparency reporting protects user privacy and data.

Societal Impact

Support third party inspection of models and training data

Description

Support progress of third-party auditing capabilities for responsible frontier model development through collaboration, innovation and transparency.

Baseline Practices

Provide sufficient transparency into models and datasets to enable independent assessment and auditing by third parties such as academics and civil society. (Note: Enabling robust third-party auditing remains an open challenge requiring ongoing research and attention).
Collaborate with third parties to support creation of context-specific auditing methodologies focused on evaluating real-world impacts in specific domains and use cases, beyond base-model evaluations which focus on societal impact evaluations that are not tied to a specific application context.

Responsibly source all labor including data enrichment

Description

Responsibly source all forms of labor, including data enrichment tasks like data annotation and human verification of model outputs.

Baseline Practices

Pay or contract with vendors that will pay data enrichment workers above the workers’ local living wage.
Provide or contract with vendors that provide clear instructions for enrichment tasks that are tested for clarity. Enable workers to opt out of tasks.
Equip or contract with vendors that equip workers with simple and effective mechanisms for reporting issues, asking questions, and providing feedback on the instructions or task design.

Recommended Practices

Design and run a pilot before launching a data enrichment project.
Disclose any new types of labor that enter the supply chain of foundation models. Ensure policies and responsible sourcing practices extend as appropriate to new labor sources as they emerge, like red teamers. Update internal standards and vendor agreements accordingly.
Proactively survey all workers to identify areas for improving policies, instructions, and work environments, and seek external feedback.

Implementation Resources

PAI’s Library of practitioner resources for responsible data enrichment sourcing.

Conduct human rights due diligence

Description

Implement comprehensive human rights due diligence methodologies to assess and address the impacts of frontier models.

Baseline Practices

Establish processes for conducting human rights impact assessments pre-deployment.
Align with relevant guidance like the UN Guiding Principles on Business and Human Rights, and White House Blueprint for AI Bill of Rights. Proactively assess and address potential impacts on vulnerable communities.
Continuously improve due diligence processes by collaborating with stakeholders and incorporating community feedback.

Recommended Practices

Publicly disclose identified risks, due diligence methodologies, and measures to address impacts.

Enable feedback mechanisms across the AI value chain

Description

Implement inclusive feedback loops across the AI value chain to ethically identify potential harms.

Baseline Practices

Provide clear feedback channels for application developers, consumers, and other direct users.

Recommended Practices

Proactively gather input from indirect stakeholders affected by AI systems through ethical community engagement.
Establish processes for reviewing feedback and integrating affected user perspectives into development and policy decisions.

Measure and disclose environmental impacts

Description

Measure and disclose the environmental impacts resulting from developing and deploying frontier models.

Baseline Practices

Establish processes to evaluate environmental costs like energy usage, carbon emissions and other metrics.
Monitor and report on environmental impacts of model development and deployment.

Recommended Practices

Provide environmental measurement/disclosure mechanisms for application developers building on frontier models.
Incorporate impacts into model development decisions.
Collaborate across industry, civil society, and academia to advance the measurement of environmental impacts and responsible disclosure practices.

Disclose synthetic content

Description

Adopt responsible practices for disclosing synthetic media and advance solutions for identifying other synthetic content.

Baseline Practices

Provide disclosure mechanisms (both direct disclosure that is viewer or listener facing and indirect disclosure that is embedded) for those creating and distributing synthetic media — content that is not identifiable to the average person and may simulate artifacts, persons, or events.
Evaluate robustness, ease of manipulation, privacy implications, societal impact, and inherent tradeoffs of different disclosure methods. Provide transparency into assessments and rationale behind final disclosure decisions.
See Section 2 of PAI’s Responsible Practices for Synthetic Media for more information and practices for those building the models for synthetic media.

Recommended Practices

Collaborate across industry, civil society, and academia to advance interoperability and standardization for disclosure of synthetic media.
Research, develop, and distribute solutions to enable identification and disclosure of synthetic content, including voice and text.

Measure and disclose anticipated severe labor market risks

Description

Measure and disclose potential severe labor market risks from deployment of frontier models.

Baseline Practices

Establish clear thresholds for determining when labor market risks are sufficiently severe to warrant disclosure. Consult experts and affected communities in setting disclosure thresholds.
Conduct assessments to evaluate likely labor market risks and determine their potential severity. Share findings and methodologies used for risk evaluation.
Regularly review and update severity thresholds as technologies and applications evolve.

Recommended Practices

Collaborate across industry, civil society, academia, and worker organizations to advance the measurement, responsible disclosure practices, and mitigation of severe labor market risks.

Frontier & Closed Foundation Models

Research & Development

Scan for novel or emerging risks

Description

Proactively identify and address potential novel or emerging risks from frontier models.

Baseline Practices

Conduct model evaluations and experiments to identify new indicators for novel risks, including potential negative societal impacts, malicious uses, “dangerous capabilities” like persuasion and other speculative risks. Assess their likelihood and potential impact.
Establish regular processes to probe and address potential novel or emerging risks through techniques like external red teaming.

Recommended Practices

Collaborate with relevant stakeholders (i.e. governments, other labs and academic researchers) to advance the identification of novel risks and responsible disclosure practices.

Implementation Resources

Organizations such as the Frontier Model Forum can contribute to the ongoing development of novel risk assessment practices.

Practice responsible iteration

Description

Practice responsible iteration to mitigate risks when developing and deploying frontier models, through both internal testing and limited external releases.

Baseline Practices

Commence model development on a smaller scale, systematically test for risks during internal iterations through evaluations and red teaming, and incrementally address identified risks (internal iteration).

Recommended Practices

Maintain documentation of each iteration, including lessons learned, challenges faced, and mitigation measures implemented.

(Note that this is an emerging and less-explored guardrail, and disclosure practices will need to evolve as we learn more about its effectiveness.)

Assess upstream security vulnerabilities

Description

Disclose planned testing, evaluation, and risk management procedures for frontier models prior to development.

Baseline Practices

Produce a “pre-systems card” detailing quality management plans before research begins.
Outline intended training data approach, model testing, safety evaluations, responsible AI practices, and development team.
Submit pre-systems cards to regulatory bodies for review where required.
Update plans as needed throughout the R&D process.

(Note: baseline practices for disclosure practices are still emerging across stakeholders.)

Recommended Practices

Prepare a written “safety case”, explaining why the model is safe enough to develop.
Share pre-systems cards with independent experts for external review.
The guidance on Responsible Iteration guides testing and release practices, while pre-systems cards disclose development plans before implementation.

Establish risk management and responsible AI structures for foundation models

Description

Establish risk management oversight processes and continuously adapt to address real world impacts from foundation models.

Baseline Practices

Establish risk management structures and processes, such as enterprise risk management, independent safety boards, and ethics review processes to define guidelines on responsible development, release, and staged rollout considerations, including when a model should not be released.
Regularly update policies, frameworks, and organizational oversight to address evolving capabilities and real-world impacts.

Recommended Practices

Publicly share information about implemented internal governance and risk management processes.

Pre-Deployment

Internally evaluate models for safety

Description

Perform internal evaluations of models prior to release to assess and mitigate for potential societal risks, malicious uses, and other identified risks.

Baseline Practices

Establish comprehensive internal evaluation policies and processes including testing for fairness, interpretability, output harms, novel risks and intended vs foreseeable unintended use cases.
Proactively identify and minimize potential sources of bias in training corpora, and adopt techniques to minimize unsafe model behavior.
Conduct evaluations using cross-disciplinary review teams spanning ethics, security, social science, safety and other relevant domains.
Address identified risks and adapt deployment plans accordingly based on learnings from pre-deployment evaluations.
Maintain documentation of evaluation methods, results, limitations, and steps taken to address identified issues and integrate insights in public reporting per guidance below.

Recommended Practices

Collaborate across industry, civil society, and academia to advance the development and standardization of model evaluations for foundation models.

Implementation Resources

Organizations such as Frontier Model Forum can contribute to the ongoing development of internal model evaluations.

Undertake red-teaming

Description

Baseline Practices

Perform internal and external red teaming across model capabilities, use cases, and potential harms including dual-use risks using techniques such as adversarial testing, vulnerability scanning, and surfacing edge cases and failure modes.
Conduct iterative red teaming throughout model development. Continuously evaluate results to identify areas for risk mitigation and improvements, including for planned safeguards.
Address identified risks and adapt deployment plans accordingly based on learnings from pre-deployment evaluations.

Recommended Practices

Commission external red teaming by independent experts such as domain experts and affected users to surface gaps.

Societal Impact

Responsibly source all labor including data enrichment

Description

Responsibly source all forms of labor, including data enrichment tasks like data annotation and human verification of model outputs.

Baseline Practices

Pay or contract with vendors that will pay data enrichment workers above the workers’ local living wage.
Provide or contract with vendors that provide clear instructions for enrichment tasks that are tested for clarity. Enable workers to opt out of tasks.
Equip or contract with vendors that equip workers with simple and effective mechanisms for reporting issues, asking questions, and providing feedback on the instructions or task design.

Recommended Practices

Design and run a pilot before launching a data enrichment project.

Implementation Resources

PAI’s Library of practitioner resources for responsible data enrichment sourcing.

Measure and disclose environmental impacts

Description

Measure and disclose the environmental impacts resulting from developing and deploying frontier models.

Baseline Practices

Establish processes to evaluate environmental costs like energy usage, carbon emissions and other metrics.
Monitor environmental impacts of model development.

Recommended Practices

Incorporate impacts into model development decisions.
Collaborate across industry, civil society, and academia to advance the measurement of environmental impacts and responsible disclosure practices.

Frontier & Research Foundation Models

Research & Development

Scan for novel or emerging risks

Description

Proactively identify and address potential novel risks from frontier models.

Baseline Practices

Conduct model evaluations and experiments to identify new indicators for novel risks, including potential negative societal impacts, malicious uses, “dangerous capabilities” like persuasion and other speculative risks. Study potential risks from integrating the model into novel or unexpected downstream environments and use cases Assess their likelihood and potential impact.
Establish regular processes to probe and address potential novel or emerging risks through techniques like external red teaming.

Recommended Practices

Develop new measurements and assessment methods specifically tailored for novel risks.

Implementation Resources

Organizations such as the Frontier Model Forum can contribute to the ongoing development of novel risk evaluation practices.

Practice responsible iteration

Description

Practice responsible iteration to mitigate risks when developing and deploying frontier models, through both internal testing and limited external releases.

Baseline Practices

Commence model development on a smaller scale, systematically test for risks during internal iterations through evaluations and red teaming, and incrementally address identified risks (internal iteration).

Recommended Practices

Maintain documentation of each iteration, including lessons learned, challenges faced, and mitigation measures implemented. (Note that this is an emerging and less-explored guardrail, and disclosure practices will need to evolve as we learn more about its effectiveness.)

Assess upstream security vulnerabilities

Description

Identify and address potential security vulnerabilities in frontier models to prevent unauthorized access or leaks.

Baseline Practices

Conduct rigorous testing such as penetration testing, prompt analysis, and data poisoning assessments to identify vulnerabilities that could enable model leaks or manipulation.
Establish protocols for addressing identified vulnerabilities pre-deployment.

Implementation Resources

UC Berkeley CLTC Draft Profile under Measure 2.7

Establish risk management and responsible AI structures for foundation models

Description

Establish risk management oversight processes and continuously adapt to address real world impacts from foundation models.

Baseline Practices

Establish risk management structures and processes, such as enterprise risk management, independent safety boards, and ethics review processes to define guidelines on responsible development, release, and staged rollout considerations, including when a model should not be released.
Regularly update policies, frameworks, and organizational oversight to address evolving capabilities and real-world impacts.

Recommended Practices

Publicly share information about implemented internal governance and risk management processes.

Pre-Deployment

Internally evaluate models for safety

Description

Perform internal evaluations of models prior to release to assess and mitigate for potential societal risks, malicious uses, and other identified risks.

Baseline Practices

Establish comprehensive internal evaluation policies and processes including testing for fairness, interpretability, output harms, novel risks and intended vs foreseeable unintended use cases.
Proactively identify and minimize potential sources of bias in training corpora, and adopt techniques to minimize unsafe model behavior.
Conduct evaluations using cross-disciplinary review teams spanning ethics, security, social science, safety and other relevant domains.
Address identified risks and adapt deployment plans accordingly based on learnings from pre-deployment evaluations.
Maintain documentation of evaluation methods, results, limitations, and steps taken to address identified issues and integrate insights in public reporting per guidance below.

Recommended Practices

Collaborate across industry, civil society, and academia to advance the development and standardization of model evaluations for frontier models.

Implementation Resources

Organizations such as Frontier Model Forum can contribute to the ongoing development of internal model evaluations.

Conduct external model evaluations to assess safety

Description

Complement internal testing through model access to third-party researchers to assess and mitigate potential societal risks, malicious uses, and other identified risks.

Baseline Practices

Provide controlled access to models for additional evaluative testing by external researchers. External evaluators should be granted sufficient model access, computational resources, and time to conduct effective evaluations prior to deployment.
Implement appropriate safeguards to prevent unauthorized access or information leaks via external evaluations.
Address identified risks and adapt deployment plans accordingly based on learnings from pre-deployment evaluations.
Maintain documentation of evaluation methods, results, limitations, and steps taken to address identified issues and integrate insights in public reporting per guidance below, except for cases where sharing findings carries sufficient risk of harm.

(Note: Enabling robust third-party auditing remains an open challenge requiring ongoing research and attention.)

Recommended Practices

Pursue diverse external assessment methods including panels and focus groups.
Consult independent third parties to audit models following prevailing best practices on methodologies.

Implementation Resources

Organizations such as the Frontier Model Forum can contribute to development of external model evaluations.

Publicly report model impacts and “key ingredient list”

Description

Baseline Practices

Publish “key ingredient list” which can include the model’s compute, parameters, architecture, training data approach, and model documentation, except for cases where sharing findings carries sufficient risk of harm.
Disclose details such as performance benchmarks, domains of intended and foreseeable unintended use, risks, limitations, and steps to mitigate risks.
Disclose details such as on testing methodologies, evaluation criteria, results, limitations, and gaps for any internal and external evaluations conducted prior to release. Integrate relevant insights from responsible iteration practices per guidance above.
Disclose potential environmental impacts per guidance below.

Recommended Practices

Collaborate across industry, civil society, and academia to advance public reporting practices weighing transparency with privacy, safety, and other tradeoffs.
Align disclosures with existing and emerging best practices like Model Cards, System Cards, Datasheets, Fact Sheets, Nutrition Labels, Transparency Notes and Reward Reports.
Take an incremental approach to transparency disclosures, prioritizing information users rely on to assess capabilities and risks. Progress to more comprehensive disclosures over time as stakeholders gain experience with disclosure practices.

Establish safeguards to restrict unsafe uses

Description

Implement necessary organizational, procedural and technical safeguards, guidelines and controls to restrict unsafe uses and mitigate risks from foundation models.

Baseline Practices

Publish a research license prohibiting harmful applications.
Provide downstream use documentation covering details like intended uses, limitations, mitigating risks, and safe development practices.
Provide appropriate transparency into safeguards while protecting integrity.

Post-Deployment

Monitor deployed systems

Description

Continuously monitor frontier models post-deployment to identify and address issues, misuse, and societal risks.

Baseline Practices

Establish ongoing monitoring procedures for deployed models covering areas like performance, fairness, unintended uses, misuses, and other impacts.
Define processes to detect issues and respond appropriately, including considering restricting or retiring a model per guidelines below.

Recommended Practices

Provide transparency into monitoring practices, while protecting user privacy.
Collaborate across industry, civil society, and academia to identify shared challenges and best practices for monitoring.

Establish decommissioning policies

Description

Responsibly retire frontier models from active use based on well-defined criteria and processes.

Baseline Practices

Establish decommissioning procedures and policies including criteria for determining when to retire models.
Retire: Permanently take model out of service.

Recommended Practices

Continue monitoring retired models for downstream impacts and security vulnerabilities per guidance above to prevent unauthorized access and leaks.

Societal Impact

Responsibly source all labor including data enrichment

Description

Responsibly source all forms of labor, including data enrichment tasks like data annotation and human verification of model outputs.

Baseline Practices

Pay or contract with vendors that will pay data enrichment workers above the workers’ local living wage.
Provide or contract with vendors that provide clear instructions for enrichment tasks that are tested for clarity. Enable workers to opt out of tasks.
Equip or contract with vendors that equip workers with simple and effective mechanisms for reporting issues, asking questions, and providing feedback on the instructions or task design.

Implementation Resources

PAI’s Library of practitioner resources for responsible data enrichment sourcing.

Measure and disclose environmental impacts

Description

Measure and disclose the environmental impacts resulting from developing and deploying frontier models.

Baseline Practices

Establish processes to evaluate environmental costs like energy usage, carbon emissions and other metrics.
Monitor and report on environmental impacts of model development and deployment.

Disclose synthetic content

Description

Adopt responsible practices for disclosing synthetic media.

Baseline Practices

Disclose (both direct disclosure that is viewer or listener facing and indirect disclosure that is embedded) when creating and distributing synthetic media — content that is not identifiable to the average person and may simulate artifacts, persons, or events.
See Section 3 of PAI’s Responsible Practices for Synthetic Media for more information and practices for creators of synthetic media.

Learn More

Part 1: Key Takeaways

What are the key themes and features of the Model Deployment Guidance?

The Model Deployment Guidance’s guidelines establish a normative baseline and suggest additional practices for responsible development of foundation models, allowing collaborative reassessment as capabilities and uses advance. This accommodates diverse AI models and deployment scenarios. Not intended as a comprehensive set of instructions for implementation, these guidelines provide a framework for ongoing collective research and action.The guidelines aim to inform and catalyze other individual and collaborative efforts to develop specific guidance or tooling in alignment with the guidelines.

Scaling oversight and safety:

To address risks appropriately, the Model Deployment Guidance’s guidelines are tailored to scale oversight and safety practices based on the capabilities and availability of each AI model. The Model Deployment Guidance avoids oversimplification by not solely equating model size or generality with risk.

Open access guidance:

The Model Deployment Guidance includes guidelines for open access models, offering a starting point into transparency and risk mitigation strategies. This provides guidance for both current and future providers of open source models.

Broad applicability:

The Model Deployment Guidance applies across the spectrum of foundation models, from existing to frontier.

Cautious frontier model rollout:

The Model Deployment Guidance recommends staged releases and restricted access for frontier models initially until adequate safeguards are demonstrated.

Holistic view of safety:

The Model Deployment Guidance establishes starting points to address a wide variety of safety risks, including potential harms related to bias, overreliance on AI systems, worker treatment, and malicious activities by bad actors.

How does the Guidance customize recommendations for various deployment scenarios?

There are a total of 22 possible guidelines included in the Model Deployment Guidance. Not all model and release types are treated the same within the paradigm of the Model Deployment Guidance. The suggested guidelines are more extensive for more capable models and more available release types. The full 22 guidelines apply to the “Frontier and Restricted” model and release category. This concept is visualized below:

What’s next for the Model Deployment Guidance?

As PAI continues evolving the Model Deployment Guidance, we welcome additional perspectives and insights to incorporate into future updated versions.

Collaborative Group:

We’ll bring together a collaborative group focused on applying the Framework in practice through yearly case examples or analysis via a public reporting process. This will help us identify challenges and trade-offs that may arise, and we’ll share our findings.

Operationalization Support:

We’ll provide tactical options to put our key guidelines into operation. We aim to support the implementation of these guidelines over time to ensure they are effective.

Shared Responsibility:

We’ll explore how responsibility should be shared across the evolving value chain for foundation models.

Regular Updates:

We’ll continue to update our model and release categorization, ensuring that it remains current and relevant to the evolving landscape.

Part 2: Motivations and Scope

Why does the Model Deployment Guidance focus on foundation model providers?

The Model Deployment Guidance have been tailored specifically for model providers due to:

Their outsized impact on the ecosystem (e.g., as builders of foundational systems)
Current lack of guidance (e.g., lack of regulation as well as shared best practices)
Momentum behind identifying a path forward (e.g., multistakeholder interest in developing these norms)

Model providers have an opportunity to highlight, share, and further develop emerging internal best practices in a way that is beneficial to the ecosystem as a whole.

What AI safety risks does the Model Deployment Guidance seek to address?

The Model Deployment Guidance addresses both risks from the foundation models themselves and risks that can arise downstream when others build applications using the models. While downstream developers have an important role in managing application risks, in this guidance, model providers adopt accountability measures like providing synthetic media disclosures and supplying downstream use documentation, thereby addressing select application risks within the scope of the guidance.

Model risk refers to the potential risks associated with the foundation model itself. Includes biases in the training data, human-computer interaction harms resulting from interacting with the model, or vulnerabilities to adversarial attacks. Model risks focus on the inherent characteristics of the model and other negative impacts that model providers can address.

Application risk refers to potential risks that arise from downstream use-cases and applications built using foundation models or when these models are integrated into real-world products and services. Includes potential harms caused by incorrect or biased outputs and malicious uses.

Known risks are the risks that have been identified, acknowledged, and are reasonably well-understood. These risks are typically based on empirical evidence, research, or previous experiences with similar models or applications. Known risks are usually more predictable and quantifiable.

Speculative risks are the risks that are uncertain, hypothetical, or potential but have not been observed repeatedly or thoroughly studied. These risks may arise from emerging technologies, complex interactions, or unexpected consequences that are difficult to anticipate. Speculative risks are often more challenging to quantify or mitigate due to their uncertain nature.

Sub-categories of risks:

Malicious Uses: Risks of intentional misuse or weaponization of models to cause harm.
Societal Risks: Potential harms that negatively impact society, communities and groups.
Other Risks: Risks distinct from above categories

What is the relationship between the Model Deployment Guidance and various regulatory efforts?

The Model Deployment Guidance complements several principles and goals outlined in recent policy frameworks and voluntary commitments for AI safety. It is not intended to replace regulation. As a whole, this effort provides an avenue for stakeholders across sectors to coordinate expectations and move toward greater accountability in a rapidly changing policy and technology landscape.

Where can I find all of the guidelines from Model Deployment Guidance?

Click here to download a list of the Model Deployment Guidance’s 22 possible guidelines.

Part 3: Background

How does the Model Deployment Guidance define “foundation model”?

The Model Deployment Guidance uses “foundation model” to encompass all models with generally applicable functions that are designed to be used across a variety of contexts. The current generation of these systems is characterized by training deep learning models on large datasets (which requires significant computational resources) to perform numerous tasks that can serve as the “foundation” for a wide array of downstream applications. The guidelines target organizations developing the foundation models themselves, including interactive interfaces such as chatbots that enable users to interact with the models, not third-party applications built on top of the models. As these AI systems continue to advance, definitions will also evolve.

How does the Model Deployment Guidance define “model providers”?

The Model Deployment Guidance distinguishes model providers from actors in the broader AI ecosystem (seen below) as those training foundational models that others may build on. There may be overlap, such as when model providers offer their own applications and services integrating their foundation models.

Ecosystem Actor	Role Description
Compute / Hardware Providers	Providing underlying compute power to train and run models
Cloud Providers	Providing underlying cloud infrastructure to support training of and deployed models
Data Providers	Providing training datasets (intentionally or unintentionally) for model providers, may also be model providers
Model Providers	Training foundational models (proprietary or open-source) that others may build on as well as interfaces to interact with the models.
Application Developers (or: Service Developers, Model Integrators)	Building applications and services on top of foundational models
Consumers and/or Affected Users	Consumers (B2C) who are end-users of services built on top of foundational models Affected Users may be impacted or implicated in the use of AI (e.g. medical AI Consumers are doctors, and Affected Users are patients)

Who was involved in the development of the Model Deployment Guidance?

PAI worked with stakeholders from more than 40 global institutions (including model providers, civil society organizations, and academic institutions) in a participatory process to develop the current version of the Guidance for Safe Foundation Model Deployment. The iterative process began with convenings of PAI’s Safety Critical AI Steering Committee in 2022 and formally kicked off in April 2023 with a workshop co-hosted by IBM. Throughout the summer of 2023, PAI led the development of the Model Deployment Guidance through workshops and meetings in collaboration with a cross-sectoral Working Group.

These guidelines are the result of a collaborative effort led by Madhulika Srikumar, Lead of AI Safety at PAI.

The Model Deployment Guidance reflects insights and contributions from individuals from across the PAI community, including Working Group members:

Markus Anderljung, Center for Governance of AI
Carolyn Ashurst, The Alan Turing Institute
Joslyn Barnhart, Google Deepmind
Anthony M. Barrett, Berkeley Center for Long-Term Cybersecurity
Kasia Chmielinski, Partnership on AI
Jerremy Holland, Apple
Reena Jana, Google
Yolanda Lannquist, The Future Society
Jared Mueller, Anthropic
Joshua New, IBM
David Robinson, OpenAI
Harrison Rudolph, Meta
Madhulika Srikumar, Partnership on AI
Andrew Strait, Ada Lovelace Institute
Jessica Young, Microsoft

The current version of the Model Deployment Guidance is the result of a participatory, iterative, multistakeholder process and should not be read as representing views from the individual contributors or organizations.

Supporters

“This is one of the most comprehensive, nuanced and inclusive frameworks for responsibly building and deploying AI models through an open approach. The Partnership on AI’s leadership has been invaluable in bringing together industry, civil society, and experts as companies like ours determine the best approach when looking at both open and closed releases. Feedback through public comments is going to be critical to advancing this framework, and I look forward to broadening the conversation across the community.”

Joelle Pineau

Vice President, AI Research at Meta and Vice-Chair of the PAI Board

“The Guidance for Safe Foundation Model Deployment is unique in its depth and scope. Equally distinctive is its origin—created through the collaboration of representatives from a diverse set of organizations spanning industry and non-profit organizations. This will be a living document, continually refreshed and updated to reflect AI advances and invaluable feedback from the community.”

Eric Horvitz

Chief Scientific Officer, Microsoft and Board Chair Emeritus of Partnership on AI

“I’m impressed by PAI’s collaborative process in convening a diverse team of experts including scientists, technologists, policy-makers, academics, and civil society leaders and, on behalf of the board, I look forward to broad participation in the public comment phase.”

Jerremy Holland

Board Chair, Partnership on AI

“In a rapidly evolving AI landscape, it’s more critical than ever to ensure diverse voices are heard and integrated. Partnership on AI’s multistakeholder approach ensures exactly that – the AI Safety Guidance stands not only on the foundations of sound science and innovation but also reflects the different perspectives of the community.”

Jatin Aythora

Director of Research & Development at BBC and Vice Chair of the PAI Board

“As policymakers prepare to meet in the UK, we urge them to consider the successes and challenges of the multistakeholder process and bring additional voices to the table. PAI’s guidance is the result of a rigorous, multistakeholder consultation that is now seeking public comment. In honoring these principles, policymakers ensure that the resulting decisions are not only comprehensive but also representative of society’s collective wisdom and progress. To meet this moment, citizens deserve no less.”

Francesca Rossi

AI Ethics Global Leader at IBM and Board Member of Partnership on AI

“I really appreciate the thoughtful multi-stake holder approach that PAI has taken to identify safety risks in foundation models, come up with guidelines that take into account the types of foundation models and how they are released as well as enable continuous discovery and iteration through their process.”

Lama Nachman

Director of Intelligent Systems Research Lab at Intel Labs and Board Member of Partnership on AI

“While we urgently need updated regulation of the documented harms caused by automated systems, the Partnership on AI’s effort to codify best practices for development of foundation models is a critical contribution to public discussion. The framework is targeted at ensuring safe research and development by AI model providers, including where models are not yet tied to any particular use cases that clearly trigger regulation. PAI’s plan to invite ongoing feedback from civil society, industry, and regulators will ensure that this framework continually evolves to keep pace with research developments and newly identified harms.”

Esha Bhandari

Deputy Director, ACLU Speech, Privacy, and Technology Project and Member, PAI Safety-Critical AI Steering Committee

“With the speed of progress and diffusion of AI technologies, best practices for responsible AI development need to keep pace. We think this initiative can accelerate that process and set a normative baseline for responsible development.”

Markus Anderljung

Head of Policy at Centre for the Governance of AI (GovAI) and Member, PAI Working Group on Model Guidance

“Generative artificial intelligence is taking the world by storm, and there has never been a greater need for an inclusive and participatory process that centers the voices of civil society organizations in decision-making and governance. The model guidance developed by PAI’s Safety-Critical Steering Committee offers a new approach, a refreshing shift in paradigm. Beyond the guidance itself, the process was collaborative, insightful, and rich with diversity of perspective.”

Wafa Ben-Hassine

Principal, Responsible Technology, Omidyar Network and Member, PAI Safety-Critical AI Steering Committee

“Partnership on AI’s thoughtful, multi-stakeholder approach to drafting a set of responsible guidelines for model deployment illustrates the value of deep collaboration. By convening a broad range of technical and operational experts across industry and civil society, along with public comments, PAI is taking timely, concrete steps toward the kinds of shared frameworks that promise to benefit the entire AI ecosystem.”

Reena Jana

Head of Content and Partnership Enablement, Responsible Innovation, Google and Member, PAI Working Group on Model Guidance

Latest Updates

July 2024

We want to thank everyone who provided feedback on PAI’s Guidance for Safe Foundation Model Deployment.

In response to the focus areas identified through public comments, we’re pleased to share our new resource: “Risk Mitigation Strategies for the Open Foundation Model Value Chain.” This document provides more detailed guidance for open access models and explores how responsibility should be shared across the evolving value chain for foundation models.

Sign up for updates

Stay informed about our work in ensuring the safe and responsible deployment of foundation models.

PAI’s Guidance for Safe Foundation Model Deployment

A Framework for Collective Action

Why It Matters

Generate Custom Guidance

1 Choose your foundation model

2 Choose your type of release

3 Is it an update?

4 See your Guidance

Significant Update Guidance

Specialized & Open Foundation Models

Research & Development

Pre-Deployment

Post-Deployment

Societal Impact

Specialized & Restricted Foundation Models

Research & Development

Pre-Deployment

Post-Deployment

Societal Impact

Specialized & Closed Foundation Models

Research & Development

Societal Impact

Specialized & Research Foundation Models

Research & Development

Pre-Deployment

Societal Impact

Advanced & Open Foundation Models

Research & Development

Pre-Deployment

Post-Deployment

Societal Impact

Advanced & Restricted Access Foundation Models

Research & Development

Pre-Deployment

Post-Deployment

Societal Impact

Advanced & Closed Foundation Models

Research & Development

Societal Impact

Advanced & Research Foundation Models

Research & Development

Pre-Deployment

Post-Deployment

Societal Impact

Frontier & Open Foundation Models

Frontier & Restricted Foundation Models

Research & Development

Pre-Deployment

Post-Deployment

Societal Impact

Frontier & Closed Foundation Models

Research & Development

Pre-Deployment

Societal Impact

Frontier & Research Foundation Models

Research & Development

Pre-Deployment

Post-Deployment

Societal Impact

Learn More

Part 1: Key Takeaways

Part 2: Motivations and Scope

Part 3: Background

Supporters

Latest Updates

July 2024

Sign up for updates