xAI — Frontier AI Framework

xAI Frontier Artificial Intelligence Framework Last updated: December 30, 2025 xAI seriously considers safety and security while developing and advancing AI models to help us all to better understand the universe. This Frontier AI Framework (“FAIF”) outlines xAI’s approach to policies for handling significant risks, including catastrophic risks, associated with the development, deployment, and release of xAI’s AI models, such as Grok. xAI plans to continuously review and adjust this FAIF over time, as AI model development, capability and use cases evolve. This FAIF complies with California's Transparency in Frontier Artificial Intelligence Act (the “TFAIA”, California Business and Professions Code § 22757.10 et seq.). Scope This FAIF discusses two major categories of AI risk—malicious use and loss of control. This risk includes, but is not limited to, Catastrophic Risk as defined in the TFAIA.1 This FAIF also outlines the quantitative thresholds, metrics, and procedures that xAI may utilize to manage and improve the safety of its AI models. In addition, this FAIF discusses xAI’s approach to addressing operational and societal risks posed by advanced AI, including incorporating public transparency, third-party review, and information security considerations. Overall Approach Managing the risks related to advanced AI models presents unique challenges as compared to standard risk management practices in use in other fields, such as for aerospace engineering. Given the large and continuously growing range of applications where AI models may be deployed, it is difficult to comprehensively anticipate and model all of the general public’s potential applications and interactions for an AI model. Additionally, the private nature of typical 1 The TFAIA defines Catastrophic Risk as “a foreseeable and material risk that a frontier developer’s development, storage, use, or deployment of a frontier model will materially contribute to the death of, or serious injury to, more than 50 people or more than one billion dollars ($1,000,000,000) in damage to, or loss of, property arising from a single incident involving a frontier model doing any of the following: (A) Providing expert-level assistance in the creation or release of a chemical, biological, radiological, or nuclear weapon. (B) Engaging in conduct with no meaningful human oversight, intervention, or supervision that is either a cyberattack or, if the conduct had been committed by a human, would constitute the crime of murder, assault, extortion, or theft, including theft by false pretense. (C) Evading the control of its frontier developer or user.” AI usage by end users limits the utility of third-party reporting mechanisms that may be more effective for more publicly seen usage, such as for social media platforms where providers heavily rely upon user-submitted moderation reports to identify novel forms of abuse on their platforms. xAI has focused on the risks of malicious use and loss of control, which cover many different specific risk scenarios. Risk scenarios become more or less likely depending on different model behaviors. For example, an increase in offensive cyber capabilities heightens the risk of a rogue AI but does not significantly change the risk of enabling a bioterrorism attack. Our safety evaluation and mitigation strategy focuses on individual model behaviors, which we categorize into three buckets: abuse potential (e.g., vulnerability to jailbreaks), concerning propensities (e.g., a propensity for deceiving the user), and dual-use capabilities (e.g., offensive cyber capabilities). In this FAIF, we characterize our understanding of different risk scenarios and the relevant behaviors. xAI references standards such as NIST's AI Risk Management Framework, ISO/IEC 42001 for AI management systems, and industry best practices from the Frontier Model Forum (e.g., red-teaming protocols). We evaluate these during annual reviews and integrate them into benchmarks (e.g., aligning WMDP with biosecurity consensus) and safeguards. Approach to Mitigating Risks of Malicious Use: Alongside comprehensive evaluations measuring dual-use capabilities, our mitigation strategy for malicious use risks is to identify critical steps in major risk scenarios and implement redundant layers of safeguards in our models to inhibit user progress in advancing through such steps. xAI works with a variety of governmental bodies, non-governmental organizations, private testing firms, industry peers, and academic researchers to identify such inhibiting steps, commonly referred to as bottlenecks, and implement commensurate safeguards to mitigate a model’s ability to assist in accelerating a bad actor’s progress through them. Model safeguards leverage a broad variety of techniques, including standard software systems and state-of-the-art AI capabilities, to detect and block potential abuses. Approach to Mitigating Risks of Loss of Control: Exact scenarios of loss of control risks are speculative and difficult to precisely specify. Many such scenarios, for example, speculation that a superintelligent AI system hypothetically might escape the control of its developers and wreak havoc on the public, assume dual-use capabilities such as offensive cybersecurity capabilities (e.g., to surreptitiously replicate across servers or prevent shutdown) that we also track as part of managing malicious use risks. Additionally, we conduct careful measurement of concerning model propensities that hypothetically might exacerbate loss of control risks, such as the propensity for deception or the propensity for sycophancy. We continue to work towards developing naturalistic evaluation environments that would enable us to assess more realistic, real-world behaviors. As an example of evaluating use in real-world environments and mitigating risks in real-time, xAI’s Grok model is available for public interaction and scrutiny on the X social media platform, and xAI monitors public interaction with Grok, observing and rapidly responding to the presentation of risks such as the kind contemplated herein. This continues to be an accelerant for xAI’s model risk identification and mitigation. Addressing Risks of Malicious Use xAI aims to reduce the risk that the use of its models might contribute to a bad actor potentially seriously injuring people, property, or national security interests, including reducing such risks by enacting measures to prevent use for the development or proliferation of weapons of mass destruction and large-scale violence. Without any safeguards, we recognize that advanced AI models could lower the barrier to entry for bad actors seeking to develop chemical, biological, radiological, or nuclear (“CBRN”) or cyber weapons, and could help automate knowledge compilation to swiftly overcome bottlenecks to weapons development, amplifying the expected risk posed by such weapons of mass destruction. Our most basic safeguard against malicious use is to train and instruct our publicly deployed models to decline requests showing clear intent to engage in criminal activity which poses risks of severe harm to others, also known as our basic refusal policy. Under this FAIF, xAI’s models apply heightened safeguards if they receive user prompts that pose a foreseeable and non-trivial risk of resulting in large-scale violence, terrorism, or the use, development, or proliferation of weapons of mass destruction, including CBRN weapons, and major cyber attacks on critical infrastructure. For example, xAI’s models apply heightened safeguards if they receive a request to act as an agent or tool of mass violence, or if they receive requests for step-by-step instructions for committing mass violence. In this FAIF, we particularly focus on requests that pose a Catastrophic Risk. However, we may selectively allow xAI’s models to respond to such requests from some vetted, highly trusted users (such as trusted third-party safety auditors or large enterprise customers under contract) whom we know to be using those capabilities for benign or beneficial purposes, such as scientifically investigating AI model’s capabilities for risk assessment purposes, or if such requests cover information that is already readily and easily available, including by an internet search. Even as we improve our model’s ability to scrutinize user behavior and identify bad actors, it remains imperative that xAI models apply these safeguards to user interactions. To this end, we continually evaluate and improve robustness to adversarial attacks that seek to remove xAI model safeguards (e.g., jailbreak attacks), or hijack and redirect Grok-powered applications toward nefarious purposes (e.g., prompt injection attacks).

Approach to Benchmarking To transparently measure our models’ safety properties, xAI utilizes public benchmarks like Weapons of Mass Destruction Proxy and Catastrophic Harm Benchmarks (described below). Such benchmarks are used to measure our model’s dual-use capability and resistance to facilitating large-scale violence, terrorism, or the use, development, or proliferation of weapons of mass destruction (including CBRN and major cyber weapons). In particular, we utilize the following benchmarks:

Virology Capabilities Test (VCT): VCT is a benchmark of dual-use multimodal questions on practical virology wet lab skills, sourced by dozens of expert virologists.

Weapons of Mass Destruction Proxy (WMDP) Benchmark: WMDP is a set of multiple-choice questions to enable proxy measurement of hazardous knowledge in biosecurity, cybersecurity, and chemical security. WMDP-Bio includes questions on topics such as bioweapons, reverse genetics, enhanced potential pandemic pathogens, viral vector research, and dual-use virology. WMDP-Cyber encompasses cyber

reconnaissance, weaponization, exploitation, and post-exploitation.2

Biological Lab Protocol Benchmark (BioLP-bench): BioLP-bench has modified biology protocols, in which an AI model must identify the mistake in the protocol. Responses are open-ended, rather than multiple-choice. To construct the dataset, 1 The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning 4 protocols were modified by introducing a single mistake that would cause the protocol to fail, as well as additional benign changes.3

Cybench: Cybench is a framework for evaluating cybersecurity capabilities of AI model agents. It includes 40 professional-level Capture the Flag (CTF) challenges selected from six categories: cryptography, web security, reverse engineering, forensics, miscellaneous, and exploitation.4 xAI regularly evaluates the adequacy and reliability of such benchmarks, including by comparing them against other benchmarks that we could potentially utilize, to determine and apply effective benchmarks available at the time of evaluation. We may revise this list of benchmarks periodically as relevant or more effective benchmarks for malicious use are created.

Risk Assessment Biological and Chemical Weapons: xAI approaches addressing risks using threat modeling. To design a bioweapon, a malicious actor must undergo a design process. In this threat model, “ideation” involves actively planning for a biological attack; “design” involves retrieving blueprints for a hazardous agent, such as determining the DNA sequence; “build” consists of the protocols, reagents, and equipment necessary to create the threat; and “test” consists of measuring characteristics or properties of the pathogen of interest. By “learning” from these results and iterating after the test phase, the design can be revised until the threat is released [Nelson and 4 Cybench: A Framework for Evaluating Cybersecurity Capabilities and Risks of Language Models 3 BioLP-bench: Measuring understanding of AI models of biological lab protocols 2 The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning Rose, 2023]. In the setting of biological and chemical weapons, xAI considers critical steps where we restrict xAI models from providing detailed information or substantial assistance:

Planning: brainstorming ideas or plans for creating a pathogen or chemical weapons or precursors, capable of causing severe harm to humans, animals, or crops

Circumvention: circumventing existing supply chain controls in order to access: ○ Restricted biological supplies ○ Export controlled chemical or biological equipment

Materials: acquiring or producing pathogens on the US Select Agents list or Australia Group list, or CWC Schedule I chemicals or precursors ○ Theory: understanding molecular mechanisms governing, or methods for altering, certain pathogen traits such as transmissibility and virulence.

Methods: performing experimental methods specific to animal-infecting pathogens, including: ○ Methods that relate to infecting animals or human-sustaining crops with pathogens or sampling pathogens from animals ○ Methods that relate to pathogen replication in animal cell cultures, tissues, or eggs, including serial passage, viral rescue, and viral reactivation ○ Specific procedures to conduct BSL-3 or BSL-4 work using unapproved facilities and equipment ○ Genetic manipulation of animal-infecting pathogens ○ Quantification of pathogenicity, such as infectious dose, lethal dose, and assays of virus-cell interactions These steps were identified in close collaboration with domain matter experts at SecureBio, NIST, RAND, and EBRC. xAI restricts its models from providing information that could accelerate user learning related to these steps through the use of AI-powered filters that specifically monitor user conversations for content matching these narrow topics and return a brief message declining to answer when activated. Radiological and Nuclear Weapons: Assessments to date lead xAI to conclude that its models do not substantially increase the likelihood of malicious use of nuclear and radiological materials and generally pose an acceptable risk. The international nonproliferation regime, domestic nuclear security and counterproliferation programs (DOE/NNSA) make us reasonably confident that our models are not trained on any sensitive, non-public nuclear information, and any potentially relevant information produced by our models is not actionable due to strict nuclear material security controls. Cyber Attacks on Critical Infrastructure: Independent third-party assessments of xAI’s current models on realistic offensive cyber tasks requiring identifying and chaining many exploits in sequence indicate that xAI’s models remain below the offensive cyber abilities of a human professional. Thus, xAI currently relies on our basic refusal policy to prevent misuse for cyber attacks. We will continue to re-assess the cyber misuse risks in future releases, and are still working on identifying enforceable critical steps similar to those we have identified for chemical and biological weapons. Thresholds: We developed an internal benchmark of benign and restricted biology and chemistry related queries in collaboration with SecureBio. Our risk acceptance criteria for system deployment is maintaining an answer rate of less than 1 out of 20 on restricted queries. We plan to add additional thresholds tied to other benchmarks.

Safeguard Implementation xAI’s objective is for our models to comply with their guiding principles, robustly resisting attempted manipulation and adversarial attacks. In addition to the incidental alignment resulting from post-training (our models naturally tend to refuse malicious requests even without any safety-specific training data), we are developing training methods and will continue to train our models to robustly resist complying with requests to provide assistance with highly injurious malicious use cases. Driving towards our safety objectives, we continue to design and deploy the following safeguards into our models:

Safety training: Training our models to recognize and decline harmful requests.

System prompts: Providing high-priority instructions to our models to enforce our basic refusal policy.

Input and output filters: Applying classifiers to user inputs or model outputs to verify safety when a model is queried regarding weapons of mass destruction or cyberterrorism. Because xAI is committed to continual improvement, we will continue to evaluate our approach to enhancing safety. Thus, xAI may change its approach from that listed above in order to make additional improvements. Addressing Risks of Loss of Control One of the most salient risks of AI within the public consciousness is the loss of control of advanced AI systems. While difficult to pinpoint particular risk scenarios, it is generally understood that certain concerning propensities of AI models, such as deception and sycophancy, may heighten the overall risk of such outcomes, such as propensities for deception and sycophancy. It is also possible that AIs may develop value systems that are misaligned with humanity’s interests5 and inflict widespread harms upon the public. 5 Utility Engineering: Analyzing and Controlling Emergent Value Systems in AIs xAI aims to accurately measure these propensities and reduce them through careful engineering. However, planning and executing robust evaluations and mitigation measures remains challenging for xAI and its industry peers due to the difficulty of constructing sound, realistic evaluations. For example, if the evaluation environment is recognizable as a testing environment to the AI system under test, the system may change its behavior6 intentionally or unintentionally.

Approach to Benchmarking The following are example benchmarks that xAI may use to evaluate its models for concerning propensities relevant to loss of control risks:

Model Alignment between Statements and Knowledge (MASK):7 Frontier LLMs may lie when under pressure; and increasing model scale may increase accuracy but may not increase honesty. MASK is a benchmark to evaluate honesty in LLMs by comparing the model’s response when asked neutrally versus when pressured to lie.

Sycophancy:8 A tendency toward excessive flattery or other sycophantic behavior has been observed in some production AI systems,9 possibly resulting from directly optimizing against human preferences. xAI uses an evaluation setting initially introduced by Anthropic to quantify the degree to which this behavior manifests in regular conversational contexts. xAI regularly evaluates the adequacy and reliability of such benchmarks, including by comparing them against other benchmarks that we could potentially utilize. We may revise this list of benchmarks periodically as relevant benchmarks for loss of control are created.

Risk Assessment xAI has assessed its models’ propensities in real-world settings and the models do not exhibit high levels of concerning propensities in such settings. Furthermore, xAI makes its model’s operations transparent by placing them on publicly available platforms, such as X, so that members of the public may comment and provide feedback to xAI. Moreover, xAI monitors and observes its models responses so that it can rapidly respond if the model presents propensities for untruthfulness or sycophancy. Thresholds: Our risk acceptance criteria for system deployment is maintaining a dishonesty rate of less than 1 out of 2 on MASK. We plan to add additional thresholds tied to other benchmarks. 9 Sycophancy in GPT-4o: what happened and what we’re doing about it 8 Towards Understanding Sycophancy in Language Models 7 The MASK Benchmark: Disentangling Honesty From Accuracy in AI Systems 6 Taken out of context: On measuring situational awareness in LLMs

Safeguard Implementation xAI trains its models to be honest and have values conducive to controllability, such as recognizing and obeying an instruction hierarchy. In addition, using a high level instruction called a “system prompt”, xAI directly instructs its models to not deceive or deliberately mislead the user. Operational and Societal Risks xAI aims to mitigate and address significant operational and societal risks posed by our AI models. We believe that public transparency, third-party review, and information security are important methods that can be utilized to address such risks.

Public transparency and third-party review xAI aims to keep the public informed about our risk management policies. As we work towards incorporating more risk management strategies, we intend to publish updates to this FAIF. For public transparency and third-party review, we may publish the following types of information listed below. However, to protect public safety, national security, and our intellectual property, we may redact information from our publications. As necessities dictate, we may also provide vetted and qualified external red teams or appropriate government agencies unredacted versions.

Frontier AI Framework adherence: Regularly review our adherence with this FAIF. Internally, we allow xAI employees to anonymously report concerns about nonadherence, with protections from retaliation.

Benchmark results: Share with relevant audiences leading benchmark results for general capabilities and the benchmarks listed above, upon new major releases.

Internal AI usage: Assess the percent of code or percent of pull requests at xAI generated by our models, or other potential metrics related to AI research and development automation.

Survey: Survey employees for their views and projections of important future developments in AI, e.g., capability gains and benchmark results.

Public Understanding xAI is exploring building truth-seeking AI tools, such as AIs that can help users better assess and understand events by better sorting through inaccurate or biased materials. 10 The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions

Information Security xAI has implemented appropriate information security standards sufficient to prevent its critical model information from being stolen by a motivated non-state actor. Practices include encryption of model weights, role-based access controls, and real-time monitoring to prevent unauthorized transfer. To prevent the unauthorized proliferation of advanced AI systems, we also implement security measures against the large-scale extraction and distillation of reasoning traces, which have been shown to be highly effective in quickly reproducing advanced capabilities while expending far fewer computational resources than the original AI system.11

Governance Approach To foster accountability, we integrate the approach of designating risk owners, including assigning responsibility for proactively mitigating identified risks. Risk owners are also responsible for periodic audits to enforce framework implementation. Risk owners are also responsible for monitoring for critical incidents or imminent threats, which may be identified through:

- Red-teaming and internal testing;

- Real-time monitoring, telemetry, and alerting of threshold breaches via internal tooling;

- Monitoring and alerting of public comments from the X platform. Should it happen that xAI learns of an imminent threat of a significantly harmful event, including loss of control, we may take steps such as the following to stop or prevent that event:

If we determine it is warranted, we may notify and cooperate with relevant law enforcement agencies, including any agencies that we believe could play a role in preventing or mitigating the incident. xAI employees have whistleblower protections enabling them to raise concerns to relevant government agencies regarding imminent threats to public safety.

If we determine that xAI systems are actively being used in such an event, we may take steps to isolate and revoke access to user accounts involved in the event.

If we determine that allowing a system to continue running would materially and unjustifiably increase the likelihood of a catastrophic event, we may temporarily fully shut down the relevant system until we have developed a more targeted response.

We may perform a post-mortem of the event after it has been resolved, focusing on any areas where changes to systemic factors (for example, safety culture) could have averted such an incident. We may use the post-mortem to inform development and implementation of necessary changes to our risk management practices. 11 DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Deployment Decisions To mitigate risks, xAI employs tiered availability of the functionality and features of its models. For instance, the full functionality of our models may be available to only a limited set of trusted parties, partners, and government agencies. We may also mitigate risks by adding additional controls on functionality and features depending on the type of end user. For instance, features that we make available to consumers using mobile apps may be different than the features made available to sophisticated businesses. We will also balance various factors when making deployment decisions. The necessity and extent of deployment of certain safeguards and mitigations may depend on how a model performs on relevant benchmarks. Pre-deployment reviews include assessing benchmark results (e.g., WMDP scores) and mitigation effectiveness. For internal use, we review catastrophic risks like oversight evasion before extensive rollout. However, to ensure responsible deployment, this FAIF will be continually adapted and updated as circumstances change, before major new capabilities are launched, and in response to incidents. It is conceivable that for a particular modality and/or type of release, the expected benefits of model deployment may outweigh the risks identified by a particular benchmark. For example, a model that poses a high risk of some forms of malicious cyber use may be beneficial to release to certain trusted parties if it would empower defenders more than attackers or would otherwise reduce the overall number of catastrophic events. xAI Data Disclosure This data disclosure is issued pursuant to California’s AB-2013. Grok is pretrained with a data recipe that includes publicly available Internet data, data produced by third parties for xAI, data from users (with the exception of Grok 1) or contractors, and internally generated data. xAI aims to build AI models that are maximally truth-seeking, understand the true nature of the universe, and accelerate human scientific discovery, and xAI’s use of its datasets is intended to further those purposes. xAI’s AI models were trained on datasets and dynamic datasets containing trillions to tens of trillions of tokens. Grok is pretrained with a data recipe that includes publicly available Internet data, data produced by third parties for xAI, data from users (with the exception of Grok 1) or contractors, and internally generated data. In addition to pre-training, our recipe uses a variety of reinforcement learning techniques—human feedback, verifiable rewards, and model grading—along with supervised finetuning of specific capabilities. Our datasets include data in the public domain and datasets that xAI has the necessary rights to use for training purposes. xAI has the necessary rights to use the datasets it uses for training purposes, including because certain datasets were purchased or licensed. Training datasets may incidentally include personal information, and their use is subject to compliance with applicable laws and regulations. Training datasets may incidentally include aggregate consumer information, and their use is subject to compliance with applicable laws and regulations. xAI cleans, processes or modifies datasets to facilitate pre-training, conduct reinforcement learning, and supervise finetuning. Datasets were collected at various times since xAI was founded in March 2023. Data collection is ongoing. Grok 1 began training on or about August 2023; Grok 1.5 began training on or about August 2023; Grok 2 began training on or about February 2024; Grok 3 began training on or about September 2024; Grok 4 began training on or about September 2024; Grok Code Fast 1 began training on or about September 2024; Grok 4 Fast began training on or about September 2024; and Grok 4.1 began training on or about May 2025. xAI uses synthetic data generation in the development of its AI models in order to improve its AI models, including in reinforcement learning, finetuning, and post-training.