AI Safety and Alignment Leadership · Fractional & Remote Executive
When capable AI scales, safety becomes a leadership problem
As AI systems grow more capable, they exhibit emergent behavior that no evaluation pipeline was designed to catch. Risks arise from scale, context, and interaction — not from design alone. Static controls and compliance checklists were built for software that behaves predictably. Consequential AI does not.
I work with organizations to close the robustness gap: the distance between nominal safety and real-world resilience. My work produces technical architectures and organizational designs that function together — because one without the other is not AI safety.
Available for fractional and remote executive roles in AI safety and alignment leadership through 1023AI.

Stephen Lieberman
AI Safety and Alignment Leadership through 1023AI
The Problem
Consequential AI creates a leadership problem that evaluation frameworks cannot solve
As capable systems scale, they produce emergent capabilities that were not designed, emergent risks that were not anticipated, and interaction effects with human systems that no evaluation framework fully captures. The gap between what the model was tested on and what it actually does — in messy organizational contexts, across human-AI teams, under real-world deployment pressure — is where the most consequential safety failures live.
This is why safety at scale is not just a research problem. It is a leadership problem.
Static controls, compliance checklists, and point-in-time evaluations are designed for systems with stable behavior and enumerable failure modes. They were not designed for systems whose capabilities and risks emerge at scale, shift with deployment context, or interact with human behavior in ways that are invisible during testing.
The robustness gap
A system can appear safe in theory and still fail under real deployment pressure. I use the term robustness gap to describe the distance between nominal safety and real-world resilience. In consequential AI environments, safety claims are exposed to shifting incentives, changing contexts, adversarial pressure, organizational fragmentation, and downstream effects that do not appear in controlled settings.
This is also where AI iatrogenics becomes dangerous. In medicine, iatrogenics refers to harm caused by the treatment itself. A narrow intervention can reduce one visible risk while creating new harms elsewhere — distorting incentives, increasing brittleness, or destabilizing the broader system.
In consequential AI systems, emergent misalignment is the most significant expression of the robustness gap. Alignment that appeared solid at one capability level quietly breaks down as the system becomes more capable — through a process that standard evaluation pipelines were not designed to detect. Closing this gap requires leadership that understands how emergence, nonlinearity, and sociotechnical systems behave in the real world.
The Gap
When to bring me in
AI safety and alignment become a leadership function when:
Approach
How I approach AI safety and alignment
My work produces two things simultaneously: technical systems and the organizational architectures that make those systems safe under real-world pressure. These are not separate deliverables. They are developed together — because the technical system shapes what the human system can do, and the human system shapes what the technical system needs to be.
I approach consequential AI as a systems problem, a leadership problem, and a human problem.
Complex adaptive systems
Consequential AI does not operate in isolation. It interacts with organizations, incentives, feedback loops, and people in ways that produce behavior no single component was designed to generate. Safety is a property of the whole sociotechnical environment, not just the model.
Emergent capabilities and risks
Both capabilities and risks emerge through scale, interaction, and deployment context — not through design alone. This includes emergent misalignment: alignment that held at one capability level degrading quietly as the system grows more capable. Safety strategy must be designed for a moving target.
Epistemic uncertainty
Leaders deploying consequential AI make real decisions under genuine uncertainty. That uncertainty is not a gap to be closed by better evaluation. At scale, in messy organizational contexts, and across human-AI teams, uncertainty is a structural feature of the domain.
Safety at scale
The real test is whether safety survives growth, speed, strategic pressure, and social consequence. That standard cannot be met by evaluation frameworks alone. It requires leadership that can govern the whole system as it scales.
Emergence foresight
Emergence foresight is the capacity to reason about what a system might become, not just what it currently is. Governing for the capability horizon — not just the current deployment state — is what distinguishes genuine AI safety leadership from point-in-time compliance.
Emergent AI safety
Safety itself can be treated as an emergent property of the broader sociotechnical system, not a fixed specification applied to the model. It must be cultivated across technical architecture, organizational design, human-AI teaming, and institutional governance simultaneously.
About
Stephen Lieberman
Stephen Lieberman is an AI safety and alignment leader whose work sits at the intersection of consequential AI, emergent complex systems, institutional governance, and human consequence. Through 1023AI, he works with organizations as capable AI systems move from controlled research environments into the messy, high-stakes realities of real-world deployment.
He focuses on emergent capabilities and emergent risks, emergent misalignment, the robustness gap between evaluated safety and deployed safety, and the institutional and human conditions under which consequential AI safety holds or disappears.
His core view is that capable AI cannot be governed as if it were ordinary software. As systems scale, they become emergent complex systems shaped not only by model architecture but by interaction effects, organizational structure, human-AI teaming dynamics, and downstream social consequence.
Organizations deploying consequential AI need more than a policy specialist, more than an ethicist, and more than a narrow technical reviewer. They need leadership that can move between model behavior, executive judgment, institutional design, and real-world consequence.
Mission-critical technical and operational leadership
More than 20 years leading technical and operational teams across government, defense, academia, nonprofit, and industry. Senior leadership on Department of Defense and Veterans Affairs programs within funding environments exceeding $100 billion, spanning enterprise architecture, decision-support systems, security and compliance, electronic health records, cloud systems, and data strategy.
Defense, security, and international systems
At the Naval Postgraduate School, served as a DoD civilian program leader and Principal Investigator for programs in defense technology, modeling and simulation, collaboration platforms, and decision-support systems. Work included counterterrorism, counterinsurgency, peacekeeping operations, and international collaboration across more than 100 countries.
Recognized leadership in high-consequence environments
Undersecretary of Defense Michael G. Vickers recognized my technical leadership with an official letter of commendation for creating a ground-breaking tool that will benefit the U.S. government and our allies as we continue to combat terror. Led programs with multimillion-dollar budgets and worked directly with senior leaders across defense, government, and institutional settings.
Deep research foundation in complex systems
Research background spans modeling and simulation, agent-based modeling, network theory, human behavior forecasting, sociotechnical systems, cognitive neuroscience, and human-computer interaction. H-index of 7, more than 100 citations, and 8 highly influential citations (Semantic Scholar).
Human systems as core variables in AI safety
Most AI safety frameworks treat human systems as context rather than as a core variable. That framing misses something consequential. Organizational dynamics, institutional incentives, and social structures determine whether safety holds or fails in deployment. Interventions that ignore these dimensions do not simply miss a variable — they create new failure modes.
Sociotechnical and human-centered disciplinary grounding
Approach draws on sociotechnical systems theory, organizational behavior, industrial psychology, human-centered design, and macro social work. These disciplines illuminate how people actually act inside institutions under real pressure — and how to intervene at the level of systems and structural conditions, which is precisely the level at which consequential AI governance must operate.
The Grand Challenge to Harness Technology for Social Good
Currently advancing AI safety research through the Doctor of Social Work program at the University of Southern California, supporting the Grand Challenge to Harness Technology for Social Good. The DSW is a practice-focused doctorate designed for real institutional contexts. The most significant gaps in consequential AI governance are not purely technical — they are organizational, institutional, and deeply human.
Executive leadership that is operational, not theoretical
Strategic and operational executive since 2005. President and Executive Director of a California technology nonprofit through a decade of sustained growth. CEO and C-suite roles across advisory, technology, and media. Quantitative trading in high-dimensional risk modeling — where the cost of being wrong is immediate and measurable. That reasoning structure transfers directly to consequential AI safety.
Selected institutions and mission areas include the Department of Defense, Department of State, U.S. Congress, FEMA, Northrop Grumman, the Defense Manpower Data Center, the Department of the Navy, the Department of Veterans Affairs, the Undersecretary of Defense, the Naval Postgraduate School, and the University of Southern California. Mission areas: defense and security, counterterrorism, counterinsurgency, peacekeeping operations, health systems, decision-support systems, disaster recovery, nonprofit leadership, workforce development, digital inclusion, and consequential AI alignment.
Why 1023AI
The name references Avogadro's number (6.022 x 1023), the precise mathematical boundary where immense collections of microscopic interactions forge emergent macroscopic behavior. That is not a metaphor for AI. It is a description of what actually happens. Scaling does not simply improve performance. It changes what the system is, what it can do, and what it can get wrong. Beyond a certain scale, aggregate behavior changes qualitatively, demanding a different approach.
The European Commission's official Guidelines under the EU AI Act arrive at the same number — establishing 1023 floating-point operations of training compute as the precise threshold at which AI models qualify as General Purpose AI triggering mandatory regulatory oversight. That convergence is not coincidental. It marks the boundary where AI generality becomes real, emergent capabilities and emergent risks become the dominant safety challenge, and governance must cross the same threshold the model does. Safety at that scale requires leadership that understands emergence, not just evaluation. That is what my work is about.
If you are deploying consequential AI, safety cannot stay downstream
I am open to conversations with organizations deploying consequential AI that are exploring fractional or in-house executive leadership in AI safety and alignment. The work is never generic — every engagement is shaped by the specific organization, its specific challenges, and the specific sociotechnical system it is operating within.
If your organization is navigating emergent capabilities or emergent risks, the gap between evaluated safety and real-world resilience, or the human and institutional conditions that determine whether safety holds at scale — reach out.
Safety at scale. That is what I do.