10 to the 23 AI logo
Stephen Lieberman
Through 1023AI

AI Safety and Alignment Leadership  ·  Fractional & Remote Executive

When capable AI scales, safety becomes a leadership problem

As AI systems grow more capable, they exhibit emergent behavior that no evaluation pipeline was designed to catch. Risks arise from scale, context, and interaction — not from design alone. Static controls and compliance checklists were built for software that behaves predictably. Consequential AI does not.

I work with organizations to close the robustness gap: the distance between nominal safety and real-world resilience. My work produces technical architectures and organizational designs that function together — because one without the other is not AI safety.

Available for fractional and remote executive roles in AI safety and alignment leadership through 1023AI.

Stephen Lieberman, AI safety and alignment leader

Stephen Lieberman

AI Safety and Alignment Leadership through 1023AI

20+ years leading technical and operational teams
Senior leadership on DoD/VA programs in $100B+ environments
Principal Investigator for multimillion-dollar defense programs
8 highly-influential peer-reviewed citations
Executive leadership across government, academia, nonprofit, and industry

The Problem

Consequential AI creates a leadership problem that evaluation frameworks cannot solve

As capable systems scale, they produce emergent capabilities that were not designed, emergent risks that were not anticipated, and interaction effects with human systems that no evaluation framework fully captures. The gap between what the model was tested on and what it actually does — in messy organizational contexts, across human-AI teams, under real-world deployment pressure — is where the most consequential safety failures live.

This is why safety at scale is not just a research problem. It is a leadership problem.

Static controls, compliance checklists, and point-in-time evaluations are designed for systems with stable behavior and enumerable failure modes. They were not designed for systems whose capabilities and risks emerge at scale, shift with deployment context, or interact with human behavior in ways that are invisible during testing.

Capable AI becomes hardest to govern at exactly the point it becomes most consequential.

The robustness gap

A system can appear safe in theory and still fail under real deployment pressure. I use the term robustness gap to describe the distance between nominal safety and real-world resilience. In consequential AI environments, safety claims are exposed to shifting incentives, changing contexts, adversarial pressure, organizational fragmentation, and downstream effects that do not appear in controlled settings.

This is also where AI iatrogenics becomes dangerous. In medicine, iatrogenics refers to harm caused by the treatment itself. A narrow intervention can reduce one visible risk while creating new harms elsewhere — distorting incentives, increasing brittleness, or destabilizing the broader system.

In consequential AI systems, emergent misalignment is the most significant expression of the robustness gap. Alignment that appeared solid at one capability level quietly breaks down as the system becomes more capable — through a process that standard evaluation pipelines were not designed to detect. Closing this gap requires leadership that understands how emergence, nonlinearity, and sociotechnical systems behave in the real world.

The Gap

Nominal safety
What evaluation environments measure
Deployment reality
Shifting incentives, adversarial pressure, context drift
Organizational complexity
Fragmentation, competing priorities, downstream effects
Real-world resilience
Safety that survives at scale

When to bring me in

AI safety and alignment become a leadership function when:

Emergent capabilities are scaling faster than existing governance structures can track
Safety needs executive ownership — not just downstream review
You need a credible senior integrator across research, policy, product, legal, and operations
Your leadership needs a technically serious voice that also understands institutional dynamics
Human-AI teaming is creating accountability gaps that compliance frameworks were not built for
Deployment risk extends beyond benchmarks to emergent behavior in real organizational contexts
You want a safety leader who understands both emergent AI and real-world operational consequence

Approach

How I approach AI safety and alignment

My work produces two things simultaneously: technical systems and the organizational architectures that make those systems safe under real-world pressure. These are not separate deliverables. They are developed together — because the technical system shapes what the human system can do, and the human system shapes what the technical system needs to be.

I approach consequential AI as a systems problem, a leadership problem, and a human problem.

Complex adaptive systems

Consequential AI does not operate in isolation. It interacts with organizations, incentives, feedback loops, and people in ways that produce behavior no single component was designed to generate. Safety is a property of the whole sociotechnical environment, not just the model.

Emergent capabilities and risks

Both capabilities and risks emerge through scale, interaction, and deployment context — not through design alone. This includes emergent misalignment: alignment that held at one capability level degrading quietly as the system grows more capable. Safety strategy must be designed for a moving target.

Epistemic uncertainty

Leaders deploying consequential AI make real decisions under genuine uncertainty. That uncertainty is not a gap to be closed by better evaluation. At scale, in messy organizational contexts, and across human-AI teams, uncertainty is a structural feature of the domain.

Safety at scale

The real test is whether safety survives growth, speed, strategic pressure, and social consequence. That standard cannot be met by evaluation frameworks alone. It requires leadership that can govern the whole system as it scales.

Emergence foresight

Emergence foresight is the capacity to reason about what a system might become, not just what it currently is. Governing for the capability horizon — not just the current deployment state — is what distinguishes genuine AI safety leadership from point-in-time compliance.

Emergent AI safety

Safety itself can be treated as an emergent property of the broader sociotechnical system, not a fixed specification applied to the model. It must be cultivated across technical architecture, organizational design, human-AI teaming, and institutional governance simultaneously.

About

Stephen Lieberman

Stephen Lieberman is an AI safety and alignment leader whose work sits at the intersection of consequential AI, emergent complex systems, institutional governance, and human consequence. Through 1023AI, he works with organizations as capable AI systems move from controlled research environments into the messy, high-stakes realities of real-world deployment.

He focuses on emergent capabilities and emergent risks, emergent misalignment, the robustness gap between evaluated safety and deployed safety, and the institutional and human conditions under which consequential AI safety holds or disappears.

His core view is that capable AI cannot be governed as if it were ordinary software. As systems scale, they become emergent complex systems shaped not only by model architecture but by interaction effects, organizational structure, human-AI teaming dynamics, and downstream social consequence.

My background is not conventional. That is precisely the point.

Organizations deploying consequential AI need more than a policy specialist, more than an ethicist, and more than a narrow technical reviewer. They need leadership that can move between model behavior, executive judgment, institutional design, and real-world consequence.

Mission-critical technical and operational leadership

More than 20 years leading technical and operational teams across government, defense, academia, nonprofit, and industry. Senior leadership on Department of Defense and Veterans Affairs programs within funding environments exceeding $100 billion, spanning enterprise architecture, decision-support systems, security and compliance, electronic health records, cloud systems, and data strategy.

Defense, security, and international systems

At the Naval Postgraduate School, served as a DoD civilian program leader and Principal Investigator for programs in defense technology, modeling and simulation, collaboration platforms, and decision-support systems. Work included counterterrorism, counterinsurgency, peacekeeping operations, and international collaboration across more than 100 countries.

Recognized leadership in high-consequence environments

Undersecretary of Defense Michael G. Vickers recognized my technical leadership with an official letter of commendation for creating a ground-breaking tool that will benefit the U.S. government and our allies as we continue to combat terror. Led programs with multimillion-dollar budgets and worked directly with senior leaders across defense, government, and institutional settings.

Deep research foundation in complex systems

Research background spans modeling and simulation, agent-based modeling, network theory, human behavior forecasting, sociotechnical systems, cognitive neuroscience, and human-computer interaction. H-index of 7, more than 100 citations, and 8 highly influential citations (Semantic Scholar).

Human systems as core variables in AI safety

Most AI safety frameworks treat human systems as context rather than as a core variable. That framing misses something consequential. Organizational dynamics, institutional incentives, and social structures determine whether safety holds or fails in deployment. Interventions that ignore these dimensions do not simply miss a variable — they create new failure modes.

Sociotechnical and human-centered disciplinary grounding

Approach draws on sociotechnical systems theory, organizational behavior, industrial psychology, human-centered design, and macro social work. These disciplines illuminate how people actually act inside institutions under real pressure — and how to intervene at the level of systems and structural conditions, which is precisely the level at which consequential AI governance must operate.

The Grand Challenge to Harness Technology for Social Good

Currently advancing AI safety research through the Doctor of Social Work program at the University of Southern California, supporting the Grand Challenge to Harness Technology for Social Good. The DSW is a practice-focused doctorate designed for real institutional contexts. The most significant gaps in consequential AI governance are not purely technical — they are organizational, institutional, and deeply human.

Executive leadership that is operational, not theoretical

Strategic and operational executive since 2005. President and Executive Director of a California technology nonprofit through a decade of sustained growth. CEO and C-suite roles across advisory, technology, and media. Quantitative trading in high-dimensional risk modeling — where the cost of being wrong is immediate and measurable. That reasoning structure transfers directly to consequential AI safety.

Selected institutions and mission areas include the Department of Defense, Department of State, U.S. Congress, FEMA, Northrop Grumman, the Defense Manpower Data Center, the Department of the Navy, the Department of Veterans Affairs, the Undersecretary of Defense, the Naval Postgraduate School, and the University of Southern California. Mission areas: defense and security, counterterrorism, counterinsurgency, peacekeeping operations, health systems, decision-support systems, disaster recovery, nonprofit leadership, workforce development, digital inclusion, and consequential AI alignment.

Why 1023AI

The name references Avogadro's number (6.022 x 1023), the precise mathematical boundary where immense collections of microscopic interactions forge emergent macroscopic behavior. That is not a metaphor for AI. It is a description of what actually happens. Scaling does not simply improve performance. It changes what the system is, what it can do, and what it can get wrong. Beyond a certain scale, aggregate behavior changes qualitatively, demanding a different approach.

The European Commission's official Guidelines under the EU AI Act arrive at the same number — establishing 1023 floating-point operations of training compute as the precise threshold at which AI models qualify as General Purpose AI triggering mandatory regulatory oversight. That convergence is not coincidental. It marks the boundary where AI generality becomes real, emergent capabilities and emergent risks become the dominant safety challenge, and governance must cross the same threshold the model does. Safety at that scale requires leadership that understands emergence, not just evaluation. That is what my work is about.

If you are deploying consequential AI, safety cannot stay downstream

I am open to conversations with organizations deploying consequential AI that are exploring fractional or in-house executive leadership in AI safety and alignment. The work is never generic — every engagement is shaped by the specific organization, its specific challenges, and the specific sociotechnical system it is operating within.

If your organization is navigating emergent capabilities or emergent risks, the gap between evaluated safety and real-world resilience, or the human and institutional conditions that determine whether safety holds at scale — reach out.

Safety at scale. That is what I do.

Start a Confidential Conversation

Your message goes directly to my private inbox. I treat every conversation as confidential.