Methodology

The new math behind Simulacra's Generative Causal AI.

Research asks questions. Simulacra finds answers. Every study contains a sample of its response structure: the way choices, prices, claims, segments, attitudes, baskets, transactions, and outcomes move together; how every measured variable shifts when the others move. Most synthetic-data tools approximate this structure as a product of marginals, or as text completions over field names, losing the conditional dynamics that make the original sample useful. Simulacra's Generative Causal AI learns the study's response structure directly, then lets you run new scenarios against the world your data actually measured.

Causality means fidelity

A row is a linked system, not a pile of marginals.

A generator that samples each variable from its own marginal independently can hit every univariate target while returning an impossible record — one no respondent or process could have produced. Older generators often yield incoherent rows, ignoring the internal causal structure and dependencies within the research.

Simulacra fits those constraints directly. It learns the response structure of the data: which variables can move freely, which variables have to move together, and which combinations cannot be defended by the measured process. That's why Simulacra can generate coherent rows, condition on a segment, run an intervention, and refuse a query the fitted model cannot answer. The causal behavior starts with row-level fidelity.

Generator that samples marginals

Each variable drawn from its own marginal. Resulting row:

Age
6
Income
$212k
Children
3
Home owner
Yes

every column inside its real range; the respondent cannot exist

Simulacra — linked system

Every value predicted in the context of every other:

Age
34
Income
$84k
Children
2
Home owner
Yes

rows are a linked system; dependencies preserved

The response surface

Simulacra learns the shape of the measured world.

The AI is tabular-native and diffusion-inspired, built for mixed data rather than text personas. It learns the joint response structure of the study: distributions, higher-order dependencies, segment dynamics, and the constraints that make one completed row plausible and another incoherent. A condition does not filter to nearby records; it reweights the whole model, with confidence that changes as the query moves toward thinner evidence.

Query the surface

Move the controls. Simulacra treats the values you set as a condition, then regenerates the remaining variables from the fitted response structure. In dense regions the answer is high-confidence. Near the edge, the engine labels the uncertainty. When the request goes beyond what the fitted model can defend, it refuses instead of inventing a row.

34 years
$78k
1 child
Inspect the evidence
observed record evidence around query conditioned query
Tip: hover any observed record to see the full row.
Model response

This visual is intentionally simple: real projects have dozens to hundreds of variables. Simulacra is not looking up nearby rows. It learns a joint response structure and reweights that structure under a condition. Confidence is highest where the fitted structure has strong evidence, thinner near the edge, and explicit when the data cannot answer.

Inference Safe by Design

The schema defines what Simulacra can be asked. The fitted response structure defines what kind of answer it can generate.

cleaned schema price segment outcome Only measured fields can be conditioned or generated. unfielded new variable blocked return type high evidence 200 / 200 rows returned full generation thin evidence 87 / 200 feasible rows viable ratio no prior 0 / 200 rows returned refusal reason
Schema gate and return behavior, shown separately.

Simulacra does not pretend every condition is equally answerable. It returns the strongest answer the fitted model can defend.

  1. Measured variables only. The cleaned schema is the contract. If the study did not field the question, Simulacra will not invent it later.
  2. Generalization with diagnostics. Inside the measured world, Simulacra can move along the learned response surface. As evidence thins, Simulacra marks the answer and never presents thin evidence as full support.
  3. Partial generation comes before refusal. When the fitted structure does not have enough evidence for the full request, Simulacra may return fewer rows than requested, mark the result with a partial-feasibility warning, and provide the closest viable condition or ratio when tractable.
  4. No prior means no fabricated answer. When the fitted model has no defensible prior probability for the condition, the Studio explains the boundary and the API returns a structured refusal reason.
Simulacra in comparison

How does Simulacra improve on existing methods?

A Bayesian network gives a conditional probability table. A structural causal model gives a single causal estimand under a specified graph and adjustment set. Simulacra generates a complete dataset of the conditioned population such that each row remains internally coherent.

The comparison below uses a blinded food-delivery study to show the native output of each method: distribution lookup, effect estimate, and generated scenario population.

Scenario

Set income to its high tier (₹25,000+ Indian rupees). What does each method tell you about behavior under that scenario?

When the structural causal model tab is active this region shows a directed graph among the five variables, with do(income) severing the intervention target's incoming arrows. The Bayesian network, Simulacra, and side-by-side Compare tabs do not use this region — they render their artifacts as HTML below.

Validation philosophy

Every claim ships with a paper.

Twin-2K-500 tests survey synthesis on a public benchmark. Pricing & Promo tests an applied sales holdout. Data Reduction tests sample-size economics. Each page includes methodology, holdout design, and documented gaps. We'll run the same validation on your data.

Run the methodology on your data

See our math on your data.

Bring a study you already fielded. We hold out a portion, fit Simulacra on the remainder, run the same methodology you just read, and send back the scorecard. Standard NDA, no contract required.