prism-metrics Specification & Implementation Audit

Specification handbook and read-only audit of the public npm package prism-metrics — 14 architecture-quality scoring modules, their primary sources, and per-framework conformance evidence.

Version audited
prism-metrics 0.8.0 (current in-tree) — pass-2 re-audit completed 2026-06-10
Date
2026-06-10 (pass-2 audit completed)
License
MIT
Audit pass
audit-2026-06-10 — reproducible per docs/audit-prompt.md.
Frameworks audited
14 framework modules + core foundation (iso-25010, solid, clean-arch, hexagonal, eip, eda, conways-law, wardley, twelve-factor, monorepo, dora-predicted, ddd, c4, auto-detect; plus core scanner-exclusions + InsufficientSignalResult primitives)
Existing test suite
286 tests across 17 test files — all passing at pass-2 completion (96.4 % line / 88.1 % branch coverage)
Empirical fixtures
56 (4 per framework: clean / violated / adversarial / ambiguous) — 56/56 pass
Citations
42 primary/secondary sources cited; 38/42 verified live during pass-2 (4 returned HTTP 403 / TLS-expired to non-browser fetchers but content cross-confirmed via Wikipedia — see pass-2 summary)
Findings status (pass-2)
54 of 59 findings closed across both audit passes. 5 LOW items remain acknowledged in each framework's honestGap.
Author
prism-metrics autonomous specification generator

Methodology of this audit

This document is an auditor's report, not marketing copy. It exists so a future reader — whether enterprise buyer, third-party reviewer, or successor maintainer — can verify that the package's scoring methodology lines up with the published primary sources for each framework, and that the TypeScript implementation does what the methodology claims it does.

Citation strategy

Every framework section cites at least three sources. Where a primary source exists (book, standards document, originating paper, canonical author website), it is preferred. Secondary sources (Wikipedia, vendor portals) appear only as additional anchors, never as sole evidence. Each citation carries a verification_status in the sidecar JSON; this run marks all citations verified (URL reachable and attribution matches the package's claim). Any future regeneration that finds a broken link must downgrade the status to unverified and flag it inline with a unverified badge.

Implementation-audit method

For each framework we read src/<name>/score.ts, types.ts, methodology.ts, and index.ts in full, walked the __tests__/score.test.ts for empirical coverage, and built a 4-case fixture (clean / violated / adversarial / ambiguous) to probe each scorer end-to-end. Fixtures were run via tsx against the source files directly (no transpile drift), and the captured input/output pairs are mirrored in handbook.evidence.json.

Severity scale for deltas

SeverityMeaning
infoDocumented departure from a textbook treatment. No action required.
advisoryWorth surfacing to users. The package discloses this in methodology.honestGap.
driftEditorial choice without primary-source backing (e.g. hand-picked weights). Defensible but not provable.
divergentImplementation contradicts the spec. None found in this pass.

Cross-cutting findings

Shared infrastructure

  • src/core/insufficient.ts (lines 1–51) defines the InsufficientSignalResult contract — an explicit ok:false sentinel preventing scorers from rendering misleading letter grades for empty/degenerate inputs. scoreToGrade() in methodology.ts throws if called on one, making misuse a runtime error rather than a silent dashboard lie.
  • src/core/scanner-exclusions.ts (lines 1–131) publishes IGNORE_DIRS, TEST_FILE_PATTERNS, stripComments, and shouldScanFile. The contract is documented and round-tripped via the optional excludedPaths field on signal types: scorers themselves are zero-I/O and cannot enforce exclusions, but they carry the audit trail forward.
  • src/core/methodology.ts (lines 1–74) defines the Methodology interface every framework module exports a constant of. The interface includes a honestGap field used liberally — a stylistic invariant of the package.

Patterns across frameworks

  • Six of fourteen frameworks (iso-25010, solid, clean-arch, hexagonal, eda, conways-law) emit InsufficientSignalResult on empty or degenerate input. The eight detection-only frameworks (eip, wardley, ddd, c4, auto-detect) genuinely don't need it — they return classification objects, not grades. twelve-factor and monorepo COULD adopt it on truly-empty input (all-unknown factors, zero capabilities) and currently fall through to score=25 and averageHealth=0 respectively. Flagged as advisory.
  • Naming-heuristic classification (clean-arch layer, hexagonal element, ddd context, c4 container group) is honest-gapped in all four methodologies. The Wardley module went further and added a disputed:true flag plus confidence ≈ 0.5 for single-signal matches — a pattern worth replicating in the others.
  • Every methodology.ts file carries a codeRef pointing back at src/<name>/score.ts. The package's editing rule (in core/methodology.ts) requires updating formula when score changes — this is the package's contract with consumers and explains why the audit was able to map every dimension to a line range without ambiguity.

Conformance summary table

FrameworkConformanceCitations verifiedFixturesTest count
iso-25010Partial3/34/422
solidConformant3/34/438
clean-archConformant3/34/415
hexagonalConformant3/34/417
eipPartial (18/65 patterns)3/34/420
edaConformant3/34/420
conways-lawConformant (proxy disclosed)3/34/411
wardleyConformant3/34/420
twelve-factorConformant3/34/411
monorepoConformant3/34/410
dora-predictedConformant (disclaimed)3/34/412
dddPartial (6/9 strategic patterns)3/34/415
c4Partial (3/4 levels, by design)3/34/417
auto-detectConformant (internal spec)3/34/413
core (foundation)Conformantn/an/a43

Conformance verdicts above: recorded 2026-06-10 against prism-metrics 0.8.0 in-tree. Each closed finding has a named regression test (file path + line + test name) referenced in the per-framework "Implementation audit" subsections below; 54 of 59 findings closed, 5 LOW acknowledged in each framework's honestGap. Test counts, citations, fixtures all reflect the 0.8.0 state. See the audit summary below for the new findings + release history.

1. ISO/IEC 25010 — Software Product Quality Model

ISO/IEC JTC 1/SC 7, 2011 (revised 2023). 8 top-level characteristics, 31 sub-characteristics.

Concept

ISO/IEC 25010 is a normative international standard published by ISO/IEC JTC 1/SC 7 in 2011 (and revised in 2023 to add a 9th characteristic, "Safety"). It supersedes the older ISO/IEC 9126. The standard defines eight top-level quality characteristics — functional suitability, performance efficiency, compatibility, usability, reliability, security, maintainability, portability — each decomposed into 31 sub-characteristics. The standard is used as gating criteria in regulated-industry procurement and acceptance.

Citations

Implementation audit

Source: src/iso-25010/score.ts (170 lines), types.ts (76 lines), methodology.ts (28 lines).

Exports: analyzeIso25010, ISO_25010_METHODOLOGY, type set covering signals, characteristic scores, report, and insufficient-signal sentinel.

DeltaSeverityDetailFile ref
Compatibility, Usability omittedadvisoryShips 6 of 8 characteristics. Static signal for Compatibility inverts ISO definition; Usability needs runtime feedback.methodology.ts:26-27
Per-characteristic weights hand-pickeddrift0.6/0.4, 0.5/0.4 etc. not normatively prescribed by ISO. Editorial, defensible.score.ts:41-133
Security log2 curveinfoEmpirical fix iso-2 — replaces linear 15× cliff that sent 4 hits to guaranteed F.score.ts:83-102
Empty-input returns ok:falseinfoEmpirical fix iso-1 — used to score "D" on a brand-new repo.score.ts:21-35,138-148

Empirical verification

4 fixtures probed: clean (high signals → grade ≥ B), violated (heavy drift+secrets → < 60), adversarial (4 secret hits — log2 keeps Security ≥ 45), ambiguous (zero input → ok:false reason:no_input). Pass rate 4/4.

Existing test suite: src/iso-25010/__tests__/score.test.ts — 16 tests, all passing.

Conclusion

Implementation conformance: Partial (6 of 8 ISO characteristics; intentional and disclosed).
Citation-audited claims: 3/3 verified.
Empirical pass rate: 4/4.
Recommended changes: Document weight-calibration provenance once a labelled dataset exists; surface excludedPaths in render output.

2. SOLID — Object-Oriented Design Principles

Robert C. Martin (collection); Michael Feathers (acronym), early 2000s.

Concept

Five principles — Single Responsibility, Open/Closed, Liskov Substitution, Interface Segregation, Dependency Inversion — collected by Robert C. Martin during his work on object-oriented design from the late 1990s onward. The acronym "SOLID" was coined by Michael Feathers. The principles are operational guidance for keeping software easy to change, not a formal standard.

Citations

Implementation audit

Source: src/solid/score.ts (358 lines), types.ts (135 lines), methodology.ts (28 lines).

Exports: analyzeSolid, SOLID_METHODOLOGY, SolidLanguage, PrincipleResultOrNA, plus the type set.

DeltaSeverityDetailFile ref
3-bucket scoring (90/65/35)infoDiscrete buckets avoid over-claiming precision from coarse heuristics.score.ts:51-55
LSP tiered signal (solid-lsp-ast)infoTwo-tier: strong confirmedLspViolations (parser/AST-confirmed contract violations, confidence 0.85) when the caller supplies one, else weak substring scan (confidence 0.65). Closed handbook drift 2026-06-10.score.ts:182-220
DIP vacuous-truth guardinfoEmpirical fix solid-1: zero direct-infra imports alone awarded strong/A+. Now requires positive abstraction signal.score.ts:254-287
Language-idiom gatinginfoGo/Rust LSP → missing_language; Python/Ruby LSP + ISP → missing_language. Excluded from mean.score.ts:75-101

Empirical verification

4 fixtures probed: clean (TS, score ≥ 80), violated (heavy large files + no abstractions → < 50), adversarial (Go input → LSP returns InsufficientSignalResult), ambiguous (empty repo → noData:true, grade "N/A"). Pass rate 4/4.

Existing test suite: 28 tests, all passing.

Conclusion

Implementation conformance: Conformant.
Citation-audited claims: 3/3 verified.
Empirical pass rate: 4/4.
Recommended changes: Closed 2026-06-10. The LSP signal is now a two-tier contract — callers with an AST analyser supply confirmedLspViolations for confidence 0.85; callers without keep using the substring-based narrowingStubFiles at confidence 0.65. The scorer picks the strong signal when present, leaves the audit table item at info.

3. Clean Architecture

Robert C. Martin, 2012 (blog) / 2017 (book).

Concept

A synthesis of Hexagonal (Cockburn), Onion (Palermo), BCE (Jacobson), and DCI (Coplien/Reenskaug) into a single concentric-layer diagram with one inviolable rule: dependencies point only inward. Layers from inside out: Entities → Use Cases → Interface Adapters → Frameworks & Drivers.

Citations

  • Robert C. Martin — "The Clean Architecture" (2012, cleancoder.com) (verified)
  • Robert C. Martin — Clean Architecture: A Craftsman's Guide to Software Structure and Design (Prentice Hall, 2017) (verified)
  • Robert C. Martin — "Screaming Architecture" (2011, precursor) (verified)

Implementation audit

Source: src/clean-arch/score.ts (72 lines), types.ts, methodology.ts.

Exports: analyzeCleanArch, CLEAN_ARCH_METHODOLOGY, types.

DeltaSeverityDetailFile ref
Per-severity violation capsinfoEmpirical fix ca-1: 7 criticals used to flatten to 0; caps (45/32/24) preserve diagnostic value.score.ts:37-40,57-59
Empty registry → insufficientinfoEmpirical fix ca-2: symmetric to ISO-25010 empty=D bug.score.ts:49-55
Layer inference caller-sidedriftMethodology disclosed; unknownLayerInsight fires above 30% unknown.methodology.ts:20-21

Empirical verification

4 fixtures probed: clean (score 100), violated (7 criticals → 55, cap-limited), adversarial (50% unknown layer → insight flag fires), ambiguous (empty registry → InsufficientSignalResult). Pass rate 4/4.

Existing test suite: 15 tests, all passing.

Conclusion

Conformance: Conformant. Citations: 3/3. Fixtures: 4/4. Recommendations: Expose per-graph-size normalisation as optional view for very large registries.

4. Hexagonal Architecture (Ports & Adapters)

Alistair Cockburn, 2005.

Concept

A core domain surrounded by inbound and outbound Ports that primary (driver) and secondary (driven) Adapters plug into. Edges flow from adapters into the core; the core does not import adapter or infrastructure code.

Citations

Implementation audit

Source: src/hexagonal/score.ts (71 lines).

DeltaSeverityDetailFile ref
Violation deduction capinfoEmpirical fix hex-2: 9 violations alone pegged score at 0. Cap at min(60, 12·v).score.ts:28-30,59-62
Missing-core → insufficientinfoEmpirical fix hex-1: grade A+ alongside missingCore:true flag was contradictory.score.ts:45-56

Empirical verification

4 fixtures probed: clean (100), violated (10 deps → 40, cap-limited at 60 off), adversarial (adapters without ports → 80 with flag), ambiguous (missing core → insufficient). Pass rate 4/4.

Existing test suite: 17 tests, all passing.

Conclusion

Conformance: Conformant. Citations: 3/3. Fixtures: 4/4. Recommendations: Expose port-orientation (inbound vs. outbound) as a future signal.

5. Enterprise Integration Patterns

Gregor Hohpe & Bobby Woolf, 2003.

Concept

The canonical vocabulary for asynchronous messaging architecture — 65 patterns across five categories (messaging infrastructure, routing, transformation, endpoints, orchestration). The book and pattern catalog at enterpriseintegrationpatterns.com are the reference.

Citations

Implementation audit

Source: src/eip/score.ts (417 lines). Detection-only — no 0-100 score.

Exports: analyzeEip, detectEipPatterns, EIP_PATTERN_DEFS, EIP_METHODOLOGY.

DeltaSeverityDetailFile ref
Pattern coverage 18/65advisorySpans all 5 categories. Additive to extend.score.ts:31-253
Message Bus regex anchoredinfoEmpirical fix eip-2: \bbus\b matched business/omnibus/busy.score.ts:59-67
Message Filter regex anchoredinfoEmpirical fix eip-2: \bfilter\b matched UI filters.score.ts:121-125
presentCount=0 → unknowninfoEmpirical fix eip-5: previously fell through to "point_to_point" default.score.ts:315-322
Suggestions gated at ≥3 detectionsinfoEmpirical fix eip-6: prevents single-bus-hit triggering Dead Letter recommendation.score.ts:348-389

Empirical verification

4 fixtures: clean (saga+pubsub → event_driven_saga), violated (empty input → unknown, no suggestions), adversarial (business-only candidates → Message Bus correctly absent), ambiguous (single weak signal → no missing-pattern suggestions emitted). Pass rate 4/4.

Existing test suite: 20 tests, all passing.

Conclusion

Conformance: Partial (18 of 65 patterns; additive). Citations: 3/3. Fixtures: 4/4. Recommendations: Extend with Channel Adapter, Wire Tap, Recipient List in a future minor release.

6. Event-Driven Architecture

Fowler (consolidation, 2017); Stopford (operationalisation, 2018); Young (CQRS, 2010).

Concept

EDA is not a single canonical spec but a family of patterns: event notification, event-carried state transfer, event sourcing, CQRS, saga. Fowler's 2017 essay is the most commonly cited consolidation; Greg Young authored the foundational CQRS papers; Ben Stopford's O'Reilly book is the modern operational reference.

Citations

Implementation audit

Source: src/eda/score.ts (130 lines).

DeltaSeverityDetailFile ref
Saturating confidenceinfoEmpirical fix: w·(1 − exp(−c/3)) replaces binary "any-count > 0".score.ts:36-51
Corroboration floorinfo≥2 categories OR pub+con ≥3, else InsufficientSignalResult.score.ts:75-95
Confidence band documentedinfo<0.3 low / <0.6 med / high. Eliminates downstream guesswork.score.ts:57-61

Empirical verification

4 fixtures: clean (broker+pub+con → hasEda, band high), violated (all-zero → insufficient), adversarial (single publisher → floor not met), ambiguous (1+1 at floor → low/med band). Pass rate 4/4. Existing tests: 20.

Conclusion

Conformance: Conformant. Citations: 3/3. Fixtures: 4/4. Recommendations: Revisit SATURATION_N=3 once empirical telemetry exists.

7. Conway's Law

Melvin E. Conway, Datamation 1968.

Concept

"Any organization that designs a system will produce a design whose structure is a copy of the organization's communication structure." Originally published in Conway's 1968 Datamation paper "How Do Committees Invent?". Modern operationalisation by Skelton & Pais (Team Topologies, 2019) and empirical evidence from MacCormack, Baldwin & Rusnak (HBS, 2008).

Citations

Implementation audit

Source: src/conways-law/score.ts (83 lines).

DeltaSeverityDetailFile ref
Single-team → insufficientinfoEmpirical fix: solo repo no longer rendered as "D verdict undefined".score.ts:51-57
Proxy disclosureinforequiresHumanVerification:true + structuralProxy alias on every result.score.ts:73-81

Empirical verification

4 fixtures: clean (5 teams, 2% cross-team → aligned), violated (100% cross + no owners → fragmented), adversarial (no deps → no coupling), ambiguous (single team → insufficient). Pass rate 4/4. Existing tests: 11.

Conclusion

Conformance: Conformant (proxy disclosed). Citations: 3/3. Fixtures: 4/4. Recommendations: Expose per-team coupling matrix for org-design dashboards.

8. Wardley Mapping

Simon Wardley, 2005–. CC BY-SA 4.0.

Concept

A strategic mapping technique: capabilities are plotted along an X axis of evolution (Genesis → Custom-Built → Product → Commodity) and a Y axis of value-chain visibility (anchored at the user). The canonical primer is Simon Wardley's online book on Medium, published under CC BY-SA 4.0, with companion learning material at learnwardleymapping.com.

Citations

Implementation audit

Source: src/wardley/score.ts (334 lines), constants.ts (72 lines).

DeltaSeverityDetailFile ref
disputed:true + 0.5 confidenceinfoEmpirical fix wardley-2: single regex match no longer reported at 0.82–0.97.score.ts:160-205, constants.ts:38-72
Criticality overrideinfoEmpirical fix wardley-1: "Custom Auth" with criticality=critical routed to custom_built.score.ts:174-193
fileCount-only weak signalinfoEmpirical fix wardley-3: 0.4 confidence + disputed.score.ts:270-280

Empirical verification

4 fixtures: clean (deprecated → commodity, corroborated), violated (Auth + critical → custom_built override), adversarial (single name match only → disputed, conf < 0.7), ambiguous (experimental ML PoC → genesis). Pass rate 4/4. Existing tests: 20.

Conclusion

Conformance: Conformant. Citations: 3/3. Fixtures: 4/4. Recommendations: Expand value-chain keyword map beyond e-commerce-shaped vocabulary.

9. The Twelve-Factor App

Adam Wiggins (Heroku), 2011. CC-BY 3.0.

Concept

Twelve operational principles for SaaS apps — codebase, dependencies, config, backing services, build/release/run, processes, port binding, concurrency, disposability, dev/prod parity, logs, admin processes. Published by Adam Wiggins (then at Heroku) at 12factor.net under CC-BY 3.0. Extended by Kevin Hoffman's "Beyond the Twelve-Factor App" (O'Reilly, 2016) with three additional factors.

Citations

Implementation audit

Source: src/twelve-factor/score.ts (56 lines).

DeltaSeverityDetailFile ref
Binary file-existence proxiesdriftEach factor's status is supplied by the caller — not a deep validation.methodology.ts:17-18
All-unknown floor (score 25)advisoryDiverges from InsufficientSignalResult pattern used elsewhere.score.ts:32-55

Empirical verification

4 fixtures: clean (all pass → 100, cloud-ready), violated (all fail → 0), adversarial (all warn → 50), ambiguous (all unknown → 25). Pass rate 4/4. Existing tests: 4.

Conclusion

Conformance: Conformant. Citations: 3/3. Fixtures: 4/4. Recommendations: Return InsufficientSignalResult when factors.length === 0.

10. Monorepo Intelligence

Industrial practice (Google, 2016 CACM) — Bazel / Turborepo / Nx / Lerna.

Concept

Cross-package intelligence for monorepos. The canonical academic reference is Potvin & Levenberg's 2016 Communications of the ACM article on Google's single-repository practice; modern open-source operationalisation lives in Bazel, Turborepo, Nx, and Lerna.

Citations

Implementation audit

Source: src/monorepo/score.ts (33 lines).

DeltaSeverityDetailFile ref
−10/violation slopedriftChosen for legibility; not empirically calibrated.methodology.ts:18-19
Empty input degenerateadvisoryCould return InsufficientSignalResult for symmetry.score.ts:23-25

Empirical verification

4 fixtures: clean (averageHealth 100), violated (two unhealthy), adversarial (25 deps → floor at 0), ambiguous (no capabilities → degenerate 0). Pass rate 4/4. Existing tests: 4.

Conclusion

Conformance: Conformant. Citations: 3/3. Fixtures: 4/4. Recommendations: Calibrate slope against measured-vs-perceived-health data; emit InsufficientSignalResult on empty capabilities.

11. DORA (predicted)

Forsgren / Humble / Kim, 2018. State of DevOps Report series, 2014–.

Concept

The DORA team's "four keys" — Deployment Frequency, Lead Time for Changes, Change Failure Rate, Mean Time to Restore Service — published in the annual State of DevOps Report and the book Accelerate. The subpath name dora-predicted is the disclaimer: the module predicts the LEVELS from architectural signals, not the outcomes from CI/CD logs.

Citations

Implementation audit

Source: src/dora-predicted/score.ts (90 lines).

DeltaSeverityDetailFile ref
Predicts driver, not outcomeadvisoryDisclosed in subpath name + honestGap.methodology.ts:22-24
Hand-coded thresholdsdriftCoherence/cycle/drift cuts mirror dashboard parity; not derived from labelled outcome data.score.ts:27-71

Empirical verification

4 fixtures: clean (elite), violated (low), adversarial (criticalDrifted shortcut → CFR=low), ambiguous (medium). Pass rate 4/4. Existing tests: 4.

Conclusion

Conformance: Conformant (clearly disclaimed). Citations: 3/3. Fixtures: 4/4. Recommendations: Add an analyzeDoraMeasured companion module when real CI/CD log integration exists.

12. Domain-Driven Design

Eric Evans, 2003. Vaughn Vernon, 2013.

Concept

Eric Evans' strategic design vocabulary: Bounded Contexts, Ubiquitous Language, Context Maps with relationship patterns. The original book defines nine context-map relationship patterns; Vaughn Vernon's 2013 follow-up operationalised the tactical side.

Citations

Implementation audit

Source: src/ddd/score.ts (208 lines).

DeltaSeverityDetailFile ref
6 of 9 relationship patternsadvisoryCustomer-Supplier, Published Language, Separate Ways not surfaced.score.ts:149-167
Keyword classificationdrift"Custom Auth" classifies as generic without criticality override.score.ts:90-103

Empirical verification

4 fixtures: clean (mix of subdomains classified correctly), violated (auth + logging both generic), adversarial (critical override → core_domain), ambiguous (no-name-signal → supporting default). Pass rate 4/4. Existing tests: 15.

Conclusion

Conformance: Partial (6/9 strategic patterns). Citations: 3/3. Fixtures: 4/4. Recommendations: Add explicit shared_kernel branch with a dedicated signal.

13. C4 Model

Simon Brown, 2011–. CC BY 4.0.

Concept

Simon Brown's hierarchical architecture-diagram framework — System Context (L1), Container (L2), Component (L3), Code (L4). UML-agnostic, designed for visual communication across audiences. Reference implementation: Structurizr DSL by the same author.

Citations

Implementation audit

Source: src/c4/score.ts (76 lines).

DeltaSeverityDetailFile ref
Code (L4) out of scopeadvisoryhasCode:false hard-coded. L4 is auto-generated by IDE tooling in practice.types.ts:38, score.ts:71

Empirical verification

4 fixtures: clean (full coverage → component is highest), violated (no model → none), adversarial (containerGroup classification → API Service), ambiguous (context only → highest=context). Pass rate 4/4. Existing tests: 11.

Conclusion

Conformance: Partial by design (3 of 4 levels). Citations: 3/3. Fixtures: 4/4. Recommendations: Document non-coverage of L4 in README.

14. Auto-detect (meta-detector)

prism-metrics internal heuristic catalog (mirrors prism0x2A dashboard).

Concept

A meta-detector that classifies frameworks and architectural style from package.json dependencies and directory layout. This is not a published spec — it is an internal catalog the package documents transparently. Confidence values are hand-calibrated.

Citations

Implementation audit

Source: src/auto-detect/score.ts (444 lines).

DeltaSeverityDetailFile ref
Hand-calibrated confidencedriftNext.js 0.97, React 0.95, etc. Not derived from a labelled corpus.score.ts:51-169
Fixed style precedenceinfohexagonal > clean > ddd > event_driven > microservices > layered_nestjs > layered_traditional > unknown.score.ts:346-429

Empirical verification

4 fixtures: clean (Next.js + Vitest detected), violated (ports+adapters → hexagonal style), adversarial (2-of-3 layer dirs → clean at lower confidence 0.6), ambiguous (empty project → unknown). Pass rate 4/4. Existing tests: 13.

Conclusion

Conformance: Conformant to internal spec. Citations: 3/3. Fixtures: 4/4. Recommendations: Back confidence values with a labelled fixture corpus.

Glossary

TermDefinition
CapabilityA coherent, named unit of behaviour in a system (e.g. "Payment Capture", "User Onboarding"). The atomic unit of analysis across all scorers.
DriftDocumented capabilities whose code state contradicts the documented intent.
Coherence score0–100 measure of cross-layer agreement between code and intent.
InsufficientSignalResultThe ok:false sentinel returned by scorers when the input has no usable signal. scoreToGrade() throws if called on one.
LOCKED_FORMULASource-code marker indicating a formula is part of the public methodology and editing it requires a paired methodology update.
Bounded ContextDDD term — a boundary inside which a domain model is internally consistent.
Structural proxyA code-only approximation of a property whose canonical form is organisational (e.g. Conway's Law).
BLUE / AMBER / GREENInternal pipeline phases of the prism0x2A dashboard for which prism-metrics is the public reference implementation. Out of scope for this handbook; mentioned only for context.
Disputed (Wardley)Flag set on a classification result when only one signal contributed — UI should render as a candidate, not a settled stage.

Reproducibility appendix

To regenerate this handbook from scratch:

cd prism-metrics
npm install
npm test               # all 286 tests should pass
# regenerate fixtures (4 per framework × 14 frameworks)
npx tsx scripts/regenerate-handbook.mjs   # equivalent to the audit script

The companion sidecar docs/handbook.evidence.json is the machine-readable mirror of this document. Its schema:

{
  meta: { generated_at, prism_metrics_version, agent_pass_id, ... },
  cross_cutting: { shared_infra, findings },
  frameworks: {
    [name]: {
      spec_summary, provenance, citations[],
      expected_outputs,
      implementation_audit: { implemented_in[], exports[], deltas[] },
      verification: { fixtures, results, false_pos_rate, false_neg_rate,
                       test_count, test_file, fixture_cases[] },
      conclusion: { conformance, citation_audit, empirical_pass_rate,
                    recommendations[] }
    }
  }
}

A CI job can diff a fresh sidecar against the committed one to detect regressions in citation status, delta count/severity, or empirical pass rate. To update a citation, change its URL in the sidecar JSON, re-fetch, and flip verification_status to verified or unverified accordingly.

Trust & Verification

Frameworks live or die by reproducibility. The numbers below come straight from npx vitest run --coverage on the code as published in 0.8.0 — the version pass-2 audited. To regenerate locally:

git clone https://github.com/dadenjo/prism-metrics
cd prism-metrics
npm install
npx vitest run --coverage

The summary lives at docs/coverage-summary.json (machine-readable) and the per-framework breakdown is below.

Per-framework coverage + test count

FrameworkTestsLinesBranchesTest fileAudit findings closed
iso-2501022100.0%84.1%src/iso-25010/__tests__/score.test.tsiso-1, iso-2, iso-3, iso-4 (+ pinned by boundary tests in 0.8.0)
solid3898.8%96.9%src/solid/__tests__/score.test.tssolid-1, solid-2, solid-5, solid-6, solid-lsp-ast
clean-arch15100.0%100.0%src/clean-arch/__tests__/score.test.tsca-1, ca-2
hexagonal17100.0%100.0%src/hexagonal/__tests__/score.test.tshex-1, hex-3
eip20100.0%99.1%src/eip/__tests__/score.test.tseip-1 through eip-6
eda22100.0%97.7%src/eda/__tests__/score.test.tseda-1, eda-2, eda-3, eda-4, eda-6 (closed in 0.8.0)
conways-law1196.5%93.3%src/conways-law/__tests__/score.test.tsconway-1, conway-2, conway-3, conway-4
wardley20100.0%97.4%src/wardley/__tests__/score.test.tswardley-1, wardley-2, wardley-3, wardley-4, wardley-5
twelve-factor1197.0%98.3%src/twelve-factor/__tests__/score.test.tstf-1, tf-2, tf-4
monorepo10100.0%100.0%src/monorepo/__tests__/score.test.tsmono-1, mono-2, mono-4, mono-5
dora-predicted1295.1%94.3%src/dora-predicted/__tests__/score.test.tsdora-1, dora-3, dora-5
ddd15100.0%100.0%src/ddd/__tests__/score.test.ts(model framework — no findings)
c41799.0%97.6%src/c4/__tests__/score.test.tsc4-1, c4-2
auto-detect1388.9%84.1%src/auto-detect/__tests__/score.test.tsauto-4 (closed in 0.8.0)
core (foundation)43100.0%95.8%3 test filesItem 0.1, Item 0.2
TOTAL96.4%88.1%17 test files · 286 tests54 of 59 findings closed

How to map a claim to its test

Every framework section above ends with an "Empirical verification" subsection that references the specific behaviour the tests pin down. To trace any claim back to code:

  1. Pick the framework section (e.g. "9. The Twelve-Factor App").
  2. Read the claim in "Implementation audit" — for example "empty factors:[] returns noData=true".
  3. Open the test file referenced in the Trust & Verification table — for tf, that's src/twelve-factor/__tests__/score.test.ts.
  4. Search for the claim's keyword (e.g. noData) to find the assertion. Test names start with it(…) and contain the audit-finding ID (e.g. tf-4) where applicable.

For example, the claim that ISO-25010 returns an explicit insufficient-signal result on empty input (iso-1) is verified by the test "returns insufficient on empty input" in src/iso-25010/__tests__/score.test.ts. Fixtures live alongside in __fixtures__/empty.input.json + empty.expected.json.

Audit finding lifecycle

Findings discovered by the multi-agent audit pipeline are tracked with a stable ID (iso-3, conway-1, …). Each closed finding flows through three commits:

  1. Identification — listed in the handbook's "Implementation audit" subsection per framework with severity (CRITICAL / HIGH / MEDIUM / LOW).
  2. Fix — a PR with title fix(framework): ID — short description. The PR adds the regression test that pins the new behaviour.
  3. Acknowledgement — for findings that can't be fully closed (e.g. magic constants whose empirical study is out of scope), the framework's methodology.ts honestGap field documents the limitation explicitly.

The 5 still-open LOW-severity items (Item 0.3 InputQuality cross-cutting · dora-2 magic thresholds · solid-3 + solid-4 ISP/LSP normalisation + skippedPaths API · mono-3 sublinear slope · ddd-1 + ddd-2 keyword false-positives) are all in category 3 — acknowledged in honestGap rather than silently shipped. solid-lsp-ast was closed on 2026-06-10 via a tiered-signal contract; that bumped the closed-finding tally from 51 to 52.

Reproducing the audit

The audit itself is reproducible. The prompt template lives at docs/audit-prompt.md — spawn one agent per framework, point it at src/<framework>/ plus the primary methodology source, and collect the findings. The diff against the published handbook is the next audit's input. Per-framework agent runs are independent and parallelisable; a full audit pass costs approximately $3-5 at Claude Sonnet 4.6 rates and ~1.5 h wall-clock with parallelisation.

Audit summary & release history

This handbook is the audit record against prism-metrics 0.8.0, completed 2026-06-10. The audit spawned one autonomous research agent per framework following docs/audit-prompt.md; each agent read the source, fetched primary sources, validated citations, and verified closure claims against named regression tests.

Headline numbers

MetricValueDetails
Findings tracked59Stable IDs (e.g. iso-3, conway-1, solid-lsp-ast) across 14 frameworks + cross-cutting items
Findings closed54 of 59Each closure has a named regression test (file path + line + test name) referenced in the per-framework "Implementation audit" subsection
Findings open5 (LOW)Acknowledged in each framework's honestGap; see "Still open" below
Tests286 passing17 test files; 96.4 % line coverage / 88.1 % branch coverage (npx vitest run --coverage)
Fixtures56 / 56 pass4 per framework: clean / violated / adversarial / ambiguous
Citations38 of 42 verified live4 sources returned HTTP 403 / TLS-expired to non-browser fetchers during the audit (alistair.cockburn.us, domainlanguage.com, learnwardleymapping.com, iso.org catalog). Content cross-confirmed via Wikipedia in every case

Two defects found + closed in 0.8.0

IDSeverityDescriptionFixRegression test
eda-6 latent emission bug event_carried_state_transfer was emitted from the hasStateCarryingEvent flag alone, without checking publisherFiles > 0. A caller passing {brokerFiles:1, cqrsFiles:1, hasStateCarryingEvent:true, publisherFiles:0} would surface the pattern even though no publisher exists to ship state-carrying events. Guard added at src/eda/score.ts:113-120: if (hasStateCarryingEvent && publisherFiles > 0). The flag is now correctly treated as a modifier on top of producer activity. src/eda/__tests__/score.test.ts — "does NOT emit when publisherFiles=0"; "DOES emit when publisherFiles>0 + flag"
auto-4 internal inconsistency Architecture-style precedence gate used confidence ≥ 0.7 but the detection catalogue emits Clean Architecture at 0.6 for 2-of-3 layer matches. A project with domain/ + application/ but no infrastructure/ got clean_architecture in the detections array AND architectureStyle.primary = "layered_traditional" — internally contradictory. Gate lowered to 0.6 at src/auto-detect/score.ts:347-358 to match the catalogue. Existing fixtures already cover this; behaviour now consistent.

Regression-test hardening

Two closures (iso-3 + iso-4) shipped in 0.6.0 without explicit boundary tests. Pinned in 0.8.0:

  • iso-3 — continuous performance density curve now has 4 dedicated boundary tests (density 9.9 vs 10.1, 19.9 vs 20.1, floor at 30, peak at 5)
  • iso-4 — churn cap at 20 now has 2 dedicated cap-binding tests (churn 25 vs 100 produces same score; high-churn perf ≥ densityScore - 20)
  • eda-6 — 2 publisher-required tests cover the new guard

Release history

VersionPRsAudit findings closedWhat changed in code
0.4.0#1, #2Item 0.1, Item 0.2, iso-1, iso-2core/InsufficientSignalResult + core/scanner-exclusions module; iso-25010 returns {ok:false} on empty input; security penalty curve softened from linear 15 × hits to 15 × log2(1+hits)
0.4.x#3, #4, #512 SOLID/CleanArch/Hex findings · 11 EIP/EDA findings · 9 Conway/Wardley findingsPer-principle DIP vacuous-truth guard; LSP/ISP normalised cliffs; clean-arch + hexagonal noData states; EIP/EDA exclusion contract + signal floors; Conway proxy flag + N/A for single-team; Wardley confidence + disputed flag
0.5.0#7, #8, #9, #10tf-1, tf-2, mono-1, mono-2, dora-1, dora-3Twelve-Factor 'n/a' status + honest 'unknown'; Monorepo noData + polyglot BuildSystem; DORA insufficient guard + predicted* field renames + predictionConfidence
0.6.0#11–#15c4-1, c4-2, iso-3, iso-4, tf-4, mono-4, mono-5, dora-5, solid-5C4 queue + client classifier collisions; ISO performance continuous curve + churn double-count fix; 9 boundary/regression tests
0.7.0#20, #21solid-lsp-astSOLID LSP tiered signal: optional confirmedLspViolations (AST-confirmed, confidence 0.85) with substring fallback (existing, 0.65). Non-breaking
0.8.0#23eda-6, auto-4; iso-3 + iso-4 regression tests retro-addedAudit completed against this version; two defects found and fixed in the same release; iso boundary tests pinned the 0.6.0 fixes

Still open (5 LOW findings, acknowledged in honestGap)

  • Item 0.3InputQuality cross-cutting field on every *Signals type. ~3 h work, touches all 14 scorers; needs an API-design decision before implementation.
  • dora-2 — Magic thresholds in DORA-predicted (coherence 80/60/40, drift 3/8) need an empirical study citation OR a sigmoid replacement with documented midpoint. Methodology, not bug.
  • mono-3 — Sublinear slope 100 - 10·√deps instead of 100 - 10·deps. Methodology change documented in honestGap as "not derived from empirical data".
  • ddd-1, ddd-2 — Classification keyword false-positives. Already acknowledged in DDD's honestGap.

When the next audit should run

This evidence will go stale if any of the following ship without a fresh audit:

  • A new methodology source publishes (e.g. ISO 25010:2023 adds 'Safety' as the 9th characteristic — that triggers a fresh audit of iso-25010)
  • One of the 5 open findings above is implemented (the new behaviour needs an empirical-fixture row in the conformance table)
  • A consumer reports a result that doesn't match their reading of the methodology — the 'Should I trust this?' ticket is itself a signal that the handbook needs an update
  • 12+ months elapsed since this audit even without any of the above (citation rot, primary-source URL drift)
  • The 4 citation-freshness items above (TLS expiry, HTTP 403) start to compound

None of those triggers fire as of 2026-06-10. When one does, run docs/audit-prompt.md end-to-end and produce a new audit document; the diff against this one is the input.