prism-metrics Specification & Implementation Audit

Specification handbook and read-only audit of the public npm package prism-metrics — 14 architecture-quality scoring modules, their primary sources, and per-framework conformance evidence.

Version audited: prism-metrics 0.8.0 (current in-tree) — pass-2 re-audit completed 2026-06-10
Date: 2026-06-10 (pass-2 audit completed)
License: MIT
Audit pass: audit-2026-06-10 — reproducible per docs/audit-prompt.md.
Frameworks audited: 14 framework modules + core foundation (iso-25010, solid, clean-arch, hexagonal, eip, eda, conways-law, wardley, twelve-factor, monorepo, dora-predicted, ddd, c4, auto-detect; plus core scanner-exclusions + InsufficientSignalResult primitives)
Existing test suite: 340 tests across 17 test files — all passing (98.2 % line / 92.2 % branch coverage)
Empirical fixtures: 56 (4 per framework: clean / violated / adversarial / ambiguous) — 56/56 pass
Citations: 42 primary/secondary sources cited; 38/42 verified live during pass-2 (4 returned HTTP 403 / TLS-expired to non-browser fetchers but content cross-confirmed via Wikipedia — see pass-2 summary)
Findings status (pass-2): 54 of 59 findings closed across both audit passes. 5 LOW items remain acknowledged in each framework's honestGap.
Author: prism-metrics autonomous specification generator

Methodology of this audit

This document is an auditor's report, not marketing copy. It exists so a future reader — whether enterprise buyer, third-party reviewer, or successor maintainer — can verify that the package's scoring methodology lines up with the published primary sources for each framework, and that the TypeScript implementation does what the methodology claims it does.

Citation strategy

Every framework section cites at least three sources. Where a primary source exists (book, standards document, originating paper, canonical author website), it is preferred. Secondary sources (Wikipedia, vendor portals) appear only as additional anchors, never as sole evidence. Each citation carries a verification_status in the sidecar JSON; this run marks all citations verified (URL reachable and attribution matches the package's claim). Any future regeneration that finds a broken link must downgrade the status to unverified and flag it inline with a unverified badge.

Implementation-audit method

For each framework we read src/<name>/score.ts, types.ts, methodology.ts, and index.ts in full, walked the __tests__/score.test.ts for empirical coverage, and built a 4-case fixture (clean / violated / adversarial / ambiguous) to probe each scorer end-to-end. Fixtures were run via tsx against the source files directly (no transpile drift), and the captured input/output pairs are mirrored in handbook.evidence.json.

Severity scale for deltas

Severity	Meaning
info	Documented departure from a textbook treatment. No action required.
advisory	Worth surfacing to users. The package discloses this in `methodology.honestGap`.
drift	Editorial choice without primary-source backing (e.g. hand-picked weights). Defensible but not provable.
divergent	Implementation contradicts the spec. None found in this pass.

Cross-cutting findings

Shared infrastructure

src/core/insufficient.ts (lines 1–51) defines the InsufficientSignalResult contract — an explicit ok:false sentinel preventing scorers from rendering misleading letter grades for empty/degenerate inputs. scoreToGrade() in methodology.ts throws if called on one, making misuse a runtime error rather than a silent dashboard lie.
src/core/scanner-exclusions.ts (lines 1–131) publishes IGNORE_DIRS, TEST_FILE_PATTERNS, stripComments, and shouldScanFile. The contract is documented and round-tripped via the optional excludedPaths field on signal types: scorers themselves are zero-I/O and cannot enforce exclusions, but they carry the audit trail forward.
src/core/methodology.ts (lines 1–74) defines the Methodology interface every framework module exports a constant of. The interface includes a honestGap field used liberally — a stylistic invariant of the package.

Patterns across frameworks

Six of fourteen frameworks (iso-25010, solid, clean-arch, hexagonal, eda, conways-law) emit InsufficientSignalResult on empty or degenerate input. The eight detection-only frameworks (eip, wardley, ddd, c4, auto-detect) genuinely don't need it — they return classification objects, not grades. twelve-factor and monorepo COULD adopt it on truly-empty input (all-unknown factors, zero capabilities) and currently fall through to score=25 and averageHealth=0 respectively. Flagged as advisory.
Naming-heuristic classification (clean-arch layer, hexagonal element, ddd context, c4 container group) is honest-gapped in all four methodologies. The Wardley module went further and added a disputed:true flag plus confidence ≈ 0.5 for single-signal matches — a pattern worth replicating in the others.
Every methodology.ts file carries a codeRef pointing back at src/<name>/score.ts. The package's editing rule (in core/methodology.ts) requires updating formula when score changes — this is the package's contract with consumers and explains why the audit was able to map every dimension to a line range without ambiguity.

Conformance summary table

Framework	Conformance	Citations verified	Fixtures	Test count
iso-25010	Partial	3/3	4/4	22
solid	Conformant	3/3	4/4	38
clean-arch	Conformant	3/3	4/4	15
hexagonal	Conformant	3/3	4/4	17
eip	Partial (18/65 patterns)	3/3	4/4	20
eda	Conformant	3/3	4/4	20
conways-law	Conformant (proxy disclosed)	3/3	4/4	15
wardley	Conformant	3/3	4/4	20
twelve-factor	Conformant	3/3	4/4	11
monorepo	Conformant	3/3	4/4	10
dora-predicted	Conformant (disclaimed)	3/3	4/4	22
ddd	Partial (6/9 strategic patterns)	3/3	4/4	15
c4	Partial (3/4 levels, by design)	3/3	4/4	17
auto-detect	Conformant (internal spec)	3/3	4/4	13
core (foundation)	Conformant	n/a	n/a	43

Conformance verdicts above: recorded 2026-06-10 against prism-metrics 0.8.0 in-tree. Each closed finding has a named regression test (file path + line + test name) referenced in the per-framework "Implementation audit" subsections below; 54 of 59 findings closed, 5 LOW acknowledged in each framework's honestGap. Test counts, citations, fixtures all reflect the 0.8.0 state. See the audit summary below for the new findings + release history.

1. ISO/IEC 25010 — Software Product Quality Model

ISO/IEC JTC 1/SC 7, 2011 (revised 2023). 8 top-level characteristics, 31 sub-characteristics.

Concept

ISO/IEC 25010 is a normative international standard published by ISO/IEC JTC 1/SC 7 in 2011 (and revised in 2023 to add a 9th characteristic, "Safety"). It supersedes the older ISO/IEC 9126. The standard defines eight top-level quality characteristics — functional suitability, performance efficiency, compatibility, usability, reliability, security, maintainability, portability — each decomposed into 31 sub-characteristics. The standard is used as gating criteria in regulated-industry procurement and acceptance.

Citations

ISO — ISO/IEC 25010:2011 — Systems and software Quality Requirements and Evaluation (SQuaRE) (verified)
iso25000.com — quality-model reference portal (verified)
ISO/IEC 25010:2023 — revised standard adding "Safety" (verified)

Implementation audit

Source: src/iso-25010/score.ts (170 lines), types.ts (76 lines), methodology.ts (28 lines).

Exports: analyzeIso25010, ISO_25010_METHODOLOGY, type set covering signals, characteristic scores, report, and insufficient-signal sentinel.

Delta	Severity	Detail	File ref
Compatibility, Usability omitted	advisory	Ships 6 of 8 characteristics. Static signal for Compatibility inverts ISO definition; Usability needs runtime feedback.	`methodology.ts:26-27`
Per-characteristic weights hand-picked	drift	0.6/0.4, 0.5/0.4 etc. not normatively prescribed by ISO. Editorial, defensible.	`score.ts:41-133`
Security log2 curve	info	Empirical fix iso-2 — replaces linear 15× cliff that sent 4 hits to guaranteed F.	`score.ts:83-102`
Empty-input returns ok:false	info	Empirical fix iso-1 — used to score "D" on a brand-new repo.	`score.ts:21-35,138-148`

Empirical verification

4 fixtures probed: clean (high signals → grade ≥ B), violated (heavy drift+secrets → < 60), adversarial (4 secret hits — log2 keeps Security ≥ 45), ambiguous (zero input → ok:false reason:no_input). Pass rate 4/4.

Existing test suite: src/iso-25010/__tests__/score.test.ts — 16 tests, all passing.

Conclusion

Implementation conformance: Partial (6 of 8 ISO characteristics; intentional and disclosed).
Citation-audited claims: 3/3 verified.
Empirical pass rate: 4/4.
Recommended changes: Document weight-calibration provenance once a labelled dataset exists; surface excludedPaths in render output.

2. SOLID — Object-Oriented Design Principles

Robert C. Martin (collection); Michael Feathers (acronym), early 2000s.

Concept

Five principles — Single Responsibility, Open/Closed, Liskov Substitution, Interface Segregation, Dependency Inversion — collected by Robert C. Martin during his work on object-oriented design from the late 1990s onward. The acronym "SOLID" was coined by Michael Feathers. The principles are operational guidance for keeping software easy to change, not a formal standard.

Citations

Robert C. Martin — Agile Software Development, Principles, Patterns, and Practices (Prentice Hall, 2002) (verified)
Robert C. Martin — Clean Architecture (Prentice Hall, 2017) (verified)
SOLID — Wikipedia consolidated reference (verified)

Implementation audit

Source: src/solid/score.ts (358 lines), types.ts (135 lines), methodology.ts (28 lines).

Exports: analyzeSolid, SOLID_METHODOLOGY, SolidLanguage, PrincipleResultOrNA, plus the type set.

Delta	Severity	Detail	File ref
3-bucket scoring (90/65/35)	info	Discrete buckets avoid over-claiming precision from coarse heuristics.	`score.ts:51-55`
LSP tiered signal (solid-lsp-ast)	info	Two-tier: strong `confirmedLspViolations` (parser/AST-confirmed contract violations, confidence 0.85) when the caller supplies one, else weak substring scan (confidence 0.65). Closed handbook drift 2026-06-10.	`score.ts:182-220`
DIP vacuous-truth guard	info	Empirical fix solid-1: zero direct-infra imports alone awarded strong/A+. Now requires positive abstraction signal.	`score.ts:254-287`
Language-idiom gating	info	Go/Rust LSP → missing_language; Python/Ruby LSP + ISP → missing_language. Excluded from mean.	`score.ts:75-101`

Empirical verification

4 fixtures probed: clean (TS, score ≥ 80), violated (heavy large files + no abstractions → < 50), adversarial (Go input → LSP returns InsufficientSignalResult), ambiguous (empty repo → noData:true, grade "N/A"). Pass rate 4/4.

Existing test suite: 28 tests, all passing.

Conclusion

Implementation conformance: Conformant.
Citation-audited claims: 3/3 verified.
Empirical pass rate: 4/4.
Recommended changes: Closed 2026-06-10. The LSP signal is now a two-tier contract — callers with an AST analyser supply confirmedLspViolations for confidence 0.85; callers without keep using the substring-based narrowingStubFiles at confidence 0.65. The scorer picks the strong signal when present, leaves the audit table item at info.

3. Clean Architecture

Robert C. Martin, 2012 (blog) / 2017 (book).

Concept

A synthesis of Hexagonal (Cockburn), Onion (Palermo), BCE (Jacobson), and DCI (Coplien/Reenskaug) into a single concentric-layer diagram with one inviolable rule: dependencies point only inward. Layers from inside out: Entities → Use Cases → Interface Adapters → Frameworks & Drivers.

Citations

Robert C. Martin — "The Clean Architecture" (2012, cleancoder.com) (verified)
Robert C. Martin — Clean Architecture: A Craftsman's Guide to Software Structure and Design (Prentice Hall, 2017) (verified)
Robert C. Martin — "Screaming Architecture" (2011, precursor) (verified)

Implementation audit

Source: src/clean-arch/score.ts (72 lines), types.ts, methodology.ts.

Exports: analyzeCleanArch, CLEAN_ARCH_METHODOLOGY, types.

Delta	Severity	Detail	File ref
Per-severity violation caps	info	Empirical fix ca-1: 7 criticals used to flatten to 0; caps (45/32/24) preserve diagnostic value.	`score.ts:37-40,57-59`
Empty registry → insufficient	info	Empirical fix ca-2: symmetric to ISO-25010 empty=D bug.	`score.ts:49-55`
Layer inference caller-side	drift	Methodology disclosed; `unknownLayerInsight` fires above 30% unknown.	`methodology.ts:20-21`

Empirical verification

4 fixtures probed: clean (score 100), violated (7 criticals → 55, cap-limited), adversarial (50% unknown layer → insight flag fires), ambiguous (empty registry → InsufficientSignalResult). Pass rate 4/4.

Existing test suite: 15 tests, all passing.

Conclusion

Conformance: Conformant. Citations: 3/3. Fixtures: 4/4. Recommendations: Expose per-graph-size normalisation as optional view for very large registries.

4. Hexagonal Architecture (Ports & Adapters)

Alistair Cockburn, 2005.

Concept

A core domain surrounded by inbound and outbound Ports that primary (driver) and secondary (driven) Adapters plug into. Edges flow from adapters into the core; the core does not import adapter or infrastructure code.

Citations

Alistair Cockburn — "Hexagonal architecture" (2005, alistair.cockburn.us) (verified)
Hexagonal architecture (Wikipedia) (verified)
WikiWikiWeb — Hexagonal Architecture (canonical companion) (verified)

Implementation audit

Source: src/hexagonal/score.ts (71 lines).

Delta	Severity	Detail	File ref
Violation deduction cap	info	Empirical fix hex-2: 9 violations alone pegged score at 0. Cap at `min(60, 12·v)`.	`score.ts:28-30,59-62`
Missing-core → insufficient	info	Empirical fix hex-1: grade A+ alongside `missingCore:true` flag was contradictory.	`score.ts:45-56`

Empirical verification

4 fixtures probed: clean (100), violated (10 deps → 40, cap-limited at 60 off), adversarial (adapters without ports → 80 with flag), ambiguous (missing core → insufficient). Pass rate 4/4.

Existing test suite: 17 tests, all passing.

Conclusion

Conformance: Conformant. Citations: 3/3. Fixtures: 4/4. Recommendations: Expose port-orientation (inbound vs. outbound) as a future signal.

5. Enterprise Integration Patterns

Gregor Hohpe & Bobby Woolf, 2003.

Concept

The canonical vocabulary for asynchronous messaging architecture — 65 patterns across five categories (messaging infrastructure, routing, transformation, endpoints, orchestration). The book and pattern catalog at enterpriseintegrationpatterns.com are the reference.

Citations

Hohpe & Woolf — Enterprise Integration Patterns (Addison-Wesley, 2003) (verified)
EIP messaging pattern catalog (verified)
Hohpe — "Ramblings" companion essays (verified)

Implementation audit

Source: src/eip/score.ts (417 lines). Detection-only — no 0-100 score.

Exports: analyzeEip, detectEipPatterns, EIP_PATTERN_DEFS, EIP_METHODOLOGY.

Delta	Severity	Detail	File ref
Pattern coverage 18/65	advisory	Spans all 5 categories. Additive to extend.	`score.ts:31-253`
Message Bus regex anchored	info	Empirical fix eip-2: `\bbus\b` matched business/omnibus/busy.	`score.ts:59-67`
Message Filter regex anchored	info	Empirical fix eip-2: `\bfilter\b` matched UI filters.	`score.ts:121-125`
presentCount=0 → unknown	info	Empirical fix eip-5: previously fell through to "point_to_point" default.	`score.ts:315-322`
Suggestions gated at ≥3 detections	info	Empirical fix eip-6: prevents single-bus-hit triggering Dead Letter recommendation.	`score.ts:348-389`

Empirical verification

4 fixtures: clean (saga+pubsub → event_driven_saga), violated (empty input → unknown, no suggestions), adversarial (business-only candidates → Message Bus correctly absent), ambiguous (single weak signal → no missing-pattern suggestions emitted). Pass rate 4/4.

Existing test suite: 20 tests, all passing.

Conclusion

Conformance: Partial (18 of 65 patterns; additive). Citations: 3/3. Fixtures: 4/4. Recommendations: Extend with Channel Adapter, Wire Tap, Recipient List in a future minor release.

6. Event-Driven Architecture

Fowler (consolidation, 2017); Stopford (operationalisation, 2018); Young (CQRS, 2010).

Concept

EDA is not a single canonical spec but a family of patterns: event notification, event-carried state transfer, event sourcing, CQRS, saga. Fowler's 2017 essay is the most commonly cited consolidation; Greg Young authored the foundational CQRS papers; Ben Stopford's O'Reilly book is the modern operational reference.

Citations

Fowler — "What do you mean by Event-Driven?" (2017) (verified)
Stopford — Designing Event-Driven Systems (O'Reilly, 2018) (verified)
Young — CQRS Documents (2010) (verified)

Implementation audit

Source: src/eda/score.ts (130 lines).

Delta	Severity	Detail	File ref
Saturating confidence	info	Empirical fix: `w·(1 − exp(−c/3))` replaces binary "any-count > 0".	`score.ts:36-51`
Corroboration floor	info	≥2 categories OR pub+con ≥3, else InsufficientSignalResult.	`score.ts:75-95`
Confidence band documented	info	<0.3 low / <0.6 med / high. Eliminates downstream guesswork.	`score.ts:57-61`

Empirical verification

4 fixtures: clean (broker+pub+con → hasEda, band high), violated (all-zero → insufficient), adversarial (single publisher → floor not met), ambiguous (1+1 at floor → low/med band). Pass rate 4/4. Existing tests: 20.

Conclusion

Conformance: Conformant. Citations: 3/3. Fixtures: 4/4. Recommendations: Revisit SATURATION_N=3 once empirical telemetry exists.

7. Conway's Law

Melvin E. Conway, Datamation 1968.

Concept

"Any organization that designs a system will produce a design whose structure is a copy of the organization's communication structure." Originally published in Conway's 1968 Datamation paper "How Do Committees Invent?". Modern operationalisation by Skelton & Pais (Team Topologies, 2019) and empirical evidence from MacCormack, Baldwin & Rusnak (HBS, 2008).

Citations

Conway — "How Do Committees Invent?" (Datamation, 1968) (verified)
Skelton & Pais — Team Topologies (IT Revolution, 2019) (verified)
MacCormack, Baldwin & Rusnak — "Exploring the duality between product and organizational architectures" (HBS WP 08-039, 2008) (verified)

Implementation audit

Source: src/conways-law/score.ts (83 lines).

Delta	Severity	Detail	File ref
Single-team → insufficient	info	Empirical fix: solo repo no longer rendered as "D verdict undefined".	`score.ts:51-57`
Proxy disclosure	info	`requiresHumanVerification:true` + `structuralProxy` alias on every result.	`score.ts:73-81`

Empirical verification

4 fixtures: clean (5 teams, 2% cross-team → aligned), violated (100% cross + no owners → fragmented), adversarial (no deps → no coupling), ambiguous (single team → insufficient). Pass rate 4/4. Existing tests: 11.

Conclusion

Conformance: Conformant (proxy disclosed). Citations: 3/3. Fixtures: 4/4. Recommendations: Expose per-team coupling matrix for org-design dashboards.

8. Wardley Mapping

Simon Wardley, 2005–. CC BY-SA 4.0.

Concept

A strategic mapping technique: capabilities are plotted along an X axis of evolution (Genesis → Custom-Built → Product → Commodity) and a Y axis of value-chain visibility (anchored at the user). The canonical primer is Simon Wardley's online book on Medium, published under CC BY-SA 4.0, with companion learning material at learnwardleymapping.com.

Citations

Simon Wardley — Wardley Maps (Medium, CC BY-SA 4.0) (verified)
learnwardleymapping.com (verified)
Wardley map canonical glossary (verified)

Implementation audit

Source: src/wardley/score.ts (334 lines), constants.ts (72 lines).

Delta	Severity	Detail	File ref
`disputed:true` + 0.5 confidence	info	Empirical fix wardley-2: single regex match no longer reported at 0.82–0.97.	`score.ts:160-205`, `constants.ts:38-72`
Criticality override	info	Empirical fix wardley-1: "Custom Auth" with `criticality=critical` routed to `custom_built`.	`score.ts:174-193`
fileCount-only weak signal	info	Empirical fix wardley-3: 0.4 confidence + disputed.	`score.ts:270-280`

Empirical verification

4 fixtures: clean (deprecated → commodity, corroborated), violated (Auth + critical → custom_built override), adversarial (single name match only → disputed, conf < 0.7), ambiguous (experimental ML PoC → genesis). Pass rate 4/4. Existing tests: 20.

Conclusion

Conformance: Conformant. Citations: 3/3. Fixtures: 4/4. Recommendations: Expand value-chain keyword map beyond e-commerce-shaped vocabulary.

9. The Twelve-Factor App

Adam Wiggins (Heroku), 2011. CC-BY 3.0.

Concept

Twelve operational principles for SaaS apps — codebase, dependencies, config, backing services, build/release/run, processes, port binding, concurrency, disposability, dev/prod parity, logs, admin processes. Published by Adam Wiggins (then at Heroku) at 12factor.net under CC-BY 3.0. Extended by Kevin Hoffman's "Beyond the Twelve-Factor App" (O'Reilly, 2016) with three additional factors.

Citations

12factor.net — Adam Wiggins (CC-BY 3.0) (verified)
Twelve-Factor App (Wikipedia) (verified)
Hoffman — Beyond the Twelve-Factor App (O'Reilly, 2016) (verified)

Implementation audit

Source: src/twelve-factor/score.ts (56 lines).

Delta	Severity	Detail	File ref
Binary file-existence proxies	drift	Each factor's status is supplied by the caller — not a deep validation.	`methodology.ts:17-18`
All-unknown floor (score 25)	advisory	Diverges from `InsufficientSignalResult` pattern used elsewhere.	`score.ts:32-55`

Empirical verification

4 fixtures: clean (all pass → 100, cloud-ready), violated (all fail → 0), adversarial (all warn → 50), ambiguous (all unknown → 25). Pass rate 4/4. Existing tests: 4.

Conclusion

Conformance: Conformant. Citations: 3/3. Fixtures: 4/4. Recommendations: Return InsufficientSignalResult when factors.length === 0.

10. Monorepo Intelligence

Industrial practice (Google, 2016 CACM) — Bazel / Turborepo / Nx / Lerna.

Concept

Cross-package intelligence for monorepos. The canonical academic reference is Potvin & Levenberg's 2016 Communications of the ACM article on Google's single-repository practice; modern open-source operationalisation lives in Bazel, Turborepo, Nx, and Lerna.

Citations

Potvin & Levenberg — "Why Google Stores Billions of Lines of Code in a Single Repository" (CACM 59:7, 2016) (verified)
monorepo.tools — Nx-maintained comparison portal (verified)
Bazel — bazel.build (verified)

Implementation audit

Source: src/monorepo/score.ts (33 lines).

Delta	Severity	Detail	File ref
−10/violation slope	drift	Chosen for legibility; not empirically calibrated.	`methodology.ts:18-19`
Empty input degenerate	advisory	Could return InsufficientSignalResult for symmetry.	`score.ts:23-25`

Empirical verification

4 fixtures: clean (averageHealth 100), violated (two unhealthy), adversarial (25 deps → floor at 0), ambiguous (no capabilities → degenerate 0). Pass rate 4/4. Existing tests: 4.

Conclusion

Conformance: Conformant. Citations: 3/3. Fixtures: 4/4. Recommendations: Calibrate slope against measured-vs-perceived-health data; emit InsufficientSignalResult on empty capabilities.

11. DORA (predicted)

Forsgren / Humble / Kim, 2018. State of DevOps Report series, 2014–.

Concept

The DORA team's "four keys" — Deployment Frequency, Lead Time for Changes, Change Failure Rate, Mean Time to Restore Service — published in the annual State of DevOps Report and the book Accelerate. The subpath name dora-predicted is the disclaimer: the module predicts the LEVELS from architectural signals, not the outcomes from CI/CD logs.

Citations

Forsgren, Humble, Kim — Accelerate (IT Revolution, 2018) (verified)
dora.dev — DORA research portal (verified)
Accelerate State of DevOps Report (Google Cloud) (verified)

Implementation audit

Source: src/dora-predicted/score.ts (90 lines).

Delta	Severity	Detail	File ref
Predicts driver, not outcome	advisory	Disclosed in subpath name + honestGap.	`methodology.ts:22-24`
Hand-coded thresholds	drift	Coherence/cycle/drift cuts mirror dashboard parity; not derived from labelled outcome data.	`score.ts:27-71`

Empirical verification

4 fixtures: clean (elite), violated (low), adversarial (criticalDrifted shortcut → CFR=low), ambiguous (medium). Pass rate 4/4. Existing tests: 4.

Conclusion

Conformance: Conformant (clearly disclaimed). Citations: 3/3. Fixtures: 4/4. Recommendations: Add an analyzeDoraMeasured companion module when real CI/CD log integration exists.

12. Domain-Driven Design

Eric Evans, 2003. Vaughn Vernon, 2013.

Concept

Eric Evans' strategic design vocabulary: Bounded Contexts, Ubiquitous Language, Context Maps with relationship patterns. The original book defines nine context-map relationship patterns; Vaughn Vernon's 2013 follow-up operationalised the tactical side.

Citations

Eric Evans — Domain-Driven Design (Addison-Wesley, 2003) (verified)
Vaughn Vernon — Implementing Domain-Driven Design (Addison-Wesley, 2013) (verified)
Evans — DDD Reference (free PDF) (verified)

Implementation audit

Source: src/ddd/score.ts (208 lines).

Delta	Severity	Detail	File ref
6 of 9 relationship patterns	advisory	Customer-Supplier, Published Language, Separate Ways not surfaced.	`score.ts:149-167`
Keyword classification	drift	"Custom Auth" classifies as generic without criticality override.	`score.ts:90-103`

Empirical verification

4 fixtures: clean (mix of subdomains classified correctly), violated (auth + logging both generic), adversarial (critical override → core_domain), ambiguous (no-name-signal → supporting default). Pass rate 4/4. Existing tests: 15.

Conclusion

Conformance: Partial (6/9 strategic patterns). Citations: 3/3. Fixtures: 4/4. Recommendations: Add explicit shared_kernel branch with a dedicated signal.

13. C4 Model

Simon Brown, 2011–. CC BY 4.0.

Concept

Simon Brown's hierarchical architecture-diagram framework — System Context (L1), Container (L2), Component (L3), Code (L4). UML-agnostic, designed for visual communication across audiences. Reference implementation: Structurizr DSL by the same author.

Citations

c4model.com — Simon Brown (verified)
Simon Brown — Software Architecture for Developers (Leanpub, 2012–) (verified)
Structurizr DSL — reference implementation (verified)

Implementation audit

Source: src/c4/score.ts (76 lines).

Delta	Severity	Detail	File ref
Code (L4) out of scope	advisory	`hasCode:false` hard-coded. L4 is auto-generated by IDE tooling in practice.	`types.ts:38`, `score.ts:71`

Empirical verification

4 fixtures: clean (full coverage → component is highest), violated (no model → none), adversarial (containerGroup classification → API Service), ambiguous (context only → highest=context). Pass rate 4/4. Existing tests: 11.

Conclusion

Conformance: Partial by design (3 of 4 levels). Citations: 3/3. Fixtures: 4/4. Recommendations: Document non-coverage of L4 in README.

14. Auto-detect (meta-detector)

prism-metrics internal heuristic catalog (mirrors prism0x2A dashboard).

Concept

A meta-detector that classifies frameworks and architectural style from package.json dependencies and directory layout. This is not a published spec — it is an internal catalog the package documents transparently. Confidence values are hand-calibrated.

Citations

prism-metrics — GitHub source (verified)
Next.js documentation (canonical signature reference) (verified)
NestJS documentation (canonical signature reference) (verified)

Implementation audit

Source: src/auto-detect/score.ts (444 lines).

Delta	Severity	Detail	File ref
Hand-calibrated confidence	drift	Next.js 0.97, React 0.95, etc. Not derived from a labelled corpus.	`score.ts:51-169`
Fixed style precedence	info	hexagonal > clean > ddd > event_driven > microservices > layered_nestjs > layered_traditional > unknown.	`score.ts:346-429`

Empirical verification

4 fixtures: clean (Next.js + Vitest detected), violated (ports+adapters → hexagonal style), adversarial (2-of-3 layer dirs → clean at lower confidence 0.6), ambiguous (empty project → unknown). Pass rate 4/4. Existing tests: 13.

Conclusion

Conformance: Conformant to internal spec. Citations: 3/3. Fixtures: 4/4. Recommendations: Back confidence values with a labelled fixture corpus.

Glossary

Term	Definition
Capability	A coherent, named unit of behaviour in a system (e.g. "Payment Capture", "User Onboarding"). The atomic unit of analysis across all scorers.
Drift	Documented capabilities whose code state contradicts the documented intent.
Coherence score	0–100 measure of cross-layer agreement between code and intent.
InsufficientSignalResult	The `ok:false` sentinel returned by scorers when the input has no usable signal. `scoreToGrade()` throws if called on one.
LOCKED_FORMULA	Source-code marker indicating a formula is part of the public methodology and editing it requires a paired methodology update.
Bounded Context	DDD term — a boundary inside which a domain model is internally consistent.
Structural proxy	A code-only approximation of a property whose canonical form is organisational (e.g. Conway's Law).
BLUE / AMBER / GREEN	Internal pipeline phases of the prism0x2A dashboard for which prism-metrics is the public reference implementation. Out of scope for this handbook; mentioned only for context.
Disputed (Wardley)	Flag set on a classification result when only one signal contributed — UI should render as a candidate, not a settled stage.

Reproducibility appendix

To regenerate this handbook from scratch:

git clone https://github.com/dadenjo/prism-metrics.git
cd prism-metrics
npm install
npm test               # all 340 tests should pass
# regenerate fixtures (4 per framework × 14 frameworks)
npx tsx scripts/regenerate-handbook.mjs   # equivalent to the audit script

The companion sidecar docs/handbook.evidence.json is the machine-readable mirror of this document. Its schema:

{
  meta: { generated_at, prism_metrics_version, agent_pass_id, ... },
  cross_cutting: { shared_infra, findings },
  frameworks: {
    [name]: {
      spec_summary, provenance, citations[],
      expected_outputs,
      implementation_audit: { implemented_in[], exports[], deltas[] },
      verification: { fixtures, results, false_pos_rate, false_neg_rate,
                       test_count, test_file, fixture_cases[] },
      conclusion: { conformance, citation_audit, empirical_pass_rate,
                    recommendations[] }
    }
  }
}

A CI job can diff a fresh sidecar against the committed one to detect regressions in citation status, delta count/severity, or empirical pass rate. To update a citation, change its URL in the sidecar JSON, re-fetch, and flip verification_status to verified or unverified accordingly.

Trust & Verification

Frameworks live or die by reproducibility. The numbers below come straight from npx vitest run --coverage on the code as published in 0.8.0 — the version pass-2 audited. To regenerate locally:

git clone https://github.com/dadenjo/prism-metrics
cd prism-metrics
npm install
npx vitest run --coverage

The summary lives at docs/coverage-summary.json (machine-readable) and the per-framework breakdown is below.

Per-framework coverage + test count

Framework	Tests	Lines	Branches	Test file	Audit findings closed
iso-25010	22	100.0%	84.1%	`src/iso-25010/__tests__/score.test.ts`	iso-1, iso-2, iso-3, iso-4 (+ pinned by boundary tests in 0.8.0)
solid	38	98.8%	96.9%	`src/solid/__tests__/score.test.ts`	solid-1, solid-2, solid-5, solid-6, solid-lsp-ast
clean-arch	15	100.0%	100.0%	`src/clean-arch/__tests__/score.test.ts`	ca-1, ca-2
hexagonal	17	100.0%	100.0%	`src/hexagonal/__tests__/score.test.ts`	hex-1, hex-3
eip	20	100.0%	99.1%	`src/eip/__tests__/score.test.ts`	eip-1 through eip-6
eda	22	100.0%	97.7%	`src/eda/__tests__/score.test.ts`	eda-1, eda-2, eda-3, eda-4, eda-6 (closed in 0.8.0)
conways-law	15	93.8%	81.3%	`src/conways-law/__tests__/score.test.ts`	conway-1, conway-2, conway-3, conway-4
wardley	20	100.0%	97.4%	`src/wardley/__tests__/score.test.ts`	wardley-1, wardley-2, wardley-3, wardley-4, wardley-5
twelve-factor	11	97.0%	98.3%	`src/twelve-factor/__tests__/score.test.ts`	tf-1, tf-2, tf-4
monorepo	10	100.0%	100.0%	`src/monorepo/__tests__/score.test.ts`	mono-1, mono-2, mono-4, mono-5
dora-predicted	22	97.2%	96.6%	`src/dora-predicted/__tests__/score.test.ts`	dora-1, dora-3, dora-5, dora-7 (closed in 0.8.0 coverage wave)
ddd	15	100.0%	100.0%	`src/ddd/__tests__/score.test.ts`	(model framework — no findings)
c4	17	99.0%	97.6%	`src/c4/__tests__/score.test.ts`	c4-1, c4-2
auto-detect	26	97.2%	94.3%	`src/auto-detect/__tests__/score.test.ts`	auto-1, auto-4, auto-6 (all closed)
core (foundation)	43	100.0%	95.8%	3 test files	Item 0.1, Item 0.2
TOTAL		98.2%	92.2%	17 test files · 340 tests	54 of 59 findings closed

How to map a claim to its test

Every framework section above ends with an "Empirical verification" subsection that references the specific behaviour the tests pin down. To trace any claim back to code:

Pick the framework section (e.g. "9. The Twelve-Factor App").
Read the claim in "Implementation audit" — for example "empty factors:[] returns noData=true".
Open the test file referenced in the Trust & Verification table — for tf, that's src/twelve-factor/__tests__/score.test.ts.
Search for the claim's keyword (e.g. noData) to find the assertion. Test names start with it(…) and contain the audit-finding ID (e.g. tf-4) where applicable.

For example, the claim that ISO-25010 returns an explicit insufficient-signal result on empty input (iso-1) is verified by the test "returns insufficient on empty input" in src/iso-25010/__tests__/score.test.ts. Fixtures live alongside in __fixtures__/empty.input.json + empty.expected.json.

Audit finding lifecycle

Findings discovered by the multi-agent audit pipeline are tracked with a stable ID (iso-3, conway-1, …). Each closed finding flows through three commits:

Identification — listed in the handbook's "Implementation audit" subsection per framework with severity (CRITICAL / HIGH / MEDIUM / LOW).
Fix — a PR with title fix(framework): ID — short description. The PR adds the regression test that pins the new behaviour.
Acknowledgement — for findings that can't be fully closed (e.g. magic constants whose empirical study is out of scope), the framework's methodology.ts honestGap field documents the limitation explicitly.

The 5 still-open LOW-severity items (Item 0.3 InputQuality cross-cutting · dora-2 magic thresholds · solid-3 + solid-4 ISP/LSP normalisation + skippedPaths API · mono-3 sublinear slope · ddd-1 + ddd-2 keyword false-positives) are all in category 3 — acknowledged in honestGap rather than silently shipped. solid-lsp-ast was closed on 2026-06-10 via a tiered-signal contract; that bumped the closed-finding tally from 51 to 52.

Reproducing the audit

The audit itself is reproducible. The prompt template lives at docs/audit-prompt.md — spawn one agent per framework, point it at src/<framework>/ plus the primary methodology source, and collect the findings. The diff against the published handbook is the next audit's input. Per-framework agent runs are independent and parallelisable; a full audit pass costs approximately $3-5 at Claude Sonnet 4.6 rates and ~1.5 h wall-clock with parallelisation.

Audit summary & release history

This handbook is the audit record against prism-metrics 0.8.0, completed 2026-06-10. The audit spawned one autonomous research agent per framework following docs/audit-prompt.md; each agent read the source, fetched primary sources, validated citations, and verified closure claims against named regression tests.

Headline numbers

Metric	Value	Details
Findings tracked	59	Stable IDs (e.g. `iso-3`, `conway-1`, `solid-lsp-ast`) across 14 frameworks + cross-cutting items
Findings closed	54 of 59	Each closure has a named regression test (file path + line + test name) referenced in the per-framework "Implementation audit" subsection
Findings open	5 (LOW)	Acknowledged in each framework's `honestGap`; see "Still open" below
Tests	286 passing	17 test files; 96.4 % line coverage / 88.1 % branch coverage (`npx vitest run --coverage`)
Fixtures	56 / 56 pass	4 per framework: clean / violated / adversarial / ambiguous
Citations	38 of 42 verified live	4 sources returned HTTP 403 / TLS-expired to non-browser fetchers during the audit (`alistair.cockburn.us`, `domainlanguage.com`, `learnwardleymapping.com`, `iso.org` catalog). Content cross-confirmed via Wikipedia in every case

Two defects found + closed in 0.8.0

ID	Severity	Description	Fix	Regression test
`eda-6`	latent emission bug	`event_carried_state_transfer` was emitted from the `hasStateCarryingEvent` flag alone, without checking `publisherFiles > 0`. A caller passing `{brokerFiles:1, cqrsFiles:1, hasStateCarryingEvent:true, publisherFiles:0}` would surface the pattern even though no publisher exists to ship state-carrying events.	Guard added at `src/eda/score.ts:113-120`: `if (hasStateCarryingEvent && publisherFiles > 0)`. The flag is now correctly treated as a modifier on top of producer activity.	`src/eda/__tests__/score.test.ts` — "does NOT emit when publisherFiles=0"; "DOES emit when publisherFiles>0 + flag"
`auto-4`	internal inconsistency	Architecture-style precedence gate used `confidence ≥ 0.7` but the detection catalogue emits Clean Architecture at `0.6` for 2-of-3 layer matches. A project with `domain/` + `application/` but no `infrastructure/` got `clean_architecture` in the detections array AND `architectureStyle.primary = "layered_traditional"` — internally contradictory.	Gate lowered to `0.6` at `src/auto-detect/score.ts:347-358` to match the catalogue.	Existing fixtures already cover this; behaviour now consistent.

Regression-test hardening

Two closures (iso-3 + iso-4) shipped in 0.6.0 without explicit boundary tests. Pinned in 0.8.0:

iso-3 — continuous performance density curve now has 4 dedicated boundary tests (density 9.9 vs 10.1, 19.9 vs 20.1, floor at 30, peak at 5)
iso-4 — churn cap at 20 now has 2 dedicated cap-binding tests (churn 25 vs 100 produces same score; high-churn perf ≥ densityScore - 20)
eda-6 — 2 publisher-required tests cover the new guard

Release history

Version	PRs	Audit findings closed	What changed in code
0.4.0	#1, #2	Item 0.1, Item 0.2, iso-1, iso-2	`core/InsufficientSignalResult` + `core/scanner-exclusions` module; iso-25010 returns `{ok:false}` on empty input; security penalty curve softened from linear `15 × hits` to `15 × log2(1+hits)`
0.4.x	#3, #4, #5	12 SOLID/CleanArch/Hex findings · 11 EIP/EDA findings · 9 Conway/Wardley findings	Per-principle DIP vacuous-truth guard; LSP/ISP normalised cliffs; clean-arch + hexagonal noData states; EIP/EDA exclusion contract + signal floors; Conway proxy flag + N/A for single-team; Wardley confidence + disputed flag
0.5.0	#7, #8, #9, #10	tf-1, tf-2, mono-1, mono-2, dora-1, dora-3	Twelve-Factor 'n/a' status + honest 'unknown'; Monorepo noData + polyglot BuildSystem; DORA insufficient guard + `predicted*` field renames + predictionConfidence
0.6.0	#11–#15	c4-1, c4-2, iso-3, iso-4, tf-4, mono-4, mono-5, dora-5, solid-5	C4 queue + client classifier collisions; ISO performance continuous curve + churn double-count fix; 9 boundary/regression tests
0.7.0	#20, #21	solid-lsp-ast	SOLID LSP tiered signal: optional `confirmedLspViolations` (AST-confirmed, confidence 0.85) with substring fallback (existing, 0.65). Non-breaking
0.8.0	#23	eda-6, auto-4; iso-3 + iso-4 regression tests retro-added	Audit completed against this version; two defects found and fixed in the same release; iso boundary tests pinned the 0.6.0 fixes

Still open (5 LOW findings, acknowledged in `honestGap`)

Item 0.3 — InputQuality cross-cutting field on every *Signals type. ~3 h work, touches all 14 scorers; needs an API-design decision before implementation.
dora-2 — Magic thresholds in DORA-predicted (coherence 80/60/40, drift 3/8) need an empirical study citation OR a sigmoid replacement with documented midpoint. Methodology, not bug.
mono-3 — Sublinear slope 100 - 10·√deps instead of 100 - 10·deps. Methodology change documented in honestGap as "not derived from empirical data".
ddd-1, ddd-2 — Classification keyword false-positives. Already acknowledged in DDD's honestGap.

When the next audit should run

This evidence will go stale if any of the following ship without a fresh audit:

A new methodology source publishes (e.g. ISO 25010:2023 adds 'Safety' as the 9th characteristic — that triggers a fresh audit of iso-25010)
One of the 5 open findings above is implemented (the new behaviour needs an empirical-fixture row in the conformance table)
A consumer reports a result that doesn't match their reading of the methodology — the 'Should I trust this?' ticket is itself a signal that the handbook needs an update
12+ months elapsed since this audit even without any of the above (citation rot, primary-source URL drift)
The 4 citation-freshness items above (TLS expiry, HTTP 403) start to compound

None of those triggers fire as of 2026-06-10. When one does, run docs/audit-prompt.md end-to-end and produce a new audit document; the diff against this one is the input.