prism-metrics Specification & Implementation Audit
Specification handbook and read-only audit of the public npm package prism-metrics — 14 architecture-quality scoring modules, their primary sources, and per-framework conformance evidence.
Methodology of this audit
This document is an auditor's report, not marketing copy. It exists so a future reader — whether enterprise buyer, third-party reviewer, or successor maintainer — can verify that the package's scoring methodology lines up with the published primary sources for each framework, and that the TypeScript implementation does what the methodology claims it does.
Citation strategy
Every framework section cites at least three sources. Where a primary source exists (book, standards document, originating paper, canonical author website), it is preferred. Secondary sources (Wikipedia, vendor portals) appear only as additional anchors, never as sole evidence. Each citation carries a verification_status in the sidecar JSON; this run marks all citations verified (URL reachable and attribution matches the package's claim). Any future regeneration that finds a broken link must downgrade the status to unverified and flag it inline with a unverified badge.
Implementation-audit method
For each framework we read src/<name>/score.ts, types.ts, methodology.ts, and index.ts in full, walked the __tests__/score.test.ts for empirical coverage, and built a 4-case fixture (clean / violated / adversarial / ambiguous) to probe each scorer end-to-end. Fixtures were run via tsx against the source files directly (no transpile drift), and the captured input/output pairs are mirrored in handbook.evidence.json.
Severity scale for deltas
| Severity | Meaning |
|---|---|
| info | Documented departure from a textbook treatment. No action required. |
| advisory | Worth surfacing to users. The package discloses this in methodology.honestGap. |
| drift | Editorial choice without primary-source backing (e.g. hand-picked weights). Defensible but not provable. |
| divergent | Implementation contradicts the spec. None found in this pass. |
Cross-cutting findings
Shared infrastructure
src/core/insufficient.ts(lines 1–51) defines theInsufficientSignalResultcontract — an explicitok:falsesentinel preventing scorers from rendering misleading letter grades for empty/degenerate inputs.scoreToGrade()inmethodology.tsthrows if called on one, making misuse a runtime error rather than a silent dashboard lie.src/core/scanner-exclusions.ts(lines 1–131) publishesIGNORE_DIRS,TEST_FILE_PATTERNS,stripComments, andshouldScanFile. The contract is documented and round-tripped via the optionalexcludedPathsfield on signal types: scorers themselves are zero-I/O and cannot enforce exclusions, but they carry the audit trail forward.src/core/methodology.ts(lines 1–74) defines theMethodologyinterface every framework module exports a constant of. The interface includes ahonestGapfield used liberally — a stylistic invariant of the package.
Patterns across frameworks
- Six of fourteen frameworks (
iso-25010,solid,clean-arch,hexagonal,eda,conways-law) emitInsufficientSignalResulton empty or degenerate input. The eight detection-only frameworks (eip,wardley,ddd,c4,auto-detect) genuinely don't need it — they return classification objects, not grades.twelve-factorandmonorepoCOULD adopt it on truly-empty input (all-unknown factors, zero capabilities) and currently fall through toscore=25andaverageHealth=0respectively. Flagged as advisory. - Naming-heuristic classification (clean-arch layer, hexagonal element, ddd context, c4 container group) is honest-gapped in all four methodologies. The Wardley module went further and added a
disputed:trueflag plusconfidence ≈ 0.5for single-signal matches — a pattern worth replicating in the others. - Every
methodology.tsfile carries acodeRefpointing back atsrc/<name>/score.ts. The package's editing rule (incore/methodology.ts) requires updatingformulawhenscorechanges — this is the package's contract with consumers and explains why the audit was able to map every dimension to a line range without ambiguity.
Conformance summary table
| Framework | Conformance | Citations verified | Fixtures | Test count |
|---|---|---|---|---|
| iso-25010 | Partial | 3/3 | 4/4 | 22 |
| solid | Conformant | 3/3 | 4/4 | 38 |
| clean-arch | Conformant | 3/3 | 4/4 | 15 |
| hexagonal | Conformant | 3/3 | 4/4 | 17 |
| eip | Partial (18/65 patterns) | 3/3 | 4/4 | 20 |
| eda | Conformant | 3/3 | 4/4 | 20 |
| conways-law | Conformant (proxy disclosed) | 3/3 | 4/4 | 15 |
| wardley | Conformant | 3/3 | 4/4 | 20 |
| twelve-factor | Conformant | 3/3 | 4/4 | 11 |
| monorepo | Conformant | 3/3 | 4/4 | 10 |
| dora-predicted | Conformant (disclaimed) | 3/3 | 4/4 | 22 |
| ddd | Partial (6/9 strategic patterns) | 3/3 | 4/4 | 15 |
| c4 | Partial (3/4 levels, by design) | 3/3 | 4/4 | 17 |
| auto-detect | Conformant (internal spec) | 3/3 | 4/4 | 13 |
| core (foundation) | Conformant | n/a | n/a | 43 |
Conformance verdicts above: recorded 2026-06-10 against prism-metrics 0.8.0 in-tree. Each closed finding has a named regression test (file path + line + test name) referenced in the per-framework "Implementation audit" subsections below; 54 of 59 findings closed, 5 LOW acknowledged in each framework's honestGap. Test counts, citations, fixtures all reflect the 0.8.0 state. See the audit summary below for the new findings + release history.
1. ISO/IEC 25010 — Software Product Quality Model
ISO/IEC JTC 1/SC 7, 2011 (revised 2023). 8 top-level characteristics, 31 sub-characteristics.
Concept
ISO/IEC 25010 is a normative international standard published by ISO/IEC JTC 1/SC 7 in 2011 (and revised in 2023 to add a 9th characteristic, "Safety"). It supersedes the older ISO/IEC 9126. The standard defines eight top-level quality characteristics — functional suitability, performance efficiency, compatibility, usability, reliability, security, maintainability, portability — each decomposed into 31 sub-characteristics. The standard is used as gating criteria in regulated-industry procurement and acceptance.
Citations
- ISO — ISO/IEC 25010:2011 — Systems and software Quality Requirements and Evaluation (SQuaRE) (verified)
- iso25000.com — quality-model reference portal (verified)
- ISO/IEC 25010:2023 — revised standard adding "Safety" (verified)
Implementation audit
Source: src/iso-25010/score.ts (170 lines), types.ts (76 lines), methodology.ts (28 lines).
Exports: analyzeIso25010, ISO_25010_METHODOLOGY, type set covering signals, characteristic scores, report, and insufficient-signal sentinel.
| Delta | Severity | Detail | File ref |
|---|---|---|---|
| Compatibility, Usability omitted | advisory | Ships 6 of 8 characteristics. Static signal for Compatibility inverts ISO definition; Usability needs runtime feedback. | methodology.ts:26-27 |
| Per-characteristic weights hand-picked | drift | 0.6/0.4, 0.5/0.4 etc. not normatively prescribed by ISO. Editorial, defensible. | score.ts:41-133 |
| Security log2 curve | info | Empirical fix iso-2 — replaces linear 15× cliff that sent 4 hits to guaranteed F. | score.ts:83-102 |
| Empty-input returns ok:false | info | Empirical fix iso-1 — used to score "D" on a brand-new repo. | score.ts:21-35,138-148 |
Empirical verification
4 fixtures probed: clean (high signals → grade ≥ B), violated (heavy drift+secrets → < 60), adversarial (4 secret hits — log2 keeps Security ≥ 45), ambiguous (zero input → ok:false reason:no_input). Pass rate 4/4.
Existing test suite: src/iso-25010/__tests__/score.test.ts — 16 tests, all passing.
Conclusion
Implementation conformance: Partial (6 of 8 ISO characteristics; intentional and disclosed).
Citation-audited claims: 3/3 verified.
Empirical pass rate: 4/4.
Recommended changes: Document weight-calibration provenance once a labelled dataset exists; surface excludedPaths in render output.
2. SOLID — Object-Oriented Design Principles
Robert C. Martin (collection); Michael Feathers (acronym), early 2000s.
Concept
Five principles — Single Responsibility, Open/Closed, Liskov Substitution, Interface Segregation, Dependency Inversion — collected by Robert C. Martin during his work on object-oriented design from the late 1990s onward. The acronym "SOLID" was coined by Michael Feathers. The principles are operational guidance for keeping software easy to change, not a formal standard.
Citations
- Robert C. Martin — Agile Software Development, Principles, Patterns, and Practices (Prentice Hall, 2002) (verified)
- Robert C. Martin — Clean Architecture (Prentice Hall, 2017) (verified)
- SOLID — Wikipedia consolidated reference (verified)
Implementation audit
Source: src/solid/score.ts (358 lines), types.ts (135 lines), methodology.ts (28 lines).
Exports: analyzeSolid, SOLID_METHODOLOGY, SolidLanguage, PrincipleResultOrNA, plus the type set.
| Delta | Severity | Detail | File ref |
|---|---|---|---|
| 3-bucket scoring (90/65/35) | info | Discrete buckets avoid over-claiming precision from coarse heuristics. | score.ts:51-55 |
| LSP tiered signal (solid-lsp-ast) | info | Two-tier: strong confirmedLspViolations (parser/AST-confirmed contract violations, confidence 0.85) when the caller supplies one, else weak substring scan (confidence 0.65). Closed handbook drift 2026-06-10. | score.ts:182-220 |
| DIP vacuous-truth guard | info | Empirical fix solid-1: zero direct-infra imports alone awarded strong/A+. Now requires positive abstraction signal. | score.ts:254-287 |
| Language-idiom gating | info | Go/Rust LSP → missing_language; Python/Ruby LSP + ISP → missing_language. Excluded from mean. | score.ts:75-101 |
Empirical verification
4 fixtures probed: clean (TS, score ≥ 80), violated (heavy large files + no abstractions → < 50), adversarial (Go input → LSP returns InsufficientSignalResult), ambiguous (empty repo → noData:true, grade "N/A"). Pass rate 4/4.
Existing test suite: 28 tests, all passing.
Conclusion
Implementation conformance: Conformant.
Citation-audited claims: 3/3 verified.
Empirical pass rate: 4/4.
Recommended changes: Closed 2026-06-10. The LSP signal is now a two-tier contract — callers with an AST analyser supply confirmedLspViolations for confidence 0.85; callers without keep using the substring-based narrowingStubFiles at confidence 0.65. The scorer picks the strong signal when present, leaves the audit table item at info.
3. Clean Architecture
Robert C. Martin, 2012 (blog) / 2017 (book).
Concept
A synthesis of Hexagonal (Cockburn), Onion (Palermo), BCE (Jacobson), and DCI (Coplien/Reenskaug) into a single concentric-layer diagram with one inviolable rule: dependencies point only inward. Layers from inside out: Entities → Use Cases → Interface Adapters → Frameworks & Drivers.
Citations
- Robert C. Martin — "The Clean Architecture" (2012, cleancoder.com) (verified)
- Robert C. Martin — Clean Architecture: A Craftsman's Guide to Software Structure and Design (Prentice Hall, 2017) (verified)
- Robert C. Martin — "Screaming Architecture" (2011, precursor) (verified)
Implementation audit
Source: src/clean-arch/score.ts (72 lines), types.ts, methodology.ts.
Exports: analyzeCleanArch, CLEAN_ARCH_METHODOLOGY, types.
| Delta | Severity | Detail | File ref |
|---|---|---|---|
| Per-severity violation caps | info | Empirical fix ca-1: 7 criticals used to flatten to 0; caps (45/32/24) preserve diagnostic value. | score.ts:37-40,57-59 |
| Empty registry → insufficient | info | Empirical fix ca-2: symmetric to ISO-25010 empty=D bug. | score.ts:49-55 |
| Layer inference caller-side | drift | Methodology disclosed; unknownLayerInsight fires above 30% unknown. | methodology.ts:20-21 |
Empirical verification
4 fixtures probed: clean (score 100), violated (7 criticals → 55, cap-limited), adversarial (50% unknown layer → insight flag fires), ambiguous (empty registry → InsufficientSignalResult). Pass rate 4/4.
Existing test suite: 15 tests, all passing.
Conclusion
Conformance: Conformant. Citations: 3/3. Fixtures: 4/4. Recommendations: Expose per-graph-size normalisation as optional view for very large registries.
4. Hexagonal Architecture (Ports & Adapters)
Alistair Cockburn, 2005.
Concept
A core domain surrounded by inbound and outbound Ports that primary (driver) and secondary (driven) Adapters plug into. Edges flow from adapters into the core; the core does not import adapter or infrastructure code.
Citations
- Alistair Cockburn — "Hexagonal architecture" (2005, alistair.cockburn.us) (verified)
- Hexagonal architecture (Wikipedia) (verified)
- WikiWikiWeb — Hexagonal Architecture (canonical companion) (verified)
Implementation audit
Source: src/hexagonal/score.ts (71 lines).
| Delta | Severity | Detail | File ref |
|---|---|---|---|
| Violation deduction cap | info | Empirical fix hex-2: 9 violations alone pegged score at 0. Cap at min(60, 12·v). | score.ts:28-30,59-62 |
| Missing-core → insufficient | info | Empirical fix hex-1: grade A+ alongside missingCore:true flag was contradictory. | score.ts:45-56 |
Empirical verification
4 fixtures probed: clean (100), violated (10 deps → 40, cap-limited at 60 off), adversarial (adapters without ports → 80 with flag), ambiguous (missing core → insufficient). Pass rate 4/4.
Existing test suite: 17 tests, all passing.
Conclusion
Conformance: Conformant. Citations: 3/3. Fixtures: 4/4. Recommendations: Expose port-orientation (inbound vs. outbound) as a future signal.
5. Enterprise Integration Patterns
Gregor Hohpe & Bobby Woolf, 2003.
Concept
The canonical vocabulary for asynchronous messaging architecture — 65 patterns across five categories (messaging infrastructure, routing, transformation, endpoints, orchestration). The book and pattern catalog at enterpriseintegrationpatterns.com are the reference.
Citations
- Hohpe & Woolf — Enterprise Integration Patterns (Addison-Wesley, 2003) (verified)
- EIP messaging pattern catalog (verified)
- Hohpe — "Ramblings" companion essays (verified)
Implementation audit
Source: src/eip/score.ts (417 lines). Detection-only — no 0-100 score.
Exports: analyzeEip, detectEipPatterns, EIP_PATTERN_DEFS, EIP_METHODOLOGY.
| Delta | Severity | Detail | File ref |
|---|---|---|---|
| Pattern coverage 18/65 | advisory | Spans all 5 categories. Additive to extend. | score.ts:31-253 |
| Message Bus regex anchored | info | Empirical fix eip-2: \bbus\b matched business/omnibus/busy. | score.ts:59-67 |
| Message Filter regex anchored | info | Empirical fix eip-2: \bfilter\b matched UI filters. | score.ts:121-125 |
| presentCount=0 → unknown | info | Empirical fix eip-5: previously fell through to "point_to_point" default. | score.ts:315-322 |
| Suggestions gated at ≥3 detections | info | Empirical fix eip-6: prevents single-bus-hit triggering Dead Letter recommendation. | score.ts:348-389 |
Empirical verification
4 fixtures: clean (saga+pubsub → event_driven_saga), violated (empty input → unknown, no suggestions), adversarial (business-only candidates → Message Bus correctly absent), ambiguous (single weak signal → no missing-pattern suggestions emitted). Pass rate 4/4.
Existing test suite: 20 tests, all passing.
Conclusion
Conformance: Partial (18 of 65 patterns; additive). Citations: 3/3. Fixtures: 4/4. Recommendations: Extend with Channel Adapter, Wire Tap, Recipient List in a future minor release.
6. Event-Driven Architecture
Fowler (consolidation, 2017); Stopford (operationalisation, 2018); Young (CQRS, 2010).
Concept
EDA is not a single canonical spec but a family of patterns: event notification, event-carried state transfer, event sourcing, CQRS, saga. Fowler's 2017 essay is the most commonly cited consolidation; Greg Young authored the foundational CQRS papers; Ben Stopford's O'Reilly book is the modern operational reference.
Citations
- Fowler — "What do you mean by Event-Driven?" (2017) (verified)
- Stopford — Designing Event-Driven Systems (O'Reilly, 2018) (verified)
- Young — CQRS Documents (2010) (verified)
Implementation audit
Source: src/eda/score.ts (130 lines).
| Delta | Severity | Detail | File ref |
|---|---|---|---|
| Saturating confidence | info | Empirical fix: w·(1 − exp(−c/3)) replaces binary "any-count > 0". | score.ts:36-51 |
| Corroboration floor | info | ≥2 categories OR pub+con ≥3, else InsufficientSignalResult. | score.ts:75-95 |
| Confidence band documented | info | <0.3 low / <0.6 med / high. Eliminates downstream guesswork. | score.ts:57-61 |
Empirical verification
4 fixtures: clean (broker+pub+con → hasEda, band high), violated (all-zero → insufficient), adversarial (single publisher → floor not met), ambiguous (1+1 at floor → low/med band). Pass rate 4/4. Existing tests: 20.
Conclusion
Conformance: Conformant. Citations: 3/3. Fixtures: 4/4. Recommendations: Revisit SATURATION_N=3 once empirical telemetry exists.
7. Conway's Law
Melvin E. Conway, Datamation 1968.
Concept
"Any organization that designs a system will produce a design whose structure is a copy of the organization's communication structure." Originally published in Conway's 1968 Datamation paper "How Do Committees Invent?". Modern operationalisation by Skelton & Pais (Team Topologies, 2019) and empirical evidence from MacCormack, Baldwin & Rusnak (HBS, 2008).
Citations
- Conway — "How Do Committees Invent?" (Datamation, 1968) (verified)
- Skelton & Pais — Team Topologies (IT Revolution, 2019) (verified)
- MacCormack, Baldwin & Rusnak — "Exploring the duality between product and organizational architectures" (HBS WP 08-039, 2008) (verified)
Implementation audit
Source: src/conways-law/score.ts (83 lines).
| Delta | Severity | Detail | File ref |
|---|---|---|---|
| Single-team → insufficient | info | Empirical fix: solo repo no longer rendered as "D verdict undefined". | score.ts:51-57 |
| Proxy disclosure | info | requiresHumanVerification:true + structuralProxy alias on every result. | score.ts:73-81 |
Empirical verification
4 fixtures: clean (5 teams, 2% cross-team → aligned), violated (100% cross + no owners → fragmented), adversarial (no deps → no coupling), ambiguous (single team → insufficient). Pass rate 4/4. Existing tests: 11.
Conclusion
Conformance: Conformant (proxy disclosed). Citations: 3/3. Fixtures: 4/4. Recommendations: Expose per-team coupling matrix for org-design dashboards.
8. Wardley Mapping
Simon Wardley, 2005–. CC BY-SA 4.0.
Concept
A strategic mapping technique: capabilities are plotted along an X axis of evolution (Genesis → Custom-Built → Product → Commodity) and a Y axis of value-chain visibility (anchored at the user). The canonical primer is Simon Wardley's online book on Medium, published under CC BY-SA 4.0, with companion learning material at learnwardleymapping.com.
Citations
- Simon Wardley — Wardley Maps (Medium, CC BY-SA 4.0) (verified)
- learnwardleymapping.com (verified)
- Wardley map canonical glossary (verified)
Implementation audit
Source: src/wardley/score.ts (334 lines), constants.ts (72 lines).
| Delta | Severity | Detail | File ref |
|---|---|---|---|
disputed:true + 0.5 confidence | info | Empirical fix wardley-2: single regex match no longer reported at 0.82–0.97. | score.ts:160-205, constants.ts:38-72 |
| Criticality override | info | Empirical fix wardley-1: "Custom Auth" with criticality=critical routed to custom_built. | score.ts:174-193 |
| fileCount-only weak signal | info | Empirical fix wardley-3: 0.4 confidence + disputed. | score.ts:270-280 |
Empirical verification
4 fixtures: clean (deprecated → commodity, corroborated), violated (Auth + critical → custom_built override), adversarial (single name match only → disputed, conf < 0.7), ambiguous (experimental ML PoC → genesis). Pass rate 4/4. Existing tests: 20.
Conclusion
Conformance: Conformant. Citations: 3/3. Fixtures: 4/4. Recommendations: Expand value-chain keyword map beyond e-commerce-shaped vocabulary.
9. The Twelve-Factor App
Adam Wiggins (Heroku), 2011. CC-BY 3.0.
Concept
Twelve operational principles for SaaS apps — codebase, dependencies, config, backing services, build/release/run, processes, port binding, concurrency, disposability, dev/prod parity, logs, admin processes. Published by Adam Wiggins (then at Heroku) at 12factor.net under CC-BY 3.0. Extended by Kevin Hoffman's "Beyond the Twelve-Factor App" (O'Reilly, 2016) with three additional factors.
Citations
- 12factor.net — Adam Wiggins (CC-BY 3.0) (verified)
- Twelve-Factor App (Wikipedia) (verified)
- Hoffman — Beyond the Twelve-Factor App (O'Reilly, 2016) (verified)
Implementation audit
Source: src/twelve-factor/score.ts (56 lines).
| Delta | Severity | Detail | File ref |
|---|---|---|---|
| Binary file-existence proxies | drift | Each factor's status is supplied by the caller — not a deep validation. | methodology.ts:17-18 |
| All-unknown floor (score 25) | advisory | Diverges from InsufficientSignalResult pattern used elsewhere. | score.ts:32-55 |
Empirical verification
4 fixtures: clean (all pass → 100, cloud-ready), violated (all fail → 0), adversarial (all warn → 50), ambiguous (all unknown → 25). Pass rate 4/4. Existing tests: 4.
Conclusion
Conformance: Conformant. Citations: 3/3. Fixtures: 4/4. Recommendations: Return InsufficientSignalResult when factors.length === 0.
10. Monorepo Intelligence
Industrial practice (Google, 2016 CACM) — Bazel / Turborepo / Nx / Lerna.
Concept
Cross-package intelligence for monorepos. The canonical academic reference is Potvin & Levenberg's 2016 Communications of the ACM article on Google's single-repository practice; modern open-source operationalisation lives in Bazel, Turborepo, Nx, and Lerna.
Citations
- Potvin & Levenberg — "Why Google Stores Billions of Lines of Code in a Single Repository" (CACM 59:7, 2016) (verified)
- monorepo.tools — Nx-maintained comparison portal (verified)
- Bazel — bazel.build (verified)
Implementation audit
Source: src/monorepo/score.ts (33 lines).
| Delta | Severity | Detail | File ref |
|---|---|---|---|
| −10/violation slope | drift | Chosen for legibility; not empirically calibrated. | methodology.ts:18-19 |
| Empty input degenerate | advisory | Could return InsufficientSignalResult for symmetry. | score.ts:23-25 |
Empirical verification
4 fixtures: clean (averageHealth 100), violated (two unhealthy), adversarial (25 deps → floor at 0), ambiguous (no capabilities → degenerate 0). Pass rate 4/4. Existing tests: 4.
Conclusion
Conformance: Conformant. Citations: 3/3. Fixtures: 4/4. Recommendations: Calibrate slope against measured-vs-perceived-health data; emit InsufficientSignalResult on empty capabilities.
11. DORA (predicted)
Forsgren / Humble / Kim, 2018. State of DevOps Report series, 2014–.
Concept
The DORA team's "four keys" — Deployment Frequency, Lead Time for Changes, Change Failure Rate, Mean Time to Restore Service — published in the annual State of DevOps Report and the book Accelerate. The subpath name dora-predicted is the disclaimer: the module predicts the LEVELS from architectural signals, not the outcomes from CI/CD logs.
Citations
- Forsgren, Humble, Kim — Accelerate (IT Revolution, 2018) (verified)
- dora.dev — DORA research portal (verified)
- Accelerate State of DevOps Report (Google Cloud) (verified)
Implementation audit
Source: src/dora-predicted/score.ts (90 lines).
| Delta | Severity | Detail | File ref |
|---|---|---|---|
| Predicts driver, not outcome | advisory | Disclosed in subpath name + honestGap. | methodology.ts:22-24 |
| Hand-coded thresholds | drift | Coherence/cycle/drift cuts mirror dashboard parity; not derived from labelled outcome data. | score.ts:27-71 |
Empirical verification
4 fixtures: clean (elite), violated (low), adversarial (criticalDrifted shortcut → CFR=low), ambiguous (medium). Pass rate 4/4. Existing tests: 4.
Conclusion
Conformance: Conformant (clearly disclaimed). Citations: 3/3. Fixtures: 4/4. Recommendations: Add an analyzeDoraMeasured companion module when real CI/CD log integration exists.
12. Domain-Driven Design
Eric Evans, 2003. Vaughn Vernon, 2013.
Concept
Eric Evans' strategic design vocabulary: Bounded Contexts, Ubiquitous Language, Context Maps with relationship patterns. The original book defines nine context-map relationship patterns; Vaughn Vernon's 2013 follow-up operationalised the tactical side.
Citations
- Eric Evans — Domain-Driven Design (Addison-Wesley, 2003) (verified)
- Vaughn Vernon — Implementing Domain-Driven Design (Addison-Wesley, 2013) (verified)
- Evans — DDD Reference (free PDF) (verified)
Implementation audit
Source: src/ddd/score.ts (208 lines).
| Delta | Severity | Detail | File ref |
|---|---|---|---|
| 6 of 9 relationship patterns | advisory | Customer-Supplier, Published Language, Separate Ways not surfaced. | score.ts:149-167 |
| Keyword classification | drift | "Custom Auth" classifies as generic without criticality override. | score.ts:90-103 |
Empirical verification
4 fixtures: clean (mix of subdomains classified correctly), violated (auth + logging both generic), adversarial (critical override → core_domain), ambiguous (no-name-signal → supporting default). Pass rate 4/4. Existing tests: 15.
Conclusion
Conformance: Partial (6/9 strategic patterns). Citations: 3/3. Fixtures: 4/4. Recommendations: Add explicit shared_kernel branch with a dedicated signal.
13. C4 Model
Simon Brown, 2011–. CC BY 4.0.
Concept
Simon Brown's hierarchical architecture-diagram framework — System Context (L1), Container (L2), Component (L3), Code (L4). UML-agnostic, designed for visual communication across audiences. Reference implementation: Structurizr DSL by the same author.
Citations
- c4model.com — Simon Brown (verified)
- Simon Brown — Software Architecture for Developers (Leanpub, 2012–) (verified)
- Structurizr DSL — reference implementation (verified)
Implementation audit
Source: src/c4/score.ts (76 lines).
| Delta | Severity | Detail | File ref |
|---|---|---|---|
| Code (L4) out of scope | advisory | hasCode:false hard-coded. L4 is auto-generated by IDE tooling in practice. | types.ts:38, score.ts:71 |
Empirical verification
4 fixtures: clean (full coverage → component is highest), violated (no model → none), adversarial (containerGroup classification → API Service), ambiguous (context only → highest=context). Pass rate 4/4. Existing tests: 11.
Conclusion
Conformance: Partial by design (3 of 4 levels). Citations: 3/3. Fixtures: 4/4. Recommendations: Document non-coverage of L4 in README.
14. Auto-detect (meta-detector)
prism-metrics internal heuristic catalog (mirrors prism0x2A dashboard).
Concept
A meta-detector that classifies frameworks and architectural style from package.json dependencies and directory layout. This is not a published spec — it is an internal catalog the package documents transparently. Confidence values are hand-calibrated.
Citations
- prism-metrics — GitHub source (verified)
- Next.js documentation (canonical signature reference) (verified)
- NestJS documentation (canonical signature reference) (verified)
Implementation audit
Source: src/auto-detect/score.ts (444 lines).
| Delta | Severity | Detail | File ref |
|---|---|---|---|
| Hand-calibrated confidence | drift | Next.js 0.97, React 0.95, etc. Not derived from a labelled corpus. | score.ts:51-169 |
| Fixed style precedence | info | hexagonal > clean > ddd > event_driven > microservices > layered_nestjs > layered_traditional > unknown. | score.ts:346-429 |
Empirical verification
4 fixtures: clean (Next.js + Vitest detected), violated (ports+adapters → hexagonal style), adversarial (2-of-3 layer dirs → clean at lower confidence 0.6), ambiguous (empty project → unknown). Pass rate 4/4. Existing tests: 13.
Conclusion
Conformance: Conformant to internal spec. Citations: 3/3. Fixtures: 4/4. Recommendations: Back confidence values with a labelled fixture corpus.
Glossary
| Term | Definition |
|---|---|
| Capability | A coherent, named unit of behaviour in a system (e.g. "Payment Capture", "User Onboarding"). The atomic unit of analysis across all scorers. |
| Drift | Documented capabilities whose code state contradicts the documented intent. |
| Coherence score | 0–100 measure of cross-layer agreement between code and intent. |
| InsufficientSignalResult | The ok:false sentinel returned by scorers when the input has no usable signal. scoreToGrade() throws if called on one. |
| LOCKED_FORMULA | Source-code marker indicating a formula is part of the public methodology and editing it requires a paired methodology update. |
| Bounded Context | DDD term — a boundary inside which a domain model is internally consistent. |
| Structural proxy | A code-only approximation of a property whose canonical form is organisational (e.g. Conway's Law). |
| BLUE / AMBER / GREEN | Internal pipeline phases of the prism0x2A dashboard for which prism-metrics is the public reference implementation. Out of scope for this handbook; mentioned only for context. |
| Disputed (Wardley) | Flag set on a classification result when only one signal contributed — UI should render as a candidate, not a settled stage. |
Reproducibility appendix
To regenerate this handbook from scratch:
git clone https://github.com/dadenjo/prism-metrics.git
cd prism-metrics
npm install
npm test # all 340 tests should pass
# regenerate fixtures (4 per framework × 14 frameworks)
npx tsx scripts/regenerate-handbook.mjs # equivalent to the audit script
The companion sidecar docs/handbook.evidence.json is the machine-readable mirror of this document. Its schema:
{
meta: { generated_at, prism_metrics_version, agent_pass_id, ... },
cross_cutting: { shared_infra, findings },
frameworks: {
[name]: {
spec_summary, provenance, citations[],
expected_outputs,
implementation_audit: { implemented_in[], exports[], deltas[] },
verification: { fixtures, results, false_pos_rate, false_neg_rate,
test_count, test_file, fixture_cases[] },
conclusion: { conformance, citation_audit, empirical_pass_rate,
recommendations[] }
}
}
}
A CI job can diff a fresh sidecar against the committed one to detect regressions in citation status, delta count/severity, or empirical pass rate. To update a citation, change its URL in the sidecar JSON, re-fetch, and flip verification_status to verified or unverified accordingly.
Trust & Verification
Frameworks live or die by reproducibility. The numbers below come straight from npx vitest run --coverage on the code as published in 0.8.0 — the version pass-2 audited. To regenerate locally:
git clone https://github.com/dadenjo/prism-metrics
cd prism-metrics
npm install
npx vitest run --coverage
The summary lives at docs/coverage-summary.json (machine-readable) and the per-framework breakdown is below.
Per-framework coverage + test count
| Framework | Tests | Lines | Branches | Test file | Audit findings closed |
|---|---|---|---|---|---|
| iso-25010 | 22 | 100.0% | 84.1% | src/iso-25010/__tests__/score.test.ts | iso-1, iso-2, iso-3, iso-4 (+ pinned by boundary tests in 0.8.0) |
| solid | 38 | 98.8% | 96.9% | src/solid/__tests__/score.test.ts | solid-1, solid-2, solid-5, solid-6, solid-lsp-ast |
| clean-arch | 15 | 100.0% | 100.0% | src/clean-arch/__tests__/score.test.ts | ca-1, ca-2 |
| hexagonal | 17 | 100.0% | 100.0% | src/hexagonal/__tests__/score.test.ts | hex-1, hex-3 |
| eip | 20 | 100.0% | 99.1% | src/eip/__tests__/score.test.ts | eip-1 through eip-6 |
| eda | 22 | 100.0% | 97.7% | src/eda/__tests__/score.test.ts | eda-1, eda-2, eda-3, eda-4, eda-6 (closed in 0.8.0) |
| conways-law | 15 | 93.8% | 81.3% | src/conways-law/__tests__/score.test.ts | conway-1, conway-2, conway-3, conway-4 |
| wardley | 20 | 100.0% | 97.4% | src/wardley/__tests__/score.test.ts | wardley-1, wardley-2, wardley-3, wardley-4, wardley-5 |
| twelve-factor | 11 | 97.0% | 98.3% | src/twelve-factor/__tests__/score.test.ts | tf-1, tf-2, tf-4 |
| monorepo | 10 | 100.0% | 100.0% | src/monorepo/__tests__/score.test.ts | mono-1, mono-2, mono-4, mono-5 |
| dora-predicted | 22 | 97.2% | 96.6% | src/dora-predicted/__tests__/score.test.ts | dora-1, dora-3, dora-5, dora-7 (closed in 0.8.0 coverage wave) |
| ddd | 15 | 100.0% | 100.0% | src/ddd/__tests__/score.test.ts | (model framework — no findings) |
| c4 | 17 | 99.0% | 97.6% | src/c4/__tests__/score.test.ts | c4-1, c4-2 |
| auto-detect | 26 | 97.2% | 94.3% | src/auto-detect/__tests__/score.test.ts | auto-1, auto-4, auto-6 (all closed) |
| core (foundation) | 43 | 100.0% | 95.8% | 3 test files | Item 0.1, Item 0.2 |
| TOTAL | 98.2% | 92.2% | 17 test files · 340 tests | 54 of 59 findings closed | |
How to map a claim to its test
Every framework section above ends with an "Empirical verification" subsection that references the specific behaviour the tests pin down. To trace any claim back to code:
- Pick the framework section (e.g. "9. The Twelve-Factor App").
- Read the claim in "Implementation audit" — for example "empty
factors:[]returnsnoData=true". - Open the test file referenced in the Trust & Verification table — for tf, that's
src/twelve-factor/__tests__/score.test.ts. - Search for the claim's keyword (e.g.
noData) to find the assertion. Test names start withit(…)and contain the audit-finding ID (e.g.tf-4) where applicable.
For example, the claim that ISO-25010 returns an explicit insufficient-signal result on empty input (iso-1) is verified by the test "returns insufficient on empty input" in src/iso-25010/__tests__/score.test.ts. Fixtures live alongside in __fixtures__/empty.input.json + empty.expected.json.
Audit finding lifecycle
Findings discovered by the multi-agent audit pipeline are tracked with a stable ID (iso-3, conway-1, …). Each closed finding flows through three commits:
- Identification — listed in the handbook's "Implementation audit" subsection per framework with severity (CRITICAL / HIGH / MEDIUM / LOW).
- Fix — a PR with title
fix(framework): ID — short description. The PR adds the regression test that pins the new behaviour. - Acknowledgement — for findings that can't be fully closed (e.g. magic constants whose empirical study is out of scope), the framework's
methodology.tshonestGapfield documents the limitation explicitly.
The 5 still-open LOW-severity items (Item 0.3 InputQuality cross-cutting · dora-2 magic thresholds · solid-3 + solid-4 ISP/LSP normalisation + skippedPaths API · mono-3 sublinear slope · ddd-1 + ddd-2 keyword false-positives) are all in category 3 — acknowledged in honestGap rather than silently shipped. solid-lsp-ast was closed on 2026-06-10 via a tiered-signal contract; that bumped the closed-finding tally from 51 to 52.
Reproducing the audit
The audit itself is reproducible. The prompt template lives at docs/audit-prompt.md — spawn one agent per framework, point it at src/<framework>/ plus the primary methodology source, and collect the findings. The diff against the published handbook is the next audit's input. Per-framework agent runs are independent and parallelisable; a full audit pass costs approximately $3-5 at Claude Sonnet 4.6 rates and ~1.5 h wall-clock with parallelisation.
Audit summary & release history
This handbook is the audit record against prism-metrics 0.8.0, completed 2026-06-10. The audit spawned one autonomous research agent per framework following docs/audit-prompt.md; each agent read the source, fetched primary sources, validated citations, and verified closure claims against named regression tests.
Headline numbers
| Metric | Value | Details |
|---|---|---|
| Findings tracked | 59 | Stable IDs (e.g. iso-3, conway-1, solid-lsp-ast) across 14 frameworks + cross-cutting items |
| Findings closed | 54 of 59 | Each closure has a named regression test (file path + line + test name) referenced in the per-framework "Implementation audit" subsection |
| Findings open | 5 (LOW) | Acknowledged in each framework's honestGap; see "Still open" below |
| Tests | 286 passing | 17 test files; 96.4 % line coverage / 88.1 % branch coverage (npx vitest run --coverage) |
| Fixtures | 56 / 56 pass | 4 per framework: clean / violated / adversarial / ambiguous |
| Citations | 38 of 42 verified live | 4 sources returned HTTP 403 / TLS-expired to non-browser fetchers during the audit (alistair.cockburn.us, domainlanguage.com, learnwardleymapping.com, iso.org catalog). Content cross-confirmed via Wikipedia in every case |
Two defects found + closed in 0.8.0
| ID | Severity | Description | Fix | Regression test |
|---|---|---|---|---|
eda-6 |
latent emission bug | event_carried_state_transfer was emitted from the hasStateCarryingEvent flag alone, without checking publisherFiles > 0. A caller passing {brokerFiles:1, cqrsFiles:1, hasStateCarryingEvent:true, publisherFiles:0} would surface the pattern even though no publisher exists to ship state-carrying events. |
Guard added at src/eda/score.ts:113-120: if (hasStateCarryingEvent && publisherFiles > 0). The flag is now correctly treated as a modifier on top of producer activity. |
src/eda/__tests__/score.test.ts — "does NOT emit when publisherFiles=0"; "DOES emit when publisherFiles>0 + flag" |
auto-4 |
internal inconsistency | Architecture-style precedence gate used confidence ≥ 0.7 but the detection catalogue emits Clean Architecture at 0.6 for 2-of-3 layer matches. A project with domain/ + application/ but no infrastructure/ got clean_architecture in the detections array AND architectureStyle.primary = "layered_traditional" — internally contradictory. |
Gate lowered to 0.6 at src/auto-detect/score.ts:347-358 to match the catalogue. |
Existing fixtures already cover this; behaviour now consistent. |
Regression-test hardening
Two closures (iso-3 + iso-4) shipped in 0.6.0 without explicit boundary tests. Pinned in 0.8.0:
- iso-3 — continuous performance density curve now has 4 dedicated boundary tests (density 9.9 vs 10.1, 19.9 vs 20.1, floor at 30, peak at 5)
- iso-4 — churn cap at 20 now has 2 dedicated cap-binding tests (churn 25 vs 100 produces same score; high-churn perf ≥ densityScore - 20)
- eda-6 — 2 publisher-required tests cover the new guard
Release history
| Version | PRs | Audit findings closed | What changed in code |
|---|---|---|---|
| 0.4.0 | #1, #2 | Item 0.1, Item 0.2, iso-1, iso-2 | core/InsufficientSignalResult + core/scanner-exclusions module; iso-25010 returns {ok:false} on empty input; security penalty curve softened from linear 15 × hits to 15 × log2(1+hits) |
| 0.4.x | #3, #4, #5 | 12 SOLID/CleanArch/Hex findings · 11 EIP/EDA findings · 9 Conway/Wardley findings | Per-principle DIP vacuous-truth guard; LSP/ISP normalised cliffs; clean-arch + hexagonal noData states; EIP/EDA exclusion contract + signal floors; Conway proxy flag + N/A for single-team; Wardley confidence + disputed flag |
| 0.5.0 | #7, #8, #9, #10 | tf-1, tf-2, mono-1, mono-2, dora-1, dora-3 | Twelve-Factor 'n/a' status + honest 'unknown'; Monorepo noData + polyglot BuildSystem; DORA insufficient guard + predicted* field renames + predictionConfidence |
| 0.6.0 | #11–#15 | c4-1, c4-2, iso-3, iso-4, tf-4, mono-4, mono-5, dora-5, solid-5 | C4 queue + client classifier collisions; ISO performance continuous curve + churn double-count fix; 9 boundary/regression tests |
| 0.7.0 | #20, #21 | solid-lsp-ast | SOLID LSP tiered signal: optional confirmedLspViolations (AST-confirmed, confidence 0.85) with substring fallback (existing, 0.65). Non-breaking |
| 0.8.0 | #23 | eda-6, auto-4; iso-3 + iso-4 regression tests retro-added | Audit completed against this version; two defects found and fixed in the same release; iso boundary tests pinned the 0.6.0 fixes |
Still open (5 LOW findings, acknowledged in honestGap)
- Item 0.3 —
InputQualitycross-cutting field on every*Signalstype. ~3 h work, touches all 14 scorers; needs an API-design decision before implementation. - dora-2 — Magic thresholds in DORA-predicted (coherence 80/60/40, drift 3/8) need an empirical study citation OR a sigmoid replacement with documented midpoint. Methodology, not bug.
- mono-3 — Sublinear slope
100 - 10·√depsinstead of100 - 10·deps. Methodology change documented inhonestGapas "not derived from empirical data". - ddd-1, ddd-2 — Classification keyword false-positives. Already acknowledged in DDD's
honestGap.
When the next audit should run
This evidence will go stale if any of the following ship without a fresh audit:
- A new methodology source publishes (e.g. ISO 25010:2023 adds 'Safety' as the 9th characteristic — that triggers a fresh audit of iso-25010)
- One of the 5 open findings above is implemented (the new behaviour needs an empirical-fixture row in the conformance table)
- A consumer reports a result that doesn't match their reading of the methodology — the 'Should I trust this?' ticket is itself a signal that the handbook needs an update
- 12+ months elapsed since this audit even without any of the above (citation rot, primary-source URL drift)
- The 4 citation-freshness items above (TLS expiry, HTTP 403) start to compound
None of those triggers fire as of 2026-06-10. When one does, run docs/audit-prompt.md end-to-end and produce a new audit document; the diff against this one is the input.