Skip to content

Changelog

This page mirrors the CHANGELOG.md in the repository.

All notable changes to this project will be documented here.

The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.

0.8.6 - 2026-04-27

Added

  • occ describe [directories...] top-level CLI command and occ workspace describe [rootDir] nested command for fast directory classification — identifies whether a path is a code, office, documentation, data, or mixed workspace using manifest signals and file inventory, with confidence levels and nested project detection
  • src/workspace/describe.ts module exporting describeWorkspace() and supporting classification helpers
  • src/workspace/describe-output.ts tabular and JSON formatters for WorkspaceDescription
  • WorkspaceDescription type and classification confidence levels in src/workspace/types.ts
  • ./workspace/describe programmatic export in package.json for downstream consumers
  • test/workspace-describe.test.ts covering React/Vite classification, mixed-workspace detection, multi-category folders, and .gitignore respect

0.8.5 - 2026-04-23

Security

Fixed

  • extractOfficeText helper absorbs officeparser 6.0 → 6.1's undocumented breaking change (parseOffice() now returns a ParsedOffice object with toText() instead of a plain string). Without this shim the officeparser bump would have regressed PPTX/ODT/ODS/ODP text extraction across occ, occ doc inspect, occ slide inspect, occ table inspect, and the markdown converter
  • SECURITY.md supported-versions table refreshed — the old table still listed 0.5.x / 0.4.x while the current supported line is 0.8.x

Changed

  • Transitive majors pulled in by officeparser@6.1.0: tesseract.js 6.0.1 → 7.0.0, pdfjs-dist 5.4.530 → 5.6.205. CLI and programmatic output verified unchanged against the standard fixture suite
  • package.json description and keywords refreshed to reflect the current scope (document metrics + structure + inspection + table extraction + code exploration + workspace analysis) — the old description still described the 0.1.x "scc-style summary tables" tool

0.8.4 - 2026-04-14

Added

  • overlayIgnorePatterns option on all discovery and index APIs (findFiles, discoverCodeFiles, buildCodebaseIndex, openCodeIndexStore, discoverDocuments, inspectWorkspaceDocuments, inspectWorkspaceDocumentSet, analyzeWorkspace, prepareWorkspaceContext) — overlay patterns are evaluated independently from base gitignore/caller patterns, so negation rules in overlays cannot reopen files excluded by the base matcher
  • overlayIgnorePatterns included in buildOptionFingerprint to prevent stale cache hits when overlay patterns change

Changed

  • createIgnoreMatcher refactored to use applyIgnoreResult helper for cleaner ignore evaluation logic

0.8.3 - 2026-04-14

Added

  • --ignore-pattern <pattern> CLI flag for gitignore-style exclusion patterns across all command families (occ, occ code, occ doc references, occ workspace)
  • --content-mode <mode> flag on occ code commands to control indexed content retention (none, excerpt, full)
  • --max-file-size-bytes, --max-lines, and --no-skip-minified guardrail flags for occ code commands
  • ContentMode, CodeSkippedFile, and CodeExcerpt types in the code exploration type system
  • NDJSON-based sectioned index I/O replacing monolithic JSON serialization for code indexes, avoiding V8 string-length limits on large codebases
  • dispose() and clearCache() methods on CodeIndexStore for explicit lifecycle control
  • AbortSignal support on buildCodebaseIndex, discoverCodeFiles, and prepareWorkspaceContext (including subprocess abort forwarding)
  • Minified-file detection heuristic — skips likely minified source files during code indexing by default
  • Scoped .gitignore support — nested .gitignore files (not just root) are now respected during file discovery across all command families
  • ignorePatterns, contentMode, maxFileSizeBytes, maxLines, skipMinified, and signal options on WorkspacePrepareOptions

Changed

  • Code index graph construction uses nodeById map for O(1) external-node lookups, replacing O(n) .some()/.find() scans
  • CodebaseIndex type now includes contentMode and skippedFiles fields; CodeCommandPayload.stats includes skippedFiles count
  • Store cache layout changed to <cacheRoot>/<contentMode>/<fingerprint>/ with NDJSON files, replacing single index.json
  • Store persistCache uses atomic temp-dir-then-rename writes to prevent corruption from interrupted builds
  • prefer-cache and ensure-fresh store strategies return existing in-memory sessions when available, avoiding redundant cache loads
  • Subprocess prepare runner uses sectioned NDJSON I/O instead of manual streamIndexToFile / readFileSync

Fixed

  • buildOptionFingerprint now includes ignorePatterns, contentMode, maxFileSizeBytes, maxLines, and skipMinified, preventing stale cache hits when these options change

0.8.2 - 2026-04-08

Changed

  • Code index onProgress now reports per-file progress during index construction instead of a single completion message

0.8.1 - 2026-04-08

Fixed

  • prepareWorkspaceContext subprocess mode now streams the code index to disk element-by-element, avoiding V8's ~512MB JSON.stringify string length limit on large codebases
  • Subprocess result file is read as a Buffer before JSON.parse, preventing string-length crashes on the read side

0.8.0 - 2026-04-08

Added

  • prepareWorkspaceContext API — combines code indexing and document inspection in a single call with subprocess/inline execution modes and progress callbacks
  • New subpath exports: ./workspace/prepare and ./workspace/prepare-types
  • onProgress callback parameter on inspectWorkspaceDocumentSet for document inspection progress tracking
  • workspace.prepare method on the createOcc() facade

Changed

  • workspace module promoted to layer 4 in the dependency DAG (above code and inspect-commands at layer 3)

0.7.0 - 2026-04-03

Added

  • createOcc() programmatic facade — namespace-based entry point (occ.code, occ.doc, occ.sheet, occ.slide, occ.workspace) re-exporting all public APIs from a single import
  • openCodeIndexStore — persistent code index store with three cache strategies (prefer-cache, ensure-fresh, rebuild), manifest-based freshness checks, abort signal support, and progress callbacks
  • createCodeQuerySessionFromIndex — create a code query session from a pre-built index without re-building
  • New subpath export: ./code/store
  • Root "." export now points to the createOcc facade (src/index.ts) instead of the CLI entry point

Changed

  • main and types fields in package.json now point to src/index.ts (facade) instead of bin/occ.ts
  • src/code/store.ts reads version from package.json at runtime instead of hardcoding it, preventing cache-invalidation drift across releases

0.6.3 - 2026-03-27

Changed

  • typescript now ships as a direct runtime dependency so occ code and the programmatic code-exploration exports work after a normal install

Fixed

  • occ --version and other non-code command paths no longer fail when a packaged install is missing typescript; the JS/TS parser now lazy-loads it and reports a targeted reinstall error if the install is incomplete

0.6.2 - 2026-03-24

Added

  • occ workspace analyze command — workspace-level code, document, and structure analysis with a versioned JSON contract (schemaVersion: 1)
  • occ workspace documents command — per-document summaries with cross-reference and unresolved-mention detection
  • occ code analyze coupling <target> command — module-level coupling metrics (afferent/efferent coupling, instability, key classes)
  • createCodeQuerySession programmatic API — stateful session wrapping the codebase index with refresh(), all query methods, and chunking (./code/session export)
  • fusedSearch results now include excerpt, signature, containerName, and language fields for richer downstream consumers
  • inspectDocumentSummary exported as public API from doc/batch
  • New subpath exports: ./code/session, ./workspace/analyze, ./workspace/documents, ./workspace/types
  • typesVersions in package.json for CJS consumers using moduleResolution: "node"
  • main and types top-level fields in package.json

Changed

  • DocumentSummaryResult wrapper now carries computed markdown content for internal reuse, avoiding redundant documentToMarkdown calls during workspace document inspection

Fixed

  • analyzeModuleCoupling now uses nodeById() map for O(1) lookups instead of O(n) .find() per edge, matching all other query functions

0.6.1 - 2026-03-16

Added

  • occ code index command — builds and emits the full codebase index (files, symbols, edges, language capabilities) as JSON or a summary line

0.6.0 - 2026-03-16

Added

  • --show-confidence flag displays confidence levels (exact or estimated) for each metric in both tabular and JSON output
  • Tabular output annotates estimated metrics with a ~ suffix and a ~ estimated metric footnote when --show-confidence is enabled
  • JSON output includes a confidence object per file row (e.g. { "words": "exact", "pages": "estimated" }) when --show-confidence is enabled
  • Confidence merging in grouped mode: if any file in a group has an estimated metric, the group's confidence for that metric is estimated
  • ./types and ./stats subpath exports in package.json — consumers can now import ConfidenceLevel, ParseResult, StatsRow, and AggregateResult directly

0.5.1 - 2026-03-16

Fixed

  • XLSX header cells in markdown conversion now escape pipe (|) and newline characters, matching the existing data row escaping
  • npm test script uses test/*.test.ts instead of test/**/*.test.ts for Node 18 compatibility (shell ** glob requires bash globstar or Node 21+)
  • Remove duplicate countWords function in src/code/chunk.ts; now imports from shared src/utils.ts
  • Add types conditions to all package.json subpath exports so TypeScript consumers using moduleResolution: "NodeNext" resolve .d.ts files correctly

0.5.0 - 2026-03-15

Added

  • exports field with subpath imports for the code exploration module (./code/build, ./code/types, ./code/query, ./code/discover) and root entry point (.) — consumers can now use clean imports instead of fragile deep paths into dist/
  • TypeScript as an optional peerDependency (>=5.0.0) — consumers using the code exploration module programmatically can provide their own TypeScript installation

Changed

  • The exports field restricts importable entry points. Consumers relying on unlisted deep imports into dist/ will need to use the declared subpath exports instead

0.4.1 - 2026-03-14

Added

  • Barrel re-export resolution: occ code analyze calls/callers/chain now resolves call targets through index.ts barrel files
  • Zod runtime schema validation across all CLI options and data types (parsers, walker, stats, output, structure, and all inspect commands)

Changed

  • Enable noImplicitReturns and noFallthroughCasesInSwitch TypeScript compiler options for stricter type safety
  • Extract shared XLSX cell utilities to src/inspect/xlsx-cells.ts
  • Remove deprecated re-export shim from src/sheet/inspect.ts

0.4.0 - 2026-03-13

Note: All features in OCC are currently experimental. This project cannot be considered stable software yet. APIs, output formats, and command interfaces may change between minor versions.

Added

  • occ table inspect <file> — extract structured table content from DOCX, XLSX, PPTX, ODT, and ODP as JSON or tabular output, with auto-detected headers, sample row limits, merged cell support, and per-table token estimates
  • occ doc inspect <file> — document metadata, risk flags, content stats, heading structure, and content preview for DOCX and ODT
  • occ slide inspect <file> — presentation metadata, risk flags, per-slide inventory, and content preview for PPTX and ODP
  • occ sheet inspect <file> — XLSX workbook preflight with sheet inventory, schema preview, risk flags, and token estimates
  • TypeScript interfaces, type aliases, enums, and implements clauses are now indexed in occ code exploration
  • Directory targets in occ code analyze deps now aggregate imports across all files in the directory

Fixed

  • Bidirectional chain analysis: occ code analyze chain now searches both directions and labels reverse paths explicitly
  • Code inheritance lookups disambiguate correctly when interfaces and classes share the same name
  • Directory dependency matches are preserved when aggregating across multiple files

0.3.1 - 2026-03-10

Fixed

  • Upgrade xlsx from 0.18.5 to 0.20.3 (official SheetJS tarball), resolving npm vulnerability (#1 — thanks @B33pBeeps)
  • Configure XLSX.set_fs(fs) for ESM compatibility with SheetJS 0.20+

0.3.0 - 2026-03-10

Added

  • occ code command family for on-demand code exploration
  • First-class JavaScript, TypeScript, and Python exploration support
  • Automated fixture-based tests for code graph queries and output contracts

Changed

  • Improved call resolution for this, super, self, cls, and imported aliases
  • Ambiguous calls and blocked call chains now surface candidate locations
  • Dependency analysis now separates local, external, and unresolved imports

0.2.0 - 2026-03-09

Added

  • Document structure extraction — new --structure flag parses heading hierarchy from DOCX, PDF, PPTX, ODT, and ODP files, displaying a navigable tree with dotted section codes (1, 1.1, 1.2, 2, ...)
  • Structure tree output in tabular mode with indented headings, dotted leaders, and page ranges (when available)
  • Structure data in JSON output under a structures key (only when --structure is used)
  • Page-to-section mapping for PDFs via [Page N] markers

Changed

  • Migrated entire codebase to TypeScript — all source files under src/ and bin/ are now .ts with strict type checking
  • Added npm run build (compiles to dist/) and npm run dev (runs via tsx without build step)
  • Published package now ships compiled dist/ instead of raw src/
  • New dependency: turndown (HTML-to-markdown conversion for DOCX structure extraction)
  • New devDependencies: typescript, @types/node, tsx, @types/turndown

0.1.2 - 2026-03-07

Changed

  • Rename "Extra" column to "Details" for clarity
  • Remove redundant top/bottom table borders for cleaner output
  • Remove inter-row separators, keep only header and totals borders
  • Right-align numeric columns in document table
  • Apply consistent number coloring to all scc table columns
  • Make section header width match table width dynamically
  • Use ASCII-only dashes in section headers during --ci mode
  • Parsers return only populated metric fields instead of null-filled objects
  • Batch stat calls in walker for better throughput on large directories
  • Pass scc binary path explicitly instead of module-level state

Added

  • Summary line showing scan scope, word/page counts, and elapsed time
  • Word and page counts in summary line for at-a-glance utility
  • SHA-256 checksum verification for scc binary downloads in postinstall
  • Input validation for --large-file-limit (rejects NaN values)

Fixed

  • "No office documents found." message no longer shown when code results are present
  • Table separator width mismatch between top-mid and middle characters

0.1.1 - 2026-03-07

Changed

  • Replace ExcelJS with SheetJS (xlsx) for XLSX parsing, eliminating deprecated transitive dependencies (rimraf, fstream, inflight, lodash.isequal, glob v7)

Fixed

  • Ensure test/fixtures/ directory exists before creating test fixtures (fixes CI failure)
  • Fix workflow_dispatch trigger in docs workflow (remove invalid branches key)
  • Fix Node 22+ compatibility in release workflow (require() instead of import() with assert)
  • Update GitHub Pages deployment branch policy from master to main

0.1.0 - 2026-03-07

Added

  • CLI tool for scanning directories for office documents (DOCX, XLSX, PPTX, PDF, ODT, ODS, ODP)
  • Word count, page count, paragraph count, slide count, sheet/row/cell count extraction
  • Automatic code metrics via scc integration (vendored binary with PATH fallback)
  • Per-file (--by-file) and grouped-by-type output modes
  • JSON output (--format json) for automation
  • Extension filtering (--include-ext, --exclude-ext)
  • Directory exclusion (--exclude-dir, default: node_modules,.git)
  • .gitignore-aware file discovery (disable with --no-gitignore)
  • Sortable output (--sort: files, name, words, size)
  • File output (--output)
  • CI mode (--ci) for ASCII-only, no-color output
  • Large file skip threshold (--large-file-limit, default: 50MB)
  • Progress bar with ETA
  • Auto-download of scc binary during npm install (skip with SCC_SKIP_DOWNLOAD=1)