Changelog¶
This page mirrors the CHANGELOG.md in the repository.
All notable changes to this project will be documented here.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
0.8.6 - 2026-04-27¶
Added¶
occ describe [directories...]top-level CLI command andocc workspace describe [rootDir]nested command for fast directory classification — identifies whether a path is a code, office, documentation, data, or mixed workspace using manifest signals and file inventory, with confidence levels and nested project detectionsrc/workspace/describe.tsmodule exportingdescribeWorkspace()and supporting classification helperssrc/workspace/describe-output.tstabular and JSON formatters forWorkspaceDescriptionWorkspaceDescriptiontype and classification confidence levels insrc/workspace/types.ts./workspace/describeprogrammatic export inpackage.jsonfor downstream consumerstest/workspace-describe.test.tscovering React/Vite classification, mixed-workspace detection, multi-category folders, and.gitignorerespect
0.8.5 - 2026-04-23¶
Security¶
- Refresh transitive dependencies to clear all
npm auditfindings (5 → 0). Downstream consumers can drop anyoverridesworkarounds after bumping to0.8.5. @xmldom/xmldomresolved to 0.8.13 (mammoth path) and 0.9.10 (officeparser path), clearing five advisories: CDATA serialization injection (GHSA-wh4c-j3r5-mjhp), uncontrolled-recursion DoS (GHSA-2v35-w6hq-6mfw),DocumentTypeinjection (GHSA-f6ww-3ggp-fr8h), processing-instruction injection (GHSA-x6wf-f3px-wcqx), comment-node injection (GHSA-j759-j44w-7fr8)picomatch2.3.1 → 2.3.2, clearing POSIX method-injection (GHSA-3v7f-55p6-f55p) and extglob ReDoS (GHSA-c2c7-rcm5-vvqj)file-type16.5.4 → 22.0.1 viaofficeparser, clearing ASF infinite-loop on malformed input (GHSA-5v7r-6r5c-r473)yauzl3.2.0 removed from the tree, clearing the off-by-one advisory (GHSA-gmq8-994r-jv83)
Fixed¶
extractOfficeTexthelper absorbs officeparser 6.0 → 6.1's undocumented breaking change (parseOffice()now returns aParsedOfficeobject withtoText()instead of a plain string). Without this shim the officeparser bump would have regressed PPTX/ODT/ODS/ODP text extraction acrossocc,occ doc inspect,occ slide inspect,occ table inspect, and the markdown converterSECURITY.mdsupported-versions table refreshed — the old table still listed 0.5.x / 0.4.x while the current supported line is 0.8.x
Changed¶
- Transitive majors pulled in by
officeparser@6.1.0:tesseract.js6.0.1 → 7.0.0,pdfjs-dist5.4.530 → 5.6.205. CLI and programmatic output verified unchanged against the standard fixture suite package.jsondescriptionandkeywordsrefreshed to reflect the current scope (document metrics + structure + inspection + table extraction + code exploration + workspace analysis) — the old description still described the 0.1.x "scc-style summary tables" tool
0.8.4 - 2026-04-14¶
Added¶
overlayIgnorePatternsoption on all discovery and index APIs (findFiles,discoverCodeFiles,buildCodebaseIndex,openCodeIndexStore,discoverDocuments,inspectWorkspaceDocuments,inspectWorkspaceDocumentSet,analyzeWorkspace,prepareWorkspaceContext) — overlay patterns are evaluated independently from base gitignore/caller patterns, so negation rules in overlays cannot reopen files excluded by the base matcheroverlayIgnorePatternsincluded inbuildOptionFingerprintto prevent stale cache hits when overlay patterns change
Changed¶
createIgnoreMatcherrefactored to useapplyIgnoreResulthelper for cleaner ignore evaluation logic
0.8.3 - 2026-04-14¶
Added¶
--ignore-pattern <pattern>CLI flag for gitignore-style exclusion patterns across all command families (occ,occ code,occ doc references,occ workspace)--content-mode <mode>flag onocc codecommands to control indexed content retention (none,excerpt,full)--max-file-size-bytes,--max-lines, and--no-skip-minifiedguardrail flags forocc codecommandsContentMode,CodeSkippedFile, andCodeExcerpttypes in the code exploration type system- NDJSON-based sectioned index I/O replacing monolithic JSON serialization for code indexes, avoiding V8 string-length limits on large codebases
dispose()andclearCache()methods onCodeIndexStorefor explicit lifecycle controlAbortSignalsupport onbuildCodebaseIndex,discoverCodeFiles, andprepareWorkspaceContext(including subprocess abort forwarding)- Minified-file detection heuristic — skips likely minified source files during code indexing by default
- Scoped
.gitignoresupport — nested.gitignorefiles (not just root) are now respected during file discovery across all command families ignorePatterns,contentMode,maxFileSizeBytes,maxLines,skipMinified, andsignaloptions onWorkspacePrepareOptions
Changed¶
- Code index graph construction uses
nodeByIdmap for O(1) external-node lookups, replacing O(n).some()/.find()scans CodebaseIndextype now includescontentModeandskippedFilesfields;CodeCommandPayload.statsincludesskippedFilescount- Store cache layout changed to
<cacheRoot>/<contentMode>/<fingerprint>/with NDJSON files, replacing singleindex.json - Store
persistCacheuses atomic temp-dir-then-rename writes to prevent corruption from interrupted builds prefer-cacheandensure-freshstore strategies return existing in-memory sessions when available, avoiding redundant cache loads- Subprocess prepare runner uses sectioned NDJSON I/O instead of manual
streamIndexToFile/readFileSync
Fixed¶
buildOptionFingerprintnow includesignorePatterns,contentMode,maxFileSizeBytes,maxLines, andskipMinified, preventing stale cache hits when these options change
0.8.2 - 2026-04-08¶
Changed¶
- Code index
onProgressnow reports per-file progress during index construction instead of a single completion message
0.8.1 - 2026-04-08¶
Fixed¶
prepareWorkspaceContextsubprocess mode now streams the code index to disk element-by-element, avoiding V8's ~512MBJSON.stringifystring length limit on large codebases- Subprocess result file is read as a Buffer before
JSON.parse, preventing string-length crashes on the read side
0.8.0 - 2026-04-08¶
Added¶
prepareWorkspaceContextAPI — combines code indexing and document inspection in a single call with subprocess/inline execution modes and progress callbacks- New subpath exports:
./workspace/prepareand./workspace/prepare-types onProgresscallback parameter oninspectWorkspaceDocumentSetfor document inspection progress trackingworkspace.preparemethod on thecreateOcc()facade
Changed¶
workspacemodule promoted to layer 4 in the dependency DAG (abovecodeandinspect-commandsat layer 3)
0.7.0 - 2026-04-03¶
Added¶
createOcc()programmatic facade — namespace-based entry point (occ.code,occ.doc,occ.sheet,occ.slide,occ.workspace) re-exporting all public APIs from a single importopenCodeIndexStore— persistent code index store with three cache strategies (prefer-cache,ensure-fresh,rebuild), manifest-based freshness checks, abort signal support, and progress callbackscreateCodeQuerySessionFromIndex— create a code query session from a pre-built index without re-building- New subpath export:
./code/store - Root
"."export now points to thecreateOccfacade (src/index.ts) instead of the CLI entry point
Changed¶
mainandtypesfields inpackage.jsonnow point tosrc/index.ts(facade) instead ofbin/occ.tssrc/code/store.tsreads version frompackage.jsonat runtime instead of hardcoding it, preventing cache-invalidation drift across releases
0.6.3 - 2026-03-27¶
Changed¶
typescriptnow ships as a direct runtime dependency soocc codeand the programmatic code-exploration exports work after a normal install
Fixed¶
occ --versionand other non-code command paths no longer fail when a packaged install is missingtypescript; the JS/TS parser now lazy-loads it and reports a targeted reinstall error if the install is incomplete
0.6.2 - 2026-03-24¶
Added¶
occ workspace analyzecommand — workspace-level code, document, and structure analysis with a versioned JSON contract (schemaVersion: 1)occ workspace documentscommand — per-document summaries with cross-reference and unresolved-mention detectionocc code analyze coupling <target>command — module-level coupling metrics (afferent/efferent coupling, instability, key classes)createCodeQuerySessionprogrammatic API — stateful session wrapping the codebase index withrefresh(), all query methods, and chunking (./code/sessionexport)fusedSearchresults now includeexcerpt,signature,containerName, andlanguagefields for richer downstream consumersinspectDocumentSummaryexported as public API fromdoc/batch- New subpath exports:
./code/session,./workspace/analyze,./workspace/documents,./workspace/types typesVersionsinpackage.jsonfor CJS consumers usingmoduleResolution: "node"mainandtypestop-level fields inpackage.json
Changed¶
DocumentSummaryResultwrapper now carries computed markdown content for internal reuse, avoiding redundantdocumentToMarkdowncalls during workspace document inspection
Fixed¶
analyzeModuleCouplingnow usesnodeById()map for O(1) lookups instead of O(n).find()per edge, matching all other query functions
0.6.1 - 2026-03-16¶
Added¶
occ code indexcommand — builds and emits the full codebase index (files, symbols, edges, language capabilities) as JSON or a summary line
0.6.0 - 2026-03-16¶
Added¶
--show-confidenceflag displays confidence levels (exactorestimated) for each metric in both tabular and JSON output- Tabular output annotates estimated metrics with a
~suffix and a~ estimated metricfootnote when--show-confidenceis enabled - JSON output includes a
confidenceobject per file row (e.g.{ "words": "exact", "pages": "estimated" }) when--show-confidenceis enabled - Confidence merging in grouped mode: if any file in a group has an estimated metric, the group's confidence for that metric is
estimated ./typesand./statssubpath exports inpackage.json— consumers can now importConfidenceLevel,ParseResult,StatsRow, andAggregateResultdirectly
0.5.1 - 2026-03-16¶
Fixed¶
- XLSX header cells in markdown conversion now escape pipe (
|) and newline characters, matching the existing data row escaping npm testscript usestest/*.test.tsinstead oftest/**/*.test.tsfor Node 18 compatibility (shell**glob requires bash globstar or Node 21+)- Remove duplicate
countWordsfunction insrc/code/chunk.ts; now imports from sharedsrc/utils.ts - Add
typesconditions to allpackage.jsonsubpath exports so TypeScript consumers usingmoduleResolution: "NodeNext"resolve.d.tsfiles correctly
0.5.0 - 2026-03-15¶
Added¶
exportsfield with subpath imports for the code exploration module (./code/build,./code/types,./code/query,./code/discover) and root entry point (.) — consumers can now use clean imports instead of fragile deep paths intodist/- TypeScript as an optional
peerDependency(>=5.0.0) — consumers using the code exploration module programmatically can provide their own TypeScript installation
Changed¶
- The
exportsfield restricts importable entry points. Consumers relying on unlisted deep imports intodist/will need to use the declared subpath exports instead
0.4.1 - 2026-03-14¶
Added¶
- Barrel re-export resolution:
occ code analyze calls/callers/chainnow resolves call targets throughindex.tsbarrel files - Zod runtime schema validation across all CLI options and data types (parsers, walker, stats, output, structure, and all inspect commands)
Changed¶
- Enable
noImplicitReturnsandnoFallthroughCasesInSwitchTypeScript compiler options for stricter type safety - Extract shared XLSX cell utilities to
src/inspect/xlsx-cells.ts - Remove deprecated re-export shim from
src/sheet/inspect.ts
0.4.0 - 2026-03-13¶
Note: All features in OCC are currently experimental. This project cannot be considered stable software yet. APIs, output formats, and command interfaces may change between minor versions.
Added¶
occ table inspect <file>— extract structured table content from DOCX, XLSX, PPTX, ODT, and ODP as JSON or tabular output, with auto-detected headers, sample row limits, merged cell support, and per-table token estimatesocc doc inspect <file>— document metadata, risk flags, content stats, heading structure, and content preview for DOCX and ODTocc slide inspect <file>— presentation metadata, risk flags, per-slide inventory, and content preview for PPTX and ODPocc sheet inspect <file>— XLSX workbook preflight with sheet inventory, schema preview, risk flags, and token estimates- TypeScript interfaces, type aliases, enums, and
implementsclauses are now indexed inocc codeexploration - Directory targets in
occ code analyze depsnow aggregate imports across all files in the directory
Fixed¶
- Bidirectional chain analysis:
occ code analyze chainnow searches both directions and labels reverse paths explicitly - Code inheritance lookups disambiguate correctly when interfaces and classes share the same name
- Directory dependency matches are preserved when aggregating across multiple files
0.3.1 - 2026-03-10¶
Fixed¶
- Upgrade xlsx from 0.18.5 to 0.20.3 (official SheetJS tarball), resolving npm vulnerability (#1 — thanks @B33pBeeps)
- Configure
XLSX.set_fs(fs)for ESM compatibility with SheetJS 0.20+
0.3.0 - 2026-03-10¶
Added¶
occ codecommand family for on-demand code exploration- First-class JavaScript, TypeScript, and Python exploration support
- Automated fixture-based tests for code graph queries and output contracts
Changed¶
- Improved call resolution for
this,super,self,cls, and imported aliases - Ambiguous calls and blocked call chains now surface candidate locations
- Dependency analysis now separates local, external, and unresolved imports
0.2.0 - 2026-03-09¶
Added¶
- Document structure extraction — new
--structureflag parses heading hierarchy from DOCX, PDF, PPTX, ODT, and ODP files, displaying a navigable tree with dotted section codes (1, 1.1, 1.2, 2, ...) - Structure tree output in tabular mode with indented headings, dotted leaders, and page ranges (when available)
- Structure data in JSON output under a
structureskey (only when--structureis used) - Page-to-section mapping for PDFs via
[Page N]markers
Changed¶
- Migrated entire codebase to TypeScript — all source files under
src/andbin/are now.tswith strict type checking - Added
npm run build(compiles todist/) andnpm run dev(runs via tsx without build step) - Published package now ships compiled
dist/instead of rawsrc/ - New dependency:
turndown(HTML-to-markdown conversion for DOCX structure extraction) - New devDependencies:
typescript,@types/node,tsx,@types/turndown
0.1.2 - 2026-03-07¶
Changed¶
- Rename "Extra" column to "Details" for clarity
- Remove redundant top/bottom table borders for cleaner output
- Remove inter-row separators, keep only header and totals borders
- Right-align numeric columns in document table
- Apply consistent number coloring to all scc table columns
- Make section header width match table width dynamically
- Use ASCII-only dashes in section headers during
--cimode - Parsers return only populated metric fields instead of null-filled objects
- Batch stat calls in walker for better throughput on large directories
- Pass scc binary path explicitly instead of module-level state
Added¶
- Summary line showing scan scope, word/page counts, and elapsed time
- Word and page counts in summary line for at-a-glance utility
- SHA-256 checksum verification for scc binary downloads in postinstall
- Input validation for
--large-file-limit(rejects NaN values)
Fixed¶
- "No office documents found." message no longer shown when code results are present
- Table separator width mismatch between top-mid and middle characters
0.1.1 - 2026-03-07¶
Changed¶
- Replace ExcelJS with SheetJS (xlsx) for XLSX parsing, eliminating deprecated transitive dependencies (rimraf, fstream, inflight, lodash.isequal, glob v7)
Fixed¶
- Ensure
test/fixtures/directory exists before creating test fixtures (fixes CI failure) - Fix
workflow_dispatchtrigger in docs workflow (remove invalidbrancheskey) - Fix Node 22+ compatibility in release workflow (
require()instead ofimport()withassert) - Update GitHub Pages deployment branch policy from
mastertomain
0.1.0 - 2026-03-07¶
Added¶
- CLI tool for scanning directories for office documents (DOCX, XLSX, PPTX, PDF, ODT, ODS, ODP)
- Word count, page count, paragraph count, slide count, sheet/row/cell count extraction
- Automatic code metrics via scc integration (vendored binary with PATH fallback)
- Per-file (
--by-file) and grouped-by-type output modes - JSON output (
--format json) for automation - Extension filtering (
--include-ext,--exclude-ext) - Directory exclusion (
--exclude-dir, default: node_modules,.git) - .gitignore-aware file discovery (disable with
--no-gitignore) - Sortable output (
--sort: files, name, words, size) - File output (
--output) - CI mode (
--ci) for ASCII-only, no-color output - Large file skip threshold (
--large-file-limit, default: 50MB) - Progress bar with ETA
- Auto-download of scc binary during
npm install(skip withSCC_SKIP_DOWNLOAD=1)