Skip to content

Changelog

This page mirrors the CHANGELOG.md in the repository.

All notable changes to this project will be documented here.

The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.

[0.8.12] - 2026-06-15

Added

  • Workspace corpus chunksbundleWorkspace can now emit RAG-ready chunks alongside the rest of the bundle. New WorkspaceBundleOptions fields: includeCodeChunks (returns codeChunks: CodeChunk[] from a full-content code index), includeCorpusChunks (returns a unified, deterministically ordered corpusChunks: WorkspaceCorpusChunk[] that merges code and document chunks, sorted by relativePath/kind/chunkIndex), maxCodeChunks / maxCorpusChunks caps, and a codeChunk knob (maxTokens/overlapTokens/countTokens/files) mirroring the document chunk option. When code chunks are requested the bundle transparently builds the index at contentMode: 'full' (and reuses the persistent cache when cacheDir/store is set) while still returning a slim codeIndex by default (src/workspace/bundle.ts)
  • New exported WorkspaceCorpusChunk type (kind: 'code' | 'document', with shared chunkId/relativePath/title/content/chunkIndex fields plus kind-specific ones like symbolNames/symbolTypes/language for code and headingPath/anchor/format for documents). Re-exported from the facade and @cesarandreslopez/occ/workspace/bundle
  • New 'code-chunk' progress phase in PROGRESS_PHASES, emitted while the bundle chunks code files

Migration notes

  • Additive and backward-compatible. All new options default off; with none set the bundle output is unchanged. Requesting code or corpus chunks builds a full-content index internally, but the returned codeIndex stays slim unless slimCodeIndex: false

[0.8.11] - 2026-06-09

Added

  • Incremental code indexingCodeIndexStore.update(changedFiles?) (and ensureFresh()) are now genuinely incremental instead of full rebuilds. They diff the cache manifest against disk, re-parse only the changed files plus the files whose import/call resolution those changes can affect, then splice the new inputs into the existing graph and reassemble — golden-equivalent to a from-scratch rebuild (src/code/store.ts, src/code/incremental.ts)
  • New @cesarandreslopez/occ/code/incremental subpath exposing the incremental primitives: computeManifestDiff (manifest-vs-discovery diff), classifyChangedFiles (discovery-equivalent filtering incl. excludeDir/dotfiles/symlinks and case-only renames), findResolutionImpactedFiles, and spliceIndexInputs, plus the ManifestDiff / ManifestEntry / ChangedFileStat types (re-exported from the facade)
  • parseCodeFiles() (bounded-concurrency parse pool) and assembleCodebaseIndex() split out of buildCodebaseIndex, with DEFAULT_PARSE_CONCURRENCY and the ParseCodeFilesOptions / ParseCodeFilesResult / AssembleCodebaseIndexInput types. Exposed from the facade and @cesarandreslopez/occ/code/build
  • rankNodesCached() (src/code/rank.ts) — per-index PageRank cache keyed by a mutation fingerprint; buildRepoMap reuses it across focused-map calls so repeated maps over the same index skip recomputation. Exposed as createOcc().code.rankCached
  • chunkDocumentFromMarkdown() (+ ChunkDocumentFromMarkdownOptions) — chunk pre-converted markdown without re-parsing the source document. Exposed as createOcc().doc.chunkFromMarkdown
  • CodeQuerySession.chunk({ files }) scopes code chunking to a set of changed files
  • Workspace bundle (bundleWorkspace) gains concurrentPhases (default on, opt-out) — describe/analyze/documents/code phases run in parallel with canonical error ordering and abort-responsive progress — plus cacheDir / cacheStrategy / store options that reuse the persistent index cache instead of rebuilding in a subprocess on every call
  • mapWithConcurrency() (src/utils.ts) — ordered, abort-aware, rejection-halting concurrency utility
  • PROGRESS_PHASES / isProgressPhase and the prepareProgressToProgressEvent adapter are now exported from the facade; prepare progress events carry standard phase / currentPath fields

Changed

  • Lazy parser loadingmammoth, xlsx, officeparser, pdf-parse, and turndown now load on first use rather than at import, cutting facade import time to ~50ms; a failed dependency load surfaces an actionable diagnostic instead of a raw module error
  • buildCodebaseIndex runs a bounded-concurrency parse pool with parallel stat passes and an operation mutex on the store, so concurrent index operations are serialized safely
  • splitByTokenBudget is now O(n) with separator-aware counts, so the per-chunk budget invariant holds under BPE tokenizers
  • Bumped the pdf-parse floor to ^1.1.4 (1.1.1 crashes under ESM dynamic import)

Fixed

  • Cache manifests now record guardrail-skipped files (minified / oversized), so ensureFresh() no longer rebuilds the index forever on repositories that contain them
  • Ref-counted the console.log suppression used during PDF parsing, fixing a permanent monkey-patch leak when multiple PDFs are parsed concurrently

Migration notes

  • Additive and backward-compatible. update() / ensureFresh() keep the same signatures and simply do less work; the default tokenizer, output formats, and CLI surface are unchanged. concurrentPhases defaults on but can be disabled for strictly sequential progress. The only dependency change is the pdf-parse floor bump

0.8.10 - 2026-05-29

Added

  • Query- and path-focused repo maps for occ code map / occ code pack. New flags --query <text>, --focus-path <path> (repeatable), and --focus-depth <n> (default 1) bias the map toward task-relevant files instead of only the globally most-depended-upon ones. The global weighted PageRank (rankNodes) is unchanged; when a focus is active, buildRepoMap computes a per-file relevance score and blends it with the structural rank (combined = global * 0.35 + focus * 0.65) to re-order admission within the token budget. --query matches normalized paths, symbol names, signatures, jsdoc, and (capped) excerpt/content; --focus-path boosts exact-file and directory matches plus their import/call/inheritance graph neighbors up to --focus-depth
  • computeFocusScores() (src/code/focus.ts) — the dependency-free scorer behind focus mode (query + path + undirected graph-neighbor signals, keyed by absolute path)
  • RepoMapOptions.focus ({ query?, paths?, graphDepth? }) plus additive result metadata: RepoMapResult.focus ({ query?, paths, matchedFiles }) and per-entry focusScore / focusReasons. Exposed through createOcc().code.map(index, { focus }) and re-exported as RepoMapFocus / RepoMapFocusResult from @cesarandreslopez/occ
  • Focus metadata is surfaced in every renderer — markdown/plain headers, <repomap> focusQuery/focusPaths/focusMatched attributes and per-<file> focusScore, and JSON

Migration notes

  • Purely additive and opt-in. With no focus flags (or an empty focus selector) the map is byte-for-byte identical to before, and the global PageRank remains the default for broad-overview use. The new fields are optional, so existing programmatic callers are unaffected

0.8.9 - 2026-05-29

Added

  • occ code map and occ code pack CLI subcommands (src/code/command.ts) — token-budgeted, importance-ranked repository maps. map emits the top symbol signatures per file (Aider-style, cheap structural context); pack emits file content, optionally compressed to the architecturally-significant section. Both rank the graph (see below), then greedily admit the highest-ranked files until the token budget is hit, shrinking content or shedding low-rank symbols (partial admission) instead of dropping a file that nearly fits. Flags: --map-tokens <n> (alias --token-budget, default 4096), --mode map|pack, --map-format markdown|xml|json|plain, --compress (pack mode), --no-bias-exports, --max-symbols <n>, and --tokenizer heuristic|o200k_base|cl100k_base
  • buildRepoMap() and the RepoMapResult / RepoMapEntry / RepoMapSymbol types (src/code/map.ts) — the programmatic core behind occ code map/pack, with an injectable countTokens so a real tokenizer can enforce an exact budget. Exposed as createOcc().code.map and re-exported from @cesarandreslopez/occ
  • rankNodes() (src/code/rank.ts) — weighted PageRank over imports/calls/inherits/implements edges, with per-pair call-weight capping and an exported-symbol seed bias, rolling per-symbol scores up into file scores (RankResult / RankedNode / RankedFile). Exposed as createOcc().code.rank and re-exported from @cesarandreslopez/occ
  • Pluggable token counting (src/tokens.ts) — a Tokenizer interface with HeuristicTokenizer (zero-dependency, language-aware default) and BpeTokenizer (lazily loads gpt-tokenizer, caches encoding tables per encoding, no global state). createTokenizer(name) and resolveTokenizerName(value) resolve heuristic / o200k_base / cl100k_base. New runtime dependency gpt-tokenizer@^3.4.0
  • renderRepoMap() (src/code/output.ts) — renders a RepoMapResult as markdown, xml, json, or plain
  • test/code-map.test.ts and test/tokenizer.test.ts — cover PageRank convergence/normalization/ranking order/export bias, greedy budget enforcement and truncation, repo-map format rendering, and the heuristic/BPE tokenizers (including an exact o200k_base budget case)

Changed

  • The default-scan token budgeter (applyTokenBudget in src/cli.ts) is now async and tokenizer-pluggable, and the default scan gained a top-level --tokenizer heuristic|o200k_base|cl100k_base flag so --token-budget truncation can use an exact BPE count instead of the heuristic
  • The import-DAG checker (scripts/check-imports.mjs) now permits the Layer 3 code module to depend on the Layer 0 tokens module, keeping the architecture invariant satisfied for budget-accurate repo maps

Migration notes

  • Purely additive: existing callers see no behavior change. The new commands, flags, and facade methods (createOcc().code.map, createOcc().code.rank) are opt-in, and the default tokenizer remains the zero-dependency heuristic — pass --tokenizer o200k_base (or cl100k_base) only when you want an exact BPE-budgeted map

0.8.8 - 2026-05-08

Added

  • bundleWorkspace() (src/workspace/bundle.ts) — single-call workspace bundle that fans out across describeWorkspace, analyzeWorkspace, inspectWorkspaceDocumentSet, previewCodebaseSize, buildCodebaseIndexIsolated, and chunkDocument, returning a versioned WorkspaceBundle (schemaVersion: 1) with description, analysis, documents, codePreview, codeIndex (slim by default), documentChunks, a hierarchical outline (root → projects → top modules → documents → sections), per-symbol codeDocumentReferences (regex symbol matches in markdown content), and a unified errors array. Configurable via includeDescription/includeAnalysis/includeDocuments/includeCode/includeDocumentChunks/includeCodeDocumentReferences/maxDocumentFiles/maxReferenceFiles/maxCodeFiles/maxCodeBytes/maxDocumentChunks/maxCodeDocumentReferences/slimCodeIndex/contentMode and the existing ignore/overlay knobs. Exposed as createOcc().workspace.bundle and on the @cesarandreslopez/occ/workspace/bundle subpath
  • chunkDocument() (src/doc/chunk.ts) — heading-aware, token-budgeted document chunker. Converts DOCX/PDF/PPTX/XLSX/ODT/ODS/ODP/MD/MDX/TXT/RST/AsciiDoc to markdown, splits along the heading tree from extractFromMarkdown, and packs each section into chunks under maxTokens (default 800) with configurable overlapTokens (default 80) and an injectable countTokens. Each DocumentChunk carries chunkId, anchor slug, headingPath, startLine/endLine, tokenEstimate, and wordCount. Exposed as createOcc().doc.chunk and on the @cesarandreslopez/occ/doc/chunk subpath
  • summarizeModule() and toMermaid() (src/code/query.ts) — summarizeModule(index, modulePath, { maxClasses, maxFunctions, maxEdges }) returns a ModuleSummary (coupling + key classes + key functions ranked by call activity + import edges + exported API). toMermaid(index, kind, target, { maxNodes, maxEdges }) renders Mermaid diagrams for 'import-graph', 'class-hierarchy', and 'call-graph'. Both methods are also exposed on CodeQuerySession and re-exported from @cesarandreslopez/occ. analyzeModuleCoupling now treats '.' / '' as "the whole repository" and additionally matches by relativePath exact equality and moduleName, fixing single-file modules and root-module queries
  • tsconfig.json / jsconfig.json compilerOptions.paths resolution for TS/JS/Vue imports (src/code/languages.ts:resolveTsconfigImport) — wildcard and exact patterns, baseUrl honored, results cached per repo. Imports with ? / # query/fragment suffixes are stripped before resolution (Vite-style ?raw, ?url, ?worker). Asset specifiers (.css, .scss, .svg, .png, .wasm, ...) are now classified as external rather than unresolved, removing a long tail of false unresolved import edges from frontend repos
  • Symbol position metadata on parsed symbols and graph nodes — ParsedSymbol/CodeNode now carry optional endLine, startColumn, endColumn (TS compiler API powered for TS/JS/Vue). Useful for IDE-style navigation and slicing source ranges out of content: 'full' indexes
  • NormalizedSymbolKind union ('file' | 'module' | 'function' | 'method' | 'class' | 'interface' | 'type' | 'enum' | 'variable' | 'parameter' | 'other') and toNormalizedSymbolKind(type, containerName) helper — folds function with a container into method, type-alias into type, etc., for downstream consumers that want LSP-style symbol kinds without re-implementing the mapping. Exported from the facade
  • Builtin call-noise filter (src/code/build.ts:isBuiltinNoiseCall) — drops false-positive calls edges to standard-library globals (Array, JSON, Promise, console, setTimeout, fetch, ...) and ubiquitous member methods (map, filter, then, push, forEach, ...) when no local symbol shadows them. Calls qualified by this/self/cls/super are preserved. Significantly reduces graph noise in TS/JS repos
  • health() (src/health.ts) — lightweight liveness probe returning { available, version, capabilities }. Exposed as createOcc().health and on the @cesarandreslopez/occ/health subpath
  • OccAbortError / OCC_ABORTED / isOccAbortError (src/errors.ts) — typed abort error replacing the previous DOMException('...', 'AbortError') usage in abortIfNeeded, buildCodebaseIndexIsolated, and prepareWorkspaceContext. The new error keeps name: 'AbortError' for duck-typed compatibility, but adds error.code === 'OCC_ABORTED' and survives instanceof across forked subprocesses (where DOMException does not). Exposed on the @cesarandreslopez/occ/errors subpath
  • Document discovery now includes prose formats by default — discoverDocumentSet() / discoverDocuments() accept md, markdown, mdx, txt, rst, adoc, asciidoc alongside the seven office formats, and the new includeDataFiles: true flag also pulls in yaml, yml, json, jsonc, toml. documentToMarkdown() reads these raw text formats directly. inspectWorkspaceDocumentSet and bundleWorkspace forward the flag, so workspaces with markdown-only docs are no longer empty
  • New discoverDocumentSet() API alongside discoverDocuments() — same signature, but returns { documents, skipped } where skipped carries { path, reason, size } entries for over-size files (including the actual byte count) and EACCES/other I/O failures (previously silently dropped)
  • tryParseSlimIndex(value) — non-throwing variant of parseSlimIndex returning CodebaseIndexSlim | undefined, for IPC and persistence boundaries where validation is best-effort. Exported from the facade and @cesarandreslopez/occ/code/slim
  • CodeIndexStore.update(changedFiles?, options?) — explicit refresh hook (currently equivalent to refresh(), takes a changedFiles argument for forward compatibility). Letting consumers signal incremental change without re-deriving freshness from manifests
  • ProgressPhase gains 'bundle', 'description', 'stats', 'code-preview', 'document-chunk', and 'outline' for the new workspace bundle pipeline. ProgressEvent gains optional scope, currentPath, bytesProcessed, totalBytes, startedAt, and elapsedMs fields so progress consumers can render rich UIs without needing a sidecar event stream. inspectWorkspaceDocumentSet and the workspace pipeline now emit currentPath on every event
  • sanitizeForkExecArgv() (src/utils.ts) and integration in buildCodebaseIndexIsolated and prepareWorkspaceContext — strips --input-type from process.execArgv before forking the JS runners, so OCC works under loaders like tsx / node --import tsx that set --input-type=module on the parent (which previously crashed the child runner with Module did not self-register)
  • Programmatic subpath exports (with matching typesVersions): @cesarandreslopez/occ/doc/chunk, @cesarandreslopez/occ/workspace/bundle, @cesarandreslopez/occ/table/types, @cesarandreslopez/occ/health, @cesarandreslopez/occ/errors

Changed

  • src/errors.ts is a new Layer 0 module and src/health.ts is a new top-level module; both registered in the import-DAG checker (scripts/check-imports.mjs) so the architecture invariant continues to hold (Checked 86 files, 0 violations)
  • analyzeModuleCoupling widens "module belongs to this path" matching beyond dirPrefix — exact relativePath equality and moduleName equality are now also accepted, fixing coupling reports for top-level single-file modules
  • inspectWorkspaceDocumentSet switched off findFiles and now consumes discoverDocumentSet directly, picking up the new prose/data formats, the skipped reporting, and the per-event currentPath enrichment without behavior change for existing callers (default still 50 docs, includeMarkdown still defaults to false)

Migration notes

  • Existing callers see no behavior change: discoverDocuments() keeps its array shape, all new options are opt-in, and the import-DAG plus type-check plus 208-test suite pass. To opt into the new bundle/chunk/health/errors paths, use the named exports from the facade (createOcc().workspace.bundle, createOcc().doc.chunk, createOcc().health) or the new subpath exports
  • Code reading error instanceof DOMException to detect aborts should switch to isOccAbortError(error) or error.name === 'AbortError' / error.code === 'OCC_ABORTED'. The previous DOMException instances would have failed instanceof across subprocess boundaries anyway

0.8.7 - 2026-05-08

Added

  • Vue Single-File Component support — new vue parser type registered alongside typescript/python/go/rust/generic. parseCodeFile now extracts <script>/<script setup> blocks via @vue/compiler-sfc, parses them as TypeScript with original line offsets preserved, and surfaces the SFC as an exported component class (named from defineOptions({ name }), the name: option, or filename PascalCase). Local-import resolution learned .vue and index.vue candidates
  • previewCodebaseSize() (src/code/preview.ts) — discovers and stats files without parsing to estimate codebase size by language and report exceedsBudget against maxFiles/maxBytes thresholds. Exposed on the facade as code.previewSize and re-exported from @cesarandreslopez/occ and @cesarandreslopez/occ/code/preview
  • buildCodebaseIndexIsolated() (src/code/isolated.ts + src/code/isolated-runner.ts) — runs buildCodebaseIndex in a forked subprocess, streams progress over IPC, and returns the result via a sectioned NDJSON tmp file (avoiding structured-clone of large indexes). Forwards AbortSignal through to the child. Exposed as code.buildIndexIsolated on the facade
  • Slim index variant (src/code/slim.ts) — slimifyIndex(index) produces a CodebaseIndexSlim with contentMode: 'none' (drops content/lines/excerpt from every parsed file and rewrites capabilities[*].content to false). Pair with parseSlimIndex(value) / validateSlimIndex(value) to round-trip across boundaries. buildCodebaseIndexIsolated({ slim: true }) returns a slim index directly. Exposed under @cesarandreslopez/occ/code/slim
  • Code-index budget controls on BuildCodebaseOptions and CodeIndexStoreOptionsmaxFiles, maxBytes, and onBudgetExceeded: 'throw' | 'truncate'. 'throw' (default) raises the new CodeIndexBudgetExceededError (code OCC_CODE_INDEX_BUDGET_EXCEEDED) carrying a structured budget field; 'truncate' keeps as many files as fit and reports the result via the new optional index.truncated: IndexTruncation field (reason, keptFiles, droppedFiles, totalFiles, totalBytes). The fingerprint hash now includes these fields so cached indexes invalidate on budget changes
  • Token-based chunking on chunkCodebase() and chunkFromIndex()CodeChunkOptions now accepts maxTokens, overlapTokens, and countTokens alongside the existing word-based knobs. Default token estimator is Math.ceil(length / 4); pass countTokens for a tokenizer-accurate count
  • openChunkCodeIndexStore(options) (alias openChunkStore) — convenience factory that opens a CodeIndexStore pinned to contentMode: 'full' so chunkFromIndex() works without re-specifying the mode. Exposed as code.openChunkStore on the facade
  • fusedSearch excerpt expansion — node excerpts now include up to 21 surrounding lines (~600 chars) instead of a single 140-char line, with blank lines collapsed and a fallback to file excerpt or signature
  • workspace describe enrichments — WorkspaceDescriptionProject now reports entryPoints, scripts, buildSystem (vite/webpack/turbo/nx/tsup/rollup/esbuild/tsc), testFramework (vitest/jest/mocha/ava/tap/playwright/cypress), and platforms (electron/tauri/mobile/capacitor). signals gained hasCode/hasDocuments/hasOfficeDocuments/hasTables/hasNotebooks presence flags. WorkspaceDescription.recommendedCalls (typed by the new WorkspaceRecommendedCall schema) suggests programmatic facade calls keyed to the detected primary type — e.g., code.previewSize + code.buildIndexIsolated for coding projects
  • Strongly typed ProgressPhase union in src/progress-event.tsProgressEvent.phase is now a ProgressPhase (no longer string), enumerating every phase emitted across the build, chunk, store, workspace, and inspect pipelines. Exported from the facade
  • @cesarandreslopez/occ/code/preview, @cesarandreslopez/occ/code/isolated, and @cesarandreslopez/occ/code/slim programmatic subpath exports in package.json (with matching typesVersions paths) for downstream consumers that want narrow imports
  • test/contextful-integration.test.ts — integration tests covering previewCodebaseSize, buildCodebaseIndexIsolated (full + slim), CodeIndexBudgetExceededError, truncation behavior, slim round-trip via parseSlimIndex, openChunkCodeIndexStore, fused-search excerpt expansion, and describeWorkspace recommended-call output

Changed

  • chunkFromIndex() error message now points to the new openChunkStore(...) factory when the index was built with a non-full content mode
  • @vue/compiler-sfc@^3.5.34 is a new runtime dependency required by the Vue SFC parser

Migration notes

  • Existing callers see no behavior change: budget controls default to no limit, chunking still defaults to word-based sizing, and code indexes built without maxFiles/maxBytes keep the same fingerprint as before. To opt into the new isolation/slim/preview paths, use the named exports from the facade (createOcc().code.{previewSize,buildIndexIsolated,openChunkStore,slimifyIndex}) or the new subpath exports

0.8.6 - 2026-04-27

Added

  • occ describe [directories...] top-level CLI command and occ workspace describe [rootDir] nested command for fast directory classification — identifies whether a path is a code, office, documentation, data, or mixed workspace using manifest signals and file inventory, with confidence levels and nested project detection
  • src/workspace/describe.ts module exporting describeWorkspace() and supporting classification helpers
  • src/workspace/describe-output.ts tabular and JSON formatters for WorkspaceDescription
  • WorkspaceDescription type and classification confidence levels in src/workspace/types.ts
  • ./workspace/describe programmatic export in package.json for downstream consumers
  • test/workspace-describe.test.ts covering React/Vite classification, mixed-workspace detection, multi-category folders, and .gitignore respect

0.8.5 - 2026-04-23

Security

Fixed

  • extractOfficeText helper absorbs officeparser 6.0 → 6.1's undocumented breaking change (parseOffice() now returns a ParsedOffice object with toText() instead of a plain string). Without this shim the officeparser bump would have regressed PPTX/ODT/ODS/ODP text extraction across occ, occ doc inspect, occ slide inspect, occ table inspect, and the markdown converter
  • SECURITY.md supported-versions table refreshed — the old table still listed 0.5.x / 0.4.x while the current supported line is 0.8.x

Changed

  • Transitive majors pulled in by officeparser@6.1.0: tesseract.js 6.0.1 → 7.0.0, pdfjs-dist 5.4.530 → 5.6.205. CLI and programmatic output verified unchanged against the standard fixture suite
  • package.json description and keywords refreshed to reflect the current scope (document metrics + structure + inspection + table extraction + code exploration + workspace analysis) — the old description still described the 0.1.x "scc-style summary tables" tool

0.8.4 - 2026-04-14

Added

  • overlayIgnorePatterns option on all discovery and index APIs (findFiles, discoverCodeFiles, buildCodebaseIndex, openCodeIndexStore, discoverDocuments, inspectWorkspaceDocuments, inspectWorkspaceDocumentSet, analyzeWorkspace, prepareWorkspaceContext) — overlay patterns are evaluated independently from base gitignore/caller patterns, so negation rules in overlays cannot reopen files excluded by the base matcher
  • overlayIgnorePatterns included in buildOptionFingerprint to prevent stale cache hits when overlay patterns change

Changed

  • createIgnoreMatcher refactored to use applyIgnoreResult helper for cleaner ignore evaluation logic

0.8.3 - 2026-04-14

Added

  • --ignore-pattern <pattern> CLI flag for gitignore-style exclusion patterns across all command families (occ, occ code, occ doc references, occ workspace)
  • --content-mode <mode> flag on occ code commands to control indexed content retention (none, excerpt, full)
  • --max-file-size-bytes, --max-lines, and --no-skip-minified guardrail flags for occ code commands
  • ContentMode, CodeSkippedFile, and CodeExcerpt types in the code exploration type system
  • NDJSON-based sectioned index I/O replacing monolithic JSON serialization for code indexes, avoiding V8 string-length limits on large codebases
  • dispose() and clearCache() methods on CodeIndexStore for explicit lifecycle control
  • AbortSignal support on buildCodebaseIndex, discoverCodeFiles, and prepareWorkspaceContext (including subprocess abort forwarding)
  • Minified-file detection heuristic — skips likely minified source files during code indexing by default
  • Scoped .gitignore support — nested .gitignore files (not just root) are now respected during file discovery across all command families
  • ignorePatterns, contentMode, maxFileSizeBytes, maxLines, skipMinified, and signal options on WorkspacePrepareOptions

Changed

  • Code index graph construction uses nodeById map for O(1) external-node lookups, replacing O(n) .some()/.find() scans
  • CodebaseIndex type now includes contentMode and skippedFiles fields; CodeCommandPayload.stats includes skippedFiles count
  • Store cache layout changed to <cacheRoot>/<contentMode>/<fingerprint>/ with NDJSON files, replacing single index.json
  • Store persistCache uses atomic temp-dir-then-rename writes to prevent corruption from interrupted builds
  • prefer-cache and ensure-fresh store strategies return existing in-memory sessions when available, avoiding redundant cache loads
  • Subprocess prepare runner uses sectioned NDJSON I/O instead of manual streamIndexToFile / readFileSync

Fixed

  • buildOptionFingerprint now includes ignorePatterns, contentMode, maxFileSizeBytes, maxLines, and skipMinified, preventing stale cache hits when these options change

0.8.2 - 2026-04-08

Changed

  • Code index onProgress now reports per-file progress during index construction instead of a single completion message

0.8.1 - 2026-04-08

Fixed

  • prepareWorkspaceContext subprocess mode now streams the code index to disk element-by-element, avoiding V8's ~512MB JSON.stringify string length limit on large codebases
  • Subprocess result file is read as a Buffer before JSON.parse, preventing string-length crashes on the read side

0.8.0 - 2026-04-08

Added

  • prepareWorkspaceContext API — combines code indexing and document inspection in a single call with subprocess/inline execution modes and progress callbacks
  • New subpath exports: ./workspace/prepare and ./workspace/prepare-types
  • onProgress callback parameter on inspectWorkspaceDocumentSet for document inspection progress tracking
  • workspace.prepare method on the createOcc() facade

Changed

  • workspace module promoted to layer 4 in the dependency DAG (above code and inspect-commands at layer 3)

0.7.0 - 2026-04-03

Added

  • createOcc() programmatic facade — namespace-based entry point (occ.code, occ.doc, occ.sheet, occ.slide, occ.workspace) re-exporting all public APIs from a single import
  • openCodeIndexStore — persistent code index store with three cache strategies (prefer-cache, ensure-fresh, rebuild), manifest-based freshness checks, abort signal support, and progress callbacks
  • createCodeQuerySessionFromIndex — create a code query session from a pre-built index without re-building
  • New subpath export: ./code/store
  • Root "." export now points to the createOcc facade (src/index.ts) instead of the CLI entry point

Changed

  • main and types fields in package.json now point to src/index.ts (facade) instead of bin/occ.ts
  • src/code/store.ts reads version from package.json at runtime instead of hardcoding it, preventing cache-invalidation drift across releases

0.6.3 - 2026-03-27

Changed

  • typescript now ships as a direct runtime dependency so occ code and the programmatic code-exploration exports work after a normal install

Fixed

  • occ --version and other non-code command paths no longer fail when a packaged install is missing typescript; the JS/TS parser now lazy-loads it and reports a targeted reinstall error if the install is incomplete

0.6.2 - 2026-03-24

Added

  • occ workspace analyze command — workspace-level code, document, and structure analysis with a versioned JSON contract (schemaVersion: 1)
  • occ workspace documents command — per-document summaries with cross-reference and unresolved-mention detection
  • occ code analyze coupling <target> command — module-level coupling metrics (afferent/efferent coupling, instability, key classes)
  • createCodeQuerySession programmatic API — stateful session wrapping the codebase index with refresh(), all query methods, and chunking (./code/session export)
  • fusedSearch results now include excerpt, signature, containerName, and language fields for richer downstream consumers
  • inspectDocumentSummary exported as public API from doc/batch
  • New subpath exports: ./code/session, ./workspace/analyze, ./workspace/documents, ./workspace/types
  • typesVersions in package.json for CJS consumers using moduleResolution: "node"
  • main and types top-level fields in package.json

Changed

  • DocumentSummaryResult wrapper now carries computed markdown content for internal reuse, avoiding redundant documentToMarkdown calls during workspace document inspection

Fixed

  • analyzeModuleCoupling now uses nodeById() map for O(1) lookups instead of O(n) .find() per edge, matching all other query functions

0.6.1 - 2026-03-16

Added

  • occ code index command — builds and emits the full codebase index (files, symbols, edges, language capabilities) as JSON or a summary line

0.6.0 - 2026-03-16

Added

  • --show-confidence flag displays confidence levels (exact or estimated) for each metric in both tabular and JSON output
  • Tabular output annotates estimated metrics with a ~ suffix and a ~ estimated metric footnote when --show-confidence is enabled
  • JSON output includes a confidence object per file row (e.g. { "words": "exact", "pages": "estimated" }) when --show-confidence is enabled
  • Confidence merging in grouped mode: if any file in a group has an estimated metric, the group's confidence for that metric is estimated
  • ./types and ./stats subpath exports in package.json — consumers can now import ConfidenceLevel, ParseResult, StatsRow, and AggregateResult directly

0.5.1 - 2026-03-16

Fixed

  • XLSX header cells in markdown conversion now escape pipe (|) and newline characters, matching the existing data row escaping
  • npm test script uses test/*.test.ts instead of test/**/*.test.ts for Node 18 compatibility (shell ** glob requires bash globstar or Node 21+)
  • Remove duplicate countWords function in src/code/chunk.ts; now imports from shared src/utils.ts
  • Add types conditions to all package.json subpath exports so TypeScript consumers using moduleResolution: "NodeNext" resolve .d.ts files correctly

0.5.0 - 2026-03-15

Added

  • exports field with subpath imports for the code exploration module (./code/build, ./code/types, ./code/query, ./code/discover) and root entry point (.) — consumers can now use clean imports instead of fragile deep paths into dist/
  • TypeScript as an optional peerDependency (>=5.0.0) — consumers using the code exploration module programmatically can provide their own TypeScript installation

Changed

  • The exports field restricts importable entry points. Consumers relying on unlisted deep imports into dist/ will need to use the declared subpath exports instead

0.4.1 - 2026-03-14

Added

  • Barrel re-export resolution: occ code analyze calls/callers/chain now resolves call targets through index.ts barrel files
  • Zod runtime schema validation across all CLI options and data types (parsers, walker, stats, output, structure, and all inspect commands)

Changed

  • Enable noImplicitReturns and noFallthroughCasesInSwitch TypeScript compiler options for stricter type safety
  • Extract shared XLSX cell utilities to src/inspect/xlsx-cells.ts
  • Remove deprecated re-export shim from src/sheet/inspect.ts

0.4.0 - 2026-03-13

Note: All features in OCC are currently experimental. This project cannot be considered stable software yet. APIs, output formats, and command interfaces may change between minor versions.

Added

  • occ table inspect <file> — extract structured table content from DOCX, XLSX, PPTX, ODT, and ODP as JSON or tabular output, with auto-detected headers, sample row limits, merged cell support, and per-table token estimates
  • occ doc inspect <file> — document metadata, risk flags, content stats, heading structure, and content preview for DOCX and ODT
  • occ slide inspect <file> — presentation metadata, risk flags, per-slide inventory, and content preview for PPTX and ODP
  • occ sheet inspect <file> — XLSX workbook preflight with sheet inventory, schema preview, risk flags, and token estimates
  • TypeScript interfaces, type aliases, enums, and implements clauses are now indexed in occ code exploration
  • Directory targets in occ code analyze deps now aggregate imports across all files in the directory

Fixed

  • Bidirectional chain analysis: occ code analyze chain now searches both directions and labels reverse paths explicitly
  • Code inheritance lookups disambiguate correctly when interfaces and classes share the same name
  • Directory dependency matches are preserved when aggregating across multiple files

0.3.1 - 2026-03-10

Fixed

  • Upgrade xlsx from 0.18.5 to 0.20.3 (official SheetJS tarball), resolving npm vulnerability (#1 — thanks @B33pBeeps)
  • Configure XLSX.set_fs(fs) for ESM compatibility with SheetJS 0.20+

0.3.0 - 2026-03-10

Added

  • occ code command family for on-demand code exploration
  • First-class JavaScript, TypeScript, and Python exploration support
  • Automated fixture-based tests for code graph queries and output contracts

Changed

  • Improved call resolution for this, super, self, cls, and imported aliases
  • Ambiguous calls and blocked call chains now surface candidate locations
  • Dependency analysis now separates local, external, and unresolved imports

0.2.0 - 2026-03-09

Added

  • Document structure extraction — new --structure flag parses heading hierarchy from DOCX, PDF, PPTX, ODT, and ODP files, displaying a navigable tree with dotted section codes (1, 1.1, 1.2, 2, ...)
  • Structure tree output in tabular mode with indented headings, dotted leaders, and page ranges (when available)
  • Structure data in JSON output under a structures key (only when --structure is used)
  • Page-to-section mapping for PDFs via [Page N] markers

Changed

  • Migrated entire codebase to TypeScript — all source files under src/ and bin/ are now .ts with strict type checking
  • Added npm run build (compiles to dist/) and npm run dev (runs via tsx without build step)
  • Published package now ships compiled dist/ instead of raw src/
  • New dependency: turndown (HTML-to-markdown conversion for DOCX structure extraction)
  • New devDependencies: typescript, @types/node, tsx, @types/turndown

0.1.2 - 2026-03-07

Changed

  • Rename "Extra" column to "Details" for clarity
  • Remove redundant top/bottom table borders for cleaner output
  • Remove inter-row separators, keep only header and totals borders
  • Right-align numeric columns in document table
  • Apply consistent number coloring to all scc table columns
  • Make section header width match table width dynamically
  • Use ASCII-only dashes in section headers during --ci mode
  • Parsers return only populated metric fields instead of null-filled objects
  • Batch stat calls in walker for better throughput on large directories
  • Pass scc binary path explicitly instead of module-level state

Added

  • Summary line showing scan scope, word/page counts, and elapsed time
  • Word and page counts in summary line for at-a-glance utility
  • SHA-256 checksum verification for scc binary downloads in postinstall
  • Input validation for --large-file-limit (rejects NaN values)

Fixed

  • "No office documents found." message no longer shown when code results are present
  • Table separator width mismatch between top-mid and middle characters

0.1.1 - 2026-03-07

Changed

  • Replace ExcelJS with SheetJS (xlsx) for XLSX parsing, eliminating deprecated transitive dependencies (rimraf, fstream, inflight, lodash.isequal, glob v7)

Fixed

  • Ensure test/fixtures/ directory exists before creating test fixtures (fixes CI failure)
  • Fix workflow_dispatch trigger in docs workflow (remove invalid branches key)
  • Fix Node 22+ compatibility in release workflow (require() instead of import() with assert)
  • Update GitHub Pages deployment branch policy from master to main

0.1.0 - 2026-03-07

Added

  • CLI tool for scanning directories for office documents (DOCX, XLSX, PPTX, PDF, ODT, ODS, ODP)
  • Word count, page count, paragraph count, slide count, sheet/row/cell count extraction
  • Automatic code metrics via scc integration (vendored binary with PATH fallback)
  • Per-file (--by-file) and grouped-by-type output modes
  • JSON output (--format json) for automation
  • Extension filtering (--include-ext, --exclude-ext)
  • Directory exclusion (--exclude-dir, default: node_modules,.git)
  • .gitignore-aware file discovery (disable with --no-gitignore)
  • Sortable output (--sort: files, name, words, size)
  • File output (--output)
  • CI mode (--ci) for ASCII-only, no-color output
  • Large file skip threshold (--large-file-limit, default: 50MB)
  • Progress bar with ETA
  • Auto-download of scc binary during npm install (skip with SCC_SKIP_DOWNLOAD=1)