Changelog¶
This page mirrors the CHANGELOG.md in the repository.
All notable changes to this project will be documented here.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
[0.8.12] - 2026-06-15¶
Added¶
- Workspace corpus chunks —
bundleWorkspacecan now emit RAG-ready chunks alongside the rest of the bundle. NewWorkspaceBundleOptionsfields:includeCodeChunks(returnscodeChunks: CodeChunk[]from a full-content code index),includeCorpusChunks(returns a unified, deterministically orderedcorpusChunks: WorkspaceCorpusChunk[]that merges code and document chunks, sorted byrelativePath/kind/chunkIndex),maxCodeChunks/maxCorpusChunkscaps, and acodeChunkknob (maxTokens/overlapTokens/countTokens/files) mirroring the documentchunkoption. When code chunks are requested the bundle transparently builds the index atcontentMode: 'full'(and reuses the persistent cache whencacheDir/storeis set) while still returning a slimcodeIndexby default (src/workspace/bundle.ts) - New exported
WorkspaceCorpusChunktype (kind: 'code' | 'document', with sharedchunkId/relativePath/title/content/chunkIndexfields plus kind-specific ones likesymbolNames/symbolTypes/languagefor code andheadingPath/anchor/formatfor documents). Re-exported from the facade and@cesarandreslopez/occ/workspace/bundle - New
'code-chunk'progress phase inPROGRESS_PHASES, emitted while the bundle chunks code files
Migration notes¶
- Additive and backward-compatible. All new options default off; with none set the bundle output is unchanged. Requesting code or corpus chunks builds a full-content index internally, but the returned
codeIndexstays slim unlessslimCodeIndex: false
[0.8.11] - 2026-06-09¶
Added¶
- Incremental code indexing —
CodeIndexStore.update(changedFiles?)(andensureFresh()) are now genuinely incremental instead of full rebuilds. They diff the cache manifest against disk, re-parse only the changed files plus the files whose import/call resolution those changes can affect, then splice the new inputs into the existing graph and reassemble — golden-equivalent to a from-scratch rebuild (src/code/store.ts,src/code/incremental.ts) - New
@cesarandreslopez/occ/code/incrementalsubpath exposing the incremental primitives:computeManifestDiff(manifest-vs-discovery diff),classifyChangedFiles(discovery-equivalent filtering incl.excludeDir/dotfiles/symlinks and case-only renames),findResolutionImpactedFiles, andspliceIndexInputs, plus theManifestDiff/ManifestEntry/ChangedFileStattypes (re-exported from the facade) parseCodeFiles()(bounded-concurrency parse pool) andassembleCodebaseIndex()split out ofbuildCodebaseIndex, withDEFAULT_PARSE_CONCURRENCYand theParseCodeFilesOptions/ParseCodeFilesResult/AssembleCodebaseIndexInputtypes. Exposed from the facade and@cesarandreslopez/occ/code/buildrankNodesCached()(src/code/rank.ts) — per-index PageRank cache keyed by a mutation fingerprint;buildRepoMapreuses it across focused-map calls so repeated maps over the same index skip recomputation. Exposed ascreateOcc().code.rankCachedchunkDocumentFromMarkdown()(+ChunkDocumentFromMarkdownOptions) — chunk pre-converted markdown without re-parsing the source document. Exposed ascreateOcc().doc.chunkFromMarkdownCodeQuerySession.chunk({ files })scopes code chunking to a set of changed files- Workspace bundle (
bundleWorkspace) gainsconcurrentPhases(default on, opt-out) — describe/analyze/documents/code phases run in parallel with canonical error ordering and abort-responsive progress — pluscacheDir/cacheStrategy/storeoptions that reuse the persistent index cache instead of rebuilding in a subprocess on every call mapWithConcurrency()(src/utils.ts) — ordered, abort-aware, rejection-halting concurrency utilityPROGRESS_PHASES/isProgressPhaseand theprepareProgressToProgressEventadapter are now exported from the facade; prepare progress events carry standardphase/currentPathfields
Changed¶
- Lazy parser loading —
mammoth,xlsx,officeparser,pdf-parse, andturndownnow load on first use rather than at import, cutting facade import time to ~50ms; a failed dependency load surfaces an actionable diagnostic instead of a raw module error buildCodebaseIndexruns a bounded-concurrency parse pool with parallelstatpasses and an operation mutex on the store, so concurrent index operations are serialized safelysplitByTokenBudgetis now O(n) with separator-aware counts, so the per-chunk budget invariant holds under BPE tokenizers- Bumped the
pdf-parsefloor to^1.1.4(1.1.1 crashes under ESM dynamic import)
Fixed¶
- Cache manifests now record guardrail-skipped files (minified / oversized), so
ensureFresh()no longer rebuilds the index forever on repositories that contain them - Ref-counted the
console.logsuppression used during PDF parsing, fixing a permanent monkey-patch leak when multiple PDFs are parsed concurrently
Migration notes¶
- Additive and backward-compatible.
update()/ensureFresh()keep the same signatures and simply do less work; the default tokenizer, output formats, and CLI surface are unchanged.concurrentPhasesdefaults on but can be disabled for strictly sequential progress. The only dependency change is thepdf-parsefloor bump
0.8.10 - 2026-05-29¶
Added¶
- Query- and path-focused repo maps for
occ code map/occ code pack. New flags--query <text>,--focus-path <path>(repeatable), and--focus-depth <n>(default 1) bias the map toward task-relevant files instead of only the globally most-depended-upon ones. The global weighted PageRank (rankNodes) is unchanged; when a focus is active,buildRepoMapcomputes a per-file relevance score and blends it with the structural rank (combined = global * 0.35 + focus * 0.65) to re-order admission within the token budget.--querymatches normalized paths, symbol names, signatures, jsdoc, and (capped) excerpt/content;--focus-pathboosts exact-file and directory matches plus their import/call/inheritance graph neighbors up to--focus-depth computeFocusScores()(src/code/focus.ts) — the dependency-free scorer behind focus mode (query + path + undirected graph-neighbor signals, keyed by absolute path)RepoMapOptions.focus({ query?, paths?, graphDepth? }) plus additive result metadata:RepoMapResult.focus({ query?, paths, matchedFiles }) and per-entryfocusScore/focusReasons. Exposed throughcreateOcc().code.map(index, { focus })and re-exported asRepoMapFocus/RepoMapFocusResultfrom@cesarandreslopez/occ- Focus metadata is surfaced in every renderer — markdown/plain headers,
<repomap>focusQuery/focusPaths/focusMatchedattributes and per-<file>focusScore, and JSON
Migration notes¶
- Purely additive and opt-in. With no focus flags (or an empty focus selector) the map is byte-for-byte identical to before, and the global PageRank remains the default for broad-overview use. The new fields are optional, so existing programmatic callers are unaffected
0.8.9 - 2026-05-29¶
Added¶
occ code mapandocc code packCLI subcommands (src/code/command.ts) — token-budgeted, importance-ranked repository maps.mapemits the top symbol signatures per file (Aider-style, cheap structural context);packemits file content, optionally compressed to the architecturally-significant section. Both rank the graph (see below), then greedily admit the highest-ranked files until the token budget is hit, shrinking content or shedding low-rank symbols (partial admission) instead of dropping a file that nearly fits. Flags:--map-tokens <n>(alias--token-budget, default 4096),--mode map|pack,--map-format markdown|xml|json|plain,--compress(pack mode),--no-bias-exports,--max-symbols <n>, and--tokenizer heuristic|o200k_base|cl100k_basebuildRepoMap()and theRepoMapResult/RepoMapEntry/RepoMapSymboltypes (src/code/map.ts) — the programmatic core behindocc code map/pack, with an injectablecountTokensso a real tokenizer can enforce an exact budget. Exposed ascreateOcc().code.mapand re-exported from@cesarandreslopez/occrankNodes()(src/code/rank.ts) — weighted PageRank overimports/calls/inherits/implementsedges, with per-pair call-weight capping and an exported-symbol seed bias, rolling per-symbol scores up into file scores (RankResult/RankedNode/RankedFile). Exposed ascreateOcc().code.rankand re-exported from@cesarandreslopez/occ- Pluggable token counting (
src/tokens.ts) — aTokenizerinterface withHeuristicTokenizer(zero-dependency, language-aware default) andBpeTokenizer(lazily loadsgpt-tokenizer, caches encoding tables per encoding, no global state).createTokenizer(name)andresolveTokenizerName(value)resolveheuristic/o200k_base/cl100k_base. New runtime dependencygpt-tokenizer@^3.4.0 renderRepoMap()(src/code/output.ts) — renders aRepoMapResultasmarkdown,xml,json, orplaintest/code-map.test.tsandtest/tokenizer.test.ts— cover PageRank convergence/normalization/ranking order/export bias, greedy budget enforcement and truncation, repo-map format rendering, and the heuristic/BPE tokenizers (including an exacto200k_basebudget case)
Changed¶
- The default-scan token budgeter (
applyTokenBudgetinsrc/cli.ts) is now async and tokenizer-pluggable, and the default scan gained a top-level--tokenizer heuristic|o200k_base|cl100k_baseflag so--token-budgettruncation can use an exact BPE count instead of the heuristic - The import-DAG checker (
scripts/check-imports.mjs) now permits the Layer 3codemodule to depend on the Layer 0tokensmodule, keeping the architecture invariant satisfied for budget-accurate repo maps
Migration notes¶
- Purely additive: existing callers see no behavior change. The new commands, flags, and facade methods (
createOcc().code.map,createOcc().code.rank) are opt-in, and the default tokenizer remains the zero-dependency heuristic — pass--tokenizer o200k_base(orcl100k_base) only when you want an exact BPE-budgeted map
0.8.8 - 2026-05-08¶
Added¶
bundleWorkspace()(src/workspace/bundle.ts) — single-call workspace bundle that fans out acrossdescribeWorkspace,analyzeWorkspace,inspectWorkspaceDocumentSet,previewCodebaseSize,buildCodebaseIndexIsolated, andchunkDocument, returning a versionedWorkspaceBundle(schemaVersion: 1) withdescription,analysis,documents,codePreview,codeIndex(slim by default),documentChunks, a hierarchicaloutline(root → projects → top modules → documents → sections), per-symbolcodeDocumentReferences(regex symbol matches in markdown content), and a unifiederrorsarray. Configurable viaincludeDescription/includeAnalysis/includeDocuments/includeCode/includeDocumentChunks/includeCodeDocumentReferences/maxDocumentFiles/maxReferenceFiles/maxCodeFiles/maxCodeBytes/maxDocumentChunks/maxCodeDocumentReferences/slimCodeIndex/contentModeand the existing ignore/overlay knobs. Exposed ascreateOcc().workspace.bundleand on the@cesarandreslopez/occ/workspace/bundlesubpathchunkDocument()(src/doc/chunk.ts) — heading-aware, token-budgeted document chunker. Converts DOCX/PDF/PPTX/XLSX/ODT/ODS/ODP/MD/MDX/TXT/RST/AsciiDoc to markdown, splits along the heading tree fromextractFromMarkdown, and packs each section into chunks undermaxTokens(default 800) with configurableoverlapTokens(default 80) and an injectablecountTokens. EachDocumentChunkcarrieschunkId, anchor slug,headingPath,startLine/endLine,tokenEstimate, andwordCount. Exposed ascreateOcc().doc.chunkand on the@cesarandreslopez/occ/doc/chunksubpathsummarizeModule()andtoMermaid()(src/code/query.ts) —summarizeModule(index, modulePath, { maxClasses, maxFunctions, maxEdges })returns aModuleSummary(coupling + key classes + key functions ranked by call activity + import edges + exported API).toMermaid(index, kind, target, { maxNodes, maxEdges })renders Mermaid diagrams for'import-graph','class-hierarchy', and'call-graph'. Both methods are also exposed onCodeQuerySessionand re-exported from@cesarandreslopez/occ.analyzeModuleCouplingnow treats'.'/''as "the whole repository" and additionally matches byrelativePathexact equality andmoduleName, fixing single-file modules and root-module queriestsconfig.json/jsconfig.jsoncompilerOptions.pathsresolution for TS/JS/Vue imports (src/code/languages.ts:resolveTsconfigImport) — wildcard and exact patterns,baseUrlhonored, results cached per repo. Imports with?/#query/fragment suffixes are stripped before resolution (Vite-style?raw,?url,?worker). Asset specifiers (.css,.scss,.svg,.png,.wasm, ...) are now classified asexternalrather thanunresolved, removing a long tail of falseunresolvedimport edges from frontend repos- Symbol position metadata on parsed symbols and graph nodes —
ParsedSymbol/CodeNodenow carry optionalendLine,startColumn,endColumn(TS compiler API powered for TS/JS/Vue). Useful for IDE-style navigation and slicing source ranges out ofcontent: 'full'indexes NormalizedSymbolKindunion ('file' | 'module' | 'function' | 'method' | 'class' | 'interface' | 'type' | 'enum' | 'variable' | 'parameter' | 'other') andtoNormalizedSymbolKind(type, containerName)helper — foldsfunctionwith a container intomethod,type-aliasintotype, etc., for downstream consumers that want LSP-style symbol kinds without re-implementing the mapping. Exported from the facade- Builtin call-noise filter (
src/code/build.ts:isBuiltinNoiseCall) — drops false-positivecallsedges to standard-library globals (Array,JSON,Promise,console,setTimeout,fetch, ...) and ubiquitous member methods (map,filter,then,push,forEach, ...) when no local symbol shadows them. Calls qualified bythis/self/cls/superare preserved. Significantly reduces graph noise in TS/JS repos health()(src/health.ts) — lightweight liveness probe returning{ available, version, capabilities }. Exposed ascreateOcc().healthand on the@cesarandreslopez/occ/healthsubpathOccAbortError/OCC_ABORTED/isOccAbortError(src/errors.ts) — typed abort error replacing the previousDOMException('...', 'AbortError')usage inabortIfNeeded,buildCodebaseIndexIsolated, andprepareWorkspaceContext. The new error keepsname: 'AbortError'for duck-typed compatibility, but addserror.code === 'OCC_ABORTED'and survivesinstanceofacross forked subprocesses (whereDOMExceptiondoes not). Exposed on the@cesarandreslopez/occ/errorssubpath- Document discovery now includes prose formats by default —
discoverDocumentSet()/discoverDocuments()acceptmd,markdown,mdx,txt,rst,adoc,asciidocalongside the seven office formats, and the newincludeDataFiles: trueflag also pulls inyaml,yml,json,jsonc,toml.documentToMarkdown()reads these raw text formats directly.inspectWorkspaceDocumentSetandbundleWorkspaceforward the flag, so workspaces with markdown-only docs are no longer empty - New
discoverDocumentSet()API alongsidediscoverDocuments()— same signature, but returns{ documents, skipped }whereskippedcarries{ path, reason, size }entries for over-size files (including the actual byte count) andEACCES/other I/O failures (previously silently dropped) tryParseSlimIndex(value)— non-throwing variant ofparseSlimIndexreturningCodebaseIndexSlim | undefined, for IPC and persistence boundaries where validation is best-effort. Exported from the facade and@cesarandreslopez/occ/code/slimCodeIndexStore.update(changedFiles?, options?)— explicit refresh hook (currently equivalent torefresh(), takes achangedFilesargument for forward compatibility). Letting consumers signal incremental change without re-deriving freshness from manifestsProgressPhasegains'bundle','description','stats','code-preview','document-chunk', and'outline'for the new workspace bundle pipeline.ProgressEventgains optionalscope,currentPath,bytesProcessed,totalBytes,startedAt, andelapsedMsfields so progress consumers can render rich UIs without needing a sidecar event stream.inspectWorkspaceDocumentSetand the workspace pipeline now emitcurrentPathon every eventsanitizeForkExecArgv()(src/utils.ts) and integration inbuildCodebaseIndexIsolatedandprepareWorkspaceContext— strips--input-typefromprocess.execArgvbefore forking the JS runners, so OCC works under loaders liketsx/node --import tsxthat set--input-type=moduleon the parent (which previously crashed the child runner withModule did not self-register)- Programmatic subpath exports (with matching
typesVersions):@cesarandreslopez/occ/doc/chunk,@cesarandreslopez/occ/workspace/bundle,@cesarandreslopez/occ/table/types,@cesarandreslopez/occ/health,@cesarandreslopez/occ/errors
Changed¶
src/errors.tsis a new Layer 0 module andsrc/health.tsis a new top-level module; both registered in the import-DAG checker (scripts/check-imports.mjs) so the architecture invariant continues to hold (Checked 86 files, 0 violations)analyzeModuleCouplingwidens "module belongs to this path" matching beyonddirPrefix— exactrelativePathequality andmoduleNameequality are now also accepted, fixing coupling reports for top-level single-file modulesinspectWorkspaceDocumentSetswitched offfindFilesand now consumesdiscoverDocumentSetdirectly, picking up the new prose/data formats, theskippedreporting, and the per-eventcurrentPathenrichment without behavior change for existing callers (default still 50 docs, includeMarkdown still defaults tofalse)
Migration notes¶
- Existing callers see no behavior change:
discoverDocuments()keeps its array shape, all new options are opt-in, and the import-DAG plus type-check plus 208-test suite pass. To opt into the new bundle/chunk/health/errors paths, use the named exports from the facade (createOcc().workspace.bundle,createOcc().doc.chunk,createOcc().health) or the new subpath exports - Code reading
error instanceof DOMExceptionto detect aborts should switch toisOccAbortError(error)orerror.name === 'AbortError'/error.code === 'OCC_ABORTED'. The previousDOMExceptioninstances would have failedinstanceofacross subprocess boundaries anyway
0.8.7 - 2026-05-08¶
Added¶
- Vue Single-File Component support — new
vueparser type registered alongsidetypescript/python/go/rust/generic.parseCodeFilenow extracts<script>/<script setup>blocks via@vue/compiler-sfc, parses them as TypeScript with original line offsets preserved, and surfaces the SFC as an exported component class (named fromdefineOptions({ name }), thename:option, or filename PascalCase). Local-import resolution learned.vueandindex.vuecandidates previewCodebaseSize()(src/code/preview.ts) — discovers and stats files without parsing to estimate codebase size by language and reportexceedsBudgetagainstmaxFiles/maxBytesthresholds. Exposed on the facade ascode.previewSizeand re-exported from@cesarandreslopez/occand@cesarandreslopez/occ/code/previewbuildCodebaseIndexIsolated()(src/code/isolated.ts+src/code/isolated-runner.ts) — runsbuildCodebaseIndexin a forked subprocess, streams progress over IPC, and returns the result via a sectioned NDJSON tmp file (avoiding structured-clone of large indexes). ForwardsAbortSignalthrough to the child. Exposed ascode.buildIndexIsolatedon the facade- Slim index variant (
src/code/slim.ts) —slimifyIndex(index)produces aCodebaseIndexSlimwithcontentMode: 'none'(dropscontent/lines/excerptfrom every parsed file and rewritescapabilities[*].contenttofalse). Pair withparseSlimIndex(value)/validateSlimIndex(value)to round-trip across boundaries.buildCodebaseIndexIsolated({ slim: true })returns a slim index directly. Exposed under@cesarandreslopez/occ/code/slim - Code-index budget controls on
BuildCodebaseOptionsandCodeIndexStoreOptions—maxFiles,maxBytes, andonBudgetExceeded: 'throw' | 'truncate'.'throw'(default) raises the newCodeIndexBudgetExceededError(codeOCC_CODE_INDEX_BUDGET_EXCEEDED) carrying a structuredbudgetfield;'truncate'keeps as many files as fit and reports the result via the new optionalindex.truncated: IndexTruncationfield (reason,keptFiles,droppedFiles,totalFiles,totalBytes). The fingerprint hash now includes these fields so cached indexes invalidate on budget changes - Token-based chunking on
chunkCodebase()andchunkFromIndex()—CodeChunkOptionsnow acceptsmaxTokens,overlapTokens, andcountTokensalongside the existing word-based knobs. Default token estimator isMath.ceil(length / 4); passcountTokensfor a tokenizer-accurate count openChunkCodeIndexStore(options)(aliasopenChunkStore) — convenience factory that opens aCodeIndexStorepinned tocontentMode: 'full'sochunkFromIndex()works without re-specifying the mode. Exposed ascode.openChunkStoreon the facadefusedSearchexcerpt expansion — node excerpts now include up to 21 surrounding lines (~600 chars) instead of a single 140-char line, with blank lines collapsed and a fallback to file excerpt or signatureworkspace describeenrichments —WorkspaceDescriptionProjectnow reportsentryPoints,scripts,buildSystem(vite/webpack/turbo/nx/tsup/rollup/esbuild/tsc),testFramework(vitest/jest/mocha/ava/tap/playwright/cypress), andplatforms(electron/tauri/mobile/capacitor).signalsgainedhasCode/hasDocuments/hasOfficeDocuments/hasTables/hasNotebookspresence flags.WorkspaceDescription.recommendedCalls(typed by the newWorkspaceRecommendedCallschema) suggests programmatic facade calls keyed to the detected primary type — e.g.,code.previewSize+code.buildIndexIsolatedfor coding projects- Strongly typed
ProgressPhaseunion insrc/progress-event.ts—ProgressEvent.phaseis now aProgressPhase(no longerstring), enumerating every phase emitted across the build, chunk, store, workspace, and inspect pipelines. Exported from the facade @cesarandreslopez/occ/code/preview,@cesarandreslopez/occ/code/isolated, and@cesarandreslopez/occ/code/slimprogrammatic subpath exports inpackage.json(with matchingtypesVersionspaths) for downstream consumers that want narrow importstest/contextful-integration.test.ts— integration tests coveringpreviewCodebaseSize,buildCodebaseIndexIsolated(full + slim),CodeIndexBudgetExceededError, truncation behavior, slim round-trip viaparseSlimIndex,openChunkCodeIndexStore, fused-search excerpt expansion, anddescribeWorkspacerecommended-call output
Changed¶
chunkFromIndex()error message now points to the newopenChunkStore(...)factory when the index was built with a non-fullcontent mode@vue/compiler-sfc@^3.5.34is a new runtime dependency required by the Vue SFC parser
Migration notes¶
- Existing callers see no behavior change: budget controls default to no limit, chunking still defaults to word-based sizing, and code indexes built without
maxFiles/maxByteskeep the same fingerprint as before. To opt into the new isolation/slim/preview paths, use the named exports from the facade (createOcc().code.{previewSize,buildIndexIsolated,openChunkStore,slimifyIndex}) or the new subpath exports
0.8.6 - 2026-04-27¶
Added¶
occ describe [directories...]top-level CLI command andocc workspace describe [rootDir]nested command for fast directory classification — identifies whether a path is a code, office, documentation, data, or mixed workspace using manifest signals and file inventory, with confidence levels and nested project detectionsrc/workspace/describe.tsmodule exportingdescribeWorkspace()and supporting classification helperssrc/workspace/describe-output.tstabular and JSON formatters forWorkspaceDescriptionWorkspaceDescriptiontype and classification confidence levels insrc/workspace/types.ts./workspace/describeprogrammatic export inpackage.jsonfor downstream consumerstest/workspace-describe.test.tscovering React/Vite classification, mixed-workspace detection, multi-category folders, and.gitignorerespect
0.8.5 - 2026-04-23¶
Security¶
- Refresh transitive dependencies to clear all
npm auditfindings (5 → 0). Downstream consumers can drop anyoverridesworkarounds after bumping to0.8.5. @xmldom/xmldomresolved to 0.8.13 (mammoth path) and 0.9.10 (officeparser path), clearing five advisories: CDATA serialization injection (GHSA-wh4c-j3r5-mjhp), uncontrolled-recursion DoS (GHSA-2v35-w6hq-6mfw),DocumentTypeinjection (GHSA-f6ww-3ggp-fr8h), processing-instruction injection (GHSA-x6wf-f3px-wcqx), comment-node injection (GHSA-j759-j44w-7fr8)picomatch2.3.1 → 2.3.2, clearing POSIX method-injection (GHSA-3v7f-55p6-f55p) and extglob ReDoS (GHSA-c2c7-rcm5-vvqj)file-type16.5.4 → 22.0.1 viaofficeparser, clearing ASF infinite-loop on malformed input (GHSA-5v7r-6r5c-r473)yauzl3.2.0 removed from the tree, clearing the off-by-one advisory (GHSA-gmq8-994r-jv83)
Fixed¶
extractOfficeTexthelper absorbs officeparser 6.0 → 6.1's undocumented breaking change (parseOffice()now returns aParsedOfficeobject withtoText()instead of a plain string). Without this shim the officeparser bump would have regressed PPTX/ODT/ODS/ODP text extraction acrossocc,occ doc inspect,occ slide inspect,occ table inspect, and the markdown converterSECURITY.mdsupported-versions table refreshed — the old table still listed 0.5.x / 0.4.x while the current supported line is 0.8.x
Changed¶
- Transitive majors pulled in by
officeparser@6.1.0:tesseract.js6.0.1 → 7.0.0,pdfjs-dist5.4.530 → 5.6.205. CLI and programmatic output verified unchanged against the standard fixture suite package.jsondescriptionandkeywordsrefreshed to reflect the current scope (document metrics + structure + inspection + table extraction + code exploration + workspace analysis) — the old description still described the 0.1.x "scc-style summary tables" tool
0.8.4 - 2026-04-14¶
Added¶
overlayIgnorePatternsoption on all discovery and index APIs (findFiles,discoverCodeFiles,buildCodebaseIndex,openCodeIndexStore,discoverDocuments,inspectWorkspaceDocuments,inspectWorkspaceDocumentSet,analyzeWorkspace,prepareWorkspaceContext) — overlay patterns are evaluated independently from base gitignore/caller patterns, so negation rules in overlays cannot reopen files excluded by the base matcheroverlayIgnorePatternsincluded inbuildOptionFingerprintto prevent stale cache hits when overlay patterns change
Changed¶
createIgnoreMatcherrefactored to useapplyIgnoreResulthelper for cleaner ignore evaluation logic
0.8.3 - 2026-04-14¶
Added¶
--ignore-pattern <pattern>CLI flag for gitignore-style exclusion patterns across all command families (occ,occ code,occ doc references,occ workspace)--content-mode <mode>flag onocc codecommands to control indexed content retention (none,excerpt,full)--max-file-size-bytes,--max-lines, and--no-skip-minifiedguardrail flags forocc codecommandsContentMode,CodeSkippedFile, andCodeExcerpttypes in the code exploration type system- NDJSON-based sectioned index I/O replacing monolithic JSON serialization for code indexes, avoiding V8 string-length limits on large codebases
dispose()andclearCache()methods onCodeIndexStorefor explicit lifecycle controlAbortSignalsupport onbuildCodebaseIndex,discoverCodeFiles, andprepareWorkspaceContext(including subprocess abort forwarding)- Minified-file detection heuristic — skips likely minified source files during code indexing by default
- Scoped
.gitignoresupport — nested.gitignorefiles (not just root) are now respected during file discovery across all command families ignorePatterns,contentMode,maxFileSizeBytes,maxLines,skipMinified, andsignaloptions onWorkspacePrepareOptions
Changed¶
- Code index graph construction uses
nodeByIdmap for O(1) external-node lookups, replacing O(n).some()/.find()scans CodebaseIndextype now includescontentModeandskippedFilesfields;CodeCommandPayload.statsincludesskippedFilescount- Store cache layout changed to
<cacheRoot>/<contentMode>/<fingerprint>/with NDJSON files, replacing singleindex.json - Store
persistCacheuses atomic temp-dir-then-rename writes to prevent corruption from interrupted builds prefer-cacheandensure-freshstore strategies return existing in-memory sessions when available, avoiding redundant cache loads- Subprocess prepare runner uses sectioned NDJSON I/O instead of manual
streamIndexToFile/readFileSync
Fixed¶
buildOptionFingerprintnow includesignorePatterns,contentMode,maxFileSizeBytes,maxLines, andskipMinified, preventing stale cache hits when these options change
0.8.2 - 2026-04-08¶
Changed¶
- Code index
onProgressnow reports per-file progress during index construction instead of a single completion message
0.8.1 - 2026-04-08¶
Fixed¶
prepareWorkspaceContextsubprocess mode now streams the code index to disk element-by-element, avoiding V8's ~512MBJSON.stringifystring length limit on large codebases- Subprocess result file is read as a Buffer before
JSON.parse, preventing string-length crashes on the read side
0.8.0 - 2026-04-08¶
Added¶
prepareWorkspaceContextAPI — combines code indexing and document inspection in a single call with subprocess/inline execution modes and progress callbacks- New subpath exports:
./workspace/prepareand./workspace/prepare-types onProgresscallback parameter oninspectWorkspaceDocumentSetfor document inspection progress trackingworkspace.preparemethod on thecreateOcc()facade
Changed¶
workspacemodule promoted to layer 4 in the dependency DAG (abovecodeandinspect-commandsat layer 3)
0.7.0 - 2026-04-03¶
Added¶
createOcc()programmatic facade — namespace-based entry point (occ.code,occ.doc,occ.sheet,occ.slide,occ.workspace) re-exporting all public APIs from a single importopenCodeIndexStore— persistent code index store with three cache strategies (prefer-cache,ensure-fresh,rebuild), manifest-based freshness checks, abort signal support, and progress callbackscreateCodeQuerySessionFromIndex— create a code query session from a pre-built index without re-building- New subpath export:
./code/store - Root
"."export now points to thecreateOccfacade (src/index.ts) instead of the CLI entry point
Changed¶
mainandtypesfields inpackage.jsonnow point tosrc/index.ts(facade) instead ofbin/occ.tssrc/code/store.tsreads version frompackage.jsonat runtime instead of hardcoding it, preventing cache-invalidation drift across releases
0.6.3 - 2026-03-27¶
Changed¶
typescriptnow ships as a direct runtime dependency soocc codeand the programmatic code-exploration exports work after a normal install
Fixed¶
occ --versionand other non-code command paths no longer fail when a packaged install is missingtypescript; the JS/TS parser now lazy-loads it and reports a targeted reinstall error if the install is incomplete
0.6.2 - 2026-03-24¶
Added¶
occ workspace analyzecommand — workspace-level code, document, and structure analysis with a versioned JSON contract (schemaVersion: 1)occ workspace documentscommand — per-document summaries with cross-reference and unresolved-mention detectionocc code analyze coupling <target>command — module-level coupling metrics (afferent/efferent coupling, instability, key classes)createCodeQuerySessionprogrammatic API — stateful session wrapping the codebase index withrefresh(), all query methods, and chunking (./code/sessionexport)fusedSearchresults now includeexcerpt,signature,containerName, andlanguagefields for richer downstream consumersinspectDocumentSummaryexported as public API fromdoc/batch- New subpath exports:
./code/session,./workspace/analyze,./workspace/documents,./workspace/types typesVersionsinpackage.jsonfor CJS consumers usingmoduleResolution: "node"mainandtypestop-level fields inpackage.json
Changed¶
DocumentSummaryResultwrapper now carries computed markdown content for internal reuse, avoiding redundantdocumentToMarkdowncalls during workspace document inspection
Fixed¶
analyzeModuleCouplingnow usesnodeById()map for O(1) lookups instead of O(n).find()per edge, matching all other query functions
0.6.1 - 2026-03-16¶
Added¶
occ code indexcommand — builds and emits the full codebase index (files, symbols, edges, language capabilities) as JSON or a summary line
0.6.0 - 2026-03-16¶
Added¶
--show-confidenceflag displays confidence levels (exactorestimated) for each metric in both tabular and JSON output- Tabular output annotates estimated metrics with a
~suffix and a~ estimated metricfootnote when--show-confidenceis enabled - JSON output includes a
confidenceobject per file row (e.g.{ "words": "exact", "pages": "estimated" }) when--show-confidenceis enabled - Confidence merging in grouped mode: if any file in a group has an estimated metric, the group's confidence for that metric is
estimated ./typesand./statssubpath exports inpackage.json— consumers can now importConfidenceLevel,ParseResult,StatsRow, andAggregateResultdirectly
0.5.1 - 2026-03-16¶
Fixed¶
- XLSX header cells in markdown conversion now escape pipe (
|) and newline characters, matching the existing data row escaping npm testscript usestest/*.test.tsinstead oftest/**/*.test.tsfor Node 18 compatibility (shell**glob requires bash globstar or Node 21+)- Remove duplicate
countWordsfunction insrc/code/chunk.ts; now imports from sharedsrc/utils.ts - Add
typesconditions to allpackage.jsonsubpath exports so TypeScript consumers usingmoduleResolution: "NodeNext"resolve.d.tsfiles correctly
0.5.0 - 2026-03-15¶
Added¶
exportsfield with subpath imports for the code exploration module (./code/build,./code/types,./code/query,./code/discover) and root entry point (.) — consumers can now use clean imports instead of fragile deep paths intodist/- TypeScript as an optional
peerDependency(>=5.0.0) — consumers using the code exploration module programmatically can provide their own TypeScript installation
Changed¶
- The
exportsfield restricts importable entry points. Consumers relying on unlisted deep imports intodist/will need to use the declared subpath exports instead
0.4.1 - 2026-03-14¶
Added¶
- Barrel re-export resolution:
occ code analyze calls/callers/chainnow resolves call targets throughindex.tsbarrel files - Zod runtime schema validation across all CLI options and data types (parsers, walker, stats, output, structure, and all inspect commands)
Changed¶
- Enable
noImplicitReturnsandnoFallthroughCasesInSwitchTypeScript compiler options for stricter type safety - Extract shared XLSX cell utilities to
src/inspect/xlsx-cells.ts - Remove deprecated re-export shim from
src/sheet/inspect.ts
0.4.0 - 2026-03-13¶
Note: All features in OCC are currently experimental. This project cannot be considered stable software yet. APIs, output formats, and command interfaces may change between minor versions.
Added¶
occ table inspect <file>— extract structured table content from DOCX, XLSX, PPTX, ODT, and ODP as JSON or tabular output, with auto-detected headers, sample row limits, merged cell support, and per-table token estimatesocc doc inspect <file>— document metadata, risk flags, content stats, heading structure, and content preview for DOCX and ODTocc slide inspect <file>— presentation metadata, risk flags, per-slide inventory, and content preview for PPTX and ODPocc sheet inspect <file>— XLSX workbook preflight with sheet inventory, schema preview, risk flags, and token estimates- TypeScript interfaces, type aliases, enums, and
implementsclauses are now indexed inocc codeexploration - Directory targets in
occ code analyze depsnow aggregate imports across all files in the directory
Fixed¶
- Bidirectional chain analysis:
occ code analyze chainnow searches both directions and labels reverse paths explicitly - Code inheritance lookups disambiguate correctly when interfaces and classes share the same name
- Directory dependency matches are preserved when aggregating across multiple files
0.3.1 - 2026-03-10¶
Fixed¶
- Upgrade xlsx from 0.18.5 to 0.20.3 (official SheetJS tarball), resolving npm vulnerability (#1 — thanks @B33pBeeps)
- Configure
XLSX.set_fs(fs)for ESM compatibility with SheetJS 0.20+
0.3.0 - 2026-03-10¶
Added¶
occ codecommand family for on-demand code exploration- First-class JavaScript, TypeScript, and Python exploration support
- Automated fixture-based tests for code graph queries and output contracts
Changed¶
- Improved call resolution for
this,super,self,cls, and imported aliases - Ambiguous calls and blocked call chains now surface candidate locations
- Dependency analysis now separates local, external, and unresolved imports
0.2.0 - 2026-03-09¶
Added¶
- Document structure extraction — new
--structureflag parses heading hierarchy from DOCX, PDF, PPTX, ODT, and ODP files, displaying a navigable tree with dotted section codes (1, 1.1, 1.2, 2, ...) - Structure tree output in tabular mode with indented headings, dotted leaders, and page ranges (when available)
- Structure data in JSON output under a
structureskey (only when--structureis used) - Page-to-section mapping for PDFs via
[Page N]markers
Changed¶
- Migrated entire codebase to TypeScript — all source files under
src/andbin/are now.tswith strict type checking - Added
npm run build(compiles todist/) andnpm run dev(runs via tsx without build step) - Published package now ships compiled
dist/instead of rawsrc/ - New dependency:
turndown(HTML-to-markdown conversion for DOCX structure extraction) - New devDependencies:
typescript,@types/node,tsx,@types/turndown
0.1.2 - 2026-03-07¶
Changed¶
- Rename "Extra" column to "Details" for clarity
- Remove redundant top/bottom table borders for cleaner output
- Remove inter-row separators, keep only header and totals borders
- Right-align numeric columns in document table
- Apply consistent number coloring to all scc table columns
- Make section header width match table width dynamically
- Use ASCII-only dashes in section headers during
--cimode - Parsers return only populated metric fields instead of null-filled objects
- Batch stat calls in walker for better throughput on large directories
- Pass scc binary path explicitly instead of module-level state
Added¶
- Summary line showing scan scope, word/page counts, and elapsed time
- Word and page counts in summary line for at-a-glance utility
- SHA-256 checksum verification for scc binary downloads in postinstall
- Input validation for
--large-file-limit(rejects NaN values)
Fixed¶
- "No office documents found." message no longer shown when code results are present
- Table separator width mismatch between top-mid and middle characters
0.1.1 - 2026-03-07¶
Changed¶
- Replace ExcelJS with SheetJS (xlsx) for XLSX parsing, eliminating deprecated transitive dependencies (rimraf, fstream, inflight, lodash.isequal, glob v7)
Fixed¶
- Ensure
test/fixtures/directory exists before creating test fixtures (fixes CI failure) - Fix
workflow_dispatchtrigger in docs workflow (remove invalidbrancheskey) - Fix Node 22+ compatibility in release workflow (
require()instead ofimport()withassert) - Update GitHub Pages deployment branch policy from
mastertomain
0.1.0 - 2026-03-07¶
Added¶
- CLI tool for scanning directories for office documents (DOCX, XLSX, PPTX, PDF, ODT, ODS, ODP)
- Word count, page count, paragraph count, slide count, sheet/row/cell count extraction
- Automatic code metrics via scc integration (vendored binary with PATH fallback)
- Per-file (
--by-file) and grouped-by-type output modes - JSON output (
--format json) for automation - Extension filtering (
--include-ext,--exclude-ext) - Directory exclusion (
--exclude-dir, default: node_modules,.git) - .gitignore-aware file discovery (disable with
--no-gitignore) - Sortable output (
--sort: files, name, words, size) - File output (
--output) - CI mode (
--ci) for ASCII-only, no-color output - Large file skip threshold (
--large-file-limit, default: 50MB) - Progress bar with ETA
- Auto-download of scc binary during
npm install(skip withSCC_SKIP_DOWNLOAD=1)