Filtering¶
OCC has two filtering models:
- the default
occ [directories...]scan flow, which filters document discovery - the
occ codeflow, which filters repository code discovery
Some flags overlap, but they are not identical.
Extension Filtering¶
Include Specific Extensions¶
Use --include-ext to scan only specific file types:
# Only PDF and Word files
occ --include-ext pdf,docx docs/
# Only spreadsheets
occ --include-ext xlsx,ods docs/
Exclude Specific Extensions¶
Use --exclude-ext to skip specific file types:
# Skip Excel files
occ --exclude-ext xlsx docs/
# Skip all presentation formats
occ --exclude-ext pptx,odp docs/
Note
--include-ext and --exclude-ext can be combined. When both are specified, inclusions are applied first, then exclusions are removed from that set.
Directory Exclusion¶
Use --exclude-dir to skip entire directories (comma-separated):
# Default exclusions
occ docs/ # already skips node_modules and .git
# Add more exclusions
occ --exclude-dir node_modules,.git,vendor,build,dist docs/
The default excluded directories are node_modules and .git. Specifying --exclude-dir replaces the defaults, so include them if you still want them excluded.
.gitignore Integration¶
By default, OCC respects .gitignore rules — files matched by your .gitignore patterns are skipped during file discovery.
To disable this behavior and scan all files:
Ignore Patterns¶
Use --ignore-pattern to apply additional gitignore-style exclusion patterns. This flag is repeatable:
# Skip all PDFs and anything under drafts/
occ --ignore-pattern "*.pdf" --ignore-pattern "drafts/" docs/
Patterns follow gitignore syntax and are applied after .gitignore rules.
Programmatic Overlay Patterns¶
The programmatic API supports overlayIgnorePatterns — patterns evaluated independently from the base gitignore and caller patterns. Negation rules in overlays cannot reopen files excluded by the base matcher:
import { discoverCodeFiles } from '@cesarandreslopez/occ/code/discover';
const files = await discoverCodeFiles('./my-repo', {
ignorePatterns: ['*.test.ts'],
overlayIgnorePatterns: ['src/generated/**'],
});
Large File Limit¶
Files exceeding a size threshold are automatically skipped. The default limit is 50 MB.
# Increase to 100 MB
occ --large-file-limit 100 docs/
# Lower to 10 MB
occ --large-file-limit 10 docs/
When files are skipped, OCC reports the count at the bottom of the output:
occ code Filtering¶
occ code does not use --include-ext, --exclude-ext, or --large-file-limit. Instead, it discovers supported code files under the repo root selected by --path.
The code exploration commands do support:
--exclude-dirto skip directories likedist,coverage, or generated code--no-gitignoreto disable.gitignorefiltering
Example:
This distinction matters when you are comparing the two command families: the default scan is document-format oriented, while occ code is repository-root oriented.