Filtering¶
OCC provides several mechanisms to control which files are scanned.
Extension Filtering¶
Include Specific Extensions¶
Use --include-ext to scan only specific file types:
# Only PDF and Word files
occ --include-ext pdf,docx docs/
# Only spreadsheets
occ --include-ext xlsx,ods docs/
Exclude Specific Extensions¶
Use --exclude-ext to skip specific file types:
# Skip Excel files
occ --exclude-ext xlsx docs/
# Skip all presentation formats
occ --exclude-ext pptx,odp docs/
Note
--include-ext and --exclude-ext can be combined. When both are specified, inclusions are applied first, then exclusions are removed from that set.
Directory Exclusion¶
Use --exclude-dir to skip entire directories (comma-separated):
# Default exclusions
occ docs/ # already skips node_modules and .git
# Add more exclusions
occ --exclude-dir node_modules,.git,vendor,build,dist docs/
The default excluded directories are node_modules and .git. Specifying --exclude-dir replaces the defaults, so include them if you still want them excluded.
.gitignore Integration¶
By default, OCC respects .gitignore rules — files matched by your .gitignore patterns are skipped during file discovery.
To disable this behavior and scan all files:
Large File Limit¶
Files exceeding a size threshold are automatically skipped. The default limit is 50 MB.
# Increase to 100 MB
occ --large-file-limit 100 docs/
# Lower to 10 MB
occ --large-file-limit 10 docs/
When files are skipped, OCC reports the count at the bottom of the output: