Skip to content

Filtering

OCC provides several mechanisms to control which files are scanned.

Extension Filtering

Include Specific Extensions

Use --include-ext to scan only specific file types:

# Only PDF and Word files
occ --include-ext pdf,docx docs/

# Only spreadsheets
occ --include-ext xlsx,ods docs/

Exclude Specific Extensions

Use --exclude-ext to skip specific file types:

# Skip Excel files
occ --exclude-ext xlsx docs/

# Skip all presentation formats
occ --exclude-ext pptx,odp docs/

Note

--include-ext and --exclude-ext can be combined. When both are specified, inclusions are applied first, then exclusions are removed from that set.

Directory Exclusion

Use --exclude-dir to skip entire directories (comma-separated):

# Default exclusions
occ docs/   # already skips node_modules and .git

# Add more exclusions
occ --exclude-dir node_modules,.git,vendor,build,dist docs/

The default excluded directories are node_modules and .git. Specifying --exclude-dir replaces the defaults, so include them if you still want them excluded.

.gitignore Integration

By default, OCC respects .gitignore rules — files matched by your .gitignore patterns are skipped during file discovery.

To disable this behavior and scan all files:

occ --no-gitignore docs/

Large File Limit

Files exceeding a size threshold are automatically skipped. The default limit is 50 MB.

# Increase to 100 MB
occ --large-file-limit 100 docs/

# Lower to 10 MB
occ --large-file-limit 10 docs/

When files are skipped, OCC reports the count at the bottom of the output:

3 file(s) skipped (use --large-file-limit to adjust)