Poppler Tools
oschwartz10612/poppler-windows: Download Poppler binaries packaged for Windows with dependencies
# basic metadata
pdfinfo.exe suspicious.pdf
# fast malicious indicators (/JavaScript /OpenAction /Launch /EmbeddedFile /URI)
python pdfid.py suspicious.pdf
# search suspicious objects
python pdf-parser.py suspicious.pdf --search javascript
# dump specific object
python pdf-parser.py suspicious.pdf --object 12
# list embedded files
pdfdetach.exe -list suspicious.pdf
# extract all embedded files
pdfdetach.exe -saveall suspicious.pdf
# extract specific embedded file
pdfdetach.exe -save 1 suspicious.pdf
# quick strings for URLs / cmd / powershell
strings suspicious.pdf
# hash extracted payload
Get-FileHash extracted.bin -Algorithm SHA256
# inspect magic bytes
Format-Hex extracted.bin -Count 16
# expected headers:
# PDF = 25 50 44 46
# ZIP = 50 4B
# EXE = 4D 5A
# search for obfuscated JS
python pdf-parser.py suspicious.pdf --search eval
python pdf-parser.py suspicious.pdf --search unescape
python pdf-parser.py suspicious.pdf --search fromCharCode
# dump all object references
python pdf-parser.py suspicious.pdf --stats
# decompress streams
python pdf-parser.py suspicious.pdf --filter --object 12
# extract raw stream
python pdf-parser.py suspicious.pdf --raw --object 12
# safe workflow:
# pdfinfo -> pdfid -> pdf-parser -> pdfdetach -> hash -> inspect extracted payload
| Finding | Severity |
|---|---|
| Embedded JavaScript present | High |
| /OpenAction present | High |
| /Launch action present | Critical |
| Embedded executable | Critical |
| Obfuscated JavaScript | Very High |
| External URI auto-trigger | High |
Suspicious Patterns:
- 100--800 KB weaponized PDF
- Embedded JavaScript
/OpenAction/AA(additional actions)/Launch/URI/EmbeddedFile/XFA- Recently created & modified timestamps
- Very small document with high object count
Identify
file sample.pdf
Confirm:
- Is it actually PDF?
- Or renamed EXE?
- Or polyglot file?
Enumerate
pdfinfo sample.pdf
exiftool sample.pdf
Look for:
- Author
- Creator
- Producer
- CreationDate vs ModDate delta
- Suspicious Producer strings
- Blank or random metadata
Extract Safely (No Execution)
pdfdetach -list sample.pdf
If embedded files exist:
pdfdetach -saveall sample.pdf
If you want full object extraction:
pdf-parser.py sample.pdf
Hunt
Common malicious PDF patterns:
/JavaScript/JS/OpenAction/Launch/URI/EmbeddedFile/AAeval(unescape(- Hex-encoded blobs
- Base64 payloads
# Structural hunt
strings sample.pdf | grep -E "JavaScript|OpenAction|Launch|URI|EmbeddedFile|AA"
# JavaScript hunt
strings sample.pdf | grep -i javascript
strings sample.pdf | grep -i eval
strings sample.pdf | grep -i unescape
# URL hunt
strings sample.pdf | grep -i http
Metadata
exiftool sample.pdf
Check:
- Author
- Creator
- Producer
- Creation vs modification time
delta
- Embedded file references
Red flags:
- Random author
- Blank metadata
- Created and modified within seconds
- Suspicious producer (non-Adobe generator)
- Timezone anomalies
Embedded Files
pdfdetach -list sample.pdf
If files extracted:
sha256sum extracted_file
file extracted_file
strings extracted_file | less
binwalk extracted_file
Look for: - Embedded EXE - Embedded HTA - Embedded JS - Embedded Office document - Packed PE indicators
JavaScript Execution Indicators
Search for:
strings sample.pdf | grep -i openaction
strings sample.pdf | grep -i launch
strings sample.pdf | grep -i uri
Look for: - /OpenAction - /Launch - app.launchURL -
this.exportDataObject - Automatic triggers on open