Operator On The Wire
Join
← Back to Knowledge Base
BLUE TEAM / MALWARE REVERSE / ANALYSIS / STATIC / FILES

PDF

Poppler Tools

oschwartz10612/poppler-windows: Download Poppler binaries packaged for Windows with dependencies

# basic metadata  
pdfinfo.exe suspicious.pdf  
  
# fast malicious indicators (/JavaScript /OpenAction /Launch /EmbeddedFile /URI)  
python pdfid.py suspicious.pdf  
  
# search suspicious objects  
python pdf-parser.py suspicious.pdf --search javascript  
  
# dump specific object  
python pdf-parser.py suspicious.pdf --object 12  
  
# list embedded files  
pdfdetach.exe -list suspicious.pdf  
  
# extract all embedded files  
pdfdetach.exe -saveall suspicious.pdf  
  
# extract specific embedded file  
pdfdetach.exe -save 1 suspicious.pdf  
  
# quick strings for URLs / cmd / powershell  
strings suspicious.pdf  
  
# hash extracted payload  
Get-FileHash extracted.bin -Algorithm SHA256  
  
# inspect magic bytes  
Format-Hex extracted.bin -Count 16  
  
# expected headers:  
# PDF = 25 50 44 46  
# ZIP = 50 4B  
# EXE = 4D 5A  
  
# search for obfuscated JS  
python pdf-parser.py suspicious.pdf --search eval  
python pdf-parser.py suspicious.pdf --search unescape  
python pdf-parser.py suspicious.pdf --search fromCharCode  
  
# dump all object references  
python pdf-parser.py suspicious.pdf --stats  
  
# decompress streams  
python pdf-parser.py suspicious.pdf --filter --object 12  
  
# extract raw stream  
python pdf-parser.py suspicious.pdf --raw --object 12  
  
# safe workflow:  
# pdfinfo -> pdfid -> pdf-parser -> pdfdetach -> hash -> inspect extracted payload

FindingSeverity
Embedded JavaScript presentHigh
/OpenAction presentHigh
/Launch action presentCritical
Embedded executableCritical
Obfuscated JavaScriptVery High
External URI auto-triggerHigh

Suspicious Patterns:

  • 100--800 KB weaponized PDF
  • Embedded JavaScript
  • /OpenAction
  • /AA (additional actions)
  • /Launch
  • /URI
  • /EmbeddedFile
  • /XFA
  • Recently created & modified timestamps
  • Very small document with high object count

Identify

file sample.pdf

Confirm:

  • Is it actually PDF?
  • Or renamed EXE?
  • Or polyglot file?

Enumerate

pdfinfo sample.pdf
exiftool sample.pdf

Look for:

  • Author
  • Creator
  • Producer
  • CreationDate vs ModDate delta
  • Suspicious Producer strings
  • Blank or random metadata

Extract Safely (No Execution)

pdfdetach -list sample.pdf

If embedded files exist:

pdfdetach -saveall sample.pdf

If you want full object extraction:

pdf-parser.py sample.pdf

Hunt

Common malicious PDF patterns:

  • /JavaScript
  • /JS
  • /OpenAction
  • /Launch
  • /URI
  • /EmbeddedFile
  • /AA
  • eval(
  • unescape(
  • Hex-encoded blobs
  • Base64 payloads
# Structural hunt
strings sample.pdf | grep -E "JavaScript|OpenAction|Launch|URI|EmbeddedFile|AA"

# JavaScript hunt
strings sample.pdf | grep -i javascript
strings sample.pdf | grep -i eval
strings sample.pdf | grep -i unescape

# URL hunt
strings sample.pdf | grep -i http

Metadata

exiftool sample.pdf

Check:

  • Author
  • Creator
  • Producer
  • Creation vs modification time

delta

  • Embedded file references

Red flags:

  • Random author
  • Blank metadata
  • Created and modified within seconds
  • Suspicious producer (non-Adobe generator)
  • Timezone anomalies

Embedded Files

pdfdetach -list sample.pdf

If files extracted:

sha256sum extracted_file
file extracted_file
strings extracted_file | less
binwalk extracted_file

Look for: - Embedded EXE - Embedded HTA - Embedded JS - Embedded Office document - Packed PE indicators


JavaScript Execution Indicators

Search for:

strings sample.pdf | grep -i openaction
strings sample.pdf | grep -i launch
strings sample.pdf | grep -i uri

Look for: - /OpenAction - /Launch - app.launchURL - this.exportDataObject - Automatic triggers on open