Hack 102 The Danger Of PDF Files

Hack 102 The Danger Of PDF Files

Today’s “hack” focuses on the danger of PDF files which float around the internet. We’ll look specifically into how you can protect yourself from malicious content which might (or might not) be hidden. By hidden, I mean part of the .pdf package.

How do I know if a PDF ebook contains a malicious JavaScript file or other dangerous code? The short answer is that you have to take a closer look. To do so, you need some Python code and access to a computer that runs Linux. OS X also has python installed and if you use Windows, then I recommend that you install Python 3 and do a few programming tutorials. You can learn a lot in a few days but for now, we worry about the danger of PDF files you get from the net.

Where to get the pdfid.py python script

Easy. A quick search will point you in the right direction which is the code author’s website

Once you open that web page, press Ctrl + F to access your web browsers search feature and enter this into the search field without the quotes: “pdfid_v0_2_5.zip

I assume that you use Linux but all operating systems follow a very similar pattern. Extract the zip file into a directory and then go to that directory by entering this into your terminal:

cd /home/youruserid/Downloads/pdfid-master (press enter).
type “ls -l” to list the files inside the pdfid-master directory. You should see for following text:

pdfid-master]$ ls -l
total 36
drwxr-xr-x 2 me me 4096 May 27 2016 img
drwxr-xr-x 2 me me 4096 Dec 1 15:21 pdfid
-rw-r–r– 1 me me 3487 May 27 2016 README.md
-rw-r–r– 1 me me 311 May 27 2016 setup.py

Inside the pdfid directory, you have a few more files and the one we need is called pdfid.py which is the actual script.

Before you go any further, you need to know where you pdf ebooks are beccause the path to the pdf files needs to be entered correctly. To help you find the correct path, just open a file browser like Dolphin (Linux KDE) and navigate to the ebooks. Then look at the top to see the path.

OK, back to the python script. To execute the pdfid.py script, type this command into the terminal:

python pdfid.py /home/yourid/Downloads/ebookdir/ebooktitle.pdf

Press enter and wait a moment. If you typed everything right, then you will see this output:

PDF Header: %PDF-1.4
obj 12246
endobj 12246
stream 4725
endstream 4666
xref 1
trailer 1
startxref 1
/Page 281
/Encrypt 0
/ObjStm 0
/JS 3
/JavaScript 0
/AA 1
/OpenAction 0
/AcroForm 0
/JBIG2Decode 0
/RichMedia 0
/Launch 0
/EmbeddedFile 0
/XFA 0
/Colors > 2^24 0

The noteworthy entries from the output are /Encrypt, /JavaScript and depending on your PDF file, /Actions.

What to do with a PDF file that looks suspicious

Again, assuming you use Linux or at least live-booted into a capable Linux distribution via USB boot, then the terminal is your friend. To get rid of hidden malware of JavaScript files, type this command exactly as I show you here. You don’t have to be root to do this so just copy-paste or type this command:

pdf2ps NameTitleOfBook.pdf – | pdf2pdf – NewNameTitleOfBook.pdf

Press enter and wait. Depending on how fast your computer crunches numbers, this process can take a few minutes. If you are not sure, then simply right click the old and new file (the one you are writing to) and select “Properties” from the pop-up window. Look at the file size and wait some more. Once the conversion is finished, you will see the new file change from a blank white placeholder.pdf to the actual book cover. Compare the file size again. If you removed a few megabytes then you probably dodged a bullet. Open the new file and check that everything look right and if it does, then delete the old file.

Do this to all of your books and delete the originals. Reboot your computer and read away.

A word of warning

There is an old saying that the best things in life are free. Don’t be fooled. On the net, nothing is free except open source software and material which is published under the GPL or MIT license. It’s easy to find some warez site and download valuable ebooks. We have all done it but you never know who made them and why. There could be another malicious file on your system which was placed somewhere during the unzip or unrar process and PDF files that contain JavaScript can access links on your computer. I personally don’t trust any file that floats around on the net and is up for grabs just like that.

Live booting from a USB goes a long way while reading PDF files or, if you have VirtualBox installed, then a throw-away Linux install that serves the purpose of “looking/testing something” will do as well. You have been warned.

Last but not least remember that privacy is dead and has been for many years. If you have time then read my other article on how my web browser reveals my identity which explains how our web browser identify our every moves on the net. That includes the Tor browser.
Get questions or comments? Fire away.

Leave a Comment