Document Metadata Extractor 20 Credits

Extract metadata from various types of online documents

Sample Report


Use Cases


Technical Details

Sample Report

Here is a Document Metadata Extractor sample report:

  • Includes metadata from various types of documents (doc, docx, pdf, xls, etc.)
  • The extracted metadata could contain:
    • Document author
    • Software type and version
    • Company name
    • Creation date, etc

Document Metadata Extractor - Use Cases

Extracts metadata from public documents such as: pdf, doc, xls, ppt, docx, pptx, xlsx. The metadata may contain: author name, username, company name, software version, document path, creation date, etc.

Fingerprint Target Organization

Metadata such as Author, Company Name, Software Type can be used to collect information about the internal structure of an organization and about its IT environment. This information can be used in further attacks.

Mount Phishing Attacks

The information contained in document metadata can be easily used to mount phishing attacks against the target user and its organization.

Trace a User's Activity

By following the metadata from multiple documents, it is possible to track the professional activity of a user, including the type of work he has done, the internal departments he has worked in and possibly the names of his colleagues (document co-authors).

Technical Details


The Metadata Extractor connects to the target URL, parses the HTML page and downloads the document(s) found. Then it extracts all the metadata inside.
The tool can extract metadata from multiple documents at once if the target URL points to a web page which contains links to the wanted documents (all of them will be searched for metadata).


Parameter Description
Document(s) URL This is the url of the document(s) that will be downloaded and parsed for metadata. If the URL points to a web page which contains links to multiple documents, all of them will be downloaded and extracted.

Example URLs:
  • http://www.adobe.com/devnet/pdf/pdf_reference.html
  • https://www.cisco.com/assets/events/deep_drive.pptx

How it works

What is document metadata?

Whenever you create or modify a document (pdf, office, etc), the editor application automatically embeds information inside the document about the document author, creation date, modification date, the type and version of editor software (ex. Microsoft Office 2013), the path on disk where it was saved, company name, etc.

The type of saved metadata is not standard and it depends on the application which creates/edits the document, on the type of document and whether it was manually removed by the document author.

How to find public documents exposed in websites?

The easiest way to find URLs to public documents is to use search engines such as Google, Bing, Yahoo, etc. They already have this information because they have already crawled all public websites.
For instance, to find various types of documents with Google, you can use search expressions such as the following:

Of course, the alternative would be to manually browse the website for documents, note the URLs of the interesting ones and use them with this tool to extract metadata.

How document metadata can be used by attackers?

The metadata information embedded inside documents can be used in multiple scenarios by hackers. Here are some examples:

  • Author names can be used to mount phishing attacks against company's employees.
  • Usernames can be used to try brute-force authentication attacks against company's external facing applications (webmail, vpn, blogs, etc).
  • Software type and version is useful to map the technologies used internally by an organization. Further attacks can be tailored against these technologies.
  • Document creation/modification date could indicate that the author still works for the company.
  • Other custom metadata may reveal additional interesting information.