This is the URL of the document(s) that will be parsed for metadata extraction.

The URL can point to a single document or to a web page which contains links (href) to multiple documents.
Example: http://site.com/documents/



Start or Schedule a scan

About the Document Metadata Extractor

Extracts metadata from public documents such as: pdf, doc, xls, ppt, docx, pptx, xlsx.
The metadata may contain: author name, username, company name, software version, document path, creation date, etc.

The Metadata Extractor connects to the target URL, downloads the document(s) found, parses them and extracts all metadata identified.
The tool can extract metadata from multiple documents at once if the target URL points to a web page which contains links to the wanted documents (all of them will be searched for metadata).


Tool parameters


What is document metadata?

Whenever you create or modify a document (pdf, office, etc), the editor application automatically embeds information inside the document about the document author, creation date, modification date, the type and version of editor software (ex. Microsoft Office 2013), the path on disk where it was saved, company name, etc.

The type of saved metadata is not standard and it depends on the application which creates/edits the document, on the type of document and whether was manually removed by the document author.


How to find public documents exposed in websites?

The easiest way to find URLs to public documents is to use search engines such as Google, Bing, Yahoo, etc. They already have this information because they have already crawled all public websites.
For instance, to find various types of documents with Google, you can use search expressions such as the following:

Of course, the alternative would be to manually browse the website for documents, note the URLs of the interesting ones and use them with this tool to extract metadata.


How document metadata can be used by attackers?

The metadata information embedded inside documents can be used in multiple scenarios by hackers. Here are some examples:

  • Author names can be used to mount phishing attacks against company's employees.
  • Usernames can be used to try brute-force authentication attacks against company's external facing applications (webmail, vpn, blogs, etc).
  • Software type and version is useful to map the technologies used internally by an organization. Further attacks can be tailored against these technologies.
  • Document creation/modification date could indicate that the author still works for the company.
  • Other custom metadata may reveal additional interesting information.

Document Metadata Extractor - Sample Report
×