Extracts metadata from public documents such as: pdf, doc, xls, ppt, docx, pptx, xlsx. The metadata may contain: author name, username, company name, software version, document path, creation date, etc.
Metadata such as Author, Company Name, Software Type can be used to collect information about the internal structure of an organization and about its IT environment. This information can be used in further attacks.
The information contained in document metadata can be easily used to mount phishing attacks against the target user and its organization.
By following the metadata from multiple documents, it is possible to track the professional activity of a user, including the type of work he has done, the internal departments he has worked in and possibly the names of his colleagues (document co-authors).
|Document(s) URL||This is the url of the document(s) that will be downloaded and parsed for metadata. If the URL points to a web page which contains links to multiple documents, all of them will be downloaded and extracted.
Whenever you create or modify a document (pdf, office, etc), the editor application automatically embeds information inside the document about the document author, creation date, modification date, the type and version of editor software (ex. Microsoft Office 2013), the path on disk where it was saved, company name, etc.
The type of saved metadata is not standard and it depends on the application which creates/edits the document, on the type of document and whether it was manually removed by the document author.
The easiest way to find URLs to public documents is to use search engines such as Google, Bing, Yahoo, etc. They already have this information because they have already crawled all public websites.
For instance, to find various types of documents with Google, you can use search expressions such as the following:
The metadata information embedded inside documents can be used in multiple scenarios by hackers. Here are some examples: