Search Engine Hacking – Manual and Automation
Search Engine Hacking – Manual and Automation
Skillset
Practice for certification success with the Skillset library of over 100,000 practice test questions. We analyze your responses and can determine when you are ready to sit for the test.
Introduction:
We are all aware of Google/Yahoo/Bing Search engines; they need no introduction. We use them every now and then to solve our day-to-day queries.
Google and other search engines use automated programs called spiders or crawlers. Also, these search engines have a large index of keywords, and where those words can be found. Powerful crawling and indexing features make these search engines not only powerful but also opens doors for hackers to use for identifying vulnerable targets over the internet. This is called Search Engine Hacking.
Search Engine Hacking involves using advanced operator-based searching to identify exploitable targets and sensitive data using the search engines.
In this article, we learn to use various Google search operators to identify vulnerable targets over the Internet and also check out a new tool that can be used to automate this process.
Special Search Characters:
Google search engine provides its users with various special search characters for advanced searching. See a partial list below:
- Quotes [“search query”]: Quotes are used to search for specific phrase or set of words.
E.g. The query [“The monk who sold his Ferrari”] will search for the specific phrase —The monk who sold his Ferrari.
- Minus Sign [-]: The minus sign tells Google search engine to exclude the word that follows the minus operator.
E.g. [-red apple] will display the search results which will exclude the word red.
- Tilde operator [~]: Adding a tilde operator in front of a word will search for results containing that word as well as even more synonyms.
E.g. [~jokes] will display search results which will include the word jokes as well as its synonyms like funny, humor, etc.
- OR operator or vertical bar [|]: Using OR (in uppercase) or the vertical bar with two or more keywords, tells Google to search for pages that contain either of the words.
E.g. [Android OR Apple] will display search results containing either of the words.
- Asterisk operator [*]: The asterisk is a computer symbol for a wildcard, which allows the search engine, such as Google, to fill in that space with any text string. You can also use it within double quotes for more precise searches.
E.g. The query [“today is * day”] will display search results like “today is a good day” or “today is mother’s day”, etc.
Basic Searching Techniques:
Google search engine provides various operators to customize our search results.
The basic syntax of a Google advanced operator is
operator:search_term
The list below provides some of the key operators useful in creating search queries to retrieve valuable information from the web.
- Intitle operator:
The query [intitle:keyword] in the search engine will return pages containing the keyword in the title.
E.g. 1: The query [intitle:Google] will return all the web pages containing Google in the title.
E.g. 2: Google Hacking using intitle operator
Using the query [intitle:”Index of”] will return all the web pages containing “Index of” in the title. This can be used to identify if Directory Listing (Directory Listing displays a list of the directory contents) is enabled on the web server.

- Site operator:
The query [site:www.site.com] narrows a search to a particular site, domain or sub-domain.
E.g. 1: The query [news site:yahoo.com] will search for the keyword “news” on the site and the sub-domains of Yahoo.com.
E.g. 2: Google Hacking – Information gathering on sub domains
The query [site:yahoo.com] will display search results containing all the sub-domains of yahoo.com. This operator is useful for gathering information on the sub-domains of a specific target site.
- Inurl operator:
The query inurl:keyword in the search engine will return pages containing the keyword in the URL.
E.g. 1 – The query [inurl:contactus site:www.MySite.com] will search for pages on MySite in the URL containing the word “contactus”.
E.g. 2 – Google Hacking – Looking for Admin Portals
The query [inurl:admin.php] will search for all the websites that might have admin login pages. These pages attract the hackers and they might brute force the login page to gain access to the admin interface.
- Cache operator:
Google keeps the snapshot of the pages it has crawled. The query [cache:keyword] in the search engine displays Google’s cached version of the page.
E.g. – The query [cache:www.yahoo.com] will display cached pages of the website Yahoo.com. The above directive can be useful in gathering information from the previously cached pages.
Another very useful website that can be used to obtain the cached pages is http://archive.org/
This websites stores a snapshot of the websites in a calendar format, and can be used to view the pages of any previous date. The screenshot below displays a cached page of Yahoo.com dated 9 Feb 2010.
Click to Enlarge
- Filetype operator:
The query [filetype:file extension] searches for pages that end in a particular file extension. Google can search for many different types of files like pdf, doc, image, rtf, ppt, xls, etc.
E.g. The query [filetype:pdf site:yahoo.com] will return all the links to pdf files found on Yahoo.com.
Google Hacking through keyword search
Let’s look at some of the keyword searches and the operators that can be used to build search queries to carry out Google Hacking.
- Digging Google for Configuration Files:
Configuration files are used to configure the initial settings for some computer programs. An attacker having access to the configuration file can get a complete understanding of the program deployed.
For e.g. a Google query like [filetype:ini inurl:ws_ftp.ini] would retrieve the configuration file used by the WS_FTP client program as shown in the screenshot below:

- Digging Google for Log Files:
The web servers log information like IP address, timestamps, HTTP request, usernames and password in to the log files. These log files are usually stored with the extension .log on the server side and may be accessible over the internet due to inadequate protection.
For e.g. a Google query like [filetype:log cron.log] would retrieve the UNIX cron log as shown in the screenshot below:
Click to Enlarge
- Digging Google for database leakage information from web applications:
Google Hackers search Google for pieces of database information leaked from vulnerable servers. This information can be used to identify a vulnerable target and launch a more sophisticated attack against the target.
For example, a Google query like [filetype:inc intext:mysql_connect
] will retrieve the .inc file that contains the mysql user credentials and other functions details that are used to connect to the database.

- Digging Google for leakage of information though error messages:
Information leakage through error messages are very much useful for information gathering and launching further attacks on the websites. If the application does not have exception/error handling mechanisms, it might leak sensitive details in the error messages like database details, error stack trace details, etc.
E.g. a Google query like [intitle:”Apache Tomcat” “Error Report”] will display search results containing the Apache Tomcat error messages.

We discussed a brief on the directives that can be used to carry out search engine hacking. Manually trying out each of these directives can be a cumbersome task. To automate the process of search engine hacking and retrieving juicy information, we make use of automated tools.
Automated tools available for Google Hacking:
- Gooscan – Gooscan is a tool that automates queries against Google search appliances, but with a twist. These particular queries are designed to find potential vulnerabilities on web pages.
Ref:
http://www.securitytube-tools.net/index.php@title=Gooscan.html - Sitedigger – SiteDigger searches Google’s cache to look for vulnerabilities, errors, configuration issues, proprietary information, and interesting security nuggets on web sites.
Ref:
http://www.mcafee.com/in/downloads/free-tools/sitedigger.aspx - Wikto – This is a multipurpose tool developed by Sensepost which can be used for automating Google Hacking.
The above tools provide are useful for Google Hacking. However, let’s look at a new tool called Search Diggity, which provides a graphical user interface and is useful in retrieving lot information from both Bing as well as Google search engine.
Search Diggity:
It is Stach & Liu’s MS Windows GUI application that serves as a front-end to the most recent versions of the Diggity tools:
- GoogleDiggity
- BingDiggity
- Bing LinkFromDomainDiggity
- CodeSearchDiggity, DLPDiggity
- FlashDiggity
- MalwareDiggity
- PortScanDiggity
- SHODANDiggity
- BingBinaryMalwareSearch
- NotInMyBackYard Diggity
More information on these modules can be found here: Ref:
http://www.stachliu.com/resources/tools/google-hacking-diggity-project/attack-tools/
Let’s explore a few of the above key modules of interest to learn about the art of search engine hacking.
GoogleDiggity:
The Google Diggity tool automates the Google Hacking process. It queries the search engine using the Google JSON/ATOM Custom Search API to identify vulnerabilities and information disclosures.
The Google Search engine uses a bot detection technique. As a result querying Google using automated tools for Google hacking. This is overcome with the use of Google JSON/ATOM Custom Search API, which uses an API key. A user can register for an API key against a valid Gmail account and get a free 100 requests/day. Additional queries are available at a cost (Google charges $5 per 1000 queries).
The tool provides a well-structured interface that allows the user to:
- Select the search queries from the list
- Feed the API key
- Specify the target site/domain/IP address
- Scan button to kick of the scan, etc.
Bing Diggity:
Similar to GoogleDiggity, Bing Diggity is a Bing search engine hacking tool. It utilizes the Bing 2.0 API (The Bing 2.0 API allows 1000 results per query) and the Stach & Liu’s newly developed Bing Hacking Database (BHDB) to find vulnerabilities and sensitive information disclosures related to your organization that are exposed via Microsoft’s Bing search engine.
The tool provides a well-structured interface that allows the user to:
- Select the search queries specific to Bing search engine from the list
- Feed the API key
- Specify the target site/domain/IP address
- Scan button to kick of the scan, etc.
DLPDiggity:
DLPDiggity is a data loss prevention tool that leverages Google/Bing to identify exposures of sensitive info (e.g. SSNs, credit card numbers, etc.) via common document formats such as .doc, .xls, and .pdf. First, GoogleDiggity and BingDiggity are used to locate and download files belonging to target domains/sites on the Internet. Then, DLPDiggity is used to analyze those downloaded files for sensitive information disclosures.
DLPDiggity utilizes IFilters
(An IFilter is a plugin that allows the Windows Indexing Service and the newer Windows Desktop Search to index different file formats so that they become searchable) to search through the actual contents of files, as opposed to just the meta-data. Using .NET regular expressions, DLPDiggity can find almost any type of sensitive data within common document file formats.
Over the last few years, there has been a tremendous increase in the volume of office documents that have been indexed and made searchable by Google and Bing. DLPDiggity taps into that in order to find documents containing sensitive information.
The tool provides a well-structured interface that allows the user to:
- Select the DLPDiggity search queries from the list that can be used to dig Google/Bing search engine for querying for documents.
- Select the regular expressions that will be used to search through the documents in the target directory for data leaks of sensitive information such as SSN, credit card numbers
- Search button to analyze through the documents
FlashDiggity:
FlashDiggity automates Google searching/downloading/decompiling/analysis of SWF files to identify Flash vulnerabilities and information disclosures.
FlashDiggity first leverages the GoogleDiggity tool in order to identify Adobe Flash SWF applications for target domains via Google searches, such as ext:swf. Next, the tool is used to download all of the SWF files in bulk for analysis. The SWF files are disassembled back to their original ActionScript source code, and then analyzed for code-based vulnerabilities.
The tool provides a well-structured interface that allows the user to:
- Select the FlashDiggity search queries from the list that can be used to dig Google search engine for querying for documents
- Select the regular expressions that will be used to search through the ActionScript of decompiled SWF Flash files for code-based vulnerabilities and information disclosures.
- Search button to decompile and analyze the SWF files
Search Engine Hacking – Manual and Automation的更多相关文章
- [DataMining]WEEK1 - text-retrieval and search engine
What does a computer have to do in order to understand a natural language sentence? What is ambiguit ...
- [Search Engine] 搜索引擎分类和基础架构概述
大家一定不会多搜索引擎感到陌生,搜索引擎是互联网发展的最直接的产物,它可以帮助我们从海量的互联网资料中找到我们查询的内容,也是我们日常学习.工作和娱乐不可或缺的查询工具.之前本人也是经常使用Googl ...
- [CareerCup] 10.7 Simplified Search Engine 简单的搜索引擎
10.7 Imagine a web server for a simplified search engine. This system has 100 machines to respond to ...
- 开源搜索 Iveely Search Engine 0.6.0 发布 -- 黎明前的娇嫩
快两年了,Iveely Search Engine已经走过了5个版本的岁月,虽出生“贫寒”,没有任何开源基金会的支持,没有优秀的“干爹.干妈”,它凭着它的爱好者的支持,0.6.0终于破壳而出,7年前, ...
- 101+ Manual and Automation Software Testing Interview Questions and Answers
101+ Manual and Automation Software Testing Interview Questions and Answers http://www.softwaretesti ...
- [0.0]Analysis of Baidu search engine
Rencently, my two teammates and I is doing a project, a simplified Chinese search engine for childre ...
- irefox 34的"Manage Search Engine"去哪了
博客搬到了fresky.github.io - Dawei XU,请各位看官挪步.最新的一篇是:irefox 34的"Manage Search Engine"去哪了.
- Iveely Search Engine 0.4.0 的发布
千呼万唤始出来,Iveely Search Engine 0.4.0 的发布 经过无数个夜晚的奋战,以及无数个夜晚的失眠,Iveely Search Engine 0.4.0 终于熬出来了,这其中 ...
- python JSON API duckduckgo search engine 使用duckduckgo API 尝试搜索引擎
The duckduckgo.com's search engine is very neat to use. Acutally it has many things to do with other ...
随机推荐
- STL关联式容器之set\map ----以STL源码为例
关联式容器的特征:所用元素都会根据元素的键值自动被排序. set STL 中的关联式容器低层数据结构为红黑树,其功能都是调用低层数据结构中提供的相应接口. set元的元素不会像map那样同时拥有键(k ...
- java静态代码块,构造方法,初始化块的执行顺序
代码Parent和Sub进行讲解 public class Parent { private static final String name; public Parent() { System.ou ...
- 硬核!如何模拟 5w+ 的并发用户?
来自:http://t.cn/ES7KBkW 本文将从负载测试的角度,描述了做一次流畅的5万用户并发测试需要做的事情. 你可以在本文的结尾部分看到讨论的记录. 快速的步骤概要 编写你的脚本 使用JMe ...
- Oracle数据库用户介绍
Oracle数据库创建的时候,创建了一系列默认的用户,有时候可能我们不小心忘记创建了某个用户,比如SCOTT用户,我们就需要使用Oracle提供的脚本来创建,介绍如下: 1.SYS/change_on ...
- java_第一年_JDBC(7)
Commons-dbutils是一个开源的JDBC工具类库,对JDBC进行封装,简化编码的工作量,包含的API: org.apache.commons.dbutils.QueryRunner org. ...
- 以区间DP为前提的【洛谷p1063】能量项链
(跑去练习区间DP,然后从上午拖到下午qwq) 能量项链[题目链接] 然后这道题也是典型的区间DP.因为是项链,所以显然是一个环,然后我们可以仿照石子合并一样,把一个有n个节点的环延长成为有2*n个节 ...
- [APIO 2010] [LOJ 3144] 奇怪装置 (数学)
[APIO 2010] [LOJ 3144] 奇怪装置 (数学) 题面 略 分析 考虑t1,t2时刻坐标相同的条件 \[\begin{cases} t_1+\lfloor \frac{t_1}{B} ...
- make: *** 没有指明目标并且找不到 makefile
make: *** 没有指明目标并且找不到 makefile. 停止. make: *** 没有规则可以创建目标“install”. 停止. 不是没有makefile文件,而是你没有安装gcc编译 ...
- python-docx 添加表格时很慢的解决方法
我们做监控系统的时候常需要给客户发送邮箱报告,附带一个word的文档,文档中插入表格给用户更直观的数据. 我用的时python-docx库操作文档,最近碰到,当往文档中插入表格时,随着表格行数的增多, ...
- python 更快地判断数字的奇数还是偶数
使用 按位与运算符(&) 将能更加快速地判断一个整数是奇数还是偶数 使用举例如下: def check_number(n): if n & 1: return '奇数' else: r ...