Search Engine Hacking – Manual and Automation
Search Engine Hacking – Manual and Automation
Skillset
Practice for certification success with the Skillset library of over 100,000 practice test questions. We analyze your responses and can determine when you are ready to sit for the test.
Introduction:
We are all aware of Google/Yahoo/Bing Search engines; they need no introduction. We use them every now and then to solve our day-to-day queries.
Google and other search engines use automated programs called spiders or crawlers. Also, these search engines have a large index of keywords, and where those words can be found. Powerful crawling and indexing features make these search engines not only powerful but also opens doors for hackers to use for identifying vulnerable targets over the internet. This is called Search Engine Hacking.
Search Engine Hacking involves using advanced operator-based searching to identify exploitable targets and sensitive data using the search engines.
In this article, we learn to use various Google search operators to identify vulnerable targets over the Internet and also check out a new tool that can be used to automate this process.
Special Search Characters:
Google search engine provides its users with various special search characters for advanced searching. See a partial list below:
- Quotes [“search query”]: Quotes are used to search for specific phrase or set of words.
E.g. The query [“The monk who sold his Ferrari”] will search for the specific phrase —The monk who sold his Ferrari.
- Minus Sign [-]: The minus sign tells Google search engine to exclude the word that follows the minus operator.
E.g. [-red apple] will display the search results which will exclude the word red.
- Tilde operator [~]: Adding a tilde operator in front of a word will search for results containing that word as well as even more synonyms.
E.g. [~jokes] will display search results which will include the word jokes as well as its synonyms like funny, humor, etc.
- OR operator or vertical bar [|]: Using OR (in uppercase) or the vertical bar with two or more keywords, tells Google to search for pages that contain either of the words.
E.g. [Android OR Apple] will display search results containing either of the words.
- Asterisk operator [*]: The asterisk is a computer symbol for a wildcard, which allows the search engine, such as Google, to fill in that space with any text string. You can also use it within double quotes for more precise searches.
E.g. The query [“today is * day”] will display search results like “today is a good day” or “today is mother’s day”, etc.
Basic Searching Techniques:
Google search engine provides various operators to customize our search results.
The basic syntax of a Google advanced operator is
operator:search_term
The list below provides some of the key operators useful in creating search queries to retrieve valuable information from the web.
- Intitle operator:
The query [intitle:keyword] in the search engine will return pages containing the keyword in the title.
E.g. 1: The query [intitle:Google] will return all the web pages containing Google in the title.
E.g. 2: Google Hacking using intitle operator
Using the query [intitle:”Index of”] will return all the web pages containing “Index of” in the title. This can be used to identify if Directory Listing (Directory Listing displays a list of the directory contents) is enabled on the web server.
- Site operator:
The query [site:www.site.com] narrows a search to a particular site, domain or sub-domain.
E.g. 1: The query [news site:yahoo.com] will search for the keyword “news” on the site and the sub-domains of Yahoo.com.
E.g. 2: Google Hacking – Information gathering on sub domains
The query [site:yahoo.com] will display search results containing all the sub-domains of yahoo.com. This operator is useful for gathering information on the sub-domains of a specific target site.
- Inurl operator:
The query inurl:keyword in the search engine will return pages containing the keyword in the URL.
E.g. 1 – The query [inurl:contactus site:www.MySite.com] will search for pages on MySite in the URL containing the word “contactus”.
E.g. 2 – Google Hacking – Looking for Admin Portals
The query [inurl:admin.php] will search for all the websites that might have admin login pages. These pages attract the hackers and they might brute force the login page to gain access to the admin interface.
- Cache operator:
Google keeps the snapshot of the pages it has crawled. The query [cache:keyword] in the search engine displays Google’s cached version of the page.
E.g. – The query [cache:www.yahoo.com] will display cached pages of the website Yahoo.com. The above directive can be useful in gathering information from the previously cached pages.
Another very useful website that can be used to obtain the cached pages is http://archive.org/
This websites stores a snapshot of the websites in a calendar format, and can be used to view the pages of any previous date. The screenshot below displays a cached page of Yahoo.com dated 9 Feb 2010.
Click to Enlarge
- Filetype operator:
The query [filetype:file extension] searches for pages that end in a particular file extension. Google can search for many different types of files like pdf, doc, image, rtf, ppt, xls, etc.
E.g. The query [filetype:pdf site:yahoo.com] will return all the links to pdf files found on Yahoo.com.
Google Hacking through keyword search
Let’s look at some of the keyword searches and the operators that can be used to build search queries to carry out Google Hacking.
- Digging Google for Configuration Files:
Configuration files are used to configure the initial settings for some computer programs. An attacker having access to the configuration file can get a complete understanding of the program deployed.
For e.g. a Google query like [filetype:ini inurl:ws_ftp.ini] would retrieve the configuration file used by the WS_FTP client program as shown in the screenshot below:
- Digging Google for Log Files:
The web servers log information like IP address, timestamps, HTTP request, usernames and password in to the log files. These log files are usually stored with the extension .log on the server side and may be accessible over the internet due to inadequate protection.
For e.g. a Google query like [filetype:log cron.log] would retrieve the UNIX cron log as shown in the screenshot below:
Click to Enlarge
- Digging Google for database leakage information from web applications:
Google Hackers search Google for pieces of database information leaked from vulnerable servers. This information can be used to identify a vulnerable target and launch a more sophisticated attack against the target.
For example, a Google query like [filetype:inc intext:mysql_connect
] will retrieve the .inc file that contains the mysql user credentials and other functions details that are used to connect to the database.
- Digging Google for leakage of information though error messages:
Information leakage through error messages are very much useful for information gathering and launching further attacks on the websites. If the application does not have exception/error handling mechanisms, it might leak sensitive details in the error messages like database details, error stack trace details, etc.
E.g. a Google query like [intitle:”Apache Tomcat” “Error Report”] will display search results containing the Apache Tomcat error messages.
We discussed a brief on the directives that can be used to carry out search engine hacking. Manually trying out each of these directives can be a cumbersome task. To automate the process of search engine hacking and retrieving juicy information, we make use of automated tools.
Automated tools available for Google Hacking:
- Gooscan – Gooscan is a tool that automates queries against Google search appliances, but with a twist. These particular queries are designed to find potential vulnerabilities on web pages.
Ref:
http://www.securitytube-tools.net/index.php@title=Gooscan.html - Sitedigger – SiteDigger searches Google’s cache to look for vulnerabilities, errors, configuration issues, proprietary information, and interesting security nuggets on web sites.
Ref:
http://www.mcafee.com/in/downloads/free-tools/sitedigger.aspx - Wikto – This is a multipurpose tool developed by Sensepost which can be used for automating Google Hacking.
The above tools provide are useful for Google Hacking. However, let’s look at a new tool called Search Diggity, which provides a graphical user interface and is useful in retrieving lot information from both Bing as well as Google search engine.
Search Diggity:
It is Stach & Liu’s MS Windows GUI application that serves as a front-end to the most recent versions of the Diggity tools:
- GoogleDiggity
- BingDiggity
- Bing LinkFromDomainDiggity
- CodeSearchDiggity, DLPDiggity
- FlashDiggity
- MalwareDiggity
- PortScanDiggity
- SHODANDiggity
- BingBinaryMalwareSearch
- NotInMyBackYard Diggity
More information on these modules can be found here: Ref:
http://www.stachliu.com/resources/tools/google-hacking-diggity-project/attack-tools/
Let’s explore a few of the above key modules of interest to learn about the art of search engine hacking.
GoogleDiggity:
The Google Diggity tool automates the Google Hacking process. It queries the search engine using the Google JSON/ATOM Custom Search API to identify vulnerabilities and information disclosures.
The Google Search engine uses a bot detection technique. As a result querying Google using automated tools for Google hacking. This is overcome with the use of Google JSON/ATOM Custom Search API, which uses an API key. A user can register for an API key against a valid Gmail account and get a free 100 requests/day. Additional queries are available at a cost (Google charges $5 per 1000 queries).
The tool provides a well-structured interface that allows the user to:
- Select the search queries from the list
- Feed the API key
- Specify the target site/domain/IP address
- Scan button to kick of the scan, etc.
Bing Diggity:
Similar to GoogleDiggity, Bing Diggity is a Bing search engine hacking tool. It utilizes the Bing 2.0 API (The Bing 2.0 API allows 1000 results per query) and the Stach & Liu’s newly developed Bing Hacking Database (BHDB) to find vulnerabilities and sensitive information disclosures related to your organization that are exposed via Microsoft’s Bing search engine.
The tool provides a well-structured interface that allows the user to:
- Select the search queries specific to Bing search engine from the list
- Feed the API key
- Specify the target site/domain/IP address
- Scan button to kick of the scan, etc.
DLPDiggity:
DLPDiggity is a data loss prevention tool that leverages Google/Bing to identify exposures of sensitive info (e.g. SSNs, credit card numbers, etc.) via common document formats such as .doc, .xls, and .pdf. First, GoogleDiggity and BingDiggity are used to locate and download files belonging to target domains/sites on the Internet. Then, DLPDiggity is used to analyze those downloaded files for sensitive information disclosures.
DLPDiggity utilizes IFilters
(An IFilter is a plugin that allows the Windows Indexing Service and the newer Windows Desktop Search to index different file formats so that they become searchable) to search through the actual contents of files, as opposed to just the meta-data. Using .NET regular expressions, DLPDiggity can find almost any type of sensitive data within common document file formats.
Over the last few years, there has been a tremendous increase in the volume of office documents that have been indexed and made searchable by Google and Bing. DLPDiggity taps into that in order to find documents containing sensitive information.
The tool provides a well-structured interface that allows the user to:
- Select the DLPDiggity search queries from the list that can be used to dig Google/Bing search engine for querying for documents.
- Select the regular expressions that will be used to search through the documents in the target directory for data leaks of sensitive information such as SSN, credit card numbers
- Search button to analyze through the documents
FlashDiggity:
FlashDiggity automates Google searching/downloading/decompiling/analysis of SWF files to identify Flash vulnerabilities and information disclosures.
FlashDiggity first leverages the GoogleDiggity tool in order to identify Adobe Flash SWF applications for target domains via Google searches, such as ext:swf. Next, the tool is used to download all of the SWF files in bulk for analysis. The SWF files are disassembled back to their original ActionScript source code, and then analyzed for code-based vulnerabilities.
The tool provides a well-structured interface that allows the user to:
- Select the FlashDiggity search queries from the list that can be used to dig Google search engine for querying for documents
- Select the regular expressions that will be used to search through the ActionScript of decompiled SWF Flash files for code-based vulnerabilities and information disclosures.
- Search button to decompile and analyze the SWF files
Search Engine Hacking – Manual and Automation的更多相关文章
- [DataMining]WEEK1 - text-retrieval and search engine
What does a computer have to do in order to understand a natural language sentence? What is ambiguit ...
- [Search Engine] 搜索引擎分类和基础架构概述
大家一定不会多搜索引擎感到陌生,搜索引擎是互联网发展的最直接的产物,它可以帮助我们从海量的互联网资料中找到我们查询的内容,也是我们日常学习.工作和娱乐不可或缺的查询工具.之前本人也是经常使用Googl ...
- [CareerCup] 10.7 Simplified Search Engine 简单的搜索引擎
10.7 Imagine a web server for a simplified search engine. This system has 100 machines to respond to ...
- 开源搜索 Iveely Search Engine 0.6.0 发布 -- 黎明前的娇嫩
快两年了,Iveely Search Engine已经走过了5个版本的岁月,虽出生“贫寒”,没有任何开源基金会的支持,没有优秀的“干爹.干妈”,它凭着它的爱好者的支持,0.6.0终于破壳而出,7年前, ...
- 101+ Manual and Automation Software Testing Interview Questions and Answers
101+ Manual and Automation Software Testing Interview Questions and Answers http://www.softwaretesti ...
- [0.0]Analysis of Baidu search engine
Rencently, my two teammates and I is doing a project, a simplified Chinese search engine for childre ...
- irefox 34的"Manage Search Engine"去哪了
博客搬到了fresky.github.io - Dawei XU,请各位看官挪步.最新的一篇是:irefox 34的"Manage Search Engine"去哪了.
- Iveely Search Engine 0.4.0 的发布
千呼万唤始出来,Iveely Search Engine 0.4.0 的发布 经过无数个夜晚的奋战,以及无数个夜晚的失眠,Iveely Search Engine 0.4.0 终于熬出来了,这其中 ...
- python JSON API duckduckgo search engine 使用duckduckgo API 尝试搜索引擎
The duckduckgo.com's search engine is very neat to use. Acutally it has many things to do with other ...
随机推荐
- LeetCode 94. Binary Tree Inorder Traversal 动态演示
非递归的中序遍历,要用到一个stack class Solution { public: vector<int> inorderTraversal(TreeNode* root) { ve ...
- Node.js实战8:可用于压缩、加密的zlib。
zlib是nodejs内置的模块,有deflate.inflate函数,使用的是gzip算法,可用于压缩和解压,也可用于数据加密.解密. 如下示例: var zlib = require(" ...
- vue项目 多文件上传并显示在页面上
<template> <label for="file" class=" btn btn-default" style="borde ...
- SQL复制远程数据库数据到本地-及查询结果少显示一列
网上找了查询结果怎么少显示一列,因为数据很多列,结果不是视图就是嵌套,太麻烦,这里用临时表做 exec sp_addlinkedserver 'ITSV ', ' ', 'SQLOLEDB', '19 ...
- Samba服务问答
1. samba服务用在什么地方?samba服务用于把Linux服务器上的文件或者打印接共享给windows或者Linux. 2. 在samba服务的配置文件中,[global]配置部分的securi ...
- Java中数据类型的分类
我们知道Java是强类型语言,那么肯定对应的也就有弱类型语言,以下介绍强类型语言与弱类型语言的区别: 强类型语言: 强类型语言也就是强制数据类型定义的语言.也就是说,一旦一个变量被指定了某个数据类型, ...
- [2019杭电多校第四场][hdu6621]K-th Closest Distance(主席树)
题目链接:http://acm.hdu.edu.cn/showproblem.php?pid=6621 题意为求区间[l,r]内第k小|a[i]-p|的值. 可以二分答案,如果二分的值为x,则判断区间 ...
- adb logcat查看手机端日志
前言 做app测试,遇到异常情况,查看日志是必不可少的,日志如何输出到手机sdcard和电脑的目录呢?这就需要用logcat输出日志了以下操作是基于windows平台的操作:adb logcat | ...
- 使用 js 修饰器封装 axios
修饰器 修饰器是一个 JavaScript 函数(建议是纯函数),它用于修改类属性/方法或类本身.修饰器提案正处于第二阶段,我们可以使用 babel-plugin-transform-decorato ...
- JUC并发包基本使用
一.简介 传统的Java多线程开发中,wait.notify.synchronized等如果不注意使用的话,很容易引起死锁.脏读问题.Java1.5 版本开始增加 java.util.concurre ...