ASP.NET的SEO：SEO Hack --- Html注入和Nofollow

黑帽（black hat）SEO主要是指采取“不怎么道德”（暂时就这么形容吧！）的方式进行搜索引擎优化。

1. 注入攻击，包括Sql注入和Html注入。我经常能看到对Sql注入防范的谈论，但对于Html注入，很多人并没有引起足够的重视。为了展示Html注入的效果，我们模仿了一个常见的留言本功能。
首先，在页面声明中添加两个属性设置EnableEventValidation="false" ValidateRequest="false" ，这很关键，读者可以试一下如果不这样设置会有什么效果。

 <%@ Page Language="C#" AutoEventWireup="true"  CodeFile="Default.aspx.cs" Inherits="_Default" EnableEventValidation="false" ValidateRequest="false" %>

然后，前台页面和后台代码段分别如下：

代码

        <asp:TextBox ID="txtInput" runat="server" Height="95px" Width="405px" TextMode="MultiLine"></asp:TextBox>
        <asp:Button ID="btnSubmit" runat="server" Text="Simple Submit"
            onclick="btnSubmit_Click" />
        <asp:Label ID="lblShow" runat="server"></asp:Label>

     protected void btnSubmit_Click(object sender, EventArgs e)    {        this.lblShow.Text = this.txtInput.Text;    }

程序很简单，将用户输入的内容再显示出来而已。运行代码，然后输入我们的恶意代码，提交。

 <p>Sanitizing <img src=""INVALID-IMAGE" onerror='location.href="http://too.much.spam/"'>!</p>

我们会发现页面自动跳转到http://too.much.spam/页面！这就是所谓的“Html注入”。当page页面render到客户端后，浏览器会按一个普通的html页面进行解析；当解析到上面的js代码时……

为了避免这种入侵，在asp.net中，我们最简单的处理方式就是对输入的内容进行“Html编码”。将后台代码改为：

     protected void btnSubmit_Click(object sender, EventArgs e)    {        this.lblShow.Text = this.Server.HtmlEncode(this.txtInput.Text);    }

现在我们再运行代码，发现源代码被原样输出显示在页面，并没有运行。为什么呢？查看输出页面的源代码：
<span id="lblShow"><p>Sanitizing <img src=""INVALID-IMAGE" onerror='location.href="http://too.much.spam/"'>!</p></span>
整理后，我们发现如下的映射转换：
< -- < (less than)
> -- > (greater than)
" -- " (quota)
所以js无法执行，但在页面显示时，我们确能看到“原汁原味”的js内容。

但问题并没有结束，现实世界中，输入的内容除了恶意代码以外，还可能有如下的内容：

 <span style=" color:blue">黑帽</span>（black hat）SEO主要是指采取<span style=" color:blue">“不怎么道德”</span>（暂时就这么形容吧！）的方式进行搜索引擎优化。

我们希望显示蓝色的文字，但经过编码后，显然无法达到我们的效果。为此，我们还需要进行更精确的过滤。这也是为什么之前我们要设置EnableEventValidation="false" ValidateRequest="false"的现实原因。
其实我最先想到的方案是：首先对整个内容进行编码，然后把我们允许使用的html标签再替换回来。这样是相当保险的，但是在具体的操作中，遇到了很多问题，这个郁闷啊~~~(如果有谁有这种方式的实现代码，千万要拿出来大家分享一下呀)。
我先介绍另一种方案：
首先要取出标签，如，<span style=" color:blue">、</span>和<script >，我们的替换范围仅局限于标签 < > 之间的内容。
然后获取所有的标签名称、属性的名称和值，如果有禁止出现的内容，就替换掉。可能的恶意代码形式如下所示：
标签的名称： <script </script
标签里的属性：<span onclick
属性的值：<img onerror="javascript:"
最后，我们对所有的“恶意单词”进行替换：

代码

using System;
using System.Text.RegularExpressions;

/// <summary>
/// Sanitize contains functionality to remove unaccepted tags or attributes
/// </summary>
public static class Sanitize
{
// list of accepted/harmeless tags (in lower case)
private static string[] allowedTags =
    { "p", "h1", "b", "i", "a", "ul", "li", "pre", "hr", "blockquote", "img" };

// list of attributes that need to be sanitized
private static string badAttributes =
    "onerror|onmousemove|onmouseout|onmouseover|" +
     "onkeypress|onkeydown|onkeyup|javascript:";

// sanitizes the HTML code in $inputHTML
public static string FixTags(string inputHtml)
{
    // define the match evaluator
    // MatchEvaluator 是一个委托，它调用fixTag方法
    MatchEvaluator fixThisLink = new MatchEvaluator(Sanitize.fixTag);

    // process each tags in the input string
    string fixedHtml = Regex.Replace(inputHtml,         //需要替换的字符串
                                     "(<.*?>)",         //正则表达式：注意“?”的使用   --贪婪模式
                                     fixThisLink,       //委托“实例”做参数
                                     RegexOptions.IgnoreCase);
    //整句代码的意思就是：将输入字符串inputHtml中能匹配上"(<.*?>)"的部分（也就是被< >包裹的标签）用fixThisLink方法进行处理

    // return the "fixed" input string
    return fixedHtml;
}

// remove tag if is not in the list of allowed tags
private static string fixTag(Match tagMatch)
{
    string tag = tagMatch.Value;

    // extrag the tag name, such as "a" or "h1"
    Match m = Regex.Match(tag,
                          @"</?(?<tagName>[^\s/]*)[>\s/]",
                          RegexOptions.IgnoreCase);
    string tagName = m.Groups["tagName"].Value.ToLower();

    // if the tag isn't in the list of allowed tags, it should be removed
    if (Array.IndexOf(allowedTags, tagName) < 0)
    {
      return "";
    }

    // remove bad attributes from the tag
    string fixedTag = Regex.Replace(tag,
                        "(" + Sanitize.badAttributes + @")(\s*)(?==)",    // 注意"?=="的意思 --正向预查
                        "SANITIZED", RegexOptions.IgnoreCase);

    // return the altered tag
    return fixedTag;
}
}

注意代码中两处正则表达式的高级用法，贪婪模式和正向预查，详细可参考贪婪模式和正向预查
这里我们就可以看到正则表达式说起到的强大作用——操作字符串的无上利器啊！

2. 除了注入攻击，另一种必须使用的技术是nofollow。因为Google的链接价值算法，我们都希望能有高价值的链接能指向我们的网站，以提高我们网站的等级。一种简单的方式就是到其他网站（如新浪）申请一个博客，然后在博客里添加一条链接，指向自己的网站即可。但如果我们自己是新浪，我们当然不愿意有其他人这样做（毕竟我们不知道其他人链接指向的网站究竟是好是坏，如果是一个垃圾网站，会牵连到我们自己的）。但是呢，我们也不愿意完全禁止掉链接的使用（比如简单的对链接进行编码，让链接失去作用），因为毕竟很多链接或许只是内部链接，而且一个能直接点击的链接能带来更好的用户体验。
为了解决这个问题，Google给出了一个方法，在链接中加上关键字nofollow，如下所示：
<a rel="nofollow" href="http://too.much.spam">cool link</a>
这样，链接能直接点击，但不会带来链接价值——即Google不会认为你认可或推荐了该链接指向的网站。看看博客园有没有这样做，……，呵呵，好像没有，很大度哟。不过据说Google也会逐步降低链接价值的作用，谣言了，随他去吧……
就直接上代码了：

代码

using System;
using System.Text.RegularExpressions;

/// <summary>
/// NoFollow contains the functionality to add rel=nofollow to unstusted links
/// </summary>
public static class NoFollow
{
// the white list of domains (in lower case)
private static string[] whitelist =
     { "seoasp", "www.seoegghead.com", "www.cristiandarie.ro" };

// finds all the links in the input string and processes them using fixLink
public static string FixLinks(string input)
{
    // define the match evaluator
    MatchEvaluator fixThisLink = new MatchEvaluator(NoFollow.fixLink);

    // fix the links in the input string
    string fixedInput = Regex.Replace(input,
                                      "(<a.*?>)",
                                      fixThisLink,
                                      RegexOptions.IgnoreCase);

    // return the "fixed" input string
    return fixedInput;
}

// receives a Regex match that contains a link such as
// <a href="http://too.much.spam/"> and adds ref=nofollow if needed
private static string fixLink(Match linkMatch)
{
    // retrieve the link from the received Match
    string singleLink = linkMatch.Value;

    // if the link already has rel=nofollow, return it back as it is
    if (Regex.IsMatch(singleLink,
                      @"rel\s*?=\s*?['""]?.*?nofollow.*?['""]?",
                      RegexOptions.IgnoreCase))
    {
      return singleLink;
    }

    // use a named group to extract the URL from the link
    Match m = Regex.Match(singleLink,
                          @"href\s*?=\s*?['""]?(?<url>[^'""]*)['""]?",
                          RegexOptions.IgnoreCase);
    string url = m.Groups["url"].Value;

    // if URL doesn't contain http://, assume it's a local link
    if (!url.Contains("http://"))
    {
      return singleLink;
    }

    // extract the host name (such as www.cristiandarie.ro) from the URL
    Uri uri = new Uri(url);
    string host = uri.Host.ToLower();

    // if the host is in the whitelist, don't alter it
    if (Array.IndexOf(whitelist, host) >= 0)
    {
      return singleLink;
    }

    // if the URL already has a rel attribute, change its value to nofollow
    string newLink = Regex.Replace(singleLink,
             @"(?<a>rel\s*=\s*(?<b>['""]?))((?<c>[^'""\s]*|[^'""]*))(?<d>['""]?)?",
             "${a}nofollow${d}",
             RegexOptions.IgnoreCase);

    // if the string had a rel attribute that we changed, return the new link
    if (newLink != singleLink)
    {
      return newLink;
    }

    // if we reached this point, we need to add rel=nofollow to our link
    newLink = Regex.Replace(singleLink, "<a", @"<a rel=""nofollow""",
                            RegexOptions.IgnoreCase);
    return newLink;
}
}

ASP.NET的SEO：SEO Hack --- Html注入和Nofollow的更多相关文章

构建ASP.NET MVC4+EF5+EasyUI+Unity2.x注入的后台管理系统（1）-前言与目录（持续更新中...）
转自:http://www.cnblogs.com/ymnets/p/3424309.html 曾几何时我想写一个系列的文章,但是由于工作很忙,一直没有时间更新博客.博客园园龄都1年了,却一直都是空空 ...
ASP.NET Core中如影随形的”依赖注入”[下]: 历数依赖注入的N种玩法
在对ASP.NET Core管道中关于依赖注入的两个核心对象(ServiceCollection和ServiceProvider)有了足够的认识之后,我们将关注的目光转移到编程层面.在ASP.NET ...
ASP.NET 5 单元测试中使用依赖注入
相关博文:<ASP.NET 5 使用 TestServer 进行单元测试> 在上一篇博文中,主要说的是,使用 TestServer 对 ASP.NET 5 WebApi 进行单元测试,依赖 ...
构建ASP.NET MVC4+EF5+EasyUI+Unity2.x注入的后台管理系统（48）-工作流设计-起草新申请
原文:构建ASP.NET MVC4+EF5+EasyUI+Unity2.x注入的后台管理系统(48)-工作流设计-起草新申请系列目录创建新表单之后,我们就可以起草申请了,申请按照严格的表单步骤和分 ...
构建ASP.NET MVC4+EF5+EasyUI+Unity2.x注入的后台管理系统（47）-工作流设计-补充
原文:构建ASP.NET MVC4+EF5+EasyUI+Unity2.x注入的后台管理系统(47)-工作流设计-补充系列目录补充一下,有人要表单的代码,这个用代码生成器生成表Flow_Form表 ...
构建ASP.NET MVC4+EF5+EasyUI+Unity2.x注入的后台管理系统（46）-工作流设计-设计分支
原文:构建ASP.NET MVC4+EF5+EasyUI+Unity2.x注入的后台管理系统(46)-工作流设计-设计分支系列目录步骤设置完毕之后,就要设置好流转了,比如财务申请大于50000元( ...
构建ASP.NET MVC4+EF5+EasyUI+Unity2.x注入的后台管理系统（45）-工作流设计-设计步骤
原文:构建ASP.NET MVC4+EF5+EasyUI+Unity2.x注入的后台管理系统(45)-工作流设计-设计步骤系列目录步骤设计很重要,特别是规则的选择. 我这里分为几个规则 1.按自行 ...
构建ASP.NET MVC4+EF5+EasyUI+Unity2.x注入的后台管理系统（44）-工作流设计-设计表单
原文:构建ASP.NET MVC4+EF5+EasyUI+Unity2.x注入的后台管理系统(44)-工作流设计-设计表单系列目录设计表单是比较复杂的一步,完成一个表单的设计其实很漫长,主要分为四 ...
构建ASP.NET MVC4+EF5+EasyUI+Unity2.x注入的后台管理系统（43）-工作流设计-字段分类设计
原文:构建ASP.NET MVC4+EF5+EasyUI+Unity2.x注入的后台管理系统(43)-工作流设计-字段分类设计系列目录建立好42节的表之后,每个字段英文表示都是有意义的说明.先建立 ...

随机推荐

CF 219D Choosing Capital for Treeland 树形DP 好题
一个国家,有n座城市,编号为1~n,有n-1条有向边如果不考虑边的有向性,这n个城市刚好构成一棵树现在国王要在这n个城市中选择一个作为首都要求:从首都可以到达这个国家的任何一个城市(边是有向的) ...
ylbtech-Recode(记录)-数据库设计
ylbtech-dbs:ylbtech-Recode(记录)-数据库设计 -- =============================================-- DatabaseName ...
(转)关于rdlc报表的数据源
rdlc 报表字符类数据分为文本数据和表数据,区别就在于文本数据只有一个,表数据可以有多行,然而有很多数据只需要一个传入就可以比如打印某个用户的基本信息,很多信息都是唯一的,如果此时报表传入的数据 ...
配置sql server 2000以允许远程访问及连接中的四个最常见错误
地址:http://www.cnblogs.com/JoshuaDreaming/archive/2010/12/01/1893242.html 配置sql server 2000以允许远程访问适合故 ...
AJAX跨域调用相关知识-CORS和JSONP
1.什么是跨域跨域问题产生的原因,是由于浏览器的安全机制,JS只能访问与所在页面同一个域(相同协议.域名.端口)的内容. 但是我们项目开发过程中,经常会遇到在一个页面的JS代码中,需要通过AJAX去 ...
C++学习32 重载new和delete运算符
内存管理运算符 new.new[].delete 和 delete[] 也可以进行重载,其重载形式既可以是类的成员函数,也可以是全局函数.一般情况下,内建的内存管理运算符就够用了,只有在需要自己管理内 ...
java从控制台读取数据的方式
1. Scanner sc = new Scanner(System.in); String s = sc.readLine(); 2. BufferReader br = new BufferRea ...
zookeeper 监控 —— 阿里taokeeper
TaoKeeper是一个围绕ZooKeeper做的监控与报表系统. 主要功能如下: 能够统计ZK集群连接数,Watcher数目 ,节点数等系列信息,并按一定规则进行一些聚合操作; 能够通过设置一些阈值 ...
js解析或获取页面路径归纳
/** * 当填写参数href后,解析你给的参数,如果为空自动从获取浏览器的地址 *测试路径:>>>http://127.0.0.1:8020/url/index.html?id=1 ...
golang.org/x/mobile/exp/gl/glutil/glimage.go 源码分析
看这篇之前,建议先看之前几篇,这几篇是基础. Go Mobile 例子 basic 源码分析 http://www.cnblogs.com/ghj1976/p/5183199.html OpenGL ...

ASP.NET的SEO：SEO Hack --- Html注入和Nofollow

ASP.NET的SEO：SEO Hack --- Html注入和Nofollow的更多相关文章

随机推荐

热门专题