Did you ever wish you had more control over how your content is indexed and presented as search results? In SharePoint 2013, you have the option to add a callout from the content processing component to your own web service so you can manipulate the content before it gets indexed. This makes it possible to create new ways to pivot over the search results. Here are some examples of what you could do:

  • Normalize the case for refinable managed properties to make the refiners look cleaner.

  • Create new refiners by extracting data from unstructured text.

  • Calculate new refiners based on managed property values.

This feature is similar to the pipeline extensibility stage in FAST Search for SharePoint 2010, but there are some differences:

  • In FAST Search for SharePoint 2010, the code was executed in a sandbox, whereas SharePoint 2013 lets you put the code in a web service. This reduces the overhead and lets you run the code anywhere.
  • The web service processes and returns managed properties, not crawled properties. Managed properties correspond to what actually gets indexed and are therefore easier to manage.
  • You can define a trigger to limit the set of items that are processed by the web service. This will optimize the overall performance if you only want to process a subset of the content.

In this blog post, we’ll go through a basic scenario where we have a list of popular movies. The metadata is not consistent with regards to casing, and there is limited metadata to use for pivoting over the search results.

We will walk you through how you can go from this search experience …

… to the one below, without modifying the original content. Notice that the list Title and the Director refiner have been normalized to title case, and that there is now a new refiner called “YearsSinceRelease”.

To understand what’s possible to achieve with the content enrichment web service, it is important to have a good conceptual understanding of what goes on during content processing, and where the web service callout takes place in relation to everything else.

This blog post goes into detail on what happens during content processing and how the web service callout fits into the overall picture. We’ll then show you how to create a sample list and create a web service that manipulates the data in the list to create new search refiners.

What happens during content processing

The content processing component receives crawled Properties from the crawler component and outputs managed properties to the index component, but what goes on in-between?

Inside the content processing component, there are “flows” that process one crawled item at a time. When an item has been indexed, a callback is sent back to the crawler to acknowledge whether the item is searchable or not. The success or failure of an item can then be inspected by the administrator in the Search Administration (navigate to Crawl Log and then to Error Breakdown).

The flow is the specification for how the crawled item should be processed to prepare it for indexing. The flow has branches that handle different operations, like inserts, deletes, and partial updates. The branches that handle deletions and partial updates do not have a web service callout.

The main branch of the flow handles insertion of new and updated documents and contains different stages that extract information from the crawled item and create managed properties.

At the start of the flow, new crawled properties are registered in the Search administration database. This is followed by a stage that parses binary document formats such as Office or PDF documents. During document parsing, there is a callout to IFilters for certain document types. Document parsing adds new crawled properties from the parsers. A crawled property cannot be mapped to a managed property until it has been registered in the admin database. After all crawled properties have been collected, another stage maps crawled properties to managed properties according to the Search schema. All stages after this one work on managed properties only.

The next stage processes security descriptors and converts them to the internal format used by the index component. Automatic Language detection takes place before the web service callout, and sets the value of the languages managed property. After the web service callout, there is processing related to people search that creates phonetic name variations.

Next, word breaking is done on all managed properties that are marked for word breaking in the Search schema. This is followed by entity extraction and other stages, like metadata extraction and document summarization. Links that are discovered within the document are written to the Analytics reporting database for later processing. Finally, the managed properties are indexed.

To summarize, it is important to note that the web service callout can only read managed properties. Any crawled property value that the web service needs as input must first be mapped to a managed property. The web service callout can only Access managed properties that exist before the web service callout, and not managed properties that are set further down in the flow. The web service callout can pass managed properties back to the flow, but only if they are a part of the Search schema.

The “Popular Movie” web service

Let’s go through a full scenario that covers everything from creating the content and the web service, to customizing the search experience with new refiners. The first thing we’ll do is to create a list of some all-time popular movies with information about the release year, director, and whether the film received an Oscar. We’ll then create a web service that does a couple of things to improve the search experience:

  • Calculates the years since the movie was released by looking at the ReleaseYear managed property

  • Normalizes the Director managed property to title case

  • Creates a time stamp for when the web service processed the item

These are the steps that we’ll walk you through:

  • Create a sample list

  • Create the web service

  • Configure crawled and managed properties

  • Crawl and search the content

Create a sample list

Open the SharePoint 2013 Management Shell as administrator. Make sure that the user you are logged on as is a member of the WSS_ADMIN_WG
group, because we need write access to SharePoint resources.

Type the following commands to create a new list:

$SPWeb = Get-SPWeb http://myserver 

$CustomListTemplate = $SPWeb.ListTemplates | where {$_.Name -eq "Custom List"}

$listUrl = "PopularMovies"

$description = "A list of popular movies"

$lists = $SPWeb.Lists

$lists.Add($listUrl, $description, $CustomListTemplate)

You now can browse to your list by going to http://myserver/lists/PopularMovies

$list = $lists | where {$_.Title -eq "PopularMovies"}  

Below are the commands to create the different fields (columns) in the list.

$spFieldType = [Microsoft.SharePoint.SPFieldType]::Text $list.Fields.Add("Director", $spFieldType, $false) $spFieldType = [Microsoft.SharePoint.SPFieldType]::Integer $list.Fields.Add("ReleaseYear", $spFieldType, $false) $spFieldType = [Microsoft.SharePoint.SPFieldType]::Boolean $list.Fields.Add("WonOscar", $spFieldType, $false) $spFieldType = [Microsoft.SharePoint.SPFieldType]::DateTime $list.Fields.Add("ReleaseDate",$spFieldType, $false) $list.Update()  

If you want the newly created fields to be part of the default list view, you can add them to the default view. This is an example of how we did it for the Director field.

$spView = $spWeb.GetViewFromUrl("/Lists/PopularMovies/Allitems.aspx”) $spfield = $list.Fields["Director"] 

$spview.ViewFields.Add($spfield)

$spview.Update()

Now we are ready to start populating the list. Each list item represents a movie, and we start off with “Pulp Fiction”.

$spListItem1 = $list.AddItem() $spListItem1["Title"] = "Pulp Fiction" 
 
$spListItem1["ReleaseYear"]=1994 
 
$spListItem1["Director"]="Quentin Tarantino" 
 
$date = Get-Date "10/21/1994" 
$spListItem1["ReleaseDate"]=$date 
 
$spListItem1.Update()

The screen shot below shows the list after we have added a few more movies:

The Title and the Director have been added with inconsistent casing to show how we can fix this automatically in the web service.

Create the web service

Now let’s create a web service that can read the list we created, and create some new managed properties that we can use as refiners.

using System;  using System.Collections.Generic;  using System.IO;  using Microsoft.Office.Server.Search.ContentProcessingEnrichment;  using Microsoft.Office.Server.Search.ContentProcessingEnrichment.PropertyTypes;  using System.Globalization;  using System.Threading;   namespace PopularMovieService {  public class PopularMovieService : IContentProcessingEnrichmentService {  // Define variables to hold the managed properties that the  // web service will populate.  private Property<Int64> NewIntegerMP = new Property<Int64>();  private Property<DateTime> NewDateTimeMP = new Property<DateTime>();  private readonly ProcessedItem processedItemHolder = 
new ProcessedItem { ItemProperties = new List<AbstractProperty>() };  public ProcessedItem ProcessItem(Item item) {  // Iterate over all managed properties passed to the web service.  foreach (var property in item.ItemProperties) {  var s = property as Property; if (s != null) {  // The value of the new text managed property is the  // string in title case.  CultureInfo cultureInfo = Thread.CurrentThread.CurrentCulture;  TextInfo textInfo = cultureInfo.TextInfo;  string normalizedString = textInfo.ToTitleCase(s.Value.ToLower());  s.Value = normalizedString;  processedItemHolder.ItemProperties.Add(s);  }  var l = property as Property; if (l != null) {  // The value of the new integer managed property the  // number of years since the release date.  int CurrentYear = DateTime.Now.Year; NewIntegerMP.Name = "YearsSinceRelease";  NewIntegerMP.Value = CurrentYear - l.Value;  processedItemHolder.ItemProperties.Add(NewIntegerMP); }  // Set the time for when the properties where added by the  // web service. NewDateTimeMP.Name = "ModifiedByWebService";  NewDateTimeMP.Value = DateTime.Now;  processedItemHolder.ItemProperties.Add(NewDateTimeMP); }  return processedItemHolder; }  }} 

Configure crawled and managed properties

When you create a list in SharePoint, the column names are picked up by the crawler as crawled properties. The name of the crawled property is the same as the list column name, but with “ows_” in front, so “ReleaseYear” becomes “ows_ReleaseYear”. There are several ways a new crawled property can be registered as part of the Search schema: by crawling or by adding it programmatically, for example, through a Windows PowerShell cmdlet. In this blog post, we stick to Windows PowerShell whenever possible.

$cp = New-SPEnterpriseSearchMetadataCrawledProperty -SearchApplication $ssa -Category SharePoint -Name "ows_ReleaseYear" -IsNameEnum $false -PropSet "00130329-0000-0130-c000-000000131346" –VariantType 0

$mp = New-SPEnterpriseSearchMetadataManagedProperty -SearchApplication $ssa -Name "ReleaseYear" -Type 2 -Queryable $True

$mp.Refinable = $True

$mp.Update()

New-SPEnterpriseSearchMetadataMapping -SearchApplication $ssa -ManagedProperty $mp -CrawledProperty $cp

We need to do the same for the other managed properties (Director, WonOscar, and ReleaseDate). For easy reference, these are the types to use when creating the managed property: Text = 1, Integer = 2, DateTime = 4, and Boolean (YesNo) = 5.

We also need to create the new managed properties that the web service populates:

$mp = New-SPEnterpriseSearchMetadataManagedProperty -SearchApplication $ssa –Name "ModifiedByWebService" –Type 4 –Queryable $True

$mp.Refinable = $True

$mp.Update()

$mp = New-SPEnterpriseSearchMetadataManagedProperty -SearchApplication $ssa –Name "YearsSinceRelease" –Type 2 –Queryable $True

$mp.Refinable = $True

$mp.Update

Crawl and search the content

Before starting the crawl, we must first enable and configure the web service callout. This is done in Windows PowerShell.

$config = New-SPEnterpriseSearchContentEnrichmentConfiguration

$config.Endpoint = "http://localhost:817/PopularMovieService.svc"

$config.InputProperties = "Director", "Title", "ReleaseYear"

$config.OutputProperties = "Director", "Title", "YearsSinceRelease", "ModifiedByWebService"

Set-SPEnterpriseSearchContentEnrichmentConfiguration -SearchApplication $ssa -ContentEnrichmentConfiguration $config

We are now ready to kick off a full crawl of the list we created, and observe what gets sent to the web service.

If you don’t already have an Enterprise Search Center, you can create one using the site collection template called “Enterprise Search Center” under the Enterprise tab.

To make it easier to see the effect of the web service, we’ve customized the search center to show refiners for the managed Properties that the web service modifies.

To show the new refiners, we will edit the result template in the search site collection. First we do a query to see the result page with refiners. If we search for “Pulp Fiction”, this will show a refiner for Modified Date, which is one of the out-of-the box refiners. Click the tools icon at the upper-right, and then click Edit Page.

Click Edit webpart for the Refinement Web Part. This brings up a configuration panel, which lets you choose refiners for the Web Part. Add all the new managed properties as refiners.

We now need to check in and publish our changes.

If we now try out a query for “ReleaseYear>0”, we get back all three results with refiners, just like the screen shot at the beginning of the blog post.

Looking for more information?

Check out the MSDN documentation for content enrichment . There is also a How to article that gives you an example of how to create a content enrichment web service in Visual Studio.

Finally, look out for our upcoming blog posts on that will provide best practices for debugging the content enrichment service and show you how to use WCF Routing to do content-based routing and load balancing

原文地址:

http://blogs.msdn.com/b/sharepointdev/archive/2012/11/13/customize-the-sharepoint-2013-search-experience-with-a-content-enrichment-web-service.aspx

Customize the SharePoint 2013 search experience with a Content Enrichment web service的更多相关文章

  1. SharePoint 2013 Search 配置总结

    前言:SharePoint 2013集成了Fast搜索以后,搜索的配置有了些许改变,自己在配置过程中,也记录下了一些入门的东西,希望能够对大家有所帮助. 1.配置搜索服务,首先需要启用搜索功能,基本思 ...

  2. SharePoint 2013 Search REST API 使用示例

    前言:在SharePoint2013中,提供Search REST service搜索服务,你可以在自己的客户端搜索方法或者移动应用程序中使用,该服务支持REST web request.你可以使用K ...

  3. 配置SharePoint 2013 Search 拓扑结构

    在单台服务器上安装了 SharePoint Server 2013,并且创建了具有默认搜索拓扑的 Search Service 应用程序.在默认搜索拓扑中,所有搜索组件都位于承载管理中心的服务器上.S ...

  4. 修复SharePoint 2013 Search 拓扑错误

    Problem 当创建和配置SharePoint 2013 Search Service Application成功之后,进入详细配置页后,在Search Application Topology节点 ...

  5. 探索 SharePoint 2013 Search功能

    转载: http://www.cnblogs.com/OceanEyes/p/SharePont-2013-Search-Develop.html SharePoint 2013的搜索功能很强大,特别 ...

  6. SharePoint 2013: Search Architecture in SPC202

    http://social.technet.microsoft.com/wiki/contents/articles/15989.sharepoint-2013-search-architecture ...

  7. 转载 SharePoint 2013 Search功能

    转载原出处: http://www.cnblogs.com/OceanEyes/p/SharePont-2013-Search-Develop.html 阅读目录 启用Search Service A ...

  8. sharepoint 2013 search configuration

    在建立search application之前完成以下命令 $hostA = Get-SPEnterpriseSearchServiceInstance -Identity "WTCSPS0 ...

  9. SharePoint 2013 开发——搜索架构及扩展

    博客地址:http://blog.csdn.net/FoxDave SharePoint 2013高度整合了搜索引擎,在一个场中只有一个搜索服务应用程序(SSA).它集成了FAST,只有一个代码库 ...

随机推荐

  1. 码云分布式之 Brzo 服务器

    摘要: 码云是国内最大的代码托管平台,为了支持更大的用户规模,开发团队也在对一些组件进行大规模的重构. 前言 码云是国内最大的代码托管平台.码云基于 Gitlab 5.5 开发,经过几年的开发已经和官 ...

  2. Linux协议栈函数调用流程

    普通网络驱动程序中必须要调用的函数是eth_type_trans(略),然后向上递交sk_buff时调用netif_rx()(net/core/dev.c).其函数中主要几行 __skb_queue_ ...

  3. vs2010创建和使用动态链接库(dll)

    本文将创建一个简单的动态链接库,并编写一个应用台控制程序使用该动态链接库,并提出了与实现相关的几个问题,供初学者交流. 本文包含以下内容: 创建动态链接库项目 向动态链接库添加类 创建引用动态链接库的 ...

  4. 在eclipse中使用jetty插件替代m2e开发调试maven web项目

    第一步在相应的web项目上配置jetty插件,配置如下: <plugin> <groupId>org.mortbay.jetty</groupId> <art ...

  5. BZOJ_1003_[ZJOI2006]_物流运输_(动态规划+最短路)

    描述 http://www.lydsy.com/JudgeOnline/problem.php?id=1003 m个码头,从1运货到m,n天每天运,其中有一些码头在特定的天里不能使用.运货的代价:在两 ...

  6. StorSimple 简介

     2014年 10月 28日,星期二 PRACHEETI NAGARKAR DESAI 混合云存储业务资深项目经理 在此我很荣幸地宣布StorSimple解决方案已经在中国正式上市.该方案为IT部 ...

  7. 备份及还原Xcode的模拟器

    http://blog.csdn.net/it_magician/article/details/8749876 每次更新或者重装Xcode之后,最麻烦的莫过于各个模拟器的安装了,因为下载速度实在让人 ...

  8. 【转】Ubuntu下配置samba服务器--不错

    原文网址:http://my.oschina.net/junn/blog/171388 设置虚拟机的网络方式为桥接方式: 一. samba的安装: sudo apt-get insall samba  ...

  9. (转载)puremvc框架之proxy

    (转载)http://www.cnblogs.com/yjmyzz/archive/2010/08/01/1789769.html 上一篇 puremvc框架之Command 里,已经学习了如何利用C ...

  10. Qt学习之路(1)------Qt常用类用法说明

    Qt常用类 向控制台输出文本 第一个例子,我们采用STL的方式: console.cpp #include <iostream> int main() { std::cout <&l ...