In computer science, a skip list is a data structure that allows fast search within an ordered sequence of elements. Fast search is made possible by maintaining a linked hierarchy of subsequences, each skipping over fewer elements. Searching starts in the sparsest subsequence until two consecutive elements have been found, one smaller and one larger than the element searched for. Via the linked hierarchy these two elements link to elements of the next sparsest subsequence where searching is continued until finally we are searching in the full sequence. The elements that are skipped over may be chosen probabilistically.[2][3]

 

Properties:

  Consists of several levels.
  All keys appear in level 1
  Each level is a sorted list.
  If key x appears in level i, then it also appears in all levels below i
  An element in level i points (via down pointer) to the element with the same key in the level below.
  In each level the keys -1 and 1 appear. (In our implementation, INT_MIN and INT_MAX
  Top points to the smallest element in the highest level.
 
cost:
  The expected number of levels is O( log n )    
   (here n  is the numer of elements)
  The expected time for insert/delete/find is O( log n )
  The expected size (number of cells) is O(n )
 
Skip List
Type List
Invented 1989
Invented by W. Pugh
Time complexity
in big O notation
  Average Worst case
Space O(n) O(n log n)[1]
Search O(log n) O(n)[1]
Insert O(log n) O(n)
Delete O(log n) O(n)

Description

A skip list is built in layers. The bottom layer is an ordinary ordered linked list. Each higher layer acts as an "express lane" for the lists below, where an element in layer i appears in layer i+1 with some fixed probability p (two commonly used values for p are 1/2 or 1/4). On average, each element appears in 1/(1-p) lists, and the tallest element (usually a special head element at the front of the skip list) in  lists.

A search for a target element begins at the head element in the top list, and proceeds horizontally until the current element is greater than or equal to the target. If the current element is equal to the target, it has been found. If the current element is greater than the target, or the search reaches the end of the linked list, the procedure is repeated after returning to the previous element and dropping down vertically to the next lower list. The expected number of steps in each linked list is at most 1/p, which can be seen by tracing the search path backwards from the target until reaching an element that appears in the next higher list or reaching the beginning of the current list. Therefore, the total expected cost of a search is  which is  when p is a constant. By choosing different values of p, it is possible to trade search costs against storage costs.

Implementation details

Inserting elements to skip list

The elements used for a skip list can contain more than one pointer since they can participate in more than one list.

Insertions and deletions are implemented much like the corresponding linked-list operations, except that "tall" elements must be inserted into or deleted from more than one linked list.

 operations, which force us to visit every node in ascending order (such as printing the entire list), provide the opportunity to perform a behind-the-scenes derandomization of the level structure of the skip-list in an optimal way, bringing the skip list to  search time. (Choose the level of the i'th finite node to be 1 plus the number of times we can repeatedly divide i by 2 before it becomes odd. Also, i=0 for the negative infinity header as we have the usual special case of choosing the highest possible level for negative and/or positive infinite nodes.) However this also allows someone to know where all of the higher-than-level 1 nodes are and delete them.

Alternatively, we could make the level structure quasi-random in the following way:

make all nodes level 1
j ← 1
while the number of nodes at level j > 1 do
for each i'th node at level j do
if i is odd
if i is not the last node at level j
randomly choose whether to promote it to level j+1
else
do not promote
end if
else if i is even and node i-1 was not promoted
promote it to level j+1
end if
repeat
j ← j + 1
repeat

Like the derandomized version, quasi-randomization is only done when there is some other reason to be running a  operation (which visits every node).

The advantage of this quasi-randomness is that it doesn't give away nearly as much level-structure related information to an adversarial user as the de-randomized one. This is desirable because an adversarial user who is able to tell which nodes are not at the lowest level can pessimize performance by simply deleting higher-level nodes. The search performance is still guaranteed to be logarithmic.

It would be tempting to make the following "optimization": In the part which says "Next, for each i'th...", forget about doing a coin-flip for each even-odd pair. Just flip a coin once to decide whether to promote only the even ones or only the odd ones. Instead of  coin flips, there would only be  of them. Unfortunately, this gives the adversarial user a 50/50 chance of being correct upon guessing that all of the even numbered nodes (among the ones at level 1 or higher) are higher than level one. This is despite the property that he has a very low probability of guessing that a particular node is at level N for some integer N.

A skip list does not provide the same absolute worst-case performance guarantees as more traditional balanced tree data structures, because it is always possible (though with very low probability) that the coin-flips used to build the skip list will produce a badly balanced structure. However, they work well in practice, and the randomized balancing scheme has been argued to be easier to implement than the deterministic balancing schemes used in balanced binary search trees. Skip lists are also useful in parallel computing, where insertions can be done in different parts of the skip list in parallel without any global rebalancing of the data structure. Such parallelism can be especially advantageous for resource discovery in an ad-hoc Wireless network because a randomized skip list can be made robust to the loss of any single node.[4]

There has been some evidence that skip lists have worse real-world performance and space requirements than B trees due to memory locality and other issues.[5]

Indexable skiplist

As described above, a skiplist is capable of fast  insertion and removal of values from a sorted sequence, but it has only slow  lookups of values at a given position in the sequence (i.e. return the 500th value); however, with a minor modification the speed of random access indexed lookups can be improved to .

For every link, also store the width of the link. The width is defined as the number of bottom layer links being traversed by each of the higher layer "express lane" links.

For example, here are the widths of the links in the example at the top of the page:

   1                               10
o---> o---------------------------------------------------------> o Top level
1 3 2 5
o---> o---------------> o---------> o---------------------------> o Level 3
1 2 1 2 5
o---> o---------> o---> o---------> o---------------------------> o Level 2
1 1 1 1 1 1 1 1 1 1 1
o---> o---> o---> o---> o---> o---> o---> o---> o---> o---> o---> o Bottom level Head 1st 2nd 3rd 4th 5th 6th 7th 8th 9th 10th NIL
Node Node Node Node Node Node Node Node Node Node

Notice that the width of a higher level link is the sum of the component links below it (i.e. the width 10 link spans the links of widths 3, 2 and 5 immediately below it). Consequently, the sum of all widths is the same on every level (10 + 1 = 1 + 3 + 2 + 5 = 1 + 2 + 1 + 2 + 5).

To index the skiplist and find the i'th value, traverse the skiplist while counting down the widths of each traversed link. Descend a level whenever the upcoming width would be too large.

For example, to find the node in the fifth position (Node 5), traverse a link of width 1 at the top level. Now four more steps are needed but the next width on this level is ten which is too large, so drop one level. Traverse one link of width 3. Since another step of width 2 would be too far, drop down to the bottom level. Now traverse the final link of width 1 to reach the target running total of 5 (1+3+1).

 function lookupByPositionIndex(i)
node ← head
i ← i + 1 # don't count the head as a step
for level from top to bottom do
while i ≥ node.width[level] do # if next step is not too far
i ← i - node.width[level] # subtract the current width
node ← node.next[level] # traverse forward at the current level
repeat
repeat
return node.value
end function

This method of implementing indexing is detailed in Section 3.4 Linear List Operations in "A skip list cookbook" by William Pugh.

History

Skip lists were first described in 1990 by William Pugh.[2]

To quote the author:

Skip lists are a probabilistic data structure that seem likely to supplant balanced trees as the implementation method of choice for many applications. Skip list algorithms have the same asymptotic expected time bounds as balanced trees and are simpler, faster and use less space.

Usages

List of applications and frameworks that use skip lists:

Skip lists are also used in distributed applications (where the nodes represent physical computers, and pointers represent network connections) and for implementing highly scalable concurrent priority queues with less lock contention,[7] or even without locking,[8][9][10] as well lockless concurrent dictionaries.[11] There are also several US patents for using skip lists to implement (lockless) priority queues and concurrent dictionaries.[citation needed]

See also

References

  1. Jump up to:a b http://www.cs.uwaterloo.ca/research/tr/1993/28/root2side.pdf
  2. Jump up to:a b Pugh, W. (1990). "Skip lists: A probabilistic alternative to balanced trees"Communications of the ACM 33 (6): 668. doi:10.1145/78973.78977edit
  3. Jump up^ Deterministic skip lists
  4. Jump up^ Shah, Gauri Ph.D.; James Aspnes (December 2003). Distributed Data Structures for Peer-to-Peer Systems (PDF). Retrieved 2008-09-23.
  5. Jump up^ http://resnet.uoregon.edu/~gurney_j/jmpc/skiplist.html
  6. Jump up^ "Redis ordered set implementation".
  7. Jump up^ Shavit, N.; Lotan, I. (2000). "Skiplist-based concurrent priority queues"Proceedings 14th International Parallel and Distributed Processing Symposium. IPDPS 2000. p. 263.doi:10.1109/IPDPS.2000.845994ISBN 0-7695-0574-0edit
  8. Jump up^ Sundell, H.; Tsigas, P. (2003). "Fast and lock-free concurrent priority queues for multi-thread systems". Proceedings International Parallel and Distributed Processing Symposium. p. 11. doi:10.1109/IPDPS.2003.1213189ISBN 0-7695-1926-1edit
  9. Jump up^ Fomitchev, M.; Ruppert, E. (2004). "Lock-free linked lists and skip lists". Proceedings of the twenty-third annual ACM symposium on Principles of distributed computing - PODC '04. p. 50. doi:10.1145/1011767.1011776ISBN 1581138024edit
  10. Jump up^ Bajpai, R.; Dhara, K. K.; Krishnaswamy, V. (2008). "QPID: A Distributed Priority Queue with Item Locality". 2008 IEEE International Symposium on Parallel and Distributed Processing with Applications. p. 215. doi:10.1109/ISPA.2008.90ISBN 978-0-7695-3471-8edit
  11. Jump up^ Sundell, H. K.; Tsigas, P. (2004). "Scalable and lock-free concurrent dictionaries"Proceedings of the 2004 ACM symposium on Applied computing - SAC '04. p. 1438.doi:10.1145/967900.968188ISBN 1581138121edit

External links

Demo applets
Implementations

Skip list--reference wiki的更多相关文章

  1. Red–black tree ---reference wiki

    source address:http://en.wikipedia.org/wiki/Red%E2%80%93black_tree A red–black tree is a type of sel ...

  2. lua weak table 概念解析

    lua weak table 经常看到lua表中有 weak table的用法, 例如: weak_table = setmetatable({}, {__mode="v"}) 官 ...

  3. Hash Map (Hash Table)

    Reference: Wiki  PrincetonAlgorithm What is Hash Table Hash table (hash map) is a data structure use ...

  4. I/O exception (java.net.SocketException) caught when processing request: Connect

    Exception [一个故障引发的话题] 最近,项目中的短信模块收到一个故障日志,要求我协助调查一下: 2010-05-07 09:22:07,221 [?:?] INFO  httpclient. ...

  5. 一些日常工具集合(C++代码片段)

    一些日常工具集合(C++代码片段) ——工欲善其事,必先利其器 尽管不会松松松,但是至少维持一个比较小的常数还是比较好的 在此之前依然要保证算法的正确性以及代码的可写性 本文依然会持久更新,因为一次写 ...

  6. Visual Studio 2019 for Mac 离线更新方法

    当你打开Visual Studio 2019 for Mac检查更新时,如果下载更新包很慢,可以尝试如下操作: 打开Finder(访达),找到~/Library/Caches/VisualStudio ...

  7. Torrent文件的解析与转换

    Torrent简介 BitTorrent协议的种子文件(英语:Torrent file)可以保存一组文件的元数据.这种格式的文件被BitTorrent协议所定义.扩展名一般为".torren ...

  8. Implementing the skip list data structure in java --reference

    reference:http://www.mathcs.emory.edu/~cheung/Courses/323/Syllabus/Map/skip-list-impl.html The link ...

  9. snakeyaml - Documentation.wiki

    SnakeYAML Documentation This documentation is very brief and incomplete. Feel free to fix or improve ...

随机推荐

  1. <Stackoverflow> 声望和节制

    什么是声望(reputation)?我是怎样获得(或失去)它的? 声望是一种粗略的测量,用来表示社区对你的信任度.通过让别人相信你知道自己正在讨论什么来获得.对网站的基本使用,包括问一个问题,回答,建 ...

  2. ArrayList、LinkedList、HashMap的遍历及遍历过程中增、删元素

    ArrayList.LinkedList.HashMap是Java中常用到的几种集合类型,遍历它们是时常遇到的情况.当然还有一些变态的时候,那就是在遍历的过程中动态增加或者删除其中的元素. 下面的例子 ...

  3. RabbitMQ (五)主题(Topic) -摘自网络

    虽然使用direct类型改良了我们的系统,但是仍然存在一些局限性:它不能够基于多重条件进行路由选择. 在我们的日志系统中,我们有可能希望不仅根据日志的级别而且想根据日志的来源进行订阅.这个概念类似un ...

  4. 关于 终端 ls 命令 不能区分文件和目录的问题

    默认的,使用ls命令来显示目录内容的时候,“终端”对于目录.可执行文件等特殊类型的文件并没有使用颜色来显示,只有使用“ls -G”时,才能显示颜色,这可真是不方便.有没有方法可以默认显示颜色呢?方法当 ...

  5. 【转】nginx的优缺点

    原博文出自于:http://blog.csdn.net/a454211787/article/details/22494485     感谢! 1.nginx相对于apache优点: 轻量级同样起we ...

  6. homework-02 "最大子数组之和"的问题进阶

    代码编写 这次的作业瞬间难了好多,无论是问题本身的难度或者是单元测试这一原来没接触过的概念或者是命令行参数的处理这些琐碎的问题,都使得这次作业的完成说不上轻松. 最大子数组之和垂直水平相连的拓展问题解 ...

  7. php 判断图片类型

    脚本之家 <?php $imgurl = "http://www.jb51.net/images/logo.gif"; //方法1 echo $ext = strrchr($ ...

  8. Red5点播和直播的实现

    (一)        Red5流媒体服务器介绍Red5是一个采用Java开发开源的Flash流媒体服务器.它支持:把音频(MP3)和视频(FLV)转换成播放流: 录制客户端播放流(只支持FLV):共享 ...

  9. aspose.cell制作excel常见写法

    //设置Excel的基本格式信息 Workbook workbook = new Workbook(); Worksheet worksheet = workbook.Worksheets[]; St ...

  10. POJ1228(稳定凸包问题)

    题目:Grandpa's Estate   题意:输入一个凸包上的点(没有凸包内部的点,要么是凸包顶点,要么是凸包边上的点),判断这个凸包是否稳定.所谓稳 定就是判断能不能在原有凸包上加点,得到一个更 ...