Tutorials on training the Skip-thoughts vectors for features extraction of sentence. 

1. Send emails and download the training dataset. 

  the dataset used in skip_thoughts vectors is from [BookCorpus]: http://yknzhu.wixsite.com/mbweb

  first, you should send a email to the auther of this paper and ask for the link of this dataset. Then you will download the following files:

  

  unzip these files in the current folders.

2. Open and download the tensorflow version code.   

  Do as the following links: https://github.com/tensorflow/models/tree/master/research/skip_thoughts

  Then, you will see the processing as follows:

         

  [Attention]  when you install the bazel, you need to install this software, but do not update it. Or, it may shown you some errors in the following operations.

3. Install the packages needed. 

4. Encoding Sentences :

  run the following py files.

  

 from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import numpy as np
import os.path
import scipy.spatial.distance as sd
from skip_thoughts import configuration
from skip_thoughts import encoder_manager
import pdb print("==>> Skip-Thought Vector ") # Set paths to the model.
VOCAB_FILE = "./skip_thoughts/pretrained/skip_thoughts_bi_2017_02_16/vocab.txt"
EMBEDDING_MATRIX_FILE = "./skip_thoughts/pretrained/skip_thoughts_bi_2017_02_16/embeddings.npy"
CHECKPOINT_PATH = "./skip_thoughts/model/train/model.ckpt-78581"
# The following directory should contain files rt-polarity.neg and rt-polarity.pos. # Set up the encoder. Here we are using a single unidirectional model.
# To use a bidirectional model as well, call load_model() again with
# configuration.model_config(bidirectional_encoder=True) and paths to the
# bidirectional model's files. The encoder will use the concatenation of
# all loaded models. print("==>> loading the pre-trained models ... ") encoder = encoder_manager.EncoderManager()
encoder.load_model(configuration.model_config(),
vocabulary_file=VOCAB_FILE,
embedding_matrix_file=EMBEDDING_MATRIX_FILE,
checkpoint_path=CHECKPOINT_PATH) print("==>> Done !") # Load the movie review dataset.
data = [' This is my second attempt to the tensorflow version skip_thought_vectors ... '] print("==>> the given sentence is: ", data) # Generate Skip-Thought Vectors for each sentence in the dataset.
encodings = encoder.encode(data) print("==>> the sentence feature is: ", encodings) ## the output feature is 2400 dimension.

wangxiao@AHU:/media/wangxiao/49cd8079-e619-4e4b-89b1-15c86afb5102/skip_thought_vector_onlineModels$ python run_sentence_feature_extraction.py
RuntimeError: module compiled against API version 0xc but this version of numpy is 0xb
RuntimeError: module compiled against API version 0xc but this version of numpy is 0xb
==>> Skip-Thought Vector
==>> loading the pre-trained models ...
WARNING:tensorflow:From ./skip_thoughts/skip_thoughts_model.py:360: create_global_step (from tensorflow.contrib.framework.python.ops.variables) is deprecated and will be removed in a future version.
Instructions for updating:
Please switch to tf.train.create_global_step
2018-05-13 21:36:27.670186: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
==>> Done !
==>> the given sentence is: [' This is my second attempt to the tensorflow version skip_thought_vectors ... ']
==>> the sentence feature is: [[-0.00676637 0.01928637 -0.01759908 ..., 0.00851333 0.00875245 -0.0040213 ]]
> ./run_sentence_feature_extraction.py(48)<module>()
-> print("==>> the encodings[0] is: ", encodings[0])
(Pdb) x = encodings[0]
(Pdb) x.size
2400
(Pdb)

as we can see from above terminal, the output feature vector is 2400-D.   

...

Tutorials on training the Skip-thoughts vectors for features extraction of sentence.的更多相关文章

  1. Training Deep Neural Networks

    http://handong1587.github.io/deep_learning/2015/10/09/training-dnn.html  //转载于 Training Deep Neural ...

  2. Computer Vision Tutorials from Conferences (2) -- ECCV

    ECCV 2012 (http://eccv2012.unifi.it/program/tutorials/) Vision Applications on Mobile using OpenCVGa ...

  3. awesome-nlp

    awesome-nlp  A curated list of resources dedicated to Natural Language Processing Maintainers - Keon ...

  4. [C3] Andrew Ng - Neural Networks and Deep Learning

    About this Course If you want to break into cutting-edge AI, this course will help you do so. Deep l ...

  5. [C2P2] Andrew Ng - Machine Learning

    ##Linear Regression with One Variable Linear regression predicts a real-valued output based on an in ...

  6. pytorch做seq2seq注意力模型的翻译

    以下是对pytorch 1.0版本 的seq2seq+注意力模型做法语--英语翻译的理解(这个代码在pytorch0.4上也可以正常跑): # -*- coding: utf-8 -*- " ...

  7. [C2P3] Andrew Ng - Machine Learning

    ##Advice for Applying Machine Learning Applying machine learning in practice is not always straightf ...

  8. 论文翻译——Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank

    Abstract Semantic word spaces have been very useful but cannot express the meaning of longer phrases ...

  9. 第五章第四周习题: Transformers Architecture with TensorFlow

    目录 Transformer Network Packages 1 - Positional Encoding 1.1 - Sine and Cosine Angles Exercise 1 - ge ...

随机推荐

  1. Runtime(IV) - 序列化与反序列化

    准备条件 父类 Biology Biology.h #import <Foundation/Foundation.h> @interface Biology : NSObject { NS ...

  2. QT构建窗体(父窗体传为野指针)异常案例

    [1]源码 工作中,时常会遇到各种各样的异常场景,有些异常场景很常见,必要备录,以防再犯. 分享本案例为:QT创建窗体时parent父窗体传野指针引起异常. 本案例源码如下: 1.1 默认新建一个QT ...

  3. Java Socket入门

    Java Socket底层采用TCP/IP协议通信,通信细节被封装,我们仅仅需要指定IP.端口,便能轻易地创建TCP或UDP连接,进行网络通信.数据的读写,可以使用我们熟悉的stream进行操作. T ...

  4. Linux基础命令---pgrep

    pgrep pgrep指令可以按名字或者其他属性搜索指定的进程,显示出进程的id到标准输出. 此命令的适用范围:RedHat.RHEL.Ubuntu.CentOS.SUSE.openSUSE.Fedo ...

  5. 转:Http下载文件类 支技断点续传功能

    using System; using System.Collections.Generic; using System.Text; using System.IO; using System.Net ...

  6. 进程表示之进程ID号

    UNIX进程总是会分配一个号码用于在其命名空间总唯一地标识它们,该号码称作进程ID号,简称PID. 1.进程ID 但每个进程除了PID外,还有其他的ID,有下列几种可能的类型: (1)处于某个线程组中 ...

  7. Codeforces 268B - Buttons

    Manao is trying to open a rather challenging lock. The lock has n buttons on it and to open it, you ...

  8. [转载]ViewPort <meta>标记

    ViewPort <meta>标记用于指定用户是否可以缩放Web页面,如果可以,那么缩放到的最大和最小缩放比例是什么.使用 ViewPort <meta>标记还表示文档针对移动 ...

  9. <转>jmeter(十六)配置元件之计数器

    本博客转载自:http://www.cnblogs.com/imyalost/category/846346.html 个人感觉不错,对jmeter讲解非常详细,担心以后找不到了,所以转发出来,留着慢 ...

  10. 运行tomcat报Exception in thread "ContainerBackgroundProcessor[StandardEngine[Catalina]]"

    解决方法1:   手动设置MaxPermSize大小,如果是linux系统,修改TOMCAT_HOME/bin/catalina.sh,如果是windows系统,修改TOMCAT_HOME/bin/c ...