Tutorials on training the Skip-thoughts vectors for features extraction of sentence. 

1. Send emails and download the training dataset. 

  the dataset used in skip_thoughts vectors is from [BookCorpus]: http://yknzhu.wixsite.com/mbweb

  first, you should send a email to the auther of this paper and ask for the link of this dataset. Then you will download the following files:

  

  unzip these files in the current folders.

2. Open and download the tensorflow version code.   

  Do as the following links: https://github.com/tensorflow/models/tree/master/research/skip_thoughts

  Then, you will see the processing as follows:

         

  [Attention]  when you install the bazel, you need to install this software, but do not update it. Or, it may shown you some errors in the following operations.

3. Install the packages needed. 

4. Encoding Sentences :

  run the following py files.

  

 from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import numpy as np
import os.path
import scipy.spatial.distance as sd
from skip_thoughts import configuration
from skip_thoughts import encoder_manager
import pdb print("==>> Skip-Thought Vector ") # Set paths to the model.
VOCAB_FILE = "./skip_thoughts/pretrained/skip_thoughts_bi_2017_02_16/vocab.txt"
EMBEDDING_MATRIX_FILE = "./skip_thoughts/pretrained/skip_thoughts_bi_2017_02_16/embeddings.npy"
CHECKPOINT_PATH = "./skip_thoughts/model/train/model.ckpt-78581"
# The following directory should contain files rt-polarity.neg and rt-polarity.pos. # Set up the encoder. Here we are using a single unidirectional model.
# To use a bidirectional model as well, call load_model() again with
# configuration.model_config(bidirectional_encoder=True) and paths to the
# bidirectional model's files. The encoder will use the concatenation of
# all loaded models. print("==>> loading the pre-trained models ... ") encoder = encoder_manager.EncoderManager()
encoder.load_model(configuration.model_config(),
vocabulary_file=VOCAB_FILE,
embedding_matrix_file=EMBEDDING_MATRIX_FILE,
checkpoint_path=CHECKPOINT_PATH) print("==>> Done !") # Load the movie review dataset.
data = [' This is my second attempt to the tensorflow version skip_thought_vectors ... '] print("==>> the given sentence is: ", data) # Generate Skip-Thought Vectors for each sentence in the dataset.
encodings = encoder.encode(data) print("==>> the sentence feature is: ", encodings) ## the output feature is 2400 dimension.

wangxiao@AHU:/media/wangxiao/49cd8079-e619-4e4b-89b1-15c86afb5102/skip_thought_vector_onlineModels$ python run_sentence_feature_extraction.py
RuntimeError: module compiled against API version 0xc but this version of numpy is 0xb
RuntimeError: module compiled against API version 0xc but this version of numpy is 0xb
==>> Skip-Thought Vector
==>> loading the pre-trained models ...
WARNING:tensorflow:From ./skip_thoughts/skip_thoughts_model.py:360: create_global_step (from tensorflow.contrib.framework.python.ops.variables) is deprecated and will be removed in a future version.
Instructions for updating:
Please switch to tf.train.create_global_step
2018-05-13 21:36:27.670186: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
==>> Done !
==>> the given sentence is: [' This is my second attempt to the tensorflow version skip_thought_vectors ... ']
==>> the sentence feature is: [[-0.00676637 0.01928637 -0.01759908 ..., 0.00851333 0.00875245 -0.0040213 ]]
> ./run_sentence_feature_extraction.py(48)<module>()
-> print("==>> the encodings[0] is: ", encodings[0])
(Pdb) x = encodings[0]
(Pdb) x.size
2400
(Pdb)

as we can see from above terminal, the output feature vector is 2400-D.   

...

Tutorials on training the Skip-thoughts vectors for features extraction of sentence.的更多相关文章

  1. Training Deep Neural Networks

    http://handong1587.github.io/deep_learning/2015/10/09/training-dnn.html  //转载于 Training Deep Neural ...

  2. Computer Vision Tutorials from Conferences (2) -- ECCV

    ECCV 2012 (http://eccv2012.unifi.it/program/tutorials/) Vision Applications on Mobile using OpenCVGa ...

  3. awesome-nlp

    awesome-nlp  A curated list of resources dedicated to Natural Language Processing Maintainers - Keon ...

  4. [C3] Andrew Ng - Neural Networks and Deep Learning

    About this Course If you want to break into cutting-edge AI, this course will help you do so. Deep l ...

  5. [C2P2] Andrew Ng - Machine Learning

    ##Linear Regression with One Variable Linear regression predicts a real-valued output based on an in ...

  6. pytorch做seq2seq注意力模型的翻译

    以下是对pytorch 1.0版本 的seq2seq+注意力模型做法语--英语翻译的理解(这个代码在pytorch0.4上也可以正常跑): # -*- coding: utf-8 -*- " ...

  7. [C2P3] Andrew Ng - Machine Learning

    ##Advice for Applying Machine Learning Applying machine learning in practice is not always straightf ...

  8. 论文翻译——Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank

    Abstract Semantic word spaces have been very useful but cannot express the meaning of longer phrases ...

  9. 第五章第四周习题: Transformers Architecture with TensorFlow

    目录 Transformer Network Packages 1 - Positional Encoding 1.1 - Sine and Cosine Angles Exercise 1 - ge ...

随机推荐

  1. Gitlab注册时报错:There was an error with the reCAPTCHA. Please solve the reCAPTCHA again.

    今天注册时碰到以下问题: 上面的错误是因为注册时有一个google的验证码需要输入.但是中国无法访问google,因此无法访问并输入该验证码导致. 解决方案: FanQiang或者通过Github登陆 ...

  2. UBuntu sudo 命令 :xxx is not in the sudoers file. This incident will be reported.

    [1]分析问题 提示内容翻译成中文即:用户XXX(一般是新添加的用户名称)没有权限使用sudo. 解决方法修改新用户的权限,具体操作即修改一下/etc/sudoers文件. [2]切换至root用户模 ...

  3. [DeploymentService:290066]Error occurred while downloading files from admin server for deployment request "0". Underlying error is: "null"

    weblogic 莫名无法启动: <Apr , :: PM CST> <Error> <Deployer> <BEA-> <Failed to i ...

  4. Error resolving version for plugin 'org.codehaus.mojo:tomcat-maven-plugin'

    从 SNV 导入新工程后,启动工程,但 Maven 报错: Error resolving version for plugin 'org.codehaus.mojo:tomcat-maven-plu ...

  5. SQL中的 group by 1, order by 1 语句

    看到group by 1,2 和 order by 1, 2.看不懂,google,搜到了Stack Overflow 上有回答 What does SQL clause “GROUP BY 1” m ...

  6. RocketMQ 顺序消费只消费一次 坑

    rocketMq实现顺序消费的原理 produce在发送消息的时候,把消息发到同一个队列(queue)中,消费者注册消息监听器为MessageListenerOrderly,这样就可以保证消费端只有一 ...

  7. Python爬虫与数据图表的实现

    要求: 1. 参考教材实例20,编写Python爬虫程序,获取江西省所有高校的大学排名数据记录,并打印输出. 2. 使用numpy和matplotlib等库分析数据,并绘制南昌大学.华东交通大学.江西 ...

  8. The Little Prince-12/07

    The Little Prince-12/07 "My little man, where do you come from? What is this ‘where I live,‘ of ...

  9. The Little Prince-11/30

    The Little Prince-11/30 Today, I have a meeting in our department. I sincerely hope that all of my d ...

  10. 用Intellij IDEA建mybatis案例

    用IDEA建mybatis案例 环境准备: 首先,建库建表(最好用navicat或sqlpro直接导) 然后打开IDEA, 1. java--->javaEE---> java app-- ...