Tutorials on training the Skip-thoughts vectors for features extraction of sentence.
Tutorials on training the Skip-thoughts vectors for features extraction of sentence.
1. Send emails and download the training dataset.
the dataset used in skip_thoughts vectors is from [BookCorpus]: http://yknzhu.wixsite.com/mbweb
first, you should send a email to the auther of this paper and ask for the link of this dataset. Then you will download the following files:

unzip these files in the current folders.
2. Open and download the tensorflow version code.
Do as the following links: https://github.com/tensorflow/models/tree/master/research/skip_thoughts
Then, you will see the processing as follows:
[Attention] when you install the bazel, you need to install this software, but do not update it. Or, it may shown you some errors in the following operations.
3. Install the packages needed.
- Bazel (instructions)
- TensorFlow (instructions)
- NumPy (instructions)
- scikit-learn (instructions)
- Natural Language Toolkit (NLTK)
- First install NLTK (instructions)
- Then install the NLTK data (instructions)
- gensim (instructions)
- Only required if you will be expanding your vocabulary with the word2vec model.
4. Encoding Sentences :
run the following py files.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import numpy as np
import os.path
import scipy.spatial.distance as sd
from skip_thoughts import configuration
from skip_thoughts import encoder_manager
import pdb print("==>> Skip-Thought Vector ") # Set paths to the model.
VOCAB_FILE = "./skip_thoughts/pretrained/skip_thoughts_bi_2017_02_16/vocab.txt"
EMBEDDING_MATRIX_FILE = "./skip_thoughts/pretrained/skip_thoughts_bi_2017_02_16/embeddings.npy"
CHECKPOINT_PATH = "./skip_thoughts/model/train/model.ckpt-78581"
# The following directory should contain files rt-polarity.neg and rt-polarity.pos. # Set up the encoder. Here we are using a single unidirectional model.
# To use a bidirectional model as well, call load_model() again with
# configuration.model_config(bidirectional_encoder=True) and paths to the
# bidirectional model's files. The encoder will use the concatenation of
# all loaded models. print("==>> loading the pre-trained models ... ") encoder = encoder_manager.EncoderManager()
encoder.load_model(configuration.model_config(),
vocabulary_file=VOCAB_FILE,
embedding_matrix_file=EMBEDDING_MATRIX_FILE,
checkpoint_path=CHECKPOINT_PATH) print("==>> Done !") # Load the movie review dataset.
data = [' This is my second attempt to the tensorflow version skip_thought_vectors ... '] print("==>> the given sentence is: ", data) # Generate Skip-Thought Vectors for each sentence in the dataset.
encodings = encoder.encode(data) print("==>> the sentence feature is: ", encodings) ## the output feature is 2400 dimension.
wangxiao@AHU:/media/wangxiao/49cd8079-e619-4e4b-89b1-15c86afb5102/skip_thought_vector_onlineModels$ python run_sentence_feature_extraction.py
RuntimeError: module compiled against API version 0xc but this version of numpy is 0xb
RuntimeError: module compiled against API version 0xc but this version of numpy is 0xb
==>> Skip-Thought Vector
==>> loading the pre-trained models ...
WARNING:tensorflow:From ./skip_thoughts/skip_thoughts_model.py:360: create_global_step (from tensorflow.contrib.framework.python.ops.variables) is deprecated and will be removed in a future version.
Instructions for updating:
Please switch to tf.train.create_global_step
2018-05-13 21:36:27.670186: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
==>> Done !
==>> the given sentence is: [' This is my second attempt to the tensorflow version skip_thought_vectors ... ']
==>> the sentence feature is: [[-0.00676637 0.01928637 -0.01759908 ..., 0.00851333 0.00875245 -0.0040213 ]]
> ./run_sentence_feature_extraction.py(48)<module>()
-> print("==>> the encodings[0] is: ", encodings[0])
(Pdb) x = encodings[0]
(Pdb) x.size
2400
(Pdb)
as we can see from above terminal, the output feature vector is 2400-D.
...
Tutorials on training the Skip-thoughts vectors for features extraction of sentence.的更多相关文章
- Training Deep Neural Networks
http://handong1587.github.io/deep_learning/2015/10/09/training-dnn.html //转载于 Training Deep Neural ...
- Computer Vision Tutorials from Conferences (2) -- ECCV
ECCV 2012 (http://eccv2012.unifi.it/program/tutorials/) Vision Applications on Mobile using OpenCVGa ...
- awesome-nlp
awesome-nlp A curated list of resources dedicated to Natural Language Processing Maintainers - Keon ...
- [C3] Andrew Ng - Neural Networks and Deep Learning
About this Course If you want to break into cutting-edge AI, this course will help you do so. Deep l ...
- [C2P2] Andrew Ng - Machine Learning
##Linear Regression with One Variable Linear regression predicts a real-valued output based on an in ...
- pytorch做seq2seq注意力模型的翻译
以下是对pytorch 1.0版本 的seq2seq+注意力模型做法语--英语翻译的理解(这个代码在pytorch0.4上也可以正常跑): # -*- coding: utf-8 -*- " ...
- [C2P3] Andrew Ng - Machine Learning
##Advice for Applying Machine Learning Applying machine learning in practice is not always straightf ...
- 论文翻译——Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank
Abstract Semantic word spaces have been very useful but cannot express the meaning of longer phrases ...
- 第五章第四周习题: Transformers Architecture with TensorFlow
目录 Transformer Network Packages 1 - Positional Encoding 1.1 - Sine and Cosine Angles Exercise 1 - ge ...
随机推荐
- EasyUI表格DataGrid前端分页和后端分页的总结
Demo简介 Demo使用Java.Servlet为后台代码(数据库已添加数据),前端使用EasyUI框架,后台直接返回JSON数据给页面 1.配置Web.xml文件 <?xml version ...
- Azure Messaging-ServiceBus Messaging消息队列技术系列2-编程SDK入门
各位,上一篇基本概念和架构中,我们介绍了Window Azure ServiceBus的消息队列技术的概览.接下来,我们进入编程模式和详细功能介绍模式,一点一点把ServiceBus技术研究出来. 本 ...
- GO富集分析
GO的主要用途之一是对基因组进行富集分析.例如,给定一组在特定条件下上调的基因,富集分析将使用该基因组的注释发现哪些GO术语被过度表示(或未充分表示). 富集分析工具 用户可以直接从GOC网站的 ...
- 转:【专题五】TCP编程
前言 前面专题的例子都是基于应用层上的HTTP协议的介绍, 现在本专题来介绍下传输层协议——TCP协议,主要介绍下TCP协议的工作过程和基于TCP协议的一个简单的通信程序,下面就开始本专题的正文了. ...
- P1192 台阶问题
递推问题,要用到递推式: 设f(n)为n个台阶的走法总数,把n个台阶的走法分成k类:第1类:第1步走1阶,剩下还有n-1阶要走,有f(n-1)种方法: 第2类:第1步走2阶,剩下还有n-2阶要走,有f ...
- Cannot assign “A1”: “B1” must be a “C1” instance.
应用 django FORM 录入数据 必须 item_id supplier_id 不能item, supplier
- RedisLive安装
环境安装 Python2.7 [root@ ~]# yum install -y readline readline-devel [root@ ~]# yum install sqlite-devel ...
- Git clone 报错 Unable to negotiate with xxx.xxx.xxx.xxx port 12345: no matching cipher found. Their offer: aes128-cbc,3des-cbc,blowfish-cbc
git clone 报错 Unable to negotiate with xxx.xxx.xxx.xxx. port 12345: no matching cipher found. Their o ...
- ssh 工具
ssh 使用 rsa key 实现无密码访问 server A 要用 rsa key 验证的方式访问 server B 在A上创建/root/.ssh/目录 > chmod -R 700 /ro ...
- 09:vuex组件间通信
1.1 vuex简介 官网:https://vuex.vuejs.org/zh/guide/ 参考博客:https://www.cnblogs.com/first-time/p/6815036.htm ...