先说结论

  • tf.feature_column.input_layer()的api,会对传入的feature_columns进行排序,并不是按照输入顺序进行组织,排序依据基于feature_column的name(tf生成的,类似于'u_wu211_indicator', 'u_wu215_indicator', 'r_rsp113_indicator', 'u_wu211_X_u_wu215_indicator'这种。
  • 关键代码:
for column in sorted(feature_columns, key=lambda x: x.name):
ordered_columns.append(column)
  • 代码验证:
In [31]: [x.name for x in sorted( fcs, key=lambda x: x.name)]
Out[31]:
['r_rsp113_indicator',
'u_wu211_X_u_wu215_indicator',
'u_wu211_indicator',
'u_wu215_indicator']

表现

In [24]: u_wu211 = tf.feature_column.categorical_column_with_vocabulary_list(key='u_wu211', vocabulary_list=['0','1','2'])
...: u_wu215 = tf.feature_column.categorical_column_with_vocabulary_list(key='u_wu215', vocabulary_list=['00s','10s','90s'])
...: r_rsp113 = tf.feature_column.categorical_column_with_vocabulary_list(key='r_rsp113', vocabulary_list=['0','-1','1'])
...: u_wu211_u_wu215_cross = tf.feature_column.crossed_column(keys = [u_wu211, u_wu215], hash_bucket_size=3)
...: print(tf.feature_column.input_layer(tfeatures, [tf.feature_column.indicator_column(u_wu211)]))
...: print(tf.feature_column.input_layer(tfeatures, [tf.feature_column.indicator_column(u_wu215)]))
...: print(tf.feature_column.input_layer(tfeatures, [tf.feature_column.indicator_column(r_rsp113)]))
...: print(tf.feature_column.input_layer(tfeatures, [tf.feature_column.indicator_column(u_wu211_u_wu215_cross)]))
...: print(tf.feature_column.input_layer(tfeatures, [tf.feature_column.indicator_column(u_wu211),
...: tf.feature_column.indicator_column(u_wu215),
...: tf.feature_column.indicator_column(r_rsp113),
...: tf.feature_column.indicator_column(u_wu211_u_wu215_cross)
...: ]))
...:
tf.Tensor(
[[1. 0. 0.]
[0. 0. 1.]], shape=(2, 3), dtype=float32)
tf.Tensor(
[[0. 0. 0.]
[1. 0. 0.]], shape=(2, 3), dtype=float32)
tf.Tensor(
[[0. 1. 0.]
[0. 1. 0.]], shape=(2, 3), dtype=float32)
tf.Tensor(
[[0. 0. 1.]
[0. 0. 1.]], shape=(2, 3), dtype=float32)
tf.Tensor(
[[0. 1. 0. 0. 0. 1. 1. 0. 0. 0. 0. 0.]
[0. 1. 0. 0. 0. 1. 0. 0. 1. 1. 0. 0.]], shape=(2, 12), dtype=float32)
  • 由第一条sample举例:期望得到的是u_wu211 + u_wu215 + r_rsp113 + u_wu211_u_wu215_cross

    • 即:[1. 0. 0.] + [0. 0. 0.] + [0. 1. 0.] + [0. 0. 1.]
    • 但得到的却是:[0. 1. 0.] + [0. 0. 1.] + [1. 0. 0.] + [0. 0. 0.],也就是['r_rsp113', 'u_wu211_u_wu215_cross', 'u_wu211', 'u_wu215']

文档描述

    feature_columns: An iterable containing the FeatureColumns to use as inputs
to your model. All items should be instances of classes derived from
`_DenseColumn` such as `numeric_column`, `embedding_column`,
`bucketized_column`, `indicator_column`. If you have categorical features,
you can wrap them with an `embedding_column` or `indicator_column`.
  • feature_columns参数接收一个:包含模型中使用到的FeatureColumns的一个迭代器,列表中的项目都应该是_DenseColumn类的实例化对象,例如numeric_column, embedding_column, bucketized_column, indicator_column.如果是标签类别的特征,需要用embedding_column or indicator_column转换一下。
  • 其中并未解释特征顺序相关问题。

源码探究

  • tf.feature_column.input_layer
@tf_export(v1=['feature_column.input_layer'])
def input_layer(features,
feature_columns,
weight_collections=None,
trainable=True,
cols_to_vars=None,
cols_to_output_tensors=None):
"""Returns a dense `Tensor` as input layer based on given `feature_columns`. Generally a single example in training data is described with FeatureColumns.
At the first layer of the model, this column oriented data should be converted
to a single `Tensor`. Example: ``python
price = numeric_column('price')
keywords_embedded = embedding_column(
categorical_column_with_hash_bucket("keywords", 10K), dimensions=16)
columns = [price, keywords_embedded, ...]
features = tf.io.parse_example(..., features=make_parse_example_spec(columns))
dense_tensor = input_layer(features, columns)
for units in [128, 64, 32]:
dense_tensor = tf.compat.v1.layers.dense(dense_tensor, units, tf.nn.relu)
prediction = tf.compat.v1.layers.dense(dense_tensor, 1)
`` Args:
features: A mapping from key to tensors. `_FeatureColumn`s look up via these
keys. For example `numeric_column('price')` will look at 'price' key in
this dict. Values can be a `SparseTensor` or a `Tensor` depends on
corresponding `_FeatureColumn`.
feature_columns: An iterable containing the FeatureColumns to use as inputs
to your model. All items should be instances of classes derived from
`_DenseColumn` such as `numeric_column`, `embedding_column`,
`bucketized_column`, `indicator_column`. If you have categorical features,
you can wrap them with an `embedding_column` or `indicator_column`.
weight_collections: A list of collection names to which the Variable will be
added. Note that variables will also be added to collections
`tf.GraphKeys.GLOBAL_VARIABLES` and `ops.GraphKeys.MODEL_VARIABLES`.
trainable: If `True` also add the variable to the graph collection
`GraphKeys.TRAINABLE_VARIABLES` (see `tf.Variable`).
cols_to_vars: If not `None`, must be a dictionary that will be filled with a
mapping from `_FeatureColumn` to list of `Variable`s. For example, after
the call, we might have cols_to_vars =
{_EmbeddingColumn(
categorical_column=_HashedCategoricalColumn(
key='sparse_feature', hash_bucket_size=5, dtype=tf.string),
dimension=10): [<tf.Variable 'some_variable:0' shape=(5, 10),
<tf.Variable 'some_variable:1' shape=(5, 10)]}
If a column creates no variables, its value will be an empty list.
cols_to_output_tensors: If not `None`, must be a dictionary that will be
filled with a mapping from '_FeatureColumn' to the associated
output `Tensor`s. Returns:
A `Tensor` which represents input layer of a model. Its shape
is (batch_size, first_layer_dimension) and its dtype is `float32`.
first_layer_dimension is determined based on given `feature_columns`. Raises:
ValueError: if an item in `feature_columns` is not a `_DenseColumn`.
"""
return _internal_input_layer(
features,
feature_columns,
weight_collections=weight_collections,
trainable=trainable,
cols_to_vars=cols_to_vars,
cols_to_output_tensors=cols_to_output_tensors)
  • _internal_input_layer

def _internal_input_layer(features,
feature_columns,
weight_collections=None,
trainable=True,
cols_to_vars=None,
scope=None,
cols_to_output_tensors=None,
from_template=False):
"""See input_layer. `scope` is a name or variable scope to use.""" feature_columns = _normalize_feature_columns(feature_columns)
for column in feature_columns:
if not isinstance(column, _DenseColumn):
raise ValueError(
'Items of feature_columns must be a _DenseColumn. '
'You can wrap a categorical column with an '
'embedding_column or indicator_column. Given: {}'.format(column))
weight_collections = list(weight_collections or [])
if ops.GraphKeys.GLOBAL_VARIABLES not in weight_collections:
weight_collections.append(ops.GraphKeys.GLOBAL_VARIABLES)
if ops.GraphKeys.MODEL_VARIABLES not in weight_collections:
weight_collections.append(ops.GraphKeys.MODEL_VARIABLES) def _get_logits(): # pylint: disable=missing-docstring
builder = _LazyBuilder(features)
output_tensors = []
ordered_columns = []
for column in sorted(feature_columns, key=lambda x: x.name):
ordered_columns.append(column)
with variable_scope.variable_scope(
None, default_name=column._var_scope_name): # pylint: disable=protected-access
tensor = column._get_dense_tensor( # pylint: disable=protected-access
builder,
weight_collections=weight_collections,
trainable=trainable)
num_elements = column._variable_shape.num_elements() # pylint: disable=protected-access
batch_size = array_ops.shape(tensor)[0]
output_tensor = array_ops.reshape(
tensor, shape=(batch_size, num_elements))
output_tensors.append(output_tensor)
if cols_to_vars is not None:
# Retrieve any variables created (some _DenseColumn's don't create
# variables, in which case an empty list is returned).
cols_to_vars[column] = ops.get_collection(
ops.GraphKeys.GLOBAL_VARIABLES,
scope=variable_scope.get_variable_scope().name)
if cols_to_output_tensors is not None:
cols_to_output_tensors[column] = output_tensor
_verify_static_batch_size_equality(output_tensors, ordered_columns)
return array_ops.concat(output_tensors, 1) # If we're constructing from the `make_template`, that by default adds a
# variable scope with the name of the layer. In that case, we dont want to
# add another `variable_scope` as that would break checkpoints.
if from_template:
return _get_logits()
else:
with variable_scope.variable_scope(
scope, default_name='input_layer', values=features.values()):
return _get_logits()
  • 两处需要注意:

    • 在_get_logits中,_LazyBuilder对重复引用的特征做了去重,并且延迟初始化
    • 另外在添加特征中,引入了一个排序,基于feature_column的name(tf生成的,类似于'u_wu211_indicator', 'u_wu215_indicator', 'r_rsp113_indicator', 'u_wu211_X_u_wu215_indicator'这种。
    • 代码如下:
  def _get_logits():  # pylint: disable=missing-docstring
builder = _LazyBuilder(features)
output_tensors = []
ordered_columns = []
for column in sorted(feature_columns, key=lambda x: x.name):
ordered_columns.append(column)
with variable_scope.variable_scope(
None, default_name=column._var_scope_name): # pylint: disable=protected-access

结论验证

In [29]: fcs = [tf.feature_column.indicator_column(u_wu211),
...: tf.feature_column.indicator_column(u_wu215),
...: tf.feature_column.indicator_column(r_rsp113),
...: tf.feature_column.indicator_column(u_wu211_u_wu215_cross)
...: ] In [30]: sorted( fcs, key=lambda x: x.name)
Out[30]:
[IndicatorColumn(categorical_column=VocabularyListCategoricalColumn(key='r_rsp113', vocabulary_list=('0', '-1', '1'), dtype=tf.string, default_value=-1, num_oov_buckets=0)),
IndicatorColumn(categorical_column=CrossedColumn(keys=(VocabularyListCategoricalColumn(key='u_wu211', vocabulary_list=('0', '1', '2'), dtype=tf.string, default_value=-1, num_oov_buckets=0), VocabularyListCategoricalColumn(key='u_wu215', vocabulary_list=('00s', '10s', '90s'), dtype=tf.string, default_value=-1, num_oov_buckets=0)), hash_bucket_size=3, hash_key=None)),
IndicatorColumn(categorical_column=VocabularyListCategoricalColumn(key='u_wu211', vocabulary_list=('0', '1', '2'), dtype=tf.string, default_value=-1, num_oov_buckets=0)),
IndicatorColumn(categorical_column=VocabularyListCategoricalColumn(key='u_wu215', vocabulary_list=('00s', '10s', '90s'), dtype=tf.string, default_value=-1, num_oov_buckets=0))]
In [31]: [x.name for x in sorted( fcs, key=lambda x: x.name)]
Out[31]:
['r_rsp113_indicator',
'u_wu211_X_u_wu215_indicator',
'u_wu211_indicator',
'u_wu215_indicator']
  • 期望结果:
  • ['r_rsp113_indicator', 'u_wu211_X_u_wu215_indicator', 'u_wu211_indicator', 'u_wu215_indicator']
    • 即: [0. 1. 0.] + [0. 0. 1.] + [1. 0. 0.] + [0. 0. 0.]
    • [0. 1. 0. 0. 0. 1. 1. 0. 0. 0. 0. 0.]
    • 与预期一致。

tf.feature_column.input_layer 特征顺序问题的更多相关文章

  1. tensorflow feature_column踩坑合集

    踩坑内容包含以下 feature_column的输入输出类型,用一个数据集给出demo feature_column接estimator feature_column接Keras feature_co ...

  2. tf.estimator

    estimator同keras是tensorflow的高级API.在tensorflow1.13以上,estimator已经作为一个单独的package从tensorflow分离出来了.estimat ...

  3. 使用movielens数据集动手实现youtube推荐候选集生成

    综述 之前在博客中总结过nce损失和YouTuBe DNN推荐;但大多都还是停留在理论层面,没有实践经验.所以笔者想借由此文继续深入探索YouTuBe DNN推荐,另外也进一步总结TensorFlow ...

  4. CTR学习笔记&代码实现5-深度ctr模型 DeepCrossing -> DCN

    之前总结了PNN,NFM,AFM这类两两向量乘积的方式,这一节我们换新的思路来看特征交互.DeepCrossing是最早在CTR模型中使用ResNet的前辈,DCN在ResNet上进一步创新,为高阶特 ...

  5. tensorflow创建自定义 Estimator

    https://www.tensorflow.org/guide/custom_estimators?hl=zh-cn 创建自定义 Estimator 本文档介绍了自定义 Estimator.具体而言 ...

  6. 4. Tensorflow的Estimator实践原理

    1. Tensorflow高效流水线Pipeline 2. Tensorflow的数据处理中的Dataset和Iterator 3. Tensorflow生成TFRecord 4. Tensorflo ...

  7. 创建自定义 Estimator

    ref 本文档介绍了自定义 Estimator.具体而言,本文档介绍了如何创建自定义 Estimator 来模拟预创建的 Estimator DNNClassifier 在解决鸢尾花问题时的行为.要详 ...

  8. TensorFlow低阶API(一)—— 简介

    简介 本文旨在知道您使用低级别TensorFlow API(TensorFlow Core)开始编程.您可以学习执行以下操作: 管理自己的TensorFlow程序(tf.Graph)和TensorFl ...

  9. 【推荐算法工程师技术栈系列】分布式&数据库--tensorflow

    目录 TensorFlow 高阶API Dataset(tf.data) Estimator(tf.estimator) FeatureColumns(tf.feature_column) tf.nn ...

  10. CTR学习笔记&代码实现1-深度学习的前奏LR->FFM

    CTR学习笔记系列的第一篇,总结在深度模型称王之前经典LR,FM, FFM模型,这些经典模型后续也作为组件用于各个深度模型.模型分别用自定义Keras Layer和estimator来实现,哈哈一个是 ...

随机推荐

  1. 关于 import 和 import static

    import 嘛,就是导包.比如说java的一些自带的包,例如 import java.lang.Matn: 又或者我们自己做的包,例如 import com.link.testImport; 一些实 ...

  2. 知乎问题:如何说服技术老大用 Redis ?

    这个问题很微妙,可能这位同学内心深处,觉得 Redis 是所有应用缓存的标配. 缓存的世界很广阔,对于应用系统来讲,我们经常将缓存划分为本地缓存和分布式缓存. 本地缓存 :应用中的缓存组件,缓存组件和 ...

  3. Node.js躬行记(28)——Cypress自动化测试实践

    最近在研究如何提升项目质量,提炼了许多个用于自测的测试用例,但是每次修改后,都手工测试,成本太高,于是就想到了自动化测试. 在一年前已将 Cypress 集成到管理后台的项目中,不过没有投入到实践中. ...

  4. sourcetree和git无法识别新增文件

    在工程中新建文件,但是git和sourcetree无法识别,我是用的是Xcode添加的文件和图片,全都无法识别.例如,新建一个类文件,.h和.m都是别不出来,但是工程文件显示已经添加相对应的类,所以肯 ...

  5. 2021-03-19:给定一个二维数组matrix,其中的值不是0就是1,返回全部由1组成的最大子矩形,内部有多少个1。

    2021-03-19:给定一个二维数组matrix,其中的值不是0就是1,返回全部由1组成的最大子矩形,内部有多少个1. 福大大 答案2021-03-19: 按行遍历二维数组,构造直方图. 单调栈,大 ...

  6. 2021-09-28:合并区间。以数组 intervals 表示若干个区间的集合,其中单个区间为 intervals[i] = [starti, endi] 。请你合并所有重叠的区间,并返回一个不重叠

    2021-09-28:合并区间.以数组 intervals 表示若干个区间的集合,其中单个区间为 intervals[i] = [starti, endi] .请你合并所有重叠的区间,并返回一个不重叠 ...

  7. vue全家桶进阶之路2:JavaScript

    JavaScript(简称"JS")是当前最流行.应用最广泛的客户端脚本语言,用来在网页中添加一些动态效果与交互功能,在 Web 开发领域有着举足轻重的地位.JavaScript ...

  8. 8张图带你全面了解kafka的核心机制

    前言 kafka是目前企业中很常用的消息队列产品,可以用于削峰.解耦.异步通信.特别是在大数据领域中应用尤为广泛,主要得益于它的高吞吐量.低延迟,在我们公司的解决方案中也有用到.既然kafka在企业中 ...

  9. UCOS II源码分析二

    最近大家都沉浸在找到实习的快乐中,自我充电的时间相对减少了,今天重拾一下ucosii的学习,记录如下: 上一篇文章分析了ucosii源码文件组织结构以及简单介绍了各个文件夹里对应文件的功能,要是对uc ...

  10. Flutter热更新技术探索

    一,需求背景: APP发布到市场后,难免会遇到严重的BUG阻碍用户使用,因此有在不发布新版本APP的情况下使用热更新技术立即修复BUG需求.原生APP(例如:Android & IOS)的热更 ...