以前在POD里跑起来,没问题的示例代码。

移到jupyter中,多给两个GPU,有时运行就会爆出这个错误:

于是,按网上的意见,暂时加了个使用GPU的指定,

暂时搞定。

如下红色部分。

import timeit
import os
import tensorflow as tf
import numpy as np
from tensorflow.keras.datasets.cifar10 import load_data

os.environ['CUDA_VISIBLE_DEVICES'] = '0,1'

def model():
    x = tf.placeholder(tf.float32, shape=[None, 32, 32, 3])
    y = tf.placeholder(tf.float32, shape=[None, 10])
    rate = tf.placeholder(tf.float32)
    # convolutional layer 1
    conv_1 = tf.layers.conv2d(x, 32, [3, 3], padding='SAME', activation=tf.nn.relu)
    max_pool_1 = tf.layers.max_pooling2d(conv_1, [2, 2], strides=2, padding='SAME')
    drop_1 = tf.layers.dropout(max_pool_1, rate=rate)
    # convolutional layer 2
    conv_2 = tf.layers.conv2d(drop_1, 64, [3, 3], padding="SAME", activation=tf.nn.relu)
    max_pool_2 = tf.layers.max_pooling2d(conv_2, [2, 2], strides=2, padding="SAME")
    drop_2 = tf.layers.dropout(max_pool_2, rate=rate)
    # convolutional layers 3
    conv_3 = tf.layers.conv2d(drop_2, 128, [3, 3], padding="SAME", activation=tf.nn.relu)
    max_pool_3 = tf.layers.max_pooling2d(conv_3, [2, 2], strides=2, padding="SAME")
    drop_3 = tf.layers.dropout(max_pool_3, rate=rate)
    # fully connected layer 1
    flat = tf.reshape(drop_3, shape=[-1, 4 * 4 * 128])
    fc_1 = tf.layers.dense(flat, 80, activation=tf.nn.relu)
    drop_4 = tf.layers.dropout(fc_1 , rate=rate)
    # fully connected layer 2 or the output layers
    fc_2 = tf.layers.dense(drop_4, 10)
    output = tf.nn.relu(fc_2)
    # accuracy
    correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(output, 1))
    accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
    # loss
    loss = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(logits=output, labels=y))
    # optimizer
    optimizer = tf.train.AdamOptimizer(1e-4, beta1=0.9, beta2=0.999, epsilon=1e-8).minimize(loss)
    return x, y, rate, accuracy, loss, optimizer

def one_hot_encoder(y):
    ret = np.zeros(len(y) * 10)
    ret = ret.reshape([-1, 10])
    for i in range(len(y)):
        ret[i][y[i]] = 1
    return (ret)

def train(x_train, y_train, sess, x, y, rate, optimizer, accuracy, loss):
    batch_size = 128
    y_train_cls = one_hot_encoder(y_train)
    start = end = 0
    for i in range(int(len(x_train) / batch_size)):
        if (i + 1) % 100 == 1:
            start = timeit.default_timer()
        batch_x = x_train[i * batch_size:(i + 1) * batch_size]
        batch_y = y_train_cls[i * batch_size:(i + 1) * batch_size]
        _, batch_loss, batch_accuracy = sess.run([optimizer, loss, accuracy], feed_dict={x:batch_x, y:batch_y, rate:0.4})
        if (i + 1) % 100 == 0:
            end = timeit.default_timer()
            print("Time:", end-start, "s the loss is ", batch_loss, " and the accuracy is ", batch_accuracy * 100, "%")

def test(x_test, y_test, sess, x, y, rate, accuracy, loss):
    batch_size = 64
    y_test_cls = one_hot_encoder(y_test)
    global_loss = 0
    global_accuracy = 0
    for t in range(int(len(x_test) / batch_size)):
        batch_x = x_test[t * batch_size : (t + 1) * batch_size]
        batch_y = y_test_cls[t * batch_size : (t + 1) * batch_size]
        batch_loss, batch_accuracy = sess.run([loss, accuracy], feed_dict={x:batch_x, y:batch_y, rate:1})
        global_loss += batch_loss
        global_accuracy += batch_accuracy
    global_loss = global_loss / (len(x_test) / batch_size)
    global_accuracy = global_accuracy / (len(x_test) / batch_size)
    print("In Test Time, loss is ", global_loss, ' and the accuracy is ', global_accuracy)

EPOCH = 100
(x_train, y_train), (x_test, y_test) = load_data()
print("There is ", len(x_train), " training images and ", len(x_test), " images")
x, y, rate, accuracy, loss, optimizer = model()
sess = tf.Session()
sess.run(tf.global_variables_initializer())

for i in range(EPOCH):
    print("Train on epoch ", i ," start")
    train(x_train, y_train, sess, x, y, rate, optimizer, accuracy, loss)
    test(x_train, y_train, sess, x, y, rate, accuracy, loss)

tensorflow运行时错误:服务似乎挂掉了,但是会立刻重启的.的更多相关文章

  1. TensorFlow Serving-TensorFlow 服务

    TensorFlow服务是一个用于服务机器学习模型的开源软件库.它处理机器学习的推断方面,在培训和管理他们的生命周期后采取模型,通过高性能,引用计数的查找表为客户端提供版本化访问. 可以同时提供多个模 ...

  2. linux 编写定时任务,查询服务是否挂掉

    shell 脚本 #!/bin/bash a=`netstat -unltp|grep fdfs|wc -l` echo "$a" if [ "$a" -ne ...

  3. 平时服务正常,突然挂了,怎么重启都起不来,查看日志Insufficient space for shared memory file 内存文件空间不足

    Java HotSpot(TM) 64-Bit Server VM warning: Insufficient space for shared memory file:   /tmp/hsperfd ...

  4. nodejs-Cluster模块

    JavaScript 标准参考教程(alpha) 草稿二:Node.js Cluster模块 GitHub TOP Cluster模块 来自<JavaScript 标准参考教程(alpha)&g ...

  5. 踩坑踩坑之Flask+ uWSGI + Tensorflow的Web服务部署

    一.简介 作为算法开发人员,在算法模块完成后,拟部署Web服务以对外提供服务,从而将算法模型落地应用.本文针对首次基于Flask + uWSGI + Tensorflow + Nginx部署Web服务 ...

  6. Dubbo框架中的应用(两)--服务治理

    Dubbo服务治理了看法 watermark/2/text/aHR0cDovL2Jsb2cuY3Nkbi5uZXQvbGlzaGVoZQ==/font/5a6L5L2T/fontsize/400/fi ...

  7. 【深度解析】Google第二代深度学习引擎TensorFlow开源

    作者:王嘉俊 王婉婷 TensorFlow 是 Google 第二代深度学习系统,今天宣布完全开源.TensorFlow 是一种编写机器学习算法的界面,也可以编译执行机器学习算法的代码.使用 Tens ...

  8. linux下监视进程 崩溃挂掉后自动重启的shell脚本

    如何保证服务一直运行?如何保证即使服务挂掉了也能自动重启?在写服务程序时经常会碰到这样的问题.在Linux系统中,强大的shell就可以很灵活的处理这样的事务. 下面的shell通过一个while-d ...

  9. java 服务治理办法

    在大规模服务化之前.应用可能仅仅是通过RMI或Hessian等工具.简单的暴露和引用远程服务,通过配置服务的URL地址进行调用.通过F5等硬件进行负载均衡. (1) 当服务越来越多时.服务URL配置管 ...

随机推荐

  1. 微信小程序登录那些事

    最近团队在开发一款小程序,都是新手,一边看文档,一边开发.在开发中会遇到各种问题,今天把小程序登录这块的流程整理下,做个记录. 小程序的登录跟平时自己APP这种登录验证还不太一样,多了一个角色,那就是 ...

  2. IOI 2013 袋熊(线段树+分块+决策单调性)

    题意 http://www.ioi2013.org/wp-content/uploads/tasks/day1/wombats/Wombats%20zh%20(CHN).pdf 思路 ​ 我们设矩形的 ...

  3. Reflector调试dll功能

    Reflector不仅仅是一个反编译工具,之前用Resharper,把这个给忽略了,这个Reflector还有一个调试dll功能, 在调试时反编译代码,会生成对应的pdb文件,就可以进行dll源码调试 ...

  4. Tomcat基础操作

    1.在WebApps ROOT目录里,如果删除过ROOT从新创建,放置index.html,index.jsp即可访问. 2.修改默认8080端口,打开server.xml,将8080端口修改为80即 ...

  5. ElasticSearch6.3.2源码分析之节点连接实现

    ElasticSearch6.3.2源码分析之节点连接实现 这篇文章主要分析ES节点之间如何维持连接的.在开始之前,先扯一下ES源码阅读的一些心得:在使用ES过程中碰到某个问题,想要深入了解一下,可源 ...

  6. Java中List集合去除重复数据的六种方法

    1. 循环list中的所有元素然后删除重复 public static List removeDuplicate(List list) { for ( int i = 0 ; i < list. ...

  7. Git系列四之在本地服务器搭建gitlab仓库管理(centeros环境下)

    1.Git仓库管理 现在本地已经创建了git仓库,又在gitlab上创建了一个git仓库,并且让这两个仓库进行远程同步,这样gitlab仓库既可以备份也可以与他人协作管理远程仓库以及根据需要推送或拉取 ...

  8. 手写LRU实现

    完整基于 Java 的代码参考如下 class DLinkedNode { String key; int value; DLinkedNode pre; DLinkedNode post; } LR ...

  9. PIESDKDoNet二次开发配置注意事项

    在安装完PIESDK进行二次开发的过程中会遇到下面几种常见的开发配置问题,就写一个文档总结一下. 1.    新建项目无PIESDK模板问题 关于新建项目时候,找不到下图中的PIEMainApplic ...

  10. because its MIME type ('text/html') is not a supported stylesheet MIME type, and strict MIME checkin

    1 前言 浏览器报错误(chrome和firefox都会):because its MIME type ('text/html') is not a supported stylesheet MIME ...