python - Very low GPU usage during training in Tensorflow -
i trying train simple multi-layer perceptron 10-class image classification task, part of assignment udacity deep-learning course. more precise, task classify letters rendered various fonts (the dataset called notmnist).
the code ended looks simple, no matter low gpu usage during training. measure load gpu-z , shows 25-30%.
here current code:
graph = tf.graph() graph.as_default(): tf.set_random_seed(52) # dataset definition dataset = dataset.from_tensor_slices({'x': train_data, 'y': train_labels}) dataset = dataset.shuffle(buffer_size=20000) dataset = dataset.batch(128) iterator = dataset.make_initializable_iterator() sample = iterator.get_next() x = sample['x'] y = sample['y'] # actual computation graph keep_prob = tf.placeholder(tf.float32) is_training = tf.placeholder(tf.bool, name='is_training') fc1 = dense_batch_relu_dropout(x, 1024, is_training, keep_prob, 'fc1') fc2 = dense_batch_relu_dropout(fc1, 300, is_training, keep_prob, 'fc2') fc3 = dense_batch_relu_dropout(fc2, 50, is_training, keep_prob, 'fc3') logits = dense(fc3, num_classes, 'logits') tf.name_scope('accuracy'): accuracy = tf.reduce_mean( tf.cast(tf.equal(tf.argmax(y, 1), tf.argmax(logits, 1)), tf.float32), ) accuracy_percent = 100 * accuracy tf.name_scope('loss'): loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=y)) update_ops = tf.get_collection(tf.graphkeys.update_ops) tf.control_dependencies(update_ops): # ensures execute update_ops before performing train_op # needed batch normalization (apparently) train_op = tf.train.adamoptimizer(learning_rate=1e-3, epsilon=1e-3).minimize(loss) tf.session(graph=graph) sess: tf.global_variables_initializer().run() step = 0 epoch = 0 while true: sess.run(iterator.initializer, feed_dict={}) while true: step += 1 try: sess.run(train_op, feed_dict={keep_prob: 0.5, is_training: true}) except tf.errors.outofrangeerror: logger.info('end of epoch #%d', epoch) break # end of epoch train_l, train_ac = sess.run( [loss, accuracy_percent], feed_dict={x: train_data, y: train_labels, keep_prob: 1, is_training: false}, ) test_l, test_ac = sess.run( [loss, accuracy_percent], feed_dict={x: test_data, y: test_labels, keep_prob: 1, is_training: false}, ) logger.info('train loss: %f, train accuracy: %.2f%%', train_l, train_ac) logger.info('test loss: %f, test accuracy: %.2f%%', test_l, test_ac) epoch += 1
here's tried far:
i changed input pipeline simple
feed_dict
tensorflow.contrib.data.dataset
. far understood, supposed take care of efficiency of input, e.g. load data in separate thread. there should not bottleneck associated input.i collected traces suggested here: https://github.com/tensorflow/tensorflow/issues/1824#issuecomment-225754659 however, these traces didn't show interesting. >90% of train step matmul operations.
changed batch size. when change 128 512 load increases ~30% ~38%, when increase further 2048, load goes ~45%. have 6gb gpu memory , dataset single channel 28x28 images. supposed use such big batch size? should increase further?
generally, should worry low load, sign training inefficiently?
here's gpu-z screenshots 128 images in batch. can see low load occasional spikes 100% when measure accuracy on entire dataset after each epoch.
mnist size networks tiny , it's hard achieve high gpu (or cpu) efficiency them, think 30% not unusual application. higher computational efficiency larger batch size, meaning can process more examples per second, lower statistical efficiency, meaning need process more examples total target accuracy. it's trade-off. tiny character models yours, statistical efficiency drops off after 100, it's not worth trying grow batch size training. inference, should use largest batch size can.
Comments
Post a Comment