python - Problems implementing CRNN with CNTK -


i'm quite new machine learning , learning exercise i'm trying implement convolutional recurrent neural network in cntk recognize variable length text image. basic idea take output of cnn, make sequence out of , feed rnn use ctc loss function. followed 'cntk 208: training acoustic model connectionist temporal classification (ctc) criteria' tutorial shows basics of ctc usage. unfortunately, during training network converges outputting blank labels , nothing else, because reason gives smallest loss.

i'm feeding network images dimensions (1, 32, 96) , generate them on fly show random letters. labels give sequence of 1 hot encoded letters blank required ctc @ index 0 (this numpy arrays because use custom data loading). turns out forward_backward() function work need make sure both inputs use same dynamic axis same length, achieve making label string same length network output length, , using to_sequence_like() in code below (i don't know how better, side effect of using to_sequence_like() here need pass dummy label data when evaluating model).

alphabet = "0123456789abcdefghijklmnopqrstuvwxyz" input_dim_model = (1, 32, 96)    # images 96 x 32 1 channel of color (gray) num_output_classes = len(alphabet) + 1 ltsm_hidden = 256  def bidirectionalltsm(features, nhidden, nout):     = c.layers.recurrence(c.layers.lstm(nhidden))(features)     b = c.layers.recurrence(c.layers.lstm(nhidden), go_backwards=true)(features)     c = c.splice(a, b)     r = c.layers.dense(nout)(c)     return r  def create_model_rnn(features):     h = features     h = bidirectionalltsm(h, ltsm_hidden, ltsm_hidden)     h = bidirectionalltsm(h, ltsm_hidden, num_output_classes)     return h  def create_model_cnn(features):     c.layers.default_options(init=c.glorot_uniform(), activation=c.relu):         h = features          h = c.layers.convolution2d(filter_shape=(3,3),                                      num_filters=64,                                      strides=(1,1),                                      pad=true, name='conv_0')(h)          #more layers...          h = c.layers.batchnormalization(name="batchnorm_6")(h)          return h  x = c.input_variable(input_dim_model, name="x") label = c.sequence.input((num_output_classes), name="y")  def create_model(features):     #composite(x: tensor[1,32,96]) -> tensor[512,1,23]     = create_model_cnn(features)      = c.reshape(a, (512, 23))     #composite(x: tensor[1,32,96]) -> tensor[23,512]     = c.swapaxes(a, 0, 1)       #is there better way convert sequence , still compatible forward_backwards() ?     #composite(x: tensor[1,32,96], y: sequence[tensor[37]]) -> sequence[tensor[512]]     = c.to_sequence_like(a, label)       #composite(x: tensor[1,32,96], y: sequence[tensor[37]]) -> sequence[tensor[37]]     = create_model_rnn(a)      return  #composite(x: tensor[1,32,96], y: sequence[tensor[37]]) -> sequence[tensor[37]] z = create_model(x)  #labelstograph(y: sequence[tensor[37]]) -> sequence[tensor[37]] graph = c.labels_to_graph(label)  #composite(y: sequence[tensor[37]], x: tensor[1,32,96]) -> np.float32 criteria = c.forward_backward(c.labels_to_graph(label), z, blanktokenid=0)   err = c.edit_distance_error(z, label, squashinputs=true, tokenstoignore=[0]) lr = c.learning_rate_schedule(0.01, c.unittype.sample) learner = c.adadelta(z.parameters, lr)  progress_printer = c.logging.progress_print.progressprinter(50, first=10, tag='training') trainer = c.trainer(z, (criteria, err), learner, progress_writers=[progress_printer])  #some more custom code ... #below how i'm feeding data  while true:     x1, y1 = custom_datareader.next_minibatch()     #x1 list of numpy arrays containing training images     #y1 list of numpy arrays 1 hot encoded labels      trainer.train_minibatch({x: x1, label: y1}) 

the network converges quickly, although not want (on left side network output, on right labels i'm giving it):

minibatch[  11-  50]: loss = 3.506087 * 58880, metric = 176.23% * 58880; lllll--55leym---------- => lllll--55leym----------, gt: aaaaaaaaaaaaaaaaaaaayox -------bbccaqqqyyyryy-q => -------bbccaqqqyyyryy-q, gt: aaaaaaaaaaaaaaaaaaajpta tt22yye------yqqqtll--- => tt22yye------yqqqtll---, gt: tttttttttttttttttttyliy ceeeeeeee----eqqqqqqe-q => ceeeeeeee----eqqqqqqe-q, gt: sssssssssssssssssssskht --tc22222al55a5qqqaa--q => --tc22222al55a5qqqaa--q, gt: cccccccccccccccccccaooa yyyyyyiqaaacy---------- => yyyyyyiqaaacy----------, gt: cccccccccccccccccccxyty mcccyya----------y---qq => mcccyya----------y---qq, gt: ppppppppppppppppppptjnj ylncyyyy--------yy--t-y => ylncyyyy--------yy--t-y, gt: sssssssssssssssssssyusl tt555555ccc------------ => tt555555ccc------------, gt: jjjjjjjjjjjjjjjjjjjmyss -------eeeaadaaa------5 => -------eeeaadaaa------5, gt: fffffffffffffffffffciya eennnnemmtmmy--------qy => eennnnemmtmmy--------qy, gt: tttttttttttttttttttajdn -rcqqqqaaaacccccycc8--q => -rcqqqqaaaacccccycc8--q, gt: aaaaaaaaaaaaaaaaaaaixvw ------33e-bfaaaaa------ => ------33e-bfaaaaa------, gt: uuuuuuuuuuuuuuuuuuupfyq r----5t5y5aaaaa-------- => r----5t5y5aaaaa--------, gt: fffffffffffffffffffapap deeeccccc2qqqm888zl---t => deeeccccc2qqqm888zl---t, gt: hhhhhhhhhhhhhhhhhhhlvjx  minibatch[  51- 100]: loss = 1.616731 * 73600, metric = 100.82% * 73600; ----------------------- => -----------------------, gt: kkkkkkkkkkkkkkkkkkkakyw ----------------------- => -----------------------, gt: ooooooooooooooooooopwtm ----------------------- => -----------------------, gt: jjjjjjjjjjjjjjjjjjjqpny ----------------------- => -----------------------, gt: iiiiiiiiiiiiiiiiiiidspr ----------------------- => -----------------------, gt: fffffffffffffffffffatyp ----------------------- => -----------------------, gt: vvvvvvvvvvvvvvvvvvvmccf ----------------------- => -----------------------, gt: dddddddddddddddddddsfyo ----------------------- => -----------------------, gt: yyyyyyyyyyyyyyyyyyylaph ----------------------- => -----------------------, gt: kkkkkkkkkkkkkkkkkkkacay ----------------------- => -----------------------, gt: uuuuuuuuuuuuuuuuuuujuqs ----------------------- => -----------------------, gt: sssssssssssssssssssovjp ----------------------- => -----------------------, gt: vvvvvvvvvvvvvvvvvvvibma ----------------------- => -----------------------, gt: vvvvvvvvvvvvvvvvvvvaajt ----------------------- => -----------------------, gt: tttttttttttttttttttdhfo ----------------------- => -----------------------, gt: yyyyyyyyyyyyyyyyyyycmbh  minibatch[ 101- 150]: loss = 0.026177 * 73600, metric = 100.00% * 73600; ----------------------- => -----------------------, gt: iiiiiiiiiiiiiiiiiiiavoo ----------------------- => -----------------------, gt: lllllllllllllllllllaara ----------------------- => -----------------------, gt: pppppppppppppppppppmufu ----------------------- => -----------------------, gt: sssssssssssssssssssaacd ----------------------- => -----------------------, gt: uuuuuuuuuuuuuuuuuuujulx ----------------------- => -----------------------, gt: vvvvvvvvvvvvvvvvvvvoaqy ----------------------- => -----------------------, gt: dddddddddddddddddddvjmr ----------------------- => -----------------------, gt: oooooooooooooooooooxlvl ----------------------- => -----------------------, gt: dddddddddddddddddddqqlo ----------------------- => -----------------------, gt: wwwwwwwwwwwwwwwwwwwwrvx ----------------------- => -----------------------, gt: pppppppppppppppppppxuxi ----------------------- => -----------------------, gt: bbbbbbbbbbbbbbbbbbbkbqv ----------------------- => -----------------------, gt: ppppppppppppppppppplpha ----------------------- => -----------------------, gt: dddddddddddddddddddilol ----------------------- => -----------------------, gt: dddddddddddddddddddqnwf 

my question how network learn output proper captions. add managed train model using same technique made in pytorch, it's unlikely images or labels problem. also, there better way convert output of convolutional layers sequence dynamic axis can still use forward_backward() function?

cntk learners fed aggregated gradient default accommodate distributed training variant minibatch size. however, aggregated gradient not work same adagrad-style learners adadelta. please try use_mean_gradient=true:

learner = c.adadelta(z.parameters, lr, use_mean_gradient=true) 

Comments

Popular posts from this blog

resizing Telegram inline keyboard -

command line - How can a Python program background itself? -

php - "cURL error 28: Resolving timed out" on Wordpress on Azure App Service on Linux -