python - How to reuse VGG19 for image classification in Keras? -
i trying understand how reuse vgg19 (or other architectures) in order improve small image classification model. classifying images (in case paintings) 3 classes (let's say, paintings 15th, 16th , 17th centuries). have quite small dataset, 1800 training examples per class 250 per class in validation set.
i have following implementation:
from keras.preprocessing.image import imagedatagenerator keras.models import sequential keras.layers import conv2d, maxpooling2d keras.layers import activation, dropout, flatten, dense keras import backend k keras.callbacks import modelcheckpoint keras.regularizers import l2, l1 keras.models import load_model # set proper image ordering tensorflow k.set_image_dim_ordering('th') batch_size = 32 # augmentation configuration use training train_datagen = imagedatagenerator( rescale=1./255, shear_range=0.2, zoom_range=0.2, horizontal_flip=true) # augmentation configuration use testing: # rescaling test_datagen = imagedatagenerator(rescale=1./255) # generator read pictures found in # subfolers of 'data/train', , indefinitely generate # batches of augmented image data train_generator = train_datagen.flow_from_directory( 'c://keras//train_set_paintings//', # target directory target_size=(150, 150), # images resized 150x150 batch_size=batch_size, class_mode='categorical') # similar generator, validation data validation_generator = test_datagen.flow_from_directory( 'c://keras//validation_set_paintings//', target_size=(150, 150), batch_size=batch_size, class_mode='categorical') model = sequential() model.add(conv2d(16, (3, 3), input_shape=(3, 150, 150))) model.add(activation('relu')) # tried leakyrelu, no improvments model.add(maxpooling2d(pool_size=(2, 3), data_format="channels_first")) model.add(conv2d(32, (3, 3))) model.add(activation('relu')) model.add(maxpooling2d(pool_size=(2, 3), data_format="channels_first")) model.add(flatten()) model.add(dense(64, kernel_regularizer=l2(.01))) model.add(activation('relu')) model.add(dropout(0.5)) model.add(dense(3)) model.add(activation('softmax')) model.compile(loss='categorical_crossentropy', optimizer='adam', # tried sgd, doesn't perform adam metrics=['accuracy']) fbestmodel = 'best_model_final_paintings.h5' best_model = modelcheckpoint(fbestmodel, verbose=0, save_best_only=true) hist = model.fit_generator( train_generator, steps_per_epoch=2000 // batch_size, epochs=100, validation_data=validation_generator, validation_steps=200 // batch_size, callbacks=[best_model], workers=8 # cpu generation run in parallel gpu training ) print("maximum train accuracy:", max(hist.history["acc"])) print("maximum train accuracy on epoch:", hist.history["acc"].index(max(hist.history["acc"]))+1) print("maximum validation accuracy:", max(hist.history["val_acc"])) print("maximum validation accuracy on epoch:", hist.history["val_acc"].index(max(hist.history["val_acc"]))+1) i have managed keep rather balanced in terms of overfitting:

if make architecture deeper, either overfits lot or jumps around insane if regularize more strictly, reaching 100% @ 1 point: 
i have tried using batchnormalization, model doesn't learn @ all, doesn't go on 50% acc on training set. tried , without dropout.
i looking other ways of improving model other changing architecture much. 1 of options see reusing existing architecture weights , plugging model. can't find real examples of how it. following blog post: https://blog.keras.io/building-powerful-image-classification-models-using-very-little-data.html
it talks reusing vgg19 improve accuracy doesn't explain how done. there other examples follow? how adapt current implementation? found full model architecture, running not possible on hardware, looking way of reusing trained model weights , adapting problem.
also, don't understand concept behind "bottleneck features", blog talks in vgg part. glad if explain it.
you should try out transfer learning (link first google result "transfer learning keras", there's plenty of tutorials on subject). tl fine-tuning of network pre-trained on big dataset (i.e., commonly imagenet) new classification layers. idea behind want keep features learned in lower levels of network (because there's high probability images have features) , learn new classifier on top of features. tends work well, if have small datasets don't allow full training of network scratch (it's faster full training)
please note there several ways tl (and encourage research topic find suits best). in applications, init network weights taken imagenet public checkpoint, remove last layers , train there (with low-enough learning rate, or you'll mess low-level features want keep). approach allows data augmentation.
another approach using bottlenecks. in context, a bottleneck, called embedding in other contexts, is internal representation of 1 of input samples @ depth level in network. rephrasing that, can see bottleneck @ level n output of network stopped after n layers. why useful? because can precompute bottlenecks samples using pre-trained network , simulate training of last layers of network without having recompute (expensive) part of network bottleneck point.
a simplified example
let's have network following structure:
in -> -> b -> c -> d -> e -> out where in , out input , output layers , other type of layer might have in network. let's found published somewhere checkpoint of network pre-trained on imagenet. imagenet has 1000 classes, none of need. you'll throw away final layer (classifier) of network. other layers, however, contain features want keep. let e classifier layer in our example.
taking samples dataset, feed them in , collect matching bottleneck value output of layer d. once samples in dataset. collection of bottlenecks new dataset you'll use train new clssifier.
you build dummy network following structure:
bottleneck_in -> e' -> out you train network would, instead of feeding samples dataset, feed matching bottleneck bottleneck dataset. note doing save computation of layers a d, but way can't apply data augmentation during training (of course can still building bottlenecks, you'll have lots of data store).
finally, build final classifier, network architecture be
in -> -> b -> c -> d -> e' -> out with weights a d taken public checkpoint , weights e' resulting training.
Comments
Post a Comment