Cloud-based Computer Vision for Computer-Aided Diagnosis


In the last decades, information retrieval and content-based image classification have been studied by the scientific community as sophisticated alternatives to the often ineffective keyword searches or textual evidences. A key application of this technology is Computer-Aided Diagnosis (CAD), procedures that assist medical personnel in interpreting medical images. The availability of automated screening for certain diseases, such as Diabetic Retinopathy and Skin Cancer, is a pressing issue as the incidence of those diseases is increasing much faster than the number of specialists trained to make the screening [1], [2].

Deep Learning Architectures (DLA) (Deep Neural Networks [3], and Deep Belief Networks [4]) appear among the most prominent solutions for pattern recognition in images. Nevertheless, those solutions suffer from the need to estimate huge numbers of parameters, which implies the need of large training sets and a lot of computational resources.

The most advanced representations based upon the Bag of Visual Words (BoVW) model [5], [6] are competing with DLA. Those cu2tting-edge representations are also “deep” in a sense, since they are based upon many layers of feature extraction. However, most of the time, there is no learning involved in the lower levels. This makes BoVW models less flexible than DLA, but also much less greedy in terms of computing resources and annotated data. Cutting-edge BoVW representations [7], [8], [9] result in very large feature vectors (up to hundreds of thousands of dimensions) that we collectively are calling Jumbo Vectors. Although a complete analytical justification of both DLA and BoVW for image classification is still lacking, the high dimensionalities of the latter (in combination with the capacity limitation of Support Vector Machines) seems to be a necessary, but by no means sufficient condition to their success.


Our aim is to advance the state of the art in Computer-aided diagnosis (CAD), for the screening of pathologies based upon medical images. Our target applications are the early screening of Melanoma and Diabetic Retinopathy.

Melanoma is the leading cause of deaths due to skin cancer. Its prognosis is very good when it is found early, but deteriorate rapidly as the disease progresses, therefore, early screening is critical [10].

Diabetic Retinopathy is a leading cause of blindness worldwide. Early diagnosis also plays a critical role in the expected treatment outcomes [1].

More often than not, the limiting factor for CAD research is the availability of annotated data. For both studies we have cooperated with medical personnel in order to secure enough data to train and test the classification models. The Retinopathy datasets, described in [1], are already publicly available. For the Melanoma screening we have secured two datasets, one with dermatoscopy images, containing 747 images, being 187 melanomas, and another with 437 clinical images, 125 of them being melanomas.

It is clear that such small datasets are largely insufficient to estimate the millions of parameters involved in Deep Learning Architectures (DLA). In our preliminary experiments for the melanoma datasets, the Jumbo Vector approach showed encouraging results, while a modest 4-layer network failed completely due to extreme overfitting. The alternative we want to explore to make DLA competitive with Jumbo Vectors is to employ other annotated data, such as the standard computer vision datasets PASCAL VOC [11], ImageNet [12], ImageCLEF [13] and MediaEval [14], to train the lower layers of the architecture, in a transfer learning scheme [15]. Therefore, the opportunity of the Amazon Web Services (AWS) Grant appears as fantastic opportunity to explore the competitiveness of DLA in a transfer learning scheme. We also intend to explore parallel implementations of the Jumbo Vector methods, which have as well, many costly but embarrassingly parallel, processing steps.

Expected outcomes (24-month timeframe)

  1. Deep Learning Architectures implementation [16] in AWS environment. Most of the code necessary is already available, although the parameterization work is delicate;
  2. Parallel implementation of the Fisher Vectors [17] and BossaNova frameworks [7] in AWS environment. We have the complete (sequential) source code completely implemented, and a parallel version implementation already ongoing ;
  3. Tests on the Melanoma screening application (annotated datasets already acquired) ;
  4. Tests on the Diabetic Retinopathy application (annotated datasets already acquired) ;
  5. A comprehensive, original, study of transfer learning for DLA. We expect this study to have great potential for scientific impact.


[1]  Ramon Pires, Herbert F. Jelinek, Jacques Wainer, Siome Goldenstein, Eduardo Valle, Anderson Rocha. “Assessing the Need for Referral in Automatic Diabetic Retinopathy Detection”. IEEE Transactions on Biomedical Engineering

[2]  Ning Situ; Xiaojing Yuan; Chen, Ji; Zouridakis, G., “Malignant melanoma detection by Bag-of-Features classification,” Engineering in Medicine and Biology Society, 2008. EMBS 2008. 30th Annual International Conference of the IEEE , vol., no., pp.3110,3113, 20-25 Aug. 2008

[3]  Krizhevsky, A.; Sutskever, I.; Hinton, G. E.; ImageNet Classification with Deep Convolutional Neural Networks. Advances in Neural Information Processing 25, MIT Press, Cambridge, MA, 2012;

[4]  Hinton, G. E. Deep Belief Networks. 2009.

[5]  Sivic, J. and Zisserman, A. Video Google: A text retrieval approach to object matching in videos. In International Conference on Computer Vision (ICCV), 2003.

[6]  Csurka, G., Bray, C., Dance, C., and Fan, L. (2004). Visual categorization with bags of keypoints. In Workshop on Statistical Learning in Computer Vision, European Conference on Computer Vision (ECCV), pages 1–22, 2004.

[7]  Avila, S.; Thome, N.; Cord, M.; Valle, E.; Araújo, A. Pooling in Image Representation: The Visual Codeword Point of View.  Computer Vision and Image Understanding (CVIU), volume 117, issue 5, p. 453-465, 2013.

[8]  Zhou, X. et al. Image Classification Using Super-Vector Coding of Local Image Descriptors. European Conference on Computer Vision – ECCV, pp. 141-154, 2010.

[9]  Jegou, H.; Douze, M.; Schmid, C.; Pérez, P. Aggregating local descriptors into a compact image representation. Computer Vision and Pattern Recognition – CVPR, 2010.

[10]       Jerant AF, Johnson JT, Sheridan CD, Caffrey TJ. Early detection and treatment of skin cancer. Am Fam Physician. 2000;62(2):357–368

[11]       The PASCAL Visual Object Classes,

[12]       ImageNet,

[13]        ImageCLEF – The CLEF Cross Language Image Retrieval Track,

[14]       MediaEval Benchmarking Initiative for Multimedia Evaluation,

[15]       Pan, S. J.; Yang, Q. A Survey on Transfer Learning. Knowledge and Data Engineering, IEEE Transactions, v. 22, issue 10, pp. 1345-1359, 2010.

[16]       Bastien, F. et al. Theano: new features and speed improvements.arXiv:1211.5590v1 [cs.SC], Canada, 2012.

[17]       Perronnin, F.; Dance, C. Fisher kernels on visual vocabularies for image categorization. Computer Vision and Pattern Recognition – CVPR, 2007

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s