WO2013022688A1 - Automated detection of diagnostically relevant regions in pathology images - Google Patents

Automated detection of diagnostically relevant regions in pathology images Download PDF

Info

Publication number
WO2013022688A1
WO2013022688A1 PCT/US2012/049275 US2012049275W WO2013022688A1 WO 2013022688 A1 WO2013022688 A1 WO 2013022688A1 US 2012049275 W US2012049275 W US 2012049275W WO 2013022688 A1 WO2013022688 A1 WO 2013022688A1
Authority
WO
WIPO (PCT)
Prior art keywords
stain
portions
tissue
image
tissue stained
Prior art date
Application number
PCT/US2012/049275
Other languages
French (fr)
Inventor
Claus Bahlmann
Amar H. PATEL
Jeffrey P. Johnson
Jie Ni
Andrei-chakib CHEKKOURY-IDRISSI
Parmeshwar Khurd
Ali Kamen
Leo Grady
Original Assignee
Siemens Healthcare Diagnostics Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Siemens Healthcare Diagnostics Inc. filed Critical Siemens Healthcare Diagnostics Inc.
Publication of WO2013022688A1 publication Critical patent/WO2013022688A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • G06T7/0012Biomedical image inspection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/90Determination of colour characteristics
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/50Extraction of image or video features by performing operations within image blocks; by using histograms, e.g. histogram of oriented gradients [HoG]; by summing image-intensity values; Projection analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/69Microscopic objects, e.g. biological cells or cellular parts
    • G06V20/695Preprocessing, e.g. image segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10056Microscopic image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30004Biomedical image processing
    • G06T2207/30024Cell structures in vitro; Tissue sections in vitro
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/513Sparse representations

Definitions

  • the present disclosure relates to image analysis, and more particularly to a method for detecting diagnostically relevant regions in medical images.
  • histopathology is the examination of tissue in the study of the manifestations of disease.
  • a histological section of a specimen is placed onto glass slide for study. In some cases this section may be imaged to generated a virtual slide.
  • Virtual slides from H&E (hematoxylin & eosin) stained digital histopathology, such as illustrated in FIG. 1, are typically several GigaBytes (GByte) in size.
  • the analysis of virtual slides by pathologists and computer algorithms is often limited by the technologies currently available for digital pathology workstations as described by Patterson et al., "Barriers and facilitators to adoption of soft copy interpretation from the user perspective: Lessons learned from filmless radiology for slideless pathology" J. Pathol. Inform. 2(1), 2011, E. Krupinski, "Virtual slide telepathology workstation-of-the-future: lessons learned from teleradiology," Sem Diag. Path. 26, pp. 194-205, 2009, and Johnson et al., "Using a visual discrimination model for the detection of compression artifacts in virtual pathology images," IEEE Trans. Med. Imaging 30(2), pp. 306-314, 2011.
  • a computationally efficient method for analyzing H&E stained digital pathology slides may distinguish diagnostically relevant regions from irrelevant regions.
  • a method for distinguishing between different tissue types imaged in a virtual slide includes receiving an image of a tissue, wherein the tissue has been treated with a first stain and a second stain, dividing the image into a plurality of image patches, accentuating a difference between portions of the tissue stained by the first stain and portions of the tissue stained by the second stain to generated a plurality of preprocessed image patches, extracting a plurality of feature descriptors from each of the preprocessed image patches according to a distribution of the portions of the tissue stained by the first stain and the portions of the tissue stained by the second stain, and classifying each of the image patches according to respective the feature descriptors, the method characterized by the extraction of the feature descriptors, wherein a sparse representation of each of the preprocessed image patches is generated as a histogram of the feature descriptors in a plurality of uniformly distributed percentile ranks.
  • Each of the image patches may be rectangular.
  • the difference between the portions of the tissue stained by the first stain and the portions of the tissue stained by the second stain may be accentuated by a linear color transform into two channels, wherein the two channels correspond to the portions of the tissue stained by the first stain and the portions of the tissue stained by the second stain, respectively.
  • the linear color transform may amplify a color of the first stain and suppresses a color of the second stain in each of the two channels.
  • the method may further include determining dominant purple and non-purple pixel values.
  • the method may further include determining a plurality of axes corresponding to the dominant purple and non-purple pixel values.
  • the feature descriptors may be determined at a level of pixel data of the image.
  • the feature descriptors may include a first descriptor corresponding to nuclei pixels and a second descriptor corresponding cytoplasm pixels.
  • a data processing system for distinguishing between different tissue types includes a memory device storing an image of a tissue, wherein the tissue has been treated with a first stain and a second stain, and a processor configured to distinguishing between different tissue types by dividing the image into a plurality of image patches.
  • the processor accentuating a difference between portions of the tissue stained by the first stain and portions of the tissue stained by the second stain to generated a plurality of preprocessed image patches, extracting a plurality of feature descriptors from each of the preprocessed image patches according to a distribution of the portions of the tissue stained by the first stain and the portions of the tissue stained by the second stain, and classifying each of the image patches according to respective the feature descriptors.
  • a method for distinguishing between different tissue types imaged in a virtual slide includes receiving an image of a tissue, wherein the tissue has been treated with a first stain and a second stain, dividing the image into a plurality of image patches, accentuating a difference between portions of the tissue stained by the first stain and portions of the tissue stained by the second stain to generated a plurality of preprocessed image patches, extracting a plurality of feature descriptors from each of the preprocessed image patches according to a distribution of the portions of the tissue stained by the first stain and the portions of the tissue stained by the second stain, and classifying each of the image patches according to respective the feature descriptors, wherein the image of the tissue is displayed including an indication of an image patch classified as relevant.
  • FIG. 1 is an exemplary virtual slide for a breast biopsy specimen
  • FIG. 2 is a flow diagram of a detection method according to an exemplary
  • FIG. 3 is a flow diagram of a detection method according to an exemplary
  • FIG. 4 is a graph show a color transform according to an exemplary embodiment of the present disclosure
  • FIG. 5 is a graph showing a percentile descriptor according to an exemplary embodiment of the present disclosure
  • FIG. 6 is an exemplary image of a virtual slide including classified regions according to an exemplary embodiment of the present disclosure.
  • FIG. 7 is a diagram of a computer system for performing a detection method according to an exemplary embodiment of the present disclosure.
  • H&E hematoxylin & eosin stained digital pathology slides may be analyzed, wherein diagnostically relevant regions are distinguished from diagnostically irrelevant regions.
  • CAD Computer Aided Diagnosis
  • ability to distinguish between different regions can improve the response time for an interactive digital pathology workstation, even in a case of GByte-plus sized histopathology slides, for example, through controlling adaptive compression or prioritization algorithms.
  • ability to distinguish diagnostically relevant can support the detection and grading workflow for expert pathologists in a semi- automated diagnosis, hereby increasing throughput and accuracy.
  • a statistical characterization of tissue components may be indicative of pathology.
  • a pathologist's decision about malignancy vs. benignancy for example based on such as, nuclei, tubules, cytoplasm, etc. may be informed by the identification of tissue components based on different statistical characterizations.
  • visual descriptors that capture the distribution of color intensities observed for nuclei and cytoplasm may be used to visualize the statistical characterization.
  • a model for distinguishing between statistics of relevant regions and irrelevant regions may be learned from annotated data, and an inference may be performed via linear classification.
  • virtual slides from H&E stained digital histopathology such as illustrated in FIG. 1 may be analyzed.
  • the analysis includes automatically identifying diagnostically relevant regions in such slides and discarding the irrelevant regions.
  • a triage-like preprocessing context may be used with high detection accuracy (e.g., about 100%, while false positive detection is low.
  • detection accuracy e.g., about 100%
  • false positive detection is low.
  • computational speed is high, with additional improvement available through the use of hardware speedup, e.g., cluster or GPU processing.
  • the virtual slides are breast biopsy specimens using a DMetrix scanner in the Arizona Telemedicine Program.
  • Slide images are sampled at 0.47 mm/pixel.
  • a single 40X objective scan yields 1 to 5 GB of uncompressed RGB image data (the RGB color model includes Red, Green, and Blue color components).
  • FIG. 1 shows an example of a virtual slide 100 having a resolution of about 40000 x 30000 pixels.
  • Two close-up views, 101 and 102 showing examples of different tissue regions that were classified by an expert pathologist as relevant and irrelevant, respectively, to the diagnosis of breast cancer.
  • FIG. 1 shows an exemplary 3.6 GByte virtual slide 100 for a breast biopsy specimen with two close-up views, 101 and 102, of a diagnostically relevant and irrelevant regions, respectively.
  • the difference between the two samples can be clearly seen by the number of indicative elements of nuclei, tubules, cytoplasm, etc.
  • Diagnostically relevant regions may be distinguished by a large amount of epithelial nuclei and tubule formation, whereas irrelevant regions, e.g., 102, are dominated by cytoplasm tissue.
  • these tissue components are stained dark purple (nuclei) and pink (cytoplasm and the extracellular connective tissue).
  • an automated detection method includes receiving an image (e.g., virtual slide) 200 and subdividing the virtual slide 201.
  • the virtual slide may be subdivided into square image patches (e.g., 256 x 256 pixels corresponding to 120 x 120 millimeter (mm)).
  • the distribution of nuclei and cytoplasm may be modeled.
  • the detection method may employ a combination of color preprocessing 202, extraction of feature descriptors 203, and classification based on machine learning 204, as is illustrated in FIG. 2.
  • an image region is transformed into H and E color channels 202, and percentile feature descriptors are extracted 203 and classified with a linear Support Vector Machine (SVM) classifier 204.
  • SVM Support Vector Machine
  • FIG. 3 is another flow diagram of an automated detection method.
  • the method takes the image patches as input (301) and that H&E channels are determined from each image patch (302). Percentile features are determined for each channel (303) and the percentile features are classified as relevant or irrelevant using a linear SVM classification (304), for example, based on the distribution of nuclei pixels and cytoplasm pixels as described herein.
  • the staining method may include the application of hemalum, including a complex of aluminum ions and oxidized haematoxylin. The hemalum colors nuclei of cells purple or blue.
  • the nuclear staining is followed by counterstaining with an aqueous or alcoholic solution of eosin Y, which colors eosinophilic structures, including cytoplasm, pink, red or orange.
  • distributions of these components may be characterized.
  • the difference between these colors may be accentuated by a (linear) color transform into two channels, which may be called H and E.
  • the channels each amplify the hematoxylin (eosin) stain and at the same time suppressing the eosin (hematoxylin) stain.
  • the method may determine dominant purple and non-purple pixel values from the data and subsequently determines main axes for the transform orthogonal to those.
  • FIG. 4 shows the axes of dominant pixel values (400 and 401) and the transformation axes (402, 403 and 404) for an example.
  • automated processing may be based on the distribution of nuclei pixels and cytoplasm pixels.
  • the level of pixels may be selected rather than higher abstraction levels, such as shape information, to achieve greater computational speed.
  • the descriptor may be based on a histogram of observed levels in the pair of H and E channels. For example, a histogram matching method may be used. In another example, a sparse representation of uniformly distributed percentile ranks may be used. [0037] Referring to the a sparse representation, for example, nine percentile ranks (at 10%, 20%, 90%) or eleven percentile ranks (at 0%, 10%, 20%, 100%) may be used. One of ordinary skill in the art would appreciate that different numbers of percentile ranks may be used. The percentile ranks correspond to cuts of the cumulative histogram (compare FIG. 1) with the corresponding percentile levels on the ordinate. FIG.
  • FIG. 5 is an exemplary percentile descriptor showing the rank values obtained via sorting or by cumulative histogramming for one channel.
  • FIG. 5 plots a normalized cumulative histogram as a function of intensity (here for the E channel). The descriptor takes values from the abscissa at locations where the cumulative histogram cuts the respective percentile levels.
  • the percentile values may be combined into an eighteen dimensional feature vector, and a supervised classifier such as a linear SVM may be trained for the classification task.
  • the training may be performed using known methods such as LIBSVM (see for example, Chang and Lin, "LIBSVM: A library for support vector machines,” ACM
  • LIBSVM is a library for support vector machines. LIBSVM trains a data set to obtain a model and uses the model to predict information of a testing data set. Support Vector
  • FIG. 6 is an exemplary image of a virtual slide including classified regions, including ground truth relevant regions, e.g., 601, ground truth irrelevant regions, e.g., 602, and classified relevant regions 603.
  • FIG. 6 is an exemplary output of the classification 204 (see FIG. 2), which may be displayed for analysis and diagnosis.
  • a computationally efficient method identifies regions of diagnostic relevance in histopathology virtual slides with high accuracy.
  • This method can serve as a fast triaging or pruning step in a CAD based cancer detection or digital pathology workstations, thereby improving computation and system response time by an order of magnitude.
  • Computational efficiency is achieved by local pixel- based analysis and a sparse color distribution descriptor. Experiments indicate high accuracy and a 10 times speedup potential for the intended application scenarios.
  • embodiments of the present disclosure may be implemented in various forms of hardware, software, firmware, special purpose processors, or a combination thereof.
  • a software application program is tangibly embodied on a non-transitory computer-readable storage medium, such as a program storage device or computer-readable storage medium, with an executable program stored thereon.
  • the application program may be uploaded to, and executed by, a machine comprising any suitable architecture.
  • a computer system (block 701) for detecting diagnostically relevant regions in pathology images includes, inter alia, a CPU (block 702), a memory (block 703) and an input/output (I/O) interface (block 704).
  • the computer system (block 701) is generally coupled through the I/O interface (block 704) to a display (block 705) and various input devices (block 706) such as a mouse, keyboard, medical scanners, power equipment, etc.
  • the display (block 705) may be implemented to display predicted ratings.
  • the support circuits can include circuits such as cache, power supplies, clock circuits, and a communications bus.
  • the memory (block 703) can include random access memory (RAM), read only memory (ROM), disk drive, tape drive, etc., or a combination thereof.
  • the present invention can be implemented as a module (block 707) of the CPU or a routine stored in memory (block 703) and executed by the CPU (block 702) to process input data (block 708), e.g., including the training datasets.
  • the data may include image information from a camera, which may be stored to memory (block 703)
  • the computer system (block 701) is a general purpose computer system that becomes a specific purpose computer system when executing the routine of the present disclosure.
  • the computer platform (block 701) also includes an operating system and micro instruction code.
  • the various processes and functions described herein may either be part of the micro instruction code or part of the application program (or a combination thereof) which is executed via the operating system.
  • various other peripheral devices may be connected to the computer platform such as an additional data storage device and a printing device.

Abstract

A method for distinguishing between different tissue types imaged in a virtual slide includes receiving an image of a tissue (200), wherein the tissue has been treated with a first stain and a second stain, dividing the image into a plurality of image patches (201), accentuating a difference between portions of the tissue stained by the first stain and portions of the tissue stained by the second stain to generated a plurality of preprocessed image patches (202), extracting a plurality of feature descriptors from each of the preprocessed image patches (203) according to a distribution of the portions of the tissue stained by the first stain and the portions of the tissue stained by the second stain, and classifying each of the image patches according to respective the feature descriptors (204).

Description

Attorney Docket No.: 2011P17483WO
AUTOMATED DETECTION OF DIAGNOSTICALLY RELEVANT REGIONS IN
PATHOLOGY IMAGES
CROSS-REFERENCE TO RELATED APPLICATION [0001] This is a non-provisional application claiming the benefit of U.S. provisional application serial number 61/515,421 filed on August 5, 2011, the contents of which are incorporated by reference herein in their entirety.
BACKGROUND
1. Technical Field
[0002] The present disclosure relates to image analysis, and more particularly to a method for detecting diagnostically relevant regions in medical images.
2. Discussion of Related Art
[0003] In the field of disease pathology, histopathology is the examination of tissue in the study of the manifestations of disease. Typically, a histological section of a specimen is placed onto glass slide for study. In some cases this section may be imaged to generated a virtual slide.
[0004] Virtual slides from H&E (hematoxylin & eosin) stained digital histopathology, such as illustrated in FIG. 1, are typically several GigaBytes (GByte) in size. The analysis of virtual slides by pathologists and computer algorithms is often limited by the technologies currently available for digital pathology workstations as described by Patterson et al., "Barriers and facilitators to adoption of soft copy interpretation from the user perspective: Lessons learned from filmless radiology for slideless pathology" J. Pathol. Inform. 2(1), 2011, E. Krupinski, "Virtual slide telepathology workstation-of-the-future: lessons learned from teleradiology," Sem Diag. Path. 26, pp. 194-205, 2009, and Johnson et al., "Using a visual discrimination model for the detection of compression artifacts in virtual pathology images," IEEE Trans. Med. Imaging 30(2), pp. 306-314, 2011.
[0005] Methods for Computer Aided Diagnosis (CAD) for histopathology based cancer detection and grading are discussed in Khurd et al., "Computer-aided gleason grading of prostate cancer histopathological images using texton forests," in Proceedings of the 2010 IEEE international conference on Biomedical imaging: from nano to Macro, ISBI' 10, pp. 636-639, (Piscataway, NJ, USA), 2010 and Khurd et al., "Network cycle features:
Application to computer-aided gleason grading of prostate cancer histopathological images," in ISBI, pp. 1632-1636, 2011. Further CAD methods are described by Naik et al.,
"Automated gland and nuclei segmentation for grading of prostate and breast cancer histopathology," in ISBI, pp. 284-287, 2008, Huang and Lee, "Automatic classification for pathological prostate images based on fractal analysis," IEEE Trans. Med. Imaging 28(7), pp. 1037-1050, 2009, and Tabesh et al., "Multifeature prostate cancer diagnosis and gleason grading of histological images," IEEE Trans. Med. Imaging 26(10), pp. 1366-1378, 2007.
BRIEF SUMMARY
[0006] According to an exemplary embodiment of the present disclosure, a computationally efficient method for analyzing H&E stained digital pathology slides may distinguish diagnostically relevant regions from irrelevant regions.
[0007] According to an exemplary embodiment of the present disclosure, a method for distinguishing between different tissue types imaged in a virtual slide includes receiving an image of a tissue, wherein the tissue has been treated with a first stain and a second stain, dividing the image into a plurality of image patches, accentuating a difference between portions of the tissue stained by the first stain and portions of the tissue stained by the second stain to generated a plurality of preprocessed image patches, extracting a plurality of feature descriptors from each of the preprocessed image patches according to a distribution of the portions of the tissue stained by the first stain and the portions of the tissue stained by the second stain, and classifying each of the image patches according to respective the feature descriptors, the method characterized by the extraction of the feature descriptors, wherein a sparse representation of each of the preprocessed image patches is generated as a histogram of the feature descriptors in a plurality of uniformly distributed percentile ranks.
[0008] Each of the image patches may be rectangular.
[0009] The difference between the portions of the tissue stained by the first stain and the portions of the tissue stained by the second stain may be accentuated by a linear color transform into two channels, wherein the two channels correspond to the portions of the tissue stained by the first stain and the portions of the tissue stained by the second stain, respectively.
[0010] The linear color transform may amplify a color of the first stain and suppresses a color of the second stain in each of the two channels.
[0011] The method may further include determining dominant purple and non-purple pixel values. The method may further include determining a plurality of axes corresponding to the dominant purple and non-purple pixel values.
[0012] The feature descriptors may be determined at a level of pixel data of the image. The feature descriptors may include a first descriptor corresponding to nuclei pixels and a second descriptor corresponding cytoplasm pixels.
[0013] According to an exemplary embodiment of the present disclosure, a data processing system for distinguishing between different tissue types includes a memory device storing an image of a tissue, wherein the tissue has been treated with a first stain and a second stain, and a processor configured to distinguishing between different tissue types by dividing the image into a plurality of image patches. The processor accentuating a difference between portions of the tissue stained by the first stain and portions of the tissue stained by the second stain to generated a plurality of preprocessed image patches, extracting a plurality of feature descriptors from each of the preprocessed image patches according to a distribution of the portions of the tissue stained by the first stain and the portions of the tissue stained by the second stain, and classifying each of the image patches according to respective the feature descriptors.
[0014] According to an exemplary embodiment of the present disclosure, a method for distinguishing between different tissue types imaged in a virtual slide includes receiving an image of a tissue, wherein the tissue has been treated with a first stain and a second stain, dividing the image into a plurality of image patches, accentuating a difference between portions of the tissue stained by the first stain and portions of the tissue stained by the second stain to generated a plurality of preprocessed image patches, extracting a plurality of feature descriptors from each of the preprocessed image patches according to a distribution of the portions of the tissue stained by the first stain and the portions of the tissue stained by the second stain, and classifying each of the image patches according to respective the feature descriptors, wherein the image of the tissue is displayed including an indication of an image patch classified as relevant.
BRIEF DESCRIPTION OF THE DRAWINGS
[0015] Preferred embodiments of the present disclosure will be described below in more detail, with reference to the accompanying drawings:
[0016] FIG. 1 is an exemplary virtual slide for a breast biopsy specimen; [0017] FIG. 2 is a flow diagram of a detection method according to an exemplary
embodiment of the present disclosure;
[0018] FIG. 3 is a flow diagram of a detection method according to an exemplary
embodiment of the present disclosure;
[0019] FIG. 4 is a graph show a color transform according to an exemplary embodiment of the present disclosure;
[0020] FIG. 5 is a graph showing a percentile descriptor according to an exemplary embodiment of the present disclosure;
[0021] FIG. 6 is an exemplary image of a virtual slide including classified regions according to an exemplary embodiment of the present disclosure; and
[0022] FIG. 7 is a diagram of a computer system for performing a detection method according to an exemplary embodiment of the present disclosure.
DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS
[0023] According to an embodiment of the present disclosure, H&E (hematoxylin & eosin) stained digital pathology slides may be analyzed, wherein diagnostically relevant regions are distinguished from diagnostically irrelevant regions.
[0024] The ability to detect diagnostically relevant regions of a medical image can speed Computer Aided Diagnosis (CAD) for histopathology based cancer detection and grading through a triage-like preprocessing and pruning. Further, ability to distinguish between different regions can improve the response time for an interactive digital pathology workstation, even in a case of GByte-plus sized histopathology slides, for example, through controlling adaptive compression or prioritization algorithms. Moreover, ability to distinguish diagnostically relevant can support the detection and grading workflow for expert pathologists in a semi- automated diagnosis, hereby increasing throughput and accuracy. [0025] According to an embodiment of the present disclosure, a statistical characterization of tissue components may be indicative of pathology. For example, a pathologist's decision about malignancy vs. benignancy, for example based on such as, nuclei, tubules, cytoplasm, etc. may be informed by the identification of tissue components based on different statistical characterizations. According to an embodiment of the present disclosure, visual descriptors that capture the distribution of color intensities observed for nuclei and cytoplasm may be used to visualize the statistical characterization. A model for distinguishing between statistics of relevant regions and irrelevant regions may be learned from annotated data, and an inference may be performed via linear classification.
[0026] According to an embodiment of the present disclosure, virtual slides from H&E stained digital histopathology, such as illustrated in FIG. 1 may be analyzed. The analysis includes automatically identifying diagnostically relevant regions in such slides and discarding the irrelevant regions.
[0027] According to an embodiment of the present disclosure, a triage-like preprocessing context may be used with high detection accuracy (e.g., about 100%, while false positive detection is low. In a case where a detection method is applied at to entire large virtual slide, e.g., for pruning, computational speed is high, with additional improvement available through the use of hardware speedup, e.g., cluster or GPU processing.
[0028] Referring now to an exemplary detection method according to an embodiment of the present disclosure, the virtual slides are breast biopsy specimens using a DMetrix scanner in the Arizona Telemedicine Program. Slide images are sampled at 0.47 mm/pixel. For a slide with 1 to 4 cm of tissue, a single 40X objective scan yields 1 to 5 GB of uncompressed RGB image data (the RGB color model includes Red, Green, and Blue color components). FIG. 1 shows an example of a virtual slide 100 having a resolution of about 40000 x 30000 pixels. Two close-up views, 101 and 102, showing examples of different tissue regions that were classified by an expert pathologist as relevant and irrelevant, respectively, to the diagnosis of breast cancer.
[0029] More particularly, FIG. 1 shows an exemplary 3.6 GByte virtual slide 100 for a breast biopsy specimen with two close-up views, 101 and 102, of a diagnostically relevant and irrelevant regions, respectively. The difference between the two samples can be clearly seen by the number of indicative elements of nuclei, tubules, cytoplasm, etc.
[0030] Diagnostically relevant regions, e.g., 101, may be distinguished by a large amount of epithelial nuclei and tubule formation, whereas irrelevant regions, e.g., 102, are dominated by cytoplasm tissue. In H&E stained images, these tissue components are stained dark purple (nuclei) and pink (cytoplasm and the extracellular connective tissue).
[0031] Pathologists typically start by visually scanning a virtual slide to identify the most diagnostically relevant tissue. According to an embodiment of the present disclosure, an automated detection method (see FIG. 2) includes receiving an image (e.g., virtual slide) 200 and subdividing the virtual slide 201. Here the virtual slide may be subdivided into square image patches (e.g., 256 x 256 pixels corresponding to 120 x 120 millimeter (mm)). The distribution of nuclei and cytoplasm may be modeled. More particularly, the detection method may employ a combination of color preprocessing 202, extraction of feature descriptors 203, and classification based on machine learning 204, as is illustrated in FIG. 2. In FIG. 2 an image region is transformed into H and E color channels 202, and percentile feature descriptors are extracted 203 and classified with a linear Support Vector Machine (SVM) classifier 204.
[0032] FIG. 3 is another flow diagram of an automated detection method. In FIG. 3 it may be observed that the method takes the image patches as input (301) and that H&E channels are determined from each image patch (302). Percentile features are determined for each channel (303) and the percentile features are classified as relevant or irrelevant using a linear SVM classification (304), for example, based on the distribution of nuclei pixels and cytoplasm pixels as described herein.
[0033] Referring to the color representation, in H&E stained specimens, nuclei appear purple and cytoplasm appears pink. The staining method may include the application of hemalum, including a complex of aluminum ions and oxidized haematoxylin. The hemalum colors nuclei of cells purple or blue. The nuclear staining is followed by counterstaining with an aqueous or alcoholic solution of eosin Y, which colors eosinophilic structures, including cytoplasm, pink, red or orange.
[0034] In a detection method according to an embodiment of the present disclosure, distributions of these components may be characterized. The difference between these colors may be accentuated by a (linear) color transform into two channels, which may be called H and E. The channels each amplify the hematoxylin (eosin) stain and at the same time suppressing the eosin (hematoxylin) stain. More generally, the method may determine dominant purple and non-purple pixel values from the data and subsequently determines main axes for the transform orthogonal to those. FIG. 4 shows the axes of dominant pixel values (400 and 401) and the transformation axes (402, 403 and 404) for an example.
[0035] Referring to the descriptor 203 and classifier 204, in a detection method according to an embodiment of the present disclosure, automated processing may be based on the distribution of nuclei pixels and cytoplasm pixels. The level of pixels may be selected rather than higher abstraction levels, such as shape information, to achieve greater computational speed.
[0036] The descriptor may be based on a histogram of observed levels in the pair of H and E channels. For example, a histogram matching method may be used. In another example, a sparse representation of uniformly distributed percentile ranks may be used. [0037] Referring to the a sparse representation, for example, nine percentile ranks (at 10%, 20%, 90%) or eleven percentile ranks (at 0%, 10%, 20%, 100%) may be used. One of ordinary skill in the art would appreciate that different numbers of percentile ranks may be used. The percentile ranks correspond to cuts of the cumulative histogram (compare FIG. 1) with the corresponding percentile levels on the ordinate. FIG. 5 is an exemplary percentile descriptor showing the rank values obtained via sorting or by cumulative histogramming for one channel. FIG. 5 plots a normalized cumulative histogram as a function of intensity (here for the E channel). The descriptor takes values from the abscissa at locations where the cumulative histogram cuts the respective percentile levels.
[0038] The percentile values may be combined into an eighteen dimensional feature vector, and a supervised classifier such as a linear SVM may be trained for the classification task. SVM is a universal learning algorithm based on the statistical learning theory. Learning is the process of selecting the best mapping function from a set of mapping models parameterized by a set of parameters. Given a finite sample data set ( , )7 ;) for i = l, 2,..., N, where x. e Rd is a d dimensional input (feature) vector and yi e {-1, 1} is a class label, the objective being to estimate a mapping function f : x→ y in order to classify future test samples.
[0039] The training may be performed using known methods such as LIBSVM (see for example, Chang and Lin, "LIBSVM: A library for support vector machines," ACM
Transactions on Intelligent Systems and Technology 2, pp. 27:1-27:27, 2011).
[0040] LIBSVM is a library for support vector machines. LIBSVM trains a data set to obtain a model and uses the model to predict information of a testing data set. Support Vector
Machine (SVM) formulations supported in LIBSVM include C-support vector classification
(C-SVC), v-support vector classification (v-SVC), distribution estimation (one-class SVM), ε-support vector regression (ε-SVR), and v-support vector regression (v-SVR). [0041] FIG. 6 is an exemplary image of a virtual slide including classified regions, including ground truth relevant regions, e.g., 601, ground truth irrelevant regions, e.g., 602, and classified relevant regions 603. FIG. 6 is an exemplary output of the classification 204 (see FIG. 2), which may be displayed for analysis and diagnosis.
[0042] An implementation of the exemplary detection method is described here in view of two types of experiments: first, on a set of 589 cropped patches that has been labeled by pathologists as relevant (256 count) or irrelevant (333 count), respectively; and second, on 5 full virtual slides of 1-5 GBytes, where pathologists have selectively marked areas of relevance and irrelevance.
[0043] The classification on the cropped patches was evaluated using ten fold cross validation. Obtained equal error rate (i.e., false positive = false negative rate) is 1.3%, or, using a different point on a ROC (Receiver Operating Characteristic) curve, 4.3% false positive and 0% false negative rate. The latter is notable because a CAD based malignancy detection with the described exemplary method as a pruning tool would degrade performance and would increase speed.
[0044] Computational speed was measured at 10 milliseconds (ms) for a 256 x 256 patch on a standard laptop and scales to approximately 150 seconds on a 1GB slide, which is orders of magnitude faster than texton based approaches for histopathology analysis.
[0045] According to an embodiment of the present disclosure, a computationally efficient method identifies regions of diagnostic relevance in histopathology virtual slides with high accuracy. This method can serve as a fast triaging or pruning step in a CAD based cancer detection or digital pathology workstations, thereby improving computation and system response time by an order of magnitude. Computational efficiency is achieved by local pixel- based analysis and a sparse color distribution descriptor. Experiments indicate high accuracy and a 10 times speedup potential for the intended application scenarios. [0046] It is to be understood that embodiments of the present disclosure may be implemented in various forms of hardware, software, firmware, special purpose processors, or a combination thereof. In one embodiment, a software application program is tangibly embodied on a non-transitory computer-readable storage medium, such as a program storage device or computer-readable storage medium, with an executable program stored thereon. The application program may be uploaded to, and executed by, a machine comprising any suitable architecture.
[0047] Referring to FIG. 7, according to an embodiment of the present disclosure, a computer system (block 701) for detecting diagnostically relevant regions in pathology images includes, inter alia, a CPU (block 702), a memory (block 703) and an input/output (I/O) interface (block 704). The computer system (block 701) is generally coupled through the I/O interface (block 704) to a display (block 705) and various input devices (block 706) such as a mouse, keyboard, medical scanners, power equipment, etc. The display (block 705) may be implemented to display predicted ratings. The support circuits can include circuits such as cache, power supplies, clock circuits, and a communications bus. The memory (block 703) can include random access memory (RAM), read only memory (ROM), disk drive, tape drive, etc., or a combination thereof. The present invention can be implemented as a module (block 707) of the CPU or a routine stored in memory (block 703) and executed by the CPU (block 702) to process input data (block 708), e.g., including the training datasets. For example, the data may include image information from a camera, which may be stored to memory (block 703) As such the computer system (block 701) is a general purpose computer system that becomes a specific purpose computer system when executing the routine of the present disclosure.
[0048] The computer platform (block 701) also includes an operating system and micro instruction code. The various processes and functions described herein may either be part of the micro instruction code or part of the application program (or a combination thereof) which is executed via the operating system. In addition, various other peripheral devices may be connected to the computer platform such as an additional data storage device and a printing device.
[0049] It is to be further understood that, because some of the constituent system components and method steps depicted in the accompanying figures may be implemented in software, the actual connections between the system components (or the process steps) may differ depending upon the manner in which the system is programmed. Given the teachings of the present disclosure provided herein, one of ordinary skill in the related art will be able to contemplate these and similar implementations or configurations of the present disclosure.
[0050] Having described embodiments for a method for detecting diagnostically relevant regions in pathology images, it is noted that modifications and variations can be made by persons skilled in the art in light of the above teachings. It is therefore to be understood that changes may be made in embodiments of the present disclosure that are within the scope and spirit thereof.

Claims

CLAIMS What is claimed is:
1. A method for distinguishing between different tissue types imaged in a virtual slide, the method comprising:
receiving an image of a tissue, wherein the tissue has been treated with a first stain and a second stain (200);
dividing the image into a plurality of image patches (201);
accentuating a difference between portions of the tissue stained by the first stain and portions of the tissue stained by the second stain to generated a plurality of preprocessed image patches (202);
extracting a plurality of feature descriptors from each of the preprocessed image patches according to a distribution of the portions of the tissue stained by the first stain and the portions of the tissue stained by the second stain (203); and
classifying each of the image patches according to respective the feature descriptors
(204),
the method characterized by the extraction of the feature descriptors, wherein a sparse representation of each of the preprocessed image patches is generated as a histogram of the feature descriptors in a plurality of uniformly distributed percentile ranks.
2. The method of claim 1, wherein each of the image patches is rectangular.
3. The method of claim 1, wherein each of the image patches is square.
4. The method of claim 1, wherein the difference between the portions of the tissue stained by the first stain and the portions of the tissue stained by the second stain is accentuated by a linear color transform into two channels, wherein the two channels correspond to the portions of the tissue stained by the first stain and the portions of the tissue stained by the second stain, respectively.
5. The method of claim 4, wherein the linear color transform amplifies a color of the first stain and suppresses a color of the second stain in each of the two channels.
6. The method of claim 4, further comprising determining dominant purple and non-purple pixel values.
7. The method of claim 6, further comprising determining a plurality of axes corresponding to the dominant purple and non-purple pixel values.
8. The method of claim 1, wherein the feature descriptors are determined at a level of pixel data of the image.
9. The method of claim 1, wherein the feature descriptors include a first descriptor corresponding to nuclei pixels and a second descriptor corresponding cytoplasm pixels.
10. A data processing system for distinguishing between different tissue types, the system comprising:
a memory device (603) storing an image of a tissue, wherein the tissue has been treated with a first stain and a second stain (200); and
a processor (602) configured to distinguishing between different tissue types by dividing the image into a plurality of image patches (201), accentuating a difference between portions of the tissue stained by the first stain and portions of the tissue stained by the second stain to generated a plurality of preprocessed image patches (202); extracting a plurality of feature descriptors from each of the preprocessed image patches according to a distribution of the portions of the tissue stained by the first stain and the portions of the tissue stained by the second stain (203); and classifying each of the image patches according to respective the feature descriptors (204).
11. The system of claim 10, wherein the processor is configured to accentuate the difference between the portions of the tissue stained by the first stain and the portions of the tissue stained in each image patch by the second stain by a linear color transform into two channels, wherein the two channels correspond to the portions of the tissue stained by the first stain and the portions of the tissue stained by the second stain, respectively.
12. The system of claim 11, wherein the linear color transform amplifies a color of the first stain and suppresses a color of the second stain in each of the two channels.
13. The system of claim 11, wherein the processor is configured to determine dominant purple and non-purple pixel values.
14. The system of claim 13, wherein the processor is configured to determine a plurality of axes corresponding to the dominant purple and non-purple pixel values.
15. The system of claim 10, wherein the feature descriptors are determined by the processor at a level of pixel data of the image.
16. The system of claim 10, wherein the feature descriptors include a first descriptor corresponding to nuclei pixels and a second descriptor corresponding cytoplasm pixels.
17. A method for distinguishing between different tissue types imaged in a virtual slide, the method comprising:
receiving an image of a tissue, wherein the tissue has been treated with a first stain and a second stain (200);
dividing the image into a plurality of image patches (201);
accentuating a difference between portions of the tissue stained by the first stain and portions of the tissue stained by the second stain to generated a plurality of preprocessed image patches (202);
extracting a plurality of feature descriptors from each of the preprocessed image patches according to a distribution of the portions of the tissue stained by the first stain and the portions of the tissue stained by the second stain (203); and
classifying each of the image patches according to respective the feature descriptors (204), wherein the image of the tissue is displayed including an indication of an image patch classified as relevant.
18. The method of claim 17, wherein the difference between the portions of the tissue stained by the first stain and the portions of the tissue stained by the second stain is accentuated by a linear color transform into two channels, wherein the two channels correspond to the portions of the tissue stained by the first stain and the portions of the tissue stained by the second stain, respectively.
19. The method of claim 18, wherein the linear color transform amplifies a color of the first stain and suppresses a color of the second stain in each of the two channels.
20. The method of claim 19, further comprising:
determining dominant purple and non-purple pixel values; and
determining a plurality of axes corresponding to the dominant purple and non-purple pixel values.
PCT/US2012/049275 2011-08-05 2012-08-02 Automated detection of diagnostically relevant regions in pathology images WO2013022688A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US201161515421P 2011-08-05 2011-08-05
US61/515,421 2011-08-05

Publications (1)

Publication Number Publication Date
WO2013022688A1 true WO2013022688A1 (en) 2013-02-14

Family

ID=47668814

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2012/049275 WO2013022688A1 (en) 2011-08-05 2012-08-02 Automated detection of diagnostically relevant regions in pathology images

Country Status (1)

Country Link
WO (1) WO2013022688A1 (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2016201186A1 (en) * 2015-06-11 2016-12-15 University Of Pittsburgh-Of The Commonwealth System Of Higher Education Systems and methods for finding regions of interest in hematoxylin and eosin (h&e) stained tissue images and quantifying intratumor cellular spatial heterogeneity in multiplexed/hyperplexed fluorescence tissue images
WO2017051187A1 (en) * 2015-09-23 2017-03-30 Pathxl Limited Image processing method & apparatus for normalisaton and artefact correction
GB2542765A (en) * 2015-09-23 2017-04-05 Pathxl Ltd Method and apparatus for tissue recognition
CN109564683A (en) * 2016-09-13 2019-04-02 株式会社日立高新技术 Diagnostic imaging auxiliary device, diagnostic imaging householder method and assaying system
CN111819569A (en) * 2018-03-07 2020-10-23 谷歌有限责任公司 Virtual staining of tissue slice images
WO2022126923A1 (en) * 2020-12-18 2022-06-23 平安科技(深圳)有限公司 Asc-us diagnosis result identification method and apparatus, computer device, and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020186875A1 (en) * 2001-04-09 2002-12-12 Burmer Glenna C. Computer methods for image pattern recognition in organic material
US20050025357A1 (en) * 2003-06-13 2005-02-03 Landwehr Val R. Method and system for detecting and classifying objects in images, such as insects and other arthropods
US20050165290A1 (en) * 2003-11-17 2005-07-28 Angeliki Kotsianti Pathological tissue mapping
US20060040302A1 (en) * 2000-07-26 2006-02-23 David Botstein Methods of classifying, diagnosing, stratifying and treating cancer patients and their tumors
US20100329529A1 (en) * 2007-10-29 2010-12-30 The Trustees Of The University Of Pennsylvania Computer assisted diagnosis (cad) of cancer using multi-functional, multi-modal in-vivo magnetic resonance spectroscopy (mrs) and imaging (mri)

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060040302A1 (en) * 2000-07-26 2006-02-23 David Botstein Methods of classifying, diagnosing, stratifying and treating cancer patients and their tumors
US20020186875A1 (en) * 2001-04-09 2002-12-12 Burmer Glenna C. Computer methods for image pattern recognition in organic material
US20050025357A1 (en) * 2003-06-13 2005-02-03 Landwehr Val R. Method and system for detecting and classifying objects in images, such as insects and other arthropods
US20050165290A1 (en) * 2003-11-17 2005-07-28 Angeliki Kotsianti Pathological tissue mapping
US20100329529A1 (en) * 2007-10-29 2010-12-30 The Trustees Of The University Of Pennsylvania Computer assisted diagnosis (cad) of cancer using multi-functional, multi-modal in-vivo magnetic resonance spectroscopy (mrs) and imaging (mri)

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2016201186A1 (en) * 2015-06-11 2016-12-15 University Of Pittsburgh-Of The Commonwealth System Of Higher Education Systems and methods for finding regions of interest in hematoxylin and eosin (h&e) stained tissue images and quantifying intratumor cellular spatial heterogeneity in multiplexed/hyperplexed fluorescence tissue images
US11376441B2 (en) 2015-06-11 2022-07-05 University of Pittsburgh—of the Commonwealth System of Higher Education Systems and methods for finding regions of in interest in hematoxylin and eosin (HandE) stained tissue images and quantifying intratumor cellular spatial heterogeneity in multiplexed/hyperplexed fluorescence tissue
US10755138B2 (en) 2015-06-11 2020-08-25 University of Pittsburgh—of the Commonwealth System of Higher Education Systems and methods for finding regions of interest in hematoxylin and eosin (H and E) stained tissue images and quantifying intratumor cellular spatial heterogeneity in multiplexed/hyperplexed fluorescence tissue images
CN107924457A (en) * 2015-06-11 2018-04-17 匹兹堡大学高等教育联邦体系 For the area-of-interest in lookup hematoxylin and the organization chart picture of eosin (H & E) dyeing in multiplexing/super composite fluorescence organization chart picture and quantify the system and method for intra-tumor cell spaces heterogeneity
US10671832B2 (en) 2015-09-23 2020-06-02 Koninklijke Philips N.V. Method and apparatus for tissue recognition
US10573002B2 (en) 2015-09-23 2020-02-25 Koninklijke Philips N.V. Image processing method and apparatus for normalisation and artefact correction
CN108140236A (en) * 2015-09-23 2018-06-08 皇家飞利浦有限公司 For normalizing image processing method and device with artifact correction
GB2542765A (en) * 2015-09-23 2017-04-05 Pathxl Ltd Method and apparatus for tissue recognition
WO2017051187A1 (en) * 2015-09-23 2017-03-30 Pathxl Limited Image processing method & apparatus for normalisaton and artefact correction
CN109564683A (en) * 2016-09-13 2019-04-02 株式会社日立高新技术 Diagnostic imaging auxiliary device, diagnostic imaging householder method and assaying system
EP3514755A4 (en) * 2016-09-13 2020-04-29 Hitachi High-Technologies Corporation Image diagnostic assistance device, image diagnostic assistance method, and sample analysis system
US11176668B2 (en) 2016-09-13 2021-11-16 Hitachi High-Tech Corporation Image diagnosis assisting apparatus, image diagnosis assisting method and sample analyzing system
CN109564683B (en) * 2016-09-13 2023-07-04 株式会社日立高新技术 Image diagnosis support device, image diagnosis support method, and sample analysis system
CN111819569A (en) * 2018-03-07 2020-10-23 谷歌有限责任公司 Virtual staining of tissue slice images
US11783603B2 (en) 2018-03-07 2023-10-10 Verily Life Sciences Llc Virtual staining for tissue slide images
CN111819569B (en) * 2018-03-07 2023-10-17 威里利生命科学有限责任公司 Virtual staining of tissue slice images
WO2022126923A1 (en) * 2020-12-18 2022-06-23 平安科技(深圳)有限公司 Asc-us diagnosis result identification method and apparatus, computer device, and storage medium

Similar Documents

Publication Publication Date Title
Bejnordi et al. Stain specific standardization of whole-slide histopathological images
CN109791693B (en) Digital pathology system and related workflow for providing visualized whole-slice image analysis
JP6660313B2 (en) Detection of nuclear edges using image analysis
Bahlmann et al. Automated detection of diagnostically relevant regions in H&E stained digital pathology slides
US20200388033A1 (en) System and method for automatic labeling of pathology images
US8600143B1 (en) Method and system for hierarchical tissue analysis and classification
JP5315411B2 (en) Mitotic image detection device and counting system, and method for detecting and counting mitotic images
US20170091937A1 (en) Methods and systems for assessing risk of breast cancer recurrence
Kothari et al. Eliminating tissue-fold artifacts in histopathological whole-slide images for improved image-based prediction of cancer grade
CN110909756A (en) Convolutional neural network model training method and device for medical image recognition
CN112435243A (en) Automatic analysis system and method for full-slice digital pathological image
WO2019048954A1 (en) Tissue staining quality determination
US20090161928A1 (en) System and method for unsupervised detection and gleason grading of prostate cancer whole mounts using nir fluorscence
Gandomkar et al. Computer-based image analysis in breast pathology
Ragothaman et al. Unsupervised segmentation of cervical cell images using gaussian mixture model
WO2013022688A1 (en) Automated detection of diagnostically relevant regions in pathology images
CN112380900A (en) Deep learning-based cervical fluid-based cell digital image classification method and system
CN110853005A (en) Immunohistochemical membrane staining section diagnosis method and device
CN108416379A (en) Method and apparatus for handling cervical cell image
Nateghi et al. Maximized inter-class weighted mean for fast and accurate mitosis cells detection in breast cancer histopathology images
WO2013019856A1 (en) Automated malignancy detection in breast histopathological images
US20220335736A1 (en) Systems and methods for automatically classifying cell types in medical images
Kanwal et al. Quantifying the effect of color processing on blood and damaged tissue detection in whole slide images
CN112990214A (en) Medical image feature recognition prediction model
CN110838094B (en) Pathological section dyeing style conversion method and electronic equipment

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 12822948

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 12822948

Country of ref document: EP

Kind code of ref document: A1