Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Automatic Ischemic Stroke Lesion Segmentation from Computed Tomography Perfusion Images by Image Synthesis and Attention-Based Deep Neural Networks

Automatic Ischemic Stroke Lesion Segmentation from Computed Tomography Perfusion Images by Image... Ischemic stroke lesion segmentation from Computed Tomography Perfusion (CTP) images is important for accurate diagnosis of stroke in acute care units. However, it is challenged by low image contrast and resolution of the perfusion parameter maps, in addi- tion to the complex appearance of the lesion. To deal with this problem, we propose a novel framework based on synthesized pseudo Di usion-Weighted Imaging (DWI) from perfusion parameter maps to obtain better image quality for more accurate segmentation. Our framework consists of three components based on Convolutional Neural Networks (CNNs) and is trained end-to-end. First, a feature extractor is used to obtain both a low-level and high-level compact representation of the raw spatiotemporal Computed Tomography Angiography (CTA) images. Second, a pseudo DWI generator takes as input the concatenation of CTP perfusion parameter maps and our extracted features to obtain the synthesized pseudo DWI. To achieve better synthesis quality, we propose a hybrid loss function that pays more attention to lesion regions and encourages high-level contextual consistency. Finally, we segment the lesion region from the synthesized pseudo DWI, where the segmentation network is based on switchable normalization and channel calibration for better performance. Experimental results showed that our framework achieved the top performance on ISLES 2018 challenge and: 1) our method using synthesized pseudo DWI outperformed methods segmenting the lesion from per- fusion parameter maps directly; 2) the feature extractor exploiting additional spatiotemporal CTA images led to better synthesized pseudo DWI quality and higher segmentation accuracy; and 3) the proposed loss functions and network structure improved the pseudo DWI synthesis and lesion segmentation performance. The proposed framework has a potential for improving diagnosis and treatment of the ischemic stroke where access to real DWI scanning is limited. Keywords: Ischemic stroke lesion, computed tomography perfusion, image synthesis, segmentation, deep learning 1. Introduction and Di usion-Weighted Imaging (DWI) are preferred imaging modalities for ischemic stroke lesions due to their good soft Stroke is the most common cerebrovascular disease and tissue contrasts. Especially, DWI is considered as the most sen- one of the primary causes of mortality and long-term disabil- sitive method for detection of early acute stroke (Mezzapesa ity worldwide (Kissela et al., 2012). Ischemic stroke is the et al., 2006). However, MR imaging including DWI is relatively most common type of stroke and accounts for 75-85% of all slow and often not accessible for acute stroke patients. Alterna- stroke cases, which is an obstruction of the cerebral blood sup- tively, Computed Tomography Perfusion (CTP) imaging o ers ply and leads to tissue hypoxia (under-perfusion) and tissue insights into cerebral hemodynamics and enables di erentia- death within few hours. The stages of stroke can be classi- tion of salvageable penumbra from irrevocably damaged infarct fied into acute (0 to 24h), sub-acute (24h to 2w) and chronic core (Donahue and Wintermark, 2015). CTP has advantages (>2w) (Gonzalez ´ et al., 2011). Early diagnosis and treatment in in speed and cost, leading to higher availability in acute care the acute stage is critical for recovery of the stroke patient, and units (Gillebert et al., 2014). In CTP imaging, a sequence of medical imaging is important for detection and quantitative as- Computed Tomography Angiography (CTA) images (i.e., spa- sessment of stroke lesions, as well as eligible patient selection tiotemporal 4D images) are acquired during the perfusion pro- for thrombolysis or thrombectomy (Zaharchuk et al., 2012). cess, which results in perfusion parameter maps such as Cere- Among di erent medical imaging methods, Magnetic Res- bral Blood Flow (CBF), Cerebral Blood Volume (CBV), Mean onance Imaging (MRI) sequences such as Fluid-Attenuated Transit Time (MTT), Time to Peak (TTP, or Tmax) to help to Inversion Recovery (FLAIR), T1 weighted, T2 weighted, identify ischemic stroke lesions. Examples of perfusion param- eter maps of two ischemic stroke patients are shown in Fig 1. Corresponding author Segmentation of stroke lesions from medical images can pro- Email address: guotai.wang@uestc.edu.cn (Guotai Wang) Equal Contribution vide quantitative measurements of the lesion region, which is Preprint submitted to Elsevier July 8, 2020 arXiv:2007.03294v1 [eess.IV] 7 Jul 2020 important for quantitative treatment decision procedures. Man- with lesions is still not well addressed (Roy et al., 2010), which ual segmentation of the lesion is time-consuming with low is challenged by the complex variation of pathological lesions inter-rater agreement, and automatic stroke lesion segmentation among patients. Especially, synthesizing pseudo DWI images is more ecient and has a potential to provide more reliable and from CTP images of ischemic stroke lesions has rarely been reproducible segmentation results (Maier et al., 2017). investigated. Considering the limited speed and availability of MRI for This work is a substantial extension of our preliminary con- acute stroke patients, we aim to segment ischemic stroke lesions ference publication (Song and Huang, 2018) that won the MIC- automatically from CTP perfusion parameter maps, which has CAI 2018 ischemic stroke lesion segmentation (ISLES) chal- a potential for improving diagnosis and treatment of ischemic lenge . In this paper, we provide detailed description and in- stroke in a timely fashion. However, this task is very dicult depth discussion of our segmentation framework and validate and the segmentation accuracy is confronted with a lot of chal- it with extensive experiments. The contribution of our work is lenges. First, the appearance of stroke lesions varies consid- summarized as follows. erably at di erent time, even within the same clinical stage of First, we propose a novel elaborated framework for auto- stroke (Gonzalez ´ et al., 2011). Second, the lesions have a large matic ischemic stroke lesion segmentation from CTP images variation of location, shape, size and appearance in the brain, as based on synthesized pseudo DWI. Compared with using only shown in Fig. 1. Some lesions may be aligned with the vascular CTP perfusion parameter maps, our framework additionally ex- supply territories while others may not. The size of some small ploits raw spatiotemporal CTA images for higher pseudo DWI lesions can be only few millimeters, and some large lesions may synthesis quality and lesion segmentation accuracy. Second, cover a complete hemisphere (Maier et al., 2017). The intensity to make use of the raw spatiotemporal CTA images more e- is not homogeneous in the lesion region, and some other stroke- ciently, we propose a feature extractor that obtains more com- similar pathologies may lead to false positives in the segmen- pact and high-level representation of the CTA images automat- tation result. Thirdly, compared with DWI, the perfusion pa- ically, which helps to reduce the required memory and com- rameter maps (CBF, CBV, MTT, and Tmax) are noisy with a putational time and improve the performance of our segmenta- lower spatial resolution, making it dicult to accurately iden- tion method. Thirdly, we propose a novel method to synthesis tify the boundary of stroke lesions, as demonstrated in Fig. 1. In pseudo DWI images with ischemic stroke lesions. We employ a addition, the raw spatiotemporal 4D CTA images contain use- high-level similarity loss function to encourage the pseudo DWI ful information of the ischemic stroke lesion but have a large to be close to the ground truth in terms of both local details and data size. Using the perfusion parameter maps alone without global context, and propose an attention-guided synthesis strat- considering the raw spatiotemporal CTA images may limit the egy so that the generator will focus more on the lesion part, segmentation accuracy, while directly taking raw spatiotempo- which benefits the final segmentation. Last but not least, to seg- ral CTA images for lesion segmentation increases the computa- ment lesions from our synthesized pseudo DWI, we propose tional cost. Therefore, extracting compact and useful features a Convolutional Neural Network (CNN) with channel calibra- from the raw spatiotemporal CTA images is desirable for e- tion and Switchable Normalization (SN) (Luo et al., 2018) that cient and accurate ischemic stroke lesion segmentation. is suitable for small training batch size, and combine it with Although automatic segmentation of ischemic stroke lesion a novel attention-based and hardness-aware loss function that has been widely studied, most of existing methods were pro- helps to obtain more accurate segmentation of ischemic stroke posed to deal with multi-modal MR images (Maier et al., 2017; lesions. Experimental results show that our method achieved Winzeck et al., 2018). Only few works have been reported on state-of-the-art performance on ISLES 2018 challenge and it ischemic stroke lesion segmentation from CTP images (Gille- outperformed direct segmentation from CTP perfusion param- bert et al., 2014; Yahiaoui and Bessaid, 2016; Abulnaga and Ru- eter maps and contemporary image synthesis-based methods bin, 2018). Some old-fashion methods such as template-based for ischemic stroke lesion segmentation from CTP images (Liu, methods (Gillebert et al., 2014) and fuzzy C-Means (Yahiaoui 2018). and Bessaid, 2016) are challenged by the complex appear- ance of stroke lesions. Recently, deep learning methods have 2. Related Works achieved state-of-the-art performance for many medical image segmentation tasks (Shen et al., 2017), and have been applied 2.1. Ischemic Stroke Lesion Segmentation to ischemic stroke lesion segmentation from CTP images (Pin- heiro et al., 2018; Abulnaga and Rubin, 2018; Vikas Kumar Segmentation of ischemic stroke lesion from medical images Anand et al., 2018). However, due to the above mentioned chal- has attracted increasing attentions in recent years (Rekik et al., lenges, it remains dicult to segment the lesions directly from 2012; Maier et al., 2017), and most of them focus on segmen- the perfusion parameter maps. tation from MR images. For example, the ISLES 2015-2017 Inspired by the fact that ischemic stroke lesions in DWI are challenges aimed at ischemic stroke lesion segmentation from easier to identify and segment than those in perfusion param- multi-modal MR images including T1, T1-contrast, FLAIR and eter maps, it is desirable to synthesize pseudo DWI images DWI sequences (Maier et al., 2017; Winzeck et al., 2018). from perfusion parameter maps to help the segmentation task. Though a lot of methods have been proposed for general med- ical image synthesis (Frangi et al., 2018), synthesizing images http://www.isles-challenge.org 2 (a) … … (b) … … CTA (Time 1) CTA (Time 2) CBF CBV MTT Tmax DWI Figure 1: Examples of CTP and DWI images of two patients with ischemic stroke lesions. Column 1-2: CTA images at di erent time points during perfusion. Column 3-6: perfusion parameter maps. Column 7: lesions delineated in DWI images. Note that we aim to segment the lesions from perfusion parameter maps, and DWI is not available at test time in our study. Some early works have used a range of methods for this seg- For example, Burgos et al. (2014) synthesized CT images from mentation task, such as Markov random field model (Kabir MRI through a multi-atlas information propagation scheme. et al., 2007), level set (Feng et al., 2015), random forest (Mitra Bahrami et al. (2016) used dictionary learning to synthesis 7T- et al., 2014) and support vector machine (Maier et al., 2014). like images from 3T MRI. Jog et al. (2017) used regression However, their accuracy is challenged by the complicated seg- random forest to synthesize T2 and FLAIR images from T1 im- mentation problem (Maier et al., 2015). Recently, deep learn- ages. Deep learning methods have also been increasingly used ing has been increasingly used for ischemic stroke lesion seg- for medical image synthesis (Ker et al., 2017), such as deep mentation with better performance. For example, Kamnitsas neural network-based synthesis methods (Nguyen et al., 2015) et al. (2017) proposed a dual pathway 3D CNN combined with and deep adversarial learning-based approaches (Nie et al., fully connected Conditional Random Field (CRF) for brain le- 2018). However, most of existing works deal with general sion segmentation. Cui et al. (2019) proposed an adapted mean cross-modality image synthesis and have not well investigated teacher model to learn from a combination of annotated and the more challenging problem of synthesizing medical images unannotated MR images for the segmentation task. Dolz et al. with pathological lesions. Roy et al. (2010) used an atlas-based (2018) combined DWI and CTP to segment ischemic stroke le- method to synthesize FLAIR images with white matter lesions. sions and used a densely connected UNet with Inception mod- Chartsias et al. (2017) proposed a CNN for synthesizing multi- ules (Szegedy et al., 2016) to handle the variation of lesion size. modal MR images of brain lesions. The e ectiveness of these Despite their good performance, these methods rely on MRI methods for pseudo DWI synthesis from CTP perfusion param- and cannot be directly applied to stroke lesion segmentation eter maps of stroke lesions has rarely been demonstrated. from CTP images. There have been few works on the challenging task of seg- 3. Method mentation of ischemic stroke lesion from CTA or CTP per- fusion parameter maps (Rekik et al., 2012). Some early The proposed framework for ischemic stroke lesion segmen- works used histogram-based classifiers (Rekik et al., 2012) or tation from CTP images is depicted in Fig. 2. Due to the large template-based voxel-wise comparison (Gillebert et al., 2014) inter-slice spacing (9.48 mm in average) of the experimental to deal with this problem. Yahiaoui and Bessaid (2016) used images, the proposed method operates on 2D slices. It con- a multi-scale contrast enhancement algorithm and fuzzy C- sists of a feature extractor, a pseudo DWI generator and a final Means for this task. Recently, Abulnaga and Rubin (2018) lesion segmenter. First, to eciently deal with the large raw used CNNs with pyramid pooling to combine global and lo- spatiotemporal CTA images and reduce the computational re- cal contextual information for this task, where a focal loss was quirements, we design a high-level feature extractor that uses a employed to enable the CNNs to focus more on hard samples. CNN to obtain a compact representation of the raw spatiotem- However, due to the lower signal-to-noise ratio of CTP perfu- poral CTA images. Additionally, we make use of a temporal sion parameter maps compared with DWI, it remains challeng- Maximal Intensity Projection (MIP) of the CTA images as a ing to automatically segment the ischemic stroke lesion from low-level feature. Then, these features are concatenated with CTP images. the perfusion parameter maps to serve as the input of the pseudo 2.2. Cross-Modality Medical Image Synthesis DWI generator, which obtains a pseudo DWI image with better A range of works have investigated the problem of synthesiz- contrast between the lesion and the background. To improve ing medical images from another modality (Frangi et al., 2018). the synthesis quality near lesion regions, we use a high-level 3 K1 n o similarity-based loss function and enable the generator to pay T = max t j K  t < T; H q (t k) = 0 (3) more attention to the lesion. Finally, a segmenter takes the k=0 pseudo DWI image as input and produces a segmentation of the ischemic stroke lesion, where a CNN using channel cali- whereH () is the Heaviside function that obtains 0 for negative bration and switchable normalization trained with an attention- inputs and 1 for positive inputs. q (t) is the first derivative of based and hardness-aware loss function is proposed to improve q(t), and K is a positive integer value which is 5 in this paper. the performance. The three components are trained end-to-end. Therefore, T is defined as the earliest time point where the first Details of these components will be described in the following. derivative of q(t) keeps positive for its following K consecutive time points, and T is defined as the latest time point where the first derivative of q(t) keeps negative for its preceding K consecutive time points. Fig. 3 shows the curve of q(t) with T 3.1. Feature Extraction from Raw Spatiotemporal CTA Images and T in two cases. In CTP imaging, the raw spatiotemporal CTA images have We extract the frames between T and T and obtain a tem- s e been transformed into a simplified feature representation in porally cropped subsequence that corresponds to the perfusion terms of perfusion parameter maps including CBF, CBV, MTT stage of the raw spatiotemporal CTA image. As the duration of and Tmax. Though these parameter maps are useful for detec- the perfusion stage has a variation among di erent subjects, the tion of the stroke lesion, they are not a complete representation temporally cropped subsequence can have di erent time point of the perfusion information in the raw spatiotemporal CTA im- numbers along the temporal axis. To deal with this problem and ages. Therefore, we do not ignore the raw spatiotemporal CTA to reduce the computational cost, we uniformly down-sample images and try to mine some additional features that are useful the temporally cropped subsequence along the temporal axis in the segmentation task. into a fixed time point number of C . The temporally cropped Let I(x; y; z; t) represent a raw spatiotemporal CTA image ob- and down-sampled CTA image is referred to as I , which is tained during the perfusion, where t 2 [0; 1; 2; :::; T 1] and T is used as the input of a CNN for high-level feature extraction. the total number of time points. Considering that the raw spa- Let C  D H W represent the size of I , where D, H and tiotemporal CTA image has a large data size due to a large value W represent the spatial depth, height and width of the input 4D of T , we use a feature extractor to obtain an additional low-level image I respectively. We treat I as a multi-channel 3D volume feature and a compact and high-level representation of the raw and use a 2D CNN for high-level feature extraction from each spatiotemporal CTA image to make an ecient use of it. The slice, as the images have a large inter-slice spacing (9.48 mm feature extraction method is shown in Fig. 2. We extract both in average) in this study. Specifically, we used the UNet (Ron- a manually designed low-level feature and a high-level feature neberger et al., 2015) for the high-level feature extraction due that is automatically learned by a CNN. to its good performance in a range of tasks (Abdulkadir et al., First, the maximal intensity value of a voxel during perfu- 2016; Li et al., 2018; Isensee et al., 2018). The UNet consists of sion may contain information related to the ischemic stroke le- an encoding path and a decoding path. The encoding path uses sion (Murayama et al., 2018). Therefore, in addition to the stan- convolution and down-sampling through max-pooling layers to dard perfusion parameter maps, we apply a Maximal Intensity obtain features at di erent scales with reduced spatial resolu- Projection (MIP) along the temporal axis to I to obtain a low- tion, and the decoding path uses up-sampling (deconvolution) layers to recover the spatial resolutions. We set the output chan- level feature map F : nel of the extractor CNN to 1. Let F denote the CNN’s output and it has a size of 1 D H  W , which is a high-level repre- F = max I(x; y; z; t) (1) sentation of the input spatiotemporal CTA image I . F =  (I ;  ) (4) h e e Second, we use a CNN to extract high-level features of the raw spatiotemporal CTA image due to CNNs’ good per- where  represents the feature extraction network and  de- e e formance in automatic feature extraction (Shen et al., 2017). notes the set of parameters of the network. Though the start and end time points of perfusion do not a ect the MIP image in theory, they are important for the high-level 3.2. Pseudo DWI Synthesis from CTP Images feature extractor, as the CNN is designed to take the frames Inspired by recent works on CNN-based image synthesis during the perfusion as input. To reject frames that are not per- with state-of-the-art performance (Frangi et al., 2018), we fused in the raw spatiotemporal CTA image, we need first to also use CNNs to generate pseudo DWI images, and select detect these two time points. We define a curve of accumulated UNet (Ronneberger et al., 2015) as the backbone network intensity over time as q(t) = I(x; y; z; t). Let T and T x;y;z s e structure due to its good performance. Di erently from pre- denote the estimated start and end time points of the perfusion vious works that synthesized pseudo DWI images only from respectively. They are determined by the following rules: CTP perfusion parameter maps including CBF, CBV, MTT and K1 TMax (Liu, 2018), we additionally take advantage of the ex- n X o T = min t j 0  t < T K; H q (t + k) = K (2) tracted low-level and high-level features (F and F ) so that l h k=0 more information from the raw spatiotemporal CTA image can 4 Feature extraction Loss function Perfusion stage Temporal detection resampling Temporal MIP Weight map 𝐴 Real DWI Weight map 𝐴 Ground truth Spatiotemporal CTA Feature extraction Image synthesis Segmentation loss 𝐿 loss 𝐿 loss 𝐿 Feature concatenation CBF CBV MTT Tmax Φ Φ Low-level High-level Pseudo DWI Segmentation Perfusion parameter maps feature 𝐹 feature 𝐹 Figure 2: Illustration of the proposed framework for ischemic stroke lesion segmentation from CTP images. We extract additional low-level features based on temporal MIP and high-level features based on a CNN from raw spatiotemporal CTA images, and concatenate them with perfusion parameter maps. The concatenated images are used to generate pseudo DWI, from which the lesion is finally segmented.  ,  and  are three CNNs for high-level feature extraction, e g s pseudo DWI generation and lesion segmentation, respectively. 1.5𝑒 Conv + BN + ReLU 1.5𝑒 Adaptive average pooling (1 x 8) 1𝑒 𝑇 𝑇 𝑇 𝑇 Adaptive average pooling (8 x 1) 1𝑒 Reshape and concatenation 5𝑒 5𝑒 Figure 4: Structure of the encoder  to obtain a high-level representation of Time step 𝑡 Time step 𝑡 an input image. The convolution kernels have a size of 33 and a stride of 22. (a) (b) Figure 3: Illustration of start time (T ) and end time (T ) detection of the per- s e where is a weighting parameter for the contextual loss and A fusion stage. is a spatial weight map. jjjj andjjjj are the L2-norm and L1- 2 1 norm respectively. As we follow the common practice of using the Peak Signal-to-Noise Ratio (PSNR) that is related to Mean help to improve the quality of the synthesized pseudo DWI. Let Square Error (MSE) as one of the metrics to evaluate the image F represent the concatenation of CBF, CBV, MTT and TMax. quality, here L2-norm is used for pixel-level loss so that mini- The input of our generator is a concatenation of F , F and F o l h mizing the L2-norm corresponds to maximizing the PSNR. On and thus it has six channels. The generated pseudo DWI can be the other hand, as L1-norm treats each element equally while represented as: L2-norm assigns higher weights (i.e., by squaring) to larger pre- I =  (F ; F ; F ;  ) (5) g g o l h g diction errors that may be caused by outliers, L1-norm has a higher robustness than L2-norm (Ghosh et al., 2017). There- where  represents the pseudo DWI generation network and fore, we use L1-norm for the high-level contextual loss.  is denotes its parameter set. a CNN-based encoder with a parameter set  and it converts Let I represent the DWI ground truth for synthesis. To train I and I into their high-level and compact (i.e., low dimen- g d the generator  so that it can focus on the lesion region and sional) representations, respectively. As ` () operates on in- the output I has a high-level similarity to the ground truth I , g d dividual voxel-wise predictions and does not guarantee global we propose a novel loss function L (I ; I ) that combines a low- g g d and high-level consistency, ` () based on the encoder  helps h c level weighted pixel-wise loss ` (I ; I ) and a high-level contex- l g d to overcome this problem by encouraging closeness between tual loss ` (I ; I ): h g d the lower dimensional non-linear projections of I and I . Our g d encoder  consists of five convolutional layers and two adap- L (I ; I ) = ` (I ; I ) + ` (I ; I ) (6) g g d l g d h g d tive average pooling layers, and its output is a vector of length 16. Details of  are shown in Fig. 4. ` (I ; I ) = jjA (I I )jj (7) l g d g d 2 As our final goal is to segment the ischemic stroke lesion, a good synthesis quality around the lesion region is desirable. ` (I ; I ) = jj (I ;  )  (I ;  )jj (8) h g d c g c c d c 1 Therefore, we use the voxel-wise weight map A to make the 𝑞𝑡 𝑞𝑡 1 x 256 x 256 32 x 128 x 128 64 x 64 x 64 64 x 32 x 32 64 x 16 x 16 64 x 8 x 8 64 x 8 x 1 64 x 1 x 8 64 x 16 for segmentation. We use an SE block after each convolution block in the encoding path of the UNet (Ronneberger et al., 2015). The proposed network is referred to as SLNet, which is Skip connection shown in Fig. 5. (Conv + SN + ReLU) × 2 To deal with the large range of the ischemic stroke lesion size SE block and challenging training samples for the segmentation task, we 1 × 1 Conv propose a novel hybrid loss function to train the segmentation Maxpooling Deconvolution network. Let Y denote the one-hot ground truth label with chan- c c nel number C. We use P and Y to denote the probability of i i Figure 5: The proposed SLNet for ischemic stroke lesion segmentation with voxel i belonging to class c in the prediction output and the Switchable Normalization (SN) and Squeeze-and-Excitation (SE) blocks. ground truth respectively. The proposed loss function is a com- bination of a weighted cross entropy loss function L and a WCE hardness-aware generalized Dice loss function L : generator pay more attention to the lesion region and less at- HGD tention to the background. Let F denote the set of lesion fore- L (P; Y ) = L (P; Y; A) + L (P; Y ) (11) s WCE HGD ground voxels, and Eud(i;F ) denote the shortest Euclidean dis- tance between a voxel i and F . We use A to represent the weight of voxel i in the weight map A: P P N C c c A Y log P i c i i L (P; Y; A) = (12) > WCE P w; if i 2 F N A = (9) i exp(Eud(i;F )=D) 0:5 + ; otherwise exp(Eud(i;F )=D)+1 where w  1 is the weight for foreground voxels and D is a L (P; Y ) = log 1 L (P; Y ) (13) HGD GD positive parameter that controls the sharpness of the weight for background voxels. A decays gradually with the increase of P P C N c c Eud(i;F ), i.e., the weights for voxels that are further from the m Y P c i i i L (P; Y ) = 1 2 (14) P P lesion region are lower. An example of A is shown in Fig. 2. GD C N c c m (Y + P ) c i i i 3.3. SLNet: Stroke Lesion Segmentation Network with Switch- where N is the number of voxels. A is a voxel-wise weight able Normalization and Channel Calibration map, and we use the same one as defined in Eq. 9, which drives Our segmentation network takes the synthesized pseudo the segmentation network to pay more attention to the lesion DWI image I as input and outputs a binary segmentation of region than the background. L is the generalized Dice loss GD the ischemic stroke lesion. Let  represent the segmentation that automatically balances di erent classes by defining a class- N c 2 network and  denote its parameter set. The segmentation net- wise weight m = 1=( Y ) (Sudre et al., 2017). Inspired i i work’s output probability map is formatted as: by the focal loss (Lin et al., 2017) that automatically penalizes hard samples in object detection tasks, we uselog(1 L ) in GD P =  (I ;  ) (10) s g s Eq. 11 that has the same monotonicity as L but gets higher GD gradient values for large L values, so that our segmentation GD where P has C channels and C equals to the class number, loss function is also aware of hard image samples. which is 2 in our binary segmentation task. We select the UNet structure (Ronneberger et al., 2015) as the backbone and extend 3.4. End-to-End Training it in two aspects to obtain a better performance. First, we replace Batch Normalization (BN) layers with The overall pipeline of our feature extractor  , pseudo DWI switchable normalization (Luo et al., 2018) layers, which learn generator  , image context encoder  and the final segmenta- g c to automatically select suitable normalizers for di erent nor- tion network  can be jointly trained in an end-to-end fashion. malization layers of a CNN. Compared with traditional batch The overall loss function for training is therefore defined as: normalization, switchable normalization is more robust to a wide range of batch sizes and more suitable for small batch L = L (P; Y ) + L (I ; I ) + L (F ; I ) (15) s g g d e h d sizes (Luo et al., 2018). In our segmentation task, the large in- put patches and dense feature maps take a lot of memory, which where and are weighting parameters. The segmentation limits the batch size to a small number. Therefore, switchable loss function L (P; Y ) is defined in Eq. 11 and the pseudo DWI normalization is preferred to batch normalization. Second, as synthesis loss function L (I ; I ) is defined in Eq. 6. To ob- g g d di erent channels in a feature map may have di erent impor- tain better synthesized pseudo DWI and lesion segmentation tance, we use a Squeeze-and-Excitation (SE) block (Hu et al., results, we add an extra explicit supervision on F that is the 2018) based on channel attention to calibrate channel-wise fea- output of the feature extractor  . Therefore, we introduce a ture responses. The SE block explicitly models inter-channel loss L (F ; I ) = L (F ; I ) to encourage the similarity between e h d g h d dependencies by learning an attention weight for each channel F and I . The end-to-end training will update  ,  ,  and h d e g c s so that the network relies more on the most important channels simultaneously. 6 4. Experiments and Results negative respectively. 4.1. Data and Implementation HD = max max d(s; G); max d(g; S ) (17) s2S g2G We used the dataset from ISLES challenge 2018 to validate our segmentation framework. The ISLES 2018 dataset includes X X CTP scanning of 103 patients in two centers who presented AS S D = d(s; G) + d(g; S ) (18) jSj +jGj within 8 hours of stroke onset. For the CTP scanning, a contrast s2S g2G agent was administered to the patient and then sequential CTA where S and G denote the set of surface points of a segmen- images were acquired 1-2 seconds apart. Then the perfusion tation result and the ground truth respectively. d(s; G) is the parameter maps CBF, CBV, MTT and Tmax were derived from shortest Euclidean distance between a point s 2 S and all the the raw spatiotemporal CTA images. An MRI DWI scanning points in G. was obtained within 3 hours after the CTP scanning for each patient. The intra-slice pixel spacing ranged from 0.80 mm 4.2. Ablation Studies 0.80 mm to 1.04 mm 1.04 mm, with a slice size of 256 256. The inter-slice spacing ranged from 4.0 mm to 12.0 mm with a We first conducted ablation studies to validate di erent com- mean value of 9.48 mm. The slice number ranged from 2 to 22 ponents of our segmentation framework. Since the ground truth with a mean value of 5.34, and the time point number for CTA segmentations of ISLES testing images were not available for ranged from 43 to 64 with a mean value of 47.18. For high- participants, we split the ocial ISLES training set at patient level feature extraction, all the CTA images were temporally level into our local training, validation and testing sets, which cropped and down-sampled with an output time point number contained images from 65, 6 and 23 scannings respectively. In of C = 6. For preprocessing, intensity values in each DWI vol- e this section, we report the experimental results obtained from ume were scaled to (0, 1) based on the minimal value and the our local testing images. 99-th percentile. Manual delineation of the stroke lesion from DWI images given by an expert was used as the segmentation 4.2.1. Comparison of Di erent Loss Functions for Pseudo DWI ground truth. The training set consisted of 94 scannings of CTP Synthesis and DWI from 63 patients. The testing set consisted of 62 CTP First, we investigated the e ect of di erent loss functions on scannings from 40 patients, for which DWI images were not pseudo DWI synthesis from perfusion parameter maps F , i.e., provided to participants of the challenge. concatenation of CBF, CBV, MTT and Tmax. The proposed Our segmentation framework was implemented by PyTorch loss function Ig (Eq. 6) based on weighted L2 loss and high- with an NVIDIA TITAN X GPU with 12 GB memory. The level contextual loss (Eq. 8) is referred to as w-L2 + ` , which h1 weights of all networks were initialized by Xavier method (Glo- is compared with 1) L1 loss that refers to ` in Eq. 7 being de- rot and Bengio, 2010) and trained with the RMSprop opti- fined as L1 norm with A = 1 for every voxel; 2) L2 loss as mizer (Tieleman and Hinton, 2012), a batch size of 5 and 300 defined in Eq. 7 with A = 1 for every voxel; 3) w-L2 loss epochs. We initialized the learning rate as 0.002 and reduced it that refers to Eq. 7 with weight coecients defined in Eq. 9; by a factor of 0.2 after 180 epochs. The parameter setting was: 4) adversarial training with Generative Adversarial Networks = 1:0, = 1:0, = 1:2, w = 1:5 and D = 50. (GAN), which is referred to as GAN; 5) L2 + GAN that com- To quantitatively evaluate the quality of the generated pseudo bines L2 loss and GAN loss and 6) w-L2 + ` that refers to h2 DWI images, we measured the Structure Similarity (SSIM) and a variant of the proposed I with ` based on L2 norm. For g h Peak Signal-to-Noise Ratio (PSNR) between the DWI ground the GAN method, we used the LSGAN framework proposed by truth and the generated pseudo DWI. These two metrics were Mao et al. (2017), and used a multi-scale discriminator (Ting- calculated both globally (i.e., in the entire image region) and Chun Wang et al., 2018) to guide the generator (i.e. UNet) to locally (i.e., in the region around the ground truth lesion). The produce realistic local details and global appearance. local SSIM and PSNR are helpful for the assessment of our Fig. 6 shows a visual comparison of pseudo DWI generated method’s ability to generate high-quality lesion regions in a by UNet trained with di erent loss functions, where the input pseudo DWI image. images were perfusion parameter maps (F ) for these variants. For quantitative evaluations of the segmentation accuracy, we The synthesized pseudo DWI images are shown in the second use Precision, Recall, Dice score, Hausdor Distance (HD) and row. It can be observed that L1 and L2 obtained similar re- Average Symmetric Surface Distance (ASSD). sults with ambiguous lesion boundary. The use of w-L2 and w- L2 + ` losses helps to obtain clearer lesion boundary respec- h1 2 T P tively. The results of GAN and L2 + GAN are less smoothed, Dice = (16) 2 T P + FN + FP but include some large artifacts as highlighted by the light blue arrows. We additionally investigated the e ect of the synthe- where T P, FP and FN are true positive, false positive and false sized pseudo DWI images on segmentation, where we used the standard cross entropy loss to train a segmentation model 3 (i.e., UNet (Ronneberger et al., 2015)) with each type of these http://www.isles-challenge.org https://pytorch.org synthesized pseudo DWI images respectively. The last row in 7 CBF CBV MTT Tmax Real DWI and segmentation from Real DWI L1 L2 GAN L2 + GAN w-L2 w-L2 + ℓ w-L2 + ℓ Figure 6: Visual comparison of pseudo DWI synthesis result (the second row) obtained by di erent loss functions and their e ect on segmentation (the third row). First row: Concatenation of perfusion parameter maps (F ) was used as the input of UNet for synthesis. w-L2: Weighted L2 loss defined in Eq. 7. w-L2 + ` : o h1 Proposed hybrid loss based on Eq. 6 and Eq. 8. Light blue arrows highlight artifacts obtained by GAN-based methods, and red arrow highlight the segmentation di erences. Green and yellow curves show segmentation result and the ground truth, respectively. Table 1: Quantitative evaluation of di erent training loss functions for pseudo DWI synthesis and their e ect on segmentation. Concatenation of the CTP perfusion parameter maps (F ) was used as the input for synthesis. Loss Global SSIM Local SSIM Global PSNR Local PSNR Dice (%) L1 0.820.11 0.47 0.16 19.364.11 13.604.25 49.4521.20 L2 0.830.11 0.510.17 19.413.63 13.824.18 50.0419.38 GAN 0.810.11 0.370.17 17.574.27 13.454.75 41.5325.08 L2 + GAN 0.780.12 0.520.14 18.303.91 13.224.52 48.7720.73 w-L2 0.830.09 0.530.15 19.433.42 13.994.01 50.9521.03 w-L2 + ` 0.830.10 0.540.13 19.263.32 13.203.38 51.2517.43 h1 w-L2 + ` 0.830.11 0.530.15 19.223.40 13.803.41 51.0221.41 h2 Fig. 6 shows that the segmentation based on synthesized pseudo additionally investigate how these synthesized results a ect the DWI images obtained by w-L2 + ` is more accurate than the segmentation, we used the standard cross entropy loss to train a h1 others, as highlighted by the red arrows. For quantitative eval- UNet (Ronneberger et al., 2015) using each type of these syn- uation, the global and local SSIM and PSNR measurements of thesized pseudo DWI images respectively. Fig. 7 shows a visual results obtained by di erent synthesis loss functions and Dice comparison of pseudo DWI synthesized from di erent input scores of their corresponding segmentation results are presented images. It can be observed that using additional F and F helps l h in Table. 1, which shows that the proposed w-L2 + ` loss to improve local details of the synthesized pseudo DWI, and the h1 function obtains higher local SSIM and Dice than the others. result obtained by concatenation of F , F and F with explicit o l h supervision lead to better image quality than the other variants, as highlighted by the green arrows. Table 2 presents a quantita- 4.2.2. E ect of Feature Extractor on Pseudo DWI Synthesis tive comparison between these di erent inputs for pseudo DWI To investigate the e ect of our feature extractor on the syn- synthesis and the downstream segmentation, which shows that thesized pseudo DWI, we compared the quality of pseudo DWI using additional low-level feature F leads to an improvement images generated from di erent inputs: 1) the standard CTP of global and local SSIM and PSNR from using CTP perfusion perfusion parameter maps (F ) only, i.e., without using our fea- parameter maps F only. The high-level feature F extracted o h ture extractor; 2) concatenation of F and our extracted low- by CNN and explicit supervision by L can further lead to im- level feature F defined in Eq. 1; 3) concatenation of F , F and l o l proved SSIM and PSNR values, which demonstrates that the F , where F denotes the high-level feature obtained by the h h proposed feature extractor making use of the raw spatiotempo- CNN-based feature extractor  trained without explicit super- ral CTA images helps to obtain better synthesized pseudo DWI vision, i.e., L is not used; and 4) concatenation of F , F and e o l images. Fig. 7 and Table 2 also show that synthesis based on F , where F is the high-level feature obtained by  trained h h e F , F and F leads to higher segmentation accuracy than the with explicit supervision through L . We used the proposed loss o l h other variants. function I (i.e., w-L2 + ` ) to train the synthesis network. To g h1 Table 2: Quantitative evaluation of di erent inputs for pseudo DWI synthesis and their e ect on segmentation. F : Concatenation of perfusion parameter maps. F : MIP of spatiotemporal CTA images. F and F are the high-level features obtained by the CNN-based feature extractor  trained without and with explicit l h e supervision through L , respectively. The proposed hybrid loss function L was used for training. e g Input Global SSIM Local SSIM Global PSNR Local PSNR Dice (%) F 0.830.10 0.540.13 19.263.32 13.203.38 51.2517.43 F , F 0.840.12 0.560.15 20.013.69 13.904.04 53.9414.39 o l F , F , F 0.840.11 0.580.16 20.163.97 14.054.04 54.6120.19 o l F , F , F 0.850.12 0.590.12 20.023.51 14.113.98 55.1016.20 o l h Real DWI 72.1719.54 Table 3: Quantitative evaluation of di erent networks for ischemic stroke lesion segmentation. SLNet: The proposed network for ischemic stroke lesion segmenta- tion. Concatenation of the CTP perfusion parameter maps (F ) was used as the input, and the cross entropy loss function was used for training. Network Parameter (M) Precision (%) Recall (%) Dice (%) HD (mm) ASSD (mm) FCN (Long et al., 2015) 18.64 52.6925.28 53.1033.47 45.7524.59 48.7530.83 3.785.37 UNet (Ronneberger et al., 2015) 31.04 63.5022.22 48.5024.55 49.9419.51 21.5213.84 2.543.34 R2UNet (Alom et al., 2018) 39.09 51.8517.99 60.0521.14 52.3416.62 26.7415.04 2.562.21 ResUNet (Xiao et al., 2018) 81.91 66.4619.32 52.3624.12 52.4918.66 23.5015.48 2.222.04 SLNet 33.84 51.2022.00 64.2023.99 54.4521.23 23.9717.61 2.743.44 SLNet (w/o SE) 31.04 69.0022.90 51.5419.81 53.4114.31 21.2612.32 2.161.90 SLNet (w/o SN) 33.84 68.1421.08 48.3223.31 52.0120.69 21.5015.63 2.572.78 di erence between di erent networks is relatively small. In the second row, SLNet w/o SE, SLNet w/o SN and UNet obtained more under-segmentations than SLNet, and FCN, R2UNet and ResUNet obtained more over-segmentations than SLNet. Quantitative comparison between these di erent networks is shown in Table 3. The proposed SLNet achieved the highest av- erage Dice score and Recall among all the compared networks, while SLNet w/o SE achieved slightly better HD and ASSD evaluation results. 𝐹 𝐹 ,𝐹 𝐹 ,𝐹 ,𝐹 𝐹 ,𝐹 ,𝐹 Real DWI 4.2.4. Comparison of Di erent Training Loss Functions for Figure 7: Visual comparison of pseudo DWI synthesized (top row) from di er- Segmentation ent input images and their e ect on segmentation (bottom row). F : Concatena- We also investigate the e ect of di erent training loss func- tion of perfusion parameter maps. F : MIP of spatiotemporal CTA images. F and F are the high-level features obtained by the CNN-based feature extractor tions for the segmentation network. We refer to our proposed trained without and with explicit supervision through L , respectively. The e e weighted cross entropy loss with hardness-aware generalized proposed hybrid loss function I defined in Eq. 6 was used for training. Green Dice loss as L + L and compare it with 1) cross entropy WCE HGD arrows highlight local di erences of the pseudo DWI, and red arrows highlight loss L , 2) Dice loss L (Milletari et al., 2016), 3) gener- CE DICE the segmentation di erence. Green and yellow curves show segmentation result alized Dice loss L (Sudre et al., 2017), 4) hardness-weighted and the ground truth, respectively. GD L , which is defined in Eq. 13 and referred to as L , and 5) GD HGD a variant of the proposed loss that does not pay attention to le- 4.2.3. Comparison of Di erent Network for Segmentation sion foreground (i.e., A is 1 for every voxel), which is referred To investigate the e ect of network structure on our ischemic to as L + L . We used these loss functions to train our CE HGD stroke lesion segmentation task, we compared our proposed SLNet to segment the ischemic stroke lesion from CTP perfu- SLNet with 1) SLNet w/o SE, where the SE blocks are not sion parameter maps F respectively. used in SLNet, 2) SLNet w/o SN, where the switchable nor- Quantitative evaluation results of these di erent segmenta- malization layers are replaced with traditional batch normal- tion loss functions are listed in Table 4. It can be observed that ization layers in SLNet, 3) the Fully Convolutional Network the combination of L and L outperforms using a single CE HGD (FCN) (Long et al., 2015), 4) UNet (Ronneberger et al., 2015), loss of L or L . By enabling the network to focus more CE HGD 5) Recurrent Residual UNet (R2UNet) (Alom et al., 2018), and on the lesion region through L + L , the values of Recall WCE HGD 6) Residual UNet (ResUNet) (Xiao et al., 2018). We trained and Dice are improved. Our proposed L + L achieved WCE HGD these networks with CTP perfusion parameter maps F as input the highest average Dice score of 59.37%, which is a large im- and used the cross entropy loss function for training. provement from 54.45% achieved by the baseline of L . CE Fig. 8 shows a visual comparison of segmentation results ob- 4.2.5. E ect of Feature Extractor and Pseudo DWI Generator tained by these networks, where the lesions are shown with the on Segmentation corresponding real DWI images for better visualization. It can be observed that it is challenging for all these networks to ob- With our proposed feature extraction and image synthesis tain very accurate segmentation of the ischemic stroke lesion. method, we evaluate the value of our pseudo DWI generated However, the results of our SLNet have a better overlap with from F , F and F for ischemic stroke lesion segmentation, o l h o;l;h the ground truth compared with the others. In the first row, the where the pseudo DWI is referred to as DWI . We compared 9 Segmentation Ground truth SLNet SLNet w/o SE SLNet w/o SN FCN UNet R2UNet ResUNet Figure 8: Visual comparison of di erent networks for ischemic stroke lesion segmentation. Concatenation of the CTP perfusion parameter maps (F ) was used as the input of CNNs and cross entropy loss function was used for training. For better visualization, the segmentation results are shown with the real DWI images. Table 4: Quantitative evaluation of di erent training loss functions for ischemic stroke lesion segmentation based on our proposed SLNet. Concatenation of perfusion parameter maps (F ) was used as the input. L : Cross entropy loss. L : Weighted cross entropy loss. L : Dice loss. L : Generalized Dice loss. o CE WCE DICE GD L : Hardness-aware generalized Dice loss. HGD Loss function Precision (%) Recall (%) Dice (%) HD (mm) ASSD (mm) L 51.2022.00 64.2023.99 54.4521.23 23.9717.61 2.743.44 CE L 67.4825.25 45.4521.81 51.5720.98 24.4318.48 2.472.56 DICE L 55.0722.10 62.1317.09 54.9817.16 36.8729.61 3.453.37 GD L 52.4020.71 66.8318.98 55.3017.78 30.8620.10 2.812.61 HGD L + L 57.3122.87 66.3717.85 57.8217.33 21.5812.69 1.961.84 CE HGD L + L 55.2020.99 73.5117.48 59.3715.73 22.2913.67 1.902.05 WCE HGD o;l;h segmentation from DWI with segmentation from 1) raw sponding Hausdor Distance values are 25.25 mm, 22.29 mm, CTA images that were temporally cropped and down-sampled 19.27 mm and 15.90 mm, respectively. We found that adding o;h;l (i.e., I as described in Section 3.1), 2) CTP perfusion parame- F to DWI leads to a reduced segmentation performance o o;h;l ter maps F , 3) DWI that refers to pseudo DWI generated from compared with using DWI only. This is due to that using F o o o;l o;h;l F , 4) DWI that refers to pseudo DWI generated from F and performs worse than using DWI , and a combination of them o o o;l;h F , and 5) concatenation of F and DWI . We used these dif- just obtains a segmentation accuracy above that of using F and l o o o;h;l ferent setting of synthesized pseudo DWI images for end-to-end below that of using DWI . It can be observed from Table 5 o;l;h o;l;h training respectively, where the overall loss function in Eq. 15 that DWI and DWI (s) obtained very close segmentation o;l;h combined with our SLNet was used for segmentation. We also accuracy in terms of Dice. However, DWI achieved smaller o;l;h o;l;h o;l;h compared DWI with its variant DWI (s) that refers to our HD and ASSD values than DWI (s). ,  and  were trained subsequently rather than end-to- e g s end. Additionally, we trained SLNet with real DWI images to investigate the gap between segmentation from synthesized As the ischemic stroke lesions vary largely in sizes, we inves- pseudo DWI images and from real DWI images. tigated the segmentation performance at di erent lesion scales. Fig. 9 presents a visual comparison between ischemic stroke We divided the local testing set into three groups: 1) 9 images lesion segmentation results from di erent input images, which with small lesions (< 10 CC), 2) 10 images with medium le- shows that the results segmented from our synthesized pseudo sions (10 - 50 CC) and 3) 4 images with large lesions (> 50 CC). DWI images are better than those of other variants. Table 5 For evaluation, we additionally measured the Relative Volume presents the quantitative evaluation results. It shows that us- Error (RVE): RVE = abs(V V )=V , where V and V are g s g g s ing DWI generated from CTP perfusion parameter maps leads the volume of a ground truth lesion and the segmented lesion, o;l;h to a slightly decreased segmentation accuracy. By using ad- respectively. Table 5 shows that DWI obtained a lower aver- ditional features F and F extracted from the raw spatiotem- age RVE value than the others except for the real DWI. Fig. 10 l h o;l o;l;h poral CTA images for synthesis, DWI and DWI lead to shows the distributions of Dice and RVE in these three groups. an improvement of Dice score respectively. Table 5 shows The average Dice values achieved by our proposed method o;l;h o;l;h that using DWI outperformed the other variants. The av- (i.e., DWI ) for these three groups were 59.50%, 68.87% and erage Dice scores for segmentation from original CTA images, 56.44% respectively. The lower performance in the small and perfusion parameter maps (i.e., F ), synthesized pseudo DWI large groups indicate that it remains dicult for the proposed o;l;h based on our proposed method (i.e., DWI ) and real DWI are method to deal with extreme cases with small and very large 56.10%, 59.37%, 62.11% and 79.72%, respectively. The corre- lesions. , ,, ,, ,, CTA 𝐹 𝐷𝑊𝐼 𝐷𝑊𝐼 𝐷𝑊𝐼 𝐷𝑊𝐼 (s) 𝐹 𝐷𝐼𝑊 Real DWI Figure 9: Visual comparison of ischemic stroke lesion segmentation from di erent input images. Yellow and green curves show segmentation and the ground truth, o o;l o;l;h respectively. F : CTP perfusion parameter maps. DWI , DWI and DWI are pseudo DWI images generated from F , (F , F ), and (F , F , F ) respectively. o o o l o l h o;l;h o;l;h DWI (s) is a variant of DWI where our  ,  and  were trained subsequently rather than end-to-end. For better visualization, the segmentation results are e g s shown with the real DWI images. o;l;h Table 5: Quantitative comparison of ischemic stroke lesion segmentation from di erent input images. F : perfusion parameter maps. DWI is our proposed o;l;h o;l;h method with pseudo DWI synthesized from (F , F , F ) as shown in Fig. 2. DWI (s) is a variant of DWI where our  ,  and  were trained subsequently o l h e g s rather than end-to-end. The results are based on our proposed SLNet and loss function L defined in Eq. 11. Input Precision (%) Recall (%) Dice (%) HD (mm) ASSD (mm) RVE CTA 65.6323.08 58.1018.63 56.1014.22 25.2516.60 2.452.26 0.730.74 F 55.2020.99 73.5117.48 59.3715.73 22.2913.67 1.902.05 0.831.27 DWI 53.6821.27 74.3716.11 58.3215.74 19.8313.10 1.902.09 0.991.63 o;l DWI 58.7022.32 71.0414.75 60.4916.26 22.7916.90 1.992.07 0.831.34 o;l;h DWI 61.9721.98 69.5217.89 62.1117.18 19.2713.17 1.762.10 0.681.36 o;l;h DWI (s) 57.0520.76 77.8013.56 62.2315.47 20.8615.05 1.841.97 0.911.60 o;l;h F + DWI 59.0622.25 71.3015.46 60.5417.17 22.4219.58 2.072.95 0.831.33 Real DWI 85.0719.35 77.3415.03 79.7215.53 15.9014.13 1.352.77 0.240.24 Table 6: Quantitative comparison of the top five methods for ISLES 2018 test- used a CNN to generate pseudo DWI for segmentation, but ing set. only from CTP perfusion parameter maps with GAN, and the achieved Dice and Recall are lower than ours. The other three Method Dice Precision Recall methods segmented the ischemic stroke lesion from CTP per- Ours 0.51  0.31 0.55  0.36 0.55  0.34 fusion parameter maps directly. Chen et al. used an ensemble Liu (2018) 0.49  0.31 0.56  0.37 0.53  0.33 Chen et al. 0.48  0.32 0.59  0.38 0.46  0.33 of multiple networks combined with several data augmentation Hu et al. 0.47  0.31 0.56  0.37 0.47  0.33 methods. Hu et al. proposed a multi-level 3D refinement mod- Garcia et al. 0.47  0.31 0.56  0.37 0.47  0.33 ule trained with curriculum learning. Clerigues et al. also used an ensemble of multiple networks, and employed a patch sam- pling strategy to alleviate class imbalance. 4.3. Comparison with Other ISLES Participants We also trained our proposed method with the entire ISLES 5. Discussion and Conclusion 2018 training set, and submitted the segmentation results of ISLES 2018 testing set to the online evaluation platform for Due to the low contrast and low resolution of CTP perfusion quantitative evaluation. According to the ISLES 2018 leader- parameter maps, it is challenging to directly use these images board , our method achieved the top performance among 62 for ischemic stroke lesion segmentation. Transferring the perfu- teams. Table 6 lists the quantitative evaluation results of the top sion parameter maps to pseudo DWI images via image synthe- five methods for ISLES 2018, where our method outperformed sis is a promising way for the segmentation task, as DWI images the others with an average Dice score of 0.51. Liu (2018) also http://www.isles-challenge.org/articles/Yu_Chen.pdf 5 8 https://www.smir.ch/ISLES/Start2018 http://www.isles-challenge.org/articles/Xiaojun_Hu.pdf 6 9 Listed in the ’Results’ section of http://www.isles-challenge.org http://www.isles-challenge.org/articles/albert.pdf 11 1.0 1.0 0.8 0.8 0.8 0.6 0.6 0.6 0.4 0.4 0.2 0.4 0.2 1.0 7.5 3.0 5.0 2.0 0.5 2.5 1.0 0.0 0.0 0.0 (a) Small lesions (< 10 CC) (b) Medium lesions (10 to 50 CC) (c) Large lesions (> 50 CC) o;l;h Figure 10: Dice and RVE for lesions at three scales segmented from di erent types of images. F : perfusion parameter maps. DWI is our proposed method with o;l;h o;l;h pseudo DWI synthesized from (F , F , F ) as shown in Fig. 2. DWI (s) is a variant of DWI where our  ,  and  were trained subsequently rather than o l h e g s end-to-end. The results are based on our proposed SLNet and loss function L . have a better contrast between the lesion and the background Results in Table 1 show that this leads to an improvement of lo- and they are used for obtaining the ground truth ischemic stroke cal SSIM around the lesion region. However, we found that our lesion region. The ISLES 2018 finalist and our experiments synthesized pseudo DWI images are still not as good as the real showed that pseudo DWI-based segmentation methods outper- DWI images. For example, Table 1 and Table 2 indicate that the formed direct segmentation from perfusion parameter maps. PSNR numbers are not very high. This is mainly due to that the high-frequency components in the real DWI images are not well The quality of the synthesized pseudo DWI images has a synthesized, as shown in Fig. 6 and Fig. 7. The high-frequency large impact on the segmentation performance. A good contrast components are related to local fine-grained details, noises and with enhanced and preserved lesion information in the pseudo some artifacts. As demonstrated by Xu et al. (2019), CNNs DWI is important for good segmentation results. Though deep capture low-frequency components at the early stage of train- learning for image synthesis has achieved very good perfor- ing, and then capture high-frequency components and tend to mance in other tasks (Frangi et al., 2018), the synthesis of overfit at the late stage of training. During the training with our pseudo DWI with ischemic stroke lesion in this study is still relatively small dataset, we used the best performing checkpoint challenging due to the low quality of perfusion parameter maps on the validation set for testing to minimize the risk of under- and a small number of training images. To alleviate this prob- fitting or over-fitting. As an incidental e ect, we found that the lem, we used two strategies. First, we exploited information synthesized pseudo DWI images related to that checkpoint did in the raw spatiotemporal CTA images by extracting low-level not have many high-frequency components. It is of interest to and high-level features in additional to the perfusion parameter further improve the pseudo DWI quality, which has a promising maps. Results show that this helps to obtain higher pseudo DWI to obtain better segmentation results. As the synthesized pseudo quality and higher segmentation accuracy than using perfusion DWI and real DWI can be regarded as coming from two di er- parameter maps only, as demonstrated in Table 2 and Table 5. ent domains, some domain adaptation methods (Perone et al., From Fig. 7 and Table 2, we find that using an explicit supervi- 2019) can be used in the future to obtain better segmentation sion on the feature extractor leads to some improvement of seg- performance with pseudo DWI. mentation accuracy, but the di erence was not significant. This phenomenon is expected as the explicit supervision serves as a For segmentation networks, by using switchable normaliza- deep supervision. When it is not used, the feature extractor can tion and SE block based on channel attention, the segmenta- also be updated based on the loss function, and the deep super- tion Dice and Recall are improved with a marginal increase vision mainly helps to improve the convergence during training. of parameter number, as shown in Table 3. The loss function Second, we designed a weighted loss function that pays atten- for training the segmentation network also has a large impact tion to the lesion region so that the quality of the generated le- on the segmentation performance. Our weighted cross entropy sion is highlighted. It is combined with a high-level contextual loss function L pays more attention to the lesion region and WCE loss function that encourages global and high-level consistency helps to alleviate the imbalance between the foreground and the between the generated pseudo DWI and the ground truth DWI. background. The hardness-aware generalized Dice loss L HGD CTA DWI o, l DWI o, l, h DWI o, l, h DWI (s) o, l, h F + DWI Real DWI CTA DWI o, l DWI o, l, h DWI o, l, h DWI (s) o, l, h F + DW Real DWI CTA DWI o, l DWI o, l, h DWI o, l, h DWI (s) o, l, h F + DWI Real DWI RVE Dice RVE Dice RVE Dice automatically gives higher weights to harder samples. A combi- normalization and channel calibration trained with hardness- nation of L and L considers pixel-wise and region-level aware generalized Dice loss is proposed for the final segmen- WCE HGD accuracy simultaneously, which leads to better Dice, Recall and tation from synthesized pseudo DWI. Extensive experimental ASSD than the other variants as shown in Table 4. It should be results on ISLES 2018 dataset showed that our method using noticed that the Hausdor distance of our results is still high. To synthesized pseudo DWI outperformed methods using CTA im- address this problem, using Hausdor distance-based loss func- ages or perfusion parameter maps directly for ischemic stroke tions (Kervadec et al., 2019a) or high-level constraints (Oktay lesion segmentation, and demonstrated that our feature extrac- et al., 2018) are potential solutions. tor helps to obtain better synthesized pseudo DWI quality that Our high-level feature extraction, pseudo DWI generation leads to higher segmentation accuracy. The proposed automatic and lesion segmentation modules are trained end-to-end so that segmentation framework has a potential for improving diagno- they are updated simultaneously and adaptive to each other with sis and treatment of the ischemic stroke in a timely fashion, a high coherence. This makes the training process more ef- especially in acute units with limited availability of DWI scan- ficient than training these modules subsequently. Results in ning. Fig. 9 and Table 5 show that the end-to-end training also ben- efits the final segmentation performance. However, a draw- 6. Acknowledgements back of end-to-end training is that these modules become less portable as a change of the segmentation network requires the This work was supported by the National Natural Science whole system to be trained again. Subsequent training would Foundation of China funding [81771921, 61901084]. make the system more modular and is preferred in a scenario where there is a high demand for replacing some of these mod- ules. For example, the segmentation network can be replaced References when more training images become available without retrain- Abdulkadir, A., Lienkamp, S.S., Brox, T., Ronneberger, O., 2016. 3D U-Net: ing the feature extractor and pseudo DWI generator. In this Learning dense volumetric segmentation from sparse annotation, in: MIC- paper, as the training set had a small size and was fixed during CAI, pp. 424–432. the study, we chose the end-to-end training strategy due to its Abulnaga, S.M., Rubin, J., 2018. Ischemic stroke lesion segmentation in CT eciency and better segmentation performance. perfusion scans using pyramid pooling and focal loss, in: Int. MICCAI Brainlesion Work., pp. 352–363. Comparing Table 5 and Table 6, we observe that there is a Alom, M.Z., Hasan, M., Yakopcic, C., Taha, T.M., Asari, V.K., 2018. Recur- performance drop between our local testing set and the ocial rent residual convolutional neural network based on U-Net (R2U-Net) for testing set of ISLES 2018. This indicates some overfitting of medical image segmentation. arXiv Prepr. arXiv1802.06955 . the proposed method. The overfitting could be attributed to a Bahrami, K., Shi, F., Zong, X., Shin, H.W., An, H., Shen, D., 2016. Recon- struction of 7T-Like images from 3T MRI. IEEE Trans. Med. Imaging 35, couple of reasons. First, the training set was relatively small 2085–2097. and each image only contained 5.34 slices in average. Second, Burgos, N., Cardoso, M.J., Thielemans, K., Modat, M., Pedemonte, S., Dick- our method relies on image synthesis as an intermediate step, son, J., Barnes, A., Ahmed, R., Mahoney, C.J., Schott, J.M., Duncan, J.S., and there might be a domain shift between synthesized pseudo Atkinson, D., Arridge, S.R., Hutton, B.F., Ourselin, S., 2014. Attenuation correction synthesis for hybrid PET-MR scanners: application to brain stud- DWI images and real DWI images. The two steps of synthesis ies. IEEE Trans. Med. Imaging 33, 2332–2341. and segmentation are prone to accumulate the prediction error Chartsias, A., Joyce, T., Giu rida, M.V., Tsaftaris, S.A., 2017. Multimodal MR and possibility of overfitting. To deal with this problem, using synthesis via modality-invariant latent representation. IEEE Trans. Med. some advanced data augmentation methods (Abdulkadir et al., Imaging 37, 803 – 814. Cui, W., Liu, Y., Li, Y., Guo, M., Li, Y., Li, X., Wang, T., Zeng, X., Ye, C., 2016; Frid-Adar et al., 2018) and additional regularizations 2019. Semi-supervised brain lesion segmentation with an adapted mean such as auxiliary tasks (Myronenko, 2018) and volume con- teacher model, in: IPMI, pp. 554–565. straints (Kervadec et al., 2019b) could be potential approaches. Dolz, J., Ben Ayed, I., Desrosiers, C., 2018. Dense multi-path U-net for is- chemic stroke lesion segmentation in multiple image modalities, in: Int. Fig. 10 shows that the proposed method did not segment well MICCAI Brainlesion Work., pp. 271–282. on large lesions, which is mainly because the large lesion group Donahue, J., Wintermark, M., 2015. Perfusion CT and acute stroke imaging: contained only few cases (i.e., 4 images for testing), and it was foundations, applications, and literature review. J. Neuroradiol. 42, 21–29. not statistically significant to evaluate the segmentation perfor- Feng, C., Zhao, D., Huang, M., 2015. Segmentation of ischemic stroke lesions in multi-spectral MR images using weighting suppressed FCM and three mance for that group. In the future, a larger dataset could be phase level set, in: Int. Work. Brainlesion Glioma, Mult. Sclerosis, Stroke used for a better evaluation. Trauma. Brain Inj., pp. 233–245. In conclusion, to deal with the problem of ischemic stroke Frangi, A.F., Tsaftaris, S.A., Prince, J.L., 2018. Simulation and synthesis in lesion segmentation from CTP images, we propose a novel medical imaging. IEEE Trans. Med. Imaging 37, 673–679. Frid-Adar, M., Diamant, I., Klang, E., Amitai, M., Goldberger, J., Greenspan, framework using synthesized pseudo DWI images for better H., 2018. GAN-based synthetic medical image augmentation for increased segmentation results. We propose a feature extractor that ob- CNN performance in liver lesion classification. Neurocomputing 321, 321– tains both a low-level and a high-level compact representation of the raw spatiotemporal CTA images, and combine them with Ghosh, A., Kumar, H., Sastry, P.S., 2017. Robust loss functions under label noise for deep neural networks, in: AAAI, pp. 1919–1925. the CTP perfusion parameter maps for better pseudo DWI syn- Gillebert, C.R., Humphreys, G.W., Mantini, D., 2014. Automated delineation thesis quality. We also propose to pay more attention to the le- of stroke lesions using brain CT images. NeuroImage Clin. 4, 540–548. sion region and encourage high-level similarity for synthesis of Glorot, X., Bengio, Y., 2010. Understanding the diculty of training deep pseudo DWI with stroke lesions. A network with switchable feedforward neural networks, in: AISTATS, pp. 249–256. 13 Gonzalez, ´ R.G., Hirsch, J.A., Lev, M.H., Schaefer, P.W., Schwamm, L.H., sumi, T., Fujii, K., Katada, K., Toyama, H., 2018. Preliminary study of 2011. Acute ischemic stroke: imaging and intervention. Springer, Berlin, time maximum intensity projection computed tomography imaging for the Heidelberg. detection of early ischemic change in patient with acute ischemic stroke. Hu, J., Shen, L., Sun, G., 2018. Squeeze-and-excitation networks, in: CVPR, Medicine (Baltimore). 97, e9906. pp. 7132–7141. Myronenko, A., 2018. 3D MRI brain tumor segmentation using autoencoder Isensee, F., Kickingereder, P., Wick, W., Bendszus, M., Maier-Hein, K.H., regularization, in: Int. MICCAI Brainlesion Work., pp. 349–356. 2018. No new-net, in: Int. MICCAI Brainlesion Work., pp. 234–244. Nguyen, H.V., Zhou, K., Vemulapalli, R., 2015. Cross-domain synthesis of Jog, A., Carass, A., Roy, S., Pham, D.L., Prince, J.L., 2017. Random forest medical images using ecient location-sensitive deep network, in: MIC- regression for magnetic resonance image synthesis. Med. Image Anal. 35, CAI, pp. 677–684. 475–488. Nie, D., Trullo, R., Lian, J., Wang, L., Petitjean, C., Ruan, S., Wang, Q., Shen, Kabir, Y., Dojat, M., Scherrer, B., Forbes, F., Garbay, C., 2007. Multimodal D., 2018. Medical image synthesis with deep convolutional adversarial net- MRI segmentation of ischemic stroke lesions, in: EMBS, pp. 1595–1598. works. IEEE Trans. Biomed. Eng. 65, 2720–2730. Kamnitsas, K., Ledig, C., Newcombe, V.F.J., Simpson, J.P., Kane, A.D., Oktay, O., Ferrante, E., Kamnitsas, K., Heinrich, M., Bai, W., Caballero, J., Menon, D.K., Rueckert, D., Glocker, B., 2017. Ecient multi-scale 3D Cook, S., Marvao, A.D., Dawes, T., Regan, D.O., Kainz, B., Glocker, B., CNN with fully connected CRF for accurate brain lesion segmentation. Rueckert, D., 2018. Anatomically constrained neural networks (ACNN): Med. Image Anal. 36, 61–78. Application to cardiac image enhancement and segmentation. IEEE Trans. Ker, J., Wang, L., Rao, J., Lim, T., 2017. Deep learning applications in medical Med. Imaging 37, 384–395. image analysis. IEEE Access 6, 9375 – 9389. Perone, C.S., Ballester, P., Barros, R.C., Cohen-Adad, J., 2019. Unsupervised Kervadec, H., Bouchtiba, J., Desrosiers, C., Granger, E., Dolz, J., Ayed, I.B., domain adaptation for medical imaging segmentation with self-ensembling. 2019a. Boundary loss for highly unbalanced segmentation, in: Int. Conf. Neuroimage 194, 1–11. Med. Imaging with Deep Learn., pp. 285–296. Pinheiro, G.R., Voltoline, R., Bento, M., Rittner, L., 2018. V-Net and U-Net Kervadec, H., Dolz, J., Tang, M., Granger, E., Boykov, Y., Ben Ayed, I., 2019b. for ischemic stroke lesion segmentation in a small dataset of perfusion data, Constrained-CNN losses for weakly supervised segmentation. Med. Image in: Int. MICCAI Brainlesion Work., pp. 301–309. Anal. 54, 88–99. Rekik, I., Allassonniere, ` S., Carpenter, T.K., Wardlaw, J.M., 2012. Medical Kissela, B.M., Khoury, J.C., Alwell, K., Moomaw, C.J., Woo, D., Adeoye, O., image analysis methods in MR/CT-imaged acute-subacute ischemic stroke Flaherty, M.L., Khatri, P., Ferioli, S., De Los Rios La Rosa, F., Broderick, lesion: Segmentation, prediction and insights into dynamic evolution simu- J.P., Kleindorfer, D.O., 2012. Age at stroke: Temporal trends in stroke inci- lation models. A critical appraisal. NeuroImage Clin. 1, 164–178. dence in a large, biracial population. Neurology 79, 1781–1787. Ronneberger, O., Fischer, P., Brox, T., 2015. U-Net: Convolutional networks Li, X., Chen, H., Qi, X., Dou, Q., Fu, C.W., Heng, P.A., 2018. H-DenseUNet: for biomedical image segmentation, in: MICCAI, pp. 234–241. Hybrid densely connected UNet for liver and liver tumor segmentation from Roy, S., Carass, A., Shiee, N., Pham, D.L., Prince, J.L., 2010. MR contrast CT volumes. IEEE Trans. Med. Imaging 37, 2663–2674. synthesis for lesion segmentation, in: ISBI, IEEE. pp. 932–935. Lin, T.Y., Goyal, P., Girshick, R., He, K., Dollar, P., 2017. Focal loss for dense Shen, D., Wu, G., Suk, H.I., 2017. Deep learning in medical image analysis. object detection, in: ICCV, pp. 2980–2988. Annu. Rev. Biomed. Eng. 19, 221–248. Liu, P., 2018. Stroke lesion segmentation with 2D novel CNN pipeline and Song, T., Huang, N., 2018. Integrated extractor, generator and segmentor for novel loss function, in: Int. MICCAI Brainlesion Work., pp. 253–262. ischemic stroke lesion segmentation, in: Int. MICCAI Brainlesion Work., Long, J., Shelhamer, E., Darrell, T., 2015. Fully convolutional networks for pp. 310–318. semantic segmentation, in: CVPR, pp. 3431–3440. Sudre, C.H., Li, W., Vercauteren, T., Ourselin, S., Cardoso, M.J., 2017. Gen- Luo, P., Ren, J., Peng, Z., Zhang, R., Li, J., 2018. Di erentiable learning- eralised Dice overlap as a deep learning loss function for highly unbalanced to-normalize via switchable normalization. arXiv Prepr. arXiv1806.10779 segmentations, in: Deep Learn. Med. Image Anal. Multimodal Learn. Clin. . Decis. Support, pp. 240–248. Maier, O., Menze, B.H., von der Gablentz, J., Hani, ¨ L., Heinrich, M.P., Szegedy, C., Vanhoucke, V., Io e, S., Shlens, J., Wojna, Z., 2016. Rethinking Liebrand, M., Winzeck, S., Basit, A., Bentley, P., Chen, L., Christiaens, D., the Inception Architecture for Computer Vision, in: CVPR, pp. 2818–2826. Dutil, F., Egger, K., Feng, C., Glocker, B., Gotz, ¨ M., Haeck, T., Halme, H.L., Tieleman, T., Hinton, G., 2012. Lecture 6.5-RMSProp, COURSERA: Neural Havaei, M., Iftekharuddin, K.M., Jodoin, P.M., Kamnitsas, K., Kellner, E., networks for machine learning. Technical Report. University of Toronto. Korvenoja, A., Larochelle, H., Ledig, C., Lee, J.H., Maes, F., Mahmood, Q., Ting-Chun Wang, Liu, M.Y., Jun-Yan Zhu, Andrew Tao, Jan Kautz, Bryan Maier-Hein, K.H., McKinley, R., Muschelli, J., Pal, C., Pei, L., Rangarajan, Catanzaro, 2018. High-resolution image synthesis and semantic manipu- J.R., Reza, S.M., Robben, D., Rueckert, D., Salli, E., Suetens, P., Wang, lation with conditional GANs, in: CVPR, pp. 8798–8807. C.W., Wilms, M., Kirschke, J.S., Kramer ¨ , U.M., Munte, ¨ T.F., Schramm, P., Vikas Kumar Anand, Khened, M., Alex, V., Krishnamurthi, G., 2018. Fully Wiest, R., Handels, H., Reyes, M., 2017. ISLES 2015 - A public evaluation automatic segmentation for ischemic stroke using CT perfusion maps, in: benchmark for ischemic stroke lesion segmentation from multispectral MRI. Int. MICCAI Brainlesion Work., pp. 328–334. Med. Image Anal. 35, 250–269. Winzeck, S., Hakim, A., McKinley, R., Pinto, J.A., Alves, V., Silva, C., Pisov, Maier, O., Schroder ¨ , C., Forkert, N.D., Martinetz, T., Handels, H., 2015. Clas- M., Krivov, E., Belyaev, M., Monteiro, M., Oliveira, A., Choi, Y., Paik, sifiers for ischemic stroke lesion segmentation: a comparison study. PLoS M.C., Kwon, Y., Lee, H., Kim, B.J., Won, J.H., Islam, M., Ren, H., Robben, One 10, e0145118. D., Suetens, P., Gong, E., Niu, Y., Xu, J., Pauly, J.M., Lucas, C., Heinrich, Maier, O., Wilms, M., von der Gablentz, J., Kramer ¨ , U., Handels, H., 2014. Is- M.P., Rivera, L.C., Castillo, L.S., Daza, L.A., Beers, A.L., Arbelaezs, P., chemic stroke lesion segmentation in multi-spectral MR images with support Maier, O., Chang, K., Brown, J.M., Kalpathy-Cramer, J., Zaharchuk, G., vector machine classifiers, in: SPIE Med. Imaging 2014 Comput. Diagnosis, Wiest, R., Reyes, M., 2018. ISLES 2016 and 2017-benchmarking ischemic p. 903504. stroke lesion outcome prediction based on multispectral MRI. Front. Neurol. Mao, X., Li, Q., Xie, H., Lau, R.Y.K., Wang, Z., 2017. Least squares generative 9, 679. adversarial networks, in: ICCV, pp. 2794–2802. Xiao, X., Lian, S., Luo, Z., Li, S., 2018. Weighted Res-UNet for high- Mezzapesa, D.M., Petruzzellis, M., Lucivero, V., Prontera, M., Tinelli, A., San- quality retina vessel segmentation, in: Int. Conf. Inf. Technol. Med. Educ., cilio, M., Carella, A., Federico, F., 2006. Multimodal MR examination in Hangzhou. pp. 327–331. acute ischemic stroke. Neuroradiology 48, 238–246. Xu, Z.Q.J., Zhang, Y., Xiao, Y., 2019. Training behavior of deep neural network Milletari, F., Navab, N., Ahmadi, S.A., 2016. V-Net: Fully convolutional neural in frequency domain, in: ICONIP, pp. 264–274. networks for volumetric medical image segmentation, in: IC3DV, pp. 565– Yahiaoui, A.F.Z., Bessaid, A., 2016. Segmentation of ischemic stroke area from 571. CT brain images. ISIVC , 13–17. Mitra, J., Bourgeat, P., Fripp, J., Ghose, S., Rose, S., Salvado, O., Connelly, Zaharchuk, G., El Mogy, I.S., Fischbein, N.J., Albers, G.W., 2012. Comparison A., Campbell, B., Palmer, S., Sharma, G., Christensen, S., Carey, L., 2014. of arterial spin labeling and bolus perfusion-weighted imaging for detecting Lesion segmentation from multimodal MRI using random forest following mismatch in acute stroke. Stroke 43, 1843–1848. ischemic stroke. Neuroimage 98, 324–335. Murayama, K., Suzuki, S., Matsukiyo, R., Takenaka, A., Hayakawa, M., Tsut- http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Computing Research Repository arXiv (Cornell University)

Automatic Ischemic Stroke Lesion Segmentation from Computed Tomography Perfusion Images by Image Synthesis and Attention-Based Deep Neural Networks

Loading next page...
 
/lp/arxiv-cornell-university/automatic-ischemic-stroke-lesion-segmentation-from-computed-tomography-EELmS22BQu

References (60)

ISSN
1361-8415
eISSN
ARCH-3344
DOI
10.1016/j.media.2020.101787
Publisher site
See Article on Publisher Site

Abstract

Ischemic stroke lesion segmentation from Computed Tomography Perfusion (CTP) images is important for accurate diagnosis of stroke in acute care units. However, it is challenged by low image contrast and resolution of the perfusion parameter maps, in addi- tion to the complex appearance of the lesion. To deal with this problem, we propose a novel framework based on synthesized pseudo Di usion-Weighted Imaging (DWI) from perfusion parameter maps to obtain better image quality for more accurate segmentation. Our framework consists of three components based on Convolutional Neural Networks (CNNs) and is trained end-to-end. First, a feature extractor is used to obtain both a low-level and high-level compact representation of the raw spatiotemporal Computed Tomography Angiography (CTA) images. Second, a pseudo DWI generator takes as input the concatenation of CTP perfusion parameter maps and our extracted features to obtain the synthesized pseudo DWI. To achieve better synthesis quality, we propose a hybrid loss function that pays more attention to lesion regions and encourages high-level contextual consistency. Finally, we segment the lesion region from the synthesized pseudo DWI, where the segmentation network is based on switchable normalization and channel calibration for better performance. Experimental results showed that our framework achieved the top performance on ISLES 2018 challenge and: 1) our method using synthesized pseudo DWI outperformed methods segmenting the lesion from per- fusion parameter maps directly; 2) the feature extractor exploiting additional spatiotemporal CTA images led to better synthesized pseudo DWI quality and higher segmentation accuracy; and 3) the proposed loss functions and network structure improved the pseudo DWI synthesis and lesion segmentation performance. The proposed framework has a potential for improving diagnosis and treatment of the ischemic stroke where access to real DWI scanning is limited. Keywords: Ischemic stroke lesion, computed tomography perfusion, image synthesis, segmentation, deep learning 1. Introduction and Di usion-Weighted Imaging (DWI) are preferred imaging modalities for ischemic stroke lesions due to their good soft Stroke is the most common cerebrovascular disease and tissue contrasts. Especially, DWI is considered as the most sen- one of the primary causes of mortality and long-term disabil- sitive method for detection of early acute stroke (Mezzapesa ity worldwide (Kissela et al., 2012). Ischemic stroke is the et al., 2006). However, MR imaging including DWI is relatively most common type of stroke and accounts for 75-85% of all slow and often not accessible for acute stroke patients. Alterna- stroke cases, which is an obstruction of the cerebral blood sup- tively, Computed Tomography Perfusion (CTP) imaging o ers ply and leads to tissue hypoxia (under-perfusion) and tissue insights into cerebral hemodynamics and enables di erentia- death within few hours. The stages of stroke can be classi- tion of salvageable penumbra from irrevocably damaged infarct fied into acute (0 to 24h), sub-acute (24h to 2w) and chronic core (Donahue and Wintermark, 2015). CTP has advantages (>2w) (Gonzalez ´ et al., 2011). Early diagnosis and treatment in in speed and cost, leading to higher availability in acute care the acute stage is critical for recovery of the stroke patient, and units (Gillebert et al., 2014). In CTP imaging, a sequence of medical imaging is important for detection and quantitative as- Computed Tomography Angiography (CTA) images (i.e., spa- sessment of stroke lesions, as well as eligible patient selection tiotemporal 4D images) are acquired during the perfusion pro- for thrombolysis or thrombectomy (Zaharchuk et al., 2012). cess, which results in perfusion parameter maps such as Cere- Among di erent medical imaging methods, Magnetic Res- bral Blood Flow (CBF), Cerebral Blood Volume (CBV), Mean onance Imaging (MRI) sequences such as Fluid-Attenuated Transit Time (MTT), Time to Peak (TTP, or Tmax) to help to Inversion Recovery (FLAIR), T1 weighted, T2 weighted, identify ischemic stroke lesions. Examples of perfusion param- eter maps of two ischemic stroke patients are shown in Fig 1. Corresponding author Segmentation of stroke lesions from medical images can pro- Email address: guotai.wang@uestc.edu.cn (Guotai Wang) Equal Contribution vide quantitative measurements of the lesion region, which is Preprint submitted to Elsevier July 8, 2020 arXiv:2007.03294v1 [eess.IV] 7 Jul 2020 important for quantitative treatment decision procedures. Man- with lesions is still not well addressed (Roy et al., 2010), which ual segmentation of the lesion is time-consuming with low is challenged by the complex variation of pathological lesions inter-rater agreement, and automatic stroke lesion segmentation among patients. Especially, synthesizing pseudo DWI images is more ecient and has a potential to provide more reliable and from CTP images of ischemic stroke lesions has rarely been reproducible segmentation results (Maier et al., 2017). investigated. Considering the limited speed and availability of MRI for This work is a substantial extension of our preliminary con- acute stroke patients, we aim to segment ischemic stroke lesions ference publication (Song and Huang, 2018) that won the MIC- automatically from CTP perfusion parameter maps, which has CAI 2018 ischemic stroke lesion segmentation (ISLES) chal- a potential for improving diagnosis and treatment of ischemic lenge . In this paper, we provide detailed description and in- stroke in a timely fashion. However, this task is very dicult depth discussion of our segmentation framework and validate and the segmentation accuracy is confronted with a lot of chal- it with extensive experiments. The contribution of our work is lenges. First, the appearance of stroke lesions varies consid- summarized as follows. erably at di erent time, even within the same clinical stage of First, we propose a novel elaborated framework for auto- stroke (Gonzalez ´ et al., 2011). Second, the lesions have a large matic ischemic stroke lesion segmentation from CTP images variation of location, shape, size and appearance in the brain, as based on synthesized pseudo DWI. Compared with using only shown in Fig. 1. Some lesions may be aligned with the vascular CTP perfusion parameter maps, our framework additionally ex- supply territories while others may not. The size of some small ploits raw spatiotemporal CTA images for higher pseudo DWI lesions can be only few millimeters, and some large lesions may synthesis quality and lesion segmentation accuracy. Second, cover a complete hemisphere (Maier et al., 2017). The intensity to make use of the raw spatiotemporal CTA images more e- is not homogeneous in the lesion region, and some other stroke- ciently, we propose a feature extractor that obtains more com- similar pathologies may lead to false positives in the segmen- pact and high-level representation of the CTA images automat- tation result. Thirdly, compared with DWI, the perfusion pa- ically, which helps to reduce the required memory and com- rameter maps (CBF, CBV, MTT, and Tmax) are noisy with a putational time and improve the performance of our segmenta- lower spatial resolution, making it dicult to accurately iden- tion method. Thirdly, we propose a novel method to synthesis tify the boundary of stroke lesions, as demonstrated in Fig. 1. In pseudo DWI images with ischemic stroke lesions. We employ a addition, the raw spatiotemporal 4D CTA images contain use- high-level similarity loss function to encourage the pseudo DWI ful information of the ischemic stroke lesion but have a large to be close to the ground truth in terms of both local details and data size. Using the perfusion parameter maps alone without global context, and propose an attention-guided synthesis strat- considering the raw spatiotemporal CTA images may limit the egy so that the generator will focus more on the lesion part, segmentation accuracy, while directly taking raw spatiotempo- which benefits the final segmentation. Last but not least, to seg- ral CTA images for lesion segmentation increases the computa- ment lesions from our synthesized pseudo DWI, we propose tional cost. Therefore, extracting compact and useful features a Convolutional Neural Network (CNN) with channel calibra- from the raw spatiotemporal CTA images is desirable for e- tion and Switchable Normalization (SN) (Luo et al., 2018) that cient and accurate ischemic stroke lesion segmentation. is suitable for small training batch size, and combine it with Although automatic segmentation of ischemic stroke lesion a novel attention-based and hardness-aware loss function that has been widely studied, most of existing methods were pro- helps to obtain more accurate segmentation of ischemic stroke posed to deal with multi-modal MR images (Maier et al., 2017; lesions. Experimental results show that our method achieved Winzeck et al., 2018). Only few works have been reported on state-of-the-art performance on ISLES 2018 challenge and it ischemic stroke lesion segmentation from CTP images (Gille- outperformed direct segmentation from CTP perfusion param- bert et al., 2014; Yahiaoui and Bessaid, 2016; Abulnaga and Ru- eter maps and contemporary image synthesis-based methods bin, 2018). Some old-fashion methods such as template-based for ischemic stroke lesion segmentation from CTP images (Liu, methods (Gillebert et al., 2014) and fuzzy C-Means (Yahiaoui 2018). and Bessaid, 2016) are challenged by the complex appear- ance of stroke lesions. Recently, deep learning methods have 2. Related Works achieved state-of-the-art performance for many medical image segmentation tasks (Shen et al., 2017), and have been applied 2.1. Ischemic Stroke Lesion Segmentation to ischemic stroke lesion segmentation from CTP images (Pin- heiro et al., 2018; Abulnaga and Rubin, 2018; Vikas Kumar Segmentation of ischemic stroke lesion from medical images Anand et al., 2018). However, due to the above mentioned chal- has attracted increasing attentions in recent years (Rekik et al., lenges, it remains dicult to segment the lesions directly from 2012; Maier et al., 2017), and most of them focus on segmen- the perfusion parameter maps. tation from MR images. For example, the ISLES 2015-2017 Inspired by the fact that ischemic stroke lesions in DWI are challenges aimed at ischemic stroke lesion segmentation from easier to identify and segment than those in perfusion param- multi-modal MR images including T1, T1-contrast, FLAIR and eter maps, it is desirable to synthesize pseudo DWI images DWI sequences (Maier et al., 2017; Winzeck et al., 2018). from perfusion parameter maps to help the segmentation task. Though a lot of methods have been proposed for general med- ical image synthesis (Frangi et al., 2018), synthesizing images http://www.isles-challenge.org 2 (a) … … (b) … … CTA (Time 1) CTA (Time 2) CBF CBV MTT Tmax DWI Figure 1: Examples of CTP and DWI images of two patients with ischemic stroke lesions. Column 1-2: CTA images at di erent time points during perfusion. Column 3-6: perfusion parameter maps. Column 7: lesions delineated in DWI images. Note that we aim to segment the lesions from perfusion parameter maps, and DWI is not available at test time in our study. Some early works have used a range of methods for this seg- For example, Burgos et al. (2014) synthesized CT images from mentation task, such as Markov random field model (Kabir MRI through a multi-atlas information propagation scheme. et al., 2007), level set (Feng et al., 2015), random forest (Mitra Bahrami et al. (2016) used dictionary learning to synthesis 7T- et al., 2014) and support vector machine (Maier et al., 2014). like images from 3T MRI. Jog et al. (2017) used regression However, their accuracy is challenged by the complicated seg- random forest to synthesize T2 and FLAIR images from T1 im- mentation problem (Maier et al., 2015). Recently, deep learn- ages. Deep learning methods have also been increasingly used ing has been increasingly used for ischemic stroke lesion seg- for medical image synthesis (Ker et al., 2017), such as deep mentation with better performance. For example, Kamnitsas neural network-based synthesis methods (Nguyen et al., 2015) et al. (2017) proposed a dual pathway 3D CNN combined with and deep adversarial learning-based approaches (Nie et al., fully connected Conditional Random Field (CRF) for brain le- 2018). However, most of existing works deal with general sion segmentation. Cui et al. (2019) proposed an adapted mean cross-modality image synthesis and have not well investigated teacher model to learn from a combination of annotated and the more challenging problem of synthesizing medical images unannotated MR images for the segmentation task. Dolz et al. with pathological lesions. Roy et al. (2010) used an atlas-based (2018) combined DWI and CTP to segment ischemic stroke le- method to synthesize FLAIR images with white matter lesions. sions and used a densely connected UNet with Inception mod- Chartsias et al. (2017) proposed a CNN for synthesizing multi- ules (Szegedy et al., 2016) to handle the variation of lesion size. modal MR images of brain lesions. The e ectiveness of these Despite their good performance, these methods rely on MRI methods for pseudo DWI synthesis from CTP perfusion param- and cannot be directly applied to stroke lesion segmentation eter maps of stroke lesions has rarely been demonstrated. from CTP images. There have been few works on the challenging task of seg- 3. Method mentation of ischemic stroke lesion from CTA or CTP per- fusion parameter maps (Rekik et al., 2012). Some early The proposed framework for ischemic stroke lesion segmen- works used histogram-based classifiers (Rekik et al., 2012) or tation from CTP images is depicted in Fig. 2. Due to the large template-based voxel-wise comparison (Gillebert et al., 2014) inter-slice spacing (9.48 mm in average) of the experimental to deal with this problem. Yahiaoui and Bessaid (2016) used images, the proposed method operates on 2D slices. It con- a multi-scale contrast enhancement algorithm and fuzzy C- sists of a feature extractor, a pseudo DWI generator and a final Means for this task. Recently, Abulnaga and Rubin (2018) lesion segmenter. First, to eciently deal with the large raw used CNNs with pyramid pooling to combine global and lo- spatiotemporal CTA images and reduce the computational re- cal contextual information for this task, where a focal loss was quirements, we design a high-level feature extractor that uses a employed to enable the CNNs to focus more on hard samples. CNN to obtain a compact representation of the raw spatiotem- However, due to the lower signal-to-noise ratio of CTP perfu- poral CTA images. Additionally, we make use of a temporal sion parameter maps compared with DWI, it remains challeng- Maximal Intensity Projection (MIP) of the CTA images as a ing to automatically segment the ischemic stroke lesion from low-level feature. Then, these features are concatenated with CTP images. the perfusion parameter maps to serve as the input of the pseudo 2.2. Cross-Modality Medical Image Synthesis DWI generator, which obtains a pseudo DWI image with better A range of works have investigated the problem of synthesiz- contrast between the lesion and the background. To improve ing medical images from another modality (Frangi et al., 2018). the synthesis quality near lesion regions, we use a high-level 3 K1 n o similarity-based loss function and enable the generator to pay T = max t j K  t < T; H q (t k) = 0 (3) more attention to the lesion. Finally, a segmenter takes the k=0 pseudo DWI image as input and produces a segmentation of the ischemic stroke lesion, where a CNN using channel cali- whereH () is the Heaviside function that obtains 0 for negative bration and switchable normalization trained with an attention- inputs and 1 for positive inputs. q (t) is the first derivative of based and hardness-aware loss function is proposed to improve q(t), and K is a positive integer value which is 5 in this paper. the performance. The three components are trained end-to-end. Therefore, T is defined as the earliest time point where the first Details of these components will be described in the following. derivative of q(t) keeps positive for its following K consecutive time points, and T is defined as the latest time point where the first derivative of q(t) keeps negative for its preceding K consecutive time points. Fig. 3 shows the curve of q(t) with T 3.1. Feature Extraction from Raw Spatiotemporal CTA Images and T in two cases. In CTP imaging, the raw spatiotemporal CTA images have We extract the frames between T and T and obtain a tem- s e been transformed into a simplified feature representation in porally cropped subsequence that corresponds to the perfusion terms of perfusion parameter maps including CBF, CBV, MTT stage of the raw spatiotemporal CTA image. As the duration of and Tmax. Though these parameter maps are useful for detec- the perfusion stage has a variation among di erent subjects, the tion of the stroke lesion, they are not a complete representation temporally cropped subsequence can have di erent time point of the perfusion information in the raw spatiotemporal CTA im- numbers along the temporal axis. To deal with this problem and ages. Therefore, we do not ignore the raw spatiotemporal CTA to reduce the computational cost, we uniformly down-sample images and try to mine some additional features that are useful the temporally cropped subsequence along the temporal axis in the segmentation task. into a fixed time point number of C . The temporally cropped Let I(x; y; z; t) represent a raw spatiotemporal CTA image ob- and down-sampled CTA image is referred to as I , which is tained during the perfusion, where t 2 [0; 1; 2; :::; T 1] and T is used as the input of a CNN for high-level feature extraction. the total number of time points. Considering that the raw spa- Let C  D H W represent the size of I , where D, H and tiotemporal CTA image has a large data size due to a large value W represent the spatial depth, height and width of the input 4D of T , we use a feature extractor to obtain an additional low-level image I respectively. We treat I as a multi-channel 3D volume feature and a compact and high-level representation of the raw and use a 2D CNN for high-level feature extraction from each spatiotemporal CTA image to make an ecient use of it. The slice, as the images have a large inter-slice spacing (9.48 mm feature extraction method is shown in Fig. 2. We extract both in average) in this study. Specifically, we used the UNet (Ron- a manually designed low-level feature and a high-level feature neberger et al., 2015) for the high-level feature extraction due that is automatically learned by a CNN. to its good performance in a range of tasks (Abdulkadir et al., First, the maximal intensity value of a voxel during perfu- 2016; Li et al., 2018; Isensee et al., 2018). The UNet consists of sion may contain information related to the ischemic stroke le- an encoding path and a decoding path. The encoding path uses sion (Murayama et al., 2018). Therefore, in addition to the stan- convolution and down-sampling through max-pooling layers to dard perfusion parameter maps, we apply a Maximal Intensity obtain features at di erent scales with reduced spatial resolu- Projection (MIP) along the temporal axis to I to obtain a low- tion, and the decoding path uses up-sampling (deconvolution) layers to recover the spatial resolutions. We set the output chan- level feature map F : nel of the extractor CNN to 1. Let F denote the CNN’s output and it has a size of 1 D H  W , which is a high-level repre- F = max I(x; y; z; t) (1) sentation of the input spatiotemporal CTA image I . F =  (I ;  ) (4) h e e Second, we use a CNN to extract high-level features of the raw spatiotemporal CTA image due to CNNs’ good per- where  represents the feature extraction network and  de- e e formance in automatic feature extraction (Shen et al., 2017). notes the set of parameters of the network. Though the start and end time points of perfusion do not a ect the MIP image in theory, they are important for the high-level 3.2. Pseudo DWI Synthesis from CTP Images feature extractor, as the CNN is designed to take the frames Inspired by recent works on CNN-based image synthesis during the perfusion as input. To reject frames that are not per- with state-of-the-art performance (Frangi et al., 2018), we fused in the raw spatiotemporal CTA image, we need first to also use CNNs to generate pseudo DWI images, and select detect these two time points. We define a curve of accumulated UNet (Ronneberger et al., 2015) as the backbone network intensity over time as q(t) = I(x; y; z; t). Let T and T x;y;z s e structure due to its good performance. Di erently from pre- denote the estimated start and end time points of the perfusion vious works that synthesized pseudo DWI images only from respectively. They are determined by the following rules: CTP perfusion parameter maps including CBF, CBV, MTT and K1 TMax (Liu, 2018), we additionally take advantage of the ex- n X o T = min t j 0  t < T K; H q (t + k) = K (2) tracted low-level and high-level features (F and F ) so that l h k=0 more information from the raw spatiotemporal CTA image can 4 Feature extraction Loss function Perfusion stage Temporal detection resampling Temporal MIP Weight map 𝐴 Real DWI Weight map 𝐴 Ground truth Spatiotemporal CTA Feature extraction Image synthesis Segmentation loss 𝐿 loss 𝐿 loss 𝐿 Feature concatenation CBF CBV MTT Tmax Φ Φ Low-level High-level Pseudo DWI Segmentation Perfusion parameter maps feature 𝐹 feature 𝐹 Figure 2: Illustration of the proposed framework for ischemic stroke lesion segmentation from CTP images. We extract additional low-level features based on temporal MIP and high-level features based on a CNN from raw spatiotemporal CTA images, and concatenate them with perfusion parameter maps. The concatenated images are used to generate pseudo DWI, from which the lesion is finally segmented.  ,  and  are three CNNs for high-level feature extraction, e g s pseudo DWI generation and lesion segmentation, respectively. 1.5𝑒 Conv + BN + ReLU 1.5𝑒 Adaptive average pooling (1 x 8) 1𝑒 𝑇 𝑇 𝑇 𝑇 Adaptive average pooling (8 x 1) 1𝑒 Reshape and concatenation 5𝑒 5𝑒 Figure 4: Structure of the encoder  to obtain a high-level representation of Time step 𝑡 Time step 𝑡 an input image. The convolution kernels have a size of 33 and a stride of 22. (a) (b) Figure 3: Illustration of start time (T ) and end time (T ) detection of the per- s e where is a weighting parameter for the contextual loss and A fusion stage. is a spatial weight map. jjjj andjjjj are the L2-norm and L1- 2 1 norm respectively. As we follow the common practice of using the Peak Signal-to-Noise Ratio (PSNR) that is related to Mean help to improve the quality of the synthesized pseudo DWI. Let Square Error (MSE) as one of the metrics to evaluate the image F represent the concatenation of CBF, CBV, MTT and TMax. quality, here L2-norm is used for pixel-level loss so that mini- The input of our generator is a concatenation of F , F and F o l h mizing the L2-norm corresponds to maximizing the PSNR. On and thus it has six channels. The generated pseudo DWI can be the other hand, as L1-norm treats each element equally while represented as: L2-norm assigns higher weights (i.e., by squaring) to larger pre- I =  (F ; F ; F ;  ) (5) g g o l h g diction errors that may be caused by outliers, L1-norm has a higher robustness than L2-norm (Ghosh et al., 2017). There- where  represents the pseudo DWI generation network and fore, we use L1-norm for the high-level contextual loss.  is denotes its parameter set. a CNN-based encoder with a parameter set  and it converts Let I represent the DWI ground truth for synthesis. To train I and I into their high-level and compact (i.e., low dimen- g d the generator  so that it can focus on the lesion region and sional) representations, respectively. As ` () operates on in- the output I has a high-level similarity to the ground truth I , g d dividual voxel-wise predictions and does not guarantee global we propose a novel loss function L (I ; I ) that combines a low- g g d and high-level consistency, ` () based on the encoder  helps h c level weighted pixel-wise loss ` (I ; I ) and a high-level contex- l g d to overcome this problem by encouraging closeness between tual loss ` (I ; I ): h g d the lower dimensional non-linear projections of I and I . Our g d encoder  consists of five convolutional layers and two adap- L (I ; I ) = ` (I ; I ) + ` (I ; I ) (6) g g d l g d h g d tive average pooling layers, and its output is a vector of length 16. Details of  are shown in Fig. 4. ` (I ; I ) = jjA (I I )jj (7) l g d g d 2 As our final goal is to segment the ischemic stroke lesion, a good synthesis quality around the lesion region is desirable. ` (I ; I ) = jj (I ;  )  (I ;  )jj (8) h g d c g c c d c 1 Therefore, we use the voxel-wise weight map A to make the 𝑞𝑡 𝑞𝑡 1 x 256 x 256 32 x 128 x 128 64 x 64 x 64 64 x 32 x 32 64 x 16 x 16 64 x 8 x 8 64 x 8 x 1 64 x 1 x 8 64 x 16 for segmentation. We use an SE block after each convolution block in the encoding path of the UNet (Ronneberger et al., 2015). The proposed network is referred to as SLNet, which is Skip connection shown in Fig. 5. (Conv + SN + ReLU) × 2 To deal with the large range of the ischemic stroke lesion size SE block and challenging training samples for the segmentation task, we 1 × 1 Conv propose a novel hybrid loss function to train the segmentation Maxpooling Deconvolution network. Let Y denote the one-hot ground truth label with chan- c c nel number C. We use P and Y to denote the probability of i i Figure 5: The proposed SLNet for ischemic stroke lesion segmentation with voxel i belonging to class c in the prediction output and the Switchable Normalization (SN) and Squeeze-and-Excitation (SE) blocks. ground truth respectively. The proposed loss function is a com- bination of a weighted cross entropy loss function L and a WCE hardness-aware generalized Dice loss function L : generator pay more attention to the lesion region and less at- HGD tention to the background. Let F denote the set of lesion fore- L (P; Y ) = L (P; Y; A) + L (P; Y ) (11) s WCE HGD ground voxels, and Eud(i;F ) denote the shortest Euclidean dis- tance between a voxel i and F . We use A to represent the weight of voxel i in the weight map A: P P N C c c A Y log P i c i i L (P; Y; A) = (12) > WCE P w; if i 2 F N A = (9) i exp(Eud(i;F )=D) 0:5 + ; otherwise exp(Eud(i;F )=D)+1 where w  1 is the weight for foreground voxels and D is a L (P; Y ) = log 1 L (P; Y ) (13) HGD GD positive parameter that controls the sharpness of the weight for background voxels. A decays gradually with the increase of P P C N c c Eud(i;F ), i.e., the weights for voxels that are further from the m Y P c i i i L (P; Y ) = 1 2 (14) P P lesion region are lower. An example of A is shown in Fig. 2. GD C N c c m (Y + P ) c i i i 3.3. SLNet: Stroke Lesion Segmentation Network with Switch- where N is the number of voxels. A is a voxel-wise weight able Normalization and Channel Calibration map, and we use the same one as defined in Eq. 9, which drives Our segmentation network takes the synthesized pseudo the segmentation network to pay more attention to the lesion DWI image I as input and outputs a binary segmentation of region than the background. L is the generalized Dice loss GD the ischemic stroke lesion. Let  represent the segmentation that automatically balances di erent classes by defining a class- N c 2 network and  denote its parameter set. The segmentation net- wise weight m = 1=( Y ) (Sudre et al., 2017). Inspired i i work’s output probability map is formatted as: by the focal loss (Lin et al., 2017) that automatically penalizes hard samples in object detection tasks, we uselog(1 L ) in GD P =  (I ;  ) (10) s g s Eq. 11 that has the same monotonicity as L but gets higher GD gradient values for large L values, so that our segmentation GD where P has C channels and C equals to the class number, loss function is also aware of hard image samples. which is 2 in our binary segmentation task. We select the UNet structure (Ronneberger et al., 2015) as the backbone and extend 3.4. End-to-End Training it in two aspects to obtain a better performance. First, we replace Batch Normalization (BN) layers with The overall pipeline of our feature extractor  , pseudo DWI switchable normalization (Luo et al., 2018) layers, which learn generator  , image context encoder  and the final segmenta- g c to automatically select suitable normalizers for di erent nor- tion network  can be jointly trained in an end-to-end fashion. malization layers of a CNN. Compared with traditional batch The overall loss function for training is therefore defined as: normalization, switchable normalization is more robust to a wide range of batch sizes and more suitable for small batch L = L (P; Y ) + L (I ; I ) + L (F ; I ) (15) s g g d e h d sizes (Luo et al., 2018). In our segmentation task, the large in- put patches and dense feature maps take a lot of memory, which where and are weighting parameters. The segmentation limits the batch size to a small number. Therefore, switchable loss function L (P; Y ) is defined in Eq. 11 and the pseudo DWI normalization is preferred to batch normalization. Second, as synthesis loss function L (I ; I ) is defined in Eq. 6. To ob- g g d di erent channels in a feature map may have di erent impor- tain better synthesized pseudo DWI and lesion segmentation tance, we use a Squeeze-and-Excitation (SE) block (Hu et al., results, we add an extra explicit supervision on F that is the 2018) based on channel attention to calibrate channel-wise fea- output of the feature extractor  . Therefore, we introduce a ture responses. The SE block explicitly models inter-channel loss L (F ; I ) = L (F ; I ) to encourage the similarity between e h d g h d dependencies by learning an attention weight for each channel F and I . The end-to-end training will update  ,  ,  and h d e g c s so that the network relies more on the most important channels simultaneously. 6 4. Experiments and Results negative respectively. 4.1. Data and Implementation HD = max max d(s; G); max d(g; S ) (17) s2S g2G We used the dataset from ISLES challenge 2018 to validate our segmentation framework. The ISLES 2018 dataset includes X X CTP scanning of 103 patients in two centers who presented AS S D = d(s; G) + d(g; S ) (18) jSj +jGj within 8 hours of stroke onset. For the CTP scanning, a contrast s2S g2G agent was administered to the patient and then sequential CTA where S and G denote the set of surface points of a segmen- images were acquired 1-2 seconds apart. Then the perfusion tation result and the ground truth respectively. d(s; G) is the parameter maps CBF, CBV, MTT and Tmax were derived from shortest Euclidean distance between a point s 2 S and all the the raw spatiotemporal CTA images. An MRI DWI scanning points in G. was obtained within 3 hours after the CTP scanning for each patient. The intra-slice pixel spacing ranged from 0.80 mm 4.2. Ablation Studies 0.80 mm to 1.04 mm 1.04 mm, with a slice size of 256 256. The inter-slice spacing ranged from 4.0 mm to 12.0 mm with a We first conducted ablation studies to validate di erent com- mean value of 9.48 mm. The slice number ranged from 2 to 22 ponents of our segmentation framework. Since the ground truth with a mean value of 5.34, and the time point number for CTA segmentations of ISLES testing images were not available for ranged from 43 to 64 with a mean value of 47.18. For high- participants, we split the ocial ISLES training set at patient level feature extraction, all the CTA images were temporally level into our local training, validation and testing sets, which cropped and down-sampled with an output time point number contained images from 65, 6 and 23 scannings respectively. In of C = 6. For preprocessing, intensity values in each DWI vol- e this section, we report the experimental results obtained from ume were scaled to (0, 1) based on the minimal value and the our local testing images. 99-th percentile. Manual delineation of the stroke lesion from DWI images given by an expert was used as the segmentation 4.2.1. Comparison of Di erent Loss Functions for Pseudo DWI ground truth. The training set consisted of 94 scannings of CTP Synthesis and DWI from 63 patients. The testing set consisted of 62 CTP First, we investigated the e ect of di erent loss functions on scannings from 40 patients, for which DWI images were not pseudo DWI synthesis from perfusion parameter maps F , i.e., provided to participants of the challenge. concatenation of CBF, CBV, MTT and Tmax. The proposed Our segmentation framework was implemented by PyTorch loss function Ig (Eq. 6) based on weighted L2 loss and high- with an NVIDIA TITAN X GPU with 12 GB memory. The level contextual loss (Eq. 8) is referred to as w-L2 + ` , which h1 weights of all networks were initialized by Xavier method (Glo- is compared with 1) L1 loss that refers to ` in Eq. 7 being de- rot and Bengio, 2010) and trained with the RMSprop opti- fined as L1 norm with A = 1 for every voxel; 2) L2 loss as mizer (Tieleman and Hinton, 2012), a batch size of 5 and 300 defined in Eq. 7 with A = 1 for every voxel; 3) w-L2 loss epochs. We initialized the learning rate as 0.002 and reduced it that refers to Eq. 7 with weight coecients defined in Eq. 9; by a factor of 0.2 after 180 epochs. The parameter setting was: 4) adversarial training with Generative Adversarial Networks = 1:0, = 1:0, = 1:2, w = 1:5 and D = 50. (GAN), which is referred to as GAN; 5) L2 + GAN that com- To quantitatively evaluate the quality of the generated pseudo bines L2 loss and GAN loss and 6) w-L2 + ` that refers to h2 DWI images, we measured the Structure Similarity (SSIM) and a variant of the proposed I with ` based on L2 norm. For g h Peak Signal-to-Noise Ratio (PSNR) between the DWI ground the GAN method, we used the LSGAN framework proposed by truth and the generated pseudo DWI. These two metrics were Mao et al. (2017), and used a multi-scale discriminator (Ting- calculated both globally (i.e., in the entire image region) and Chun Wang et al., 2018) to guide the generator (i.e. UNet) to locally (i.e., in the region around the ground truth lesion). The produce realistic local details and global appearance. local SSIM and PSNR are helpful for the assessment of our Fig. 6 shows a visual comparison of pseudo DWI generated method’s ability to generate high-quality lesion regions in a by UNet trained with di erent loss functions, where the input pseudo DWI image. images were perfusion parameter maps (F ) for these variants. For quantitative evaluations of the segmentation accuracy, we The synthesized pseudo DWI images are shown in the second use Precision, Recall, Dice score, Hausdor Distance (HD) and row. It can be observed that L1 and L2 obtained similar re- Average Symmetric Surface Distance (ASSD). sults with ambiguous lesion boundary. The use of w-L2 and w- L2 + ` losses helps to obtain clearer lesion boundary respec- h1 2 T P tively. The results of GAN and L2 + GAN are less smoothed, Dice = (16) 2 T P + FN + FP but include some large artifacts as highlighted by the light blue arrows. We additionally investigated the e ect of the synthe- where T P, FP and FN are true positive, false positive and false sized pseudo DWI images on segmentation, where we used the standard cross entropy loss to train a segmentation model 3 (i.e., UNet (Ronneberger et al., 2015)) with each type of these http://www.isles-challenge.org https://pytorch.org synthesized pseudo DWI images respectively. The last row in 7 CBF CBV MTT Tmax Real DWI and segmentation from Real DWI L1 L2 GAN L2 + GAN w-L2 w-L2 + ℓ w-L2 + ℓ Figure 6: Visual comparison of pseudo DWI synthesis result (the second row) obtained by di erent loss functions and their e ect on segmentation (the third row). First row: Concatenation of perfusion parameter maps (F ) was used as the input of UNet for synthesis. w-L2: Weighted L2 loss defined in Eq. 7. w-L2 + ` : o h1 Proposed hybrid loss based on Eq. 6 and Eq. 8. Light blue arrows highlight artifacts obtained by GAN-based methods, and red arrow highlight the segmentation di erences. Green and yellow curves show segmentation result and the ground truth, respectively. Table 1: Quantitative evaluation of di erent training loss functions for pseudo DWI synthesis and their e ect on segmentation. Concatenation of the CTP perfusion parameter maps (F ) was used as the input for synthesis. Loss Global SSIM Local SSIM Global PSNR Local PSNR Dice (%) L1 0.820.11 0.47 0.16 19.364.11 13.604.25 49.4521.20 L2 0.830.11 0.510.17 19.413.63 13.824.18 50.0419.38 GAN 0.810.11 0.370.17 17.574.27 13.454.75 41.5325.08 L2 + GAN 0.780.12 0.520.14 18.303.91 13.224.52 48.7720.73 w-L2 0.830.09 0.530.15 19.433.42 13.994.01 50.9521.03 w-L2 + ` 0.830.10 0.540.13 19.263.32 13.203.38 51.2517.43 h1 w-L2 + ` 0.830.11 0.530.15 19.223.40 13.803.41 51.0221.41 h2 Fig. 6 shows that the segmentation based on synthesized pseudo additionally investigate how these synthesized results a ect the DWI images obtained by w-L2 + ` is more accurate than the segmentation, we used the standard cross entropy loss to train a h1 others, as highlighted by the red arrows. For quantitative eval- UNet (Ronneberger et al., 2015) using each type of these syn- uation, the global and local SSIM and PSNR measurements of thesized pseudo DWI images respectively. Fig. 7 shows a visual results obtained by di erent synthesis loss functions and Dice comparison of pseudo DWI synthesized from di erent input scores of their corresponding segmentation results are presented images. It can be observed that using additional F and F helps l h in Table. 1, which shows that the proposed w-L2 + ` loss to improve local details of the synthesized pseudo DWI, and the h1 function obtains higher local SSIM and Dice than the others. result obtained by concatenation of F , F and F with explicit o l h supervision lead to better image quality than the other variants, as highlighted by the green arrows. Table 2 presents a quantita- 4.2.2. E ect of Feature Extractor on Pseudo DWI Synthesis tive comparison between these di erent inputs for pseudo DWI To investigate the e ect of our feature extractor on the syn- synthesis and the downstream segmentation, which shows that thesized pseudo DWI, we compared the quality of pseudo DWI using additional low-level feature F leads to an improvement images generated from di erent inputs: 1) the standard CTP of global and local SSIM and PSNR from using CTP perfusion perfusion parameter maps (F ) only, i.e., without using our fea- parameter maps F only. The high-level feature F extracted o h ture extractor; 2) concatenation of F and our extracted low- by CNN and explicit supervision by L can further lead to im- level feature F defined in Eq. 1; 3) concatenation of F , F and l o l proved SSIM and PSNR values, which demonstrates that the F , where F denotes the high-level feature obtained by the h h proposed feature extractor making use of the raw spatiotempo- CNN-based feature extractor  trained without explicit super- ral CTA images helps to obtain better synthesized pseudo DWI vision, i.e., L is not used; and 4) concatenation of F , F and e o l images. Fig. 7 and Table 2 also show that synthesis based on F , where F is the high-level feature obtained by  trained h h e F , F and F leads to higher segmentation accuracy than the with explicit supervision through L . We used the proposed loss o l h other variants. function I (i.e., w-L2 + ` ) to train the synthesis network. To g h1 Table 2: Quantitative evaluation of di erent inputs for pseudo DWI synthesis and their e ect on segmentation. F : Concatenation of perfusion parameter maps. F : MIP of spatiotemporal CTA images. F and F are the high-level features obtained by the CNN-based feature extractor  trained without and with explicit l h e supervision through L , respectively. The proposed hybrid loss function L was used for training. e g Input Global SSIM Local SSIM Global PSNR Local PSNR Dice (%) F 0.830.10 0.540.13 19.263.32 13.203.38 51.2517.43 F , F 0.840.12 0.560.15 20.013.69 13.904.04 53.9414.39 o l F , F , F 0.840.11 0.580.16 20.163.97 14.054.04 54.6120.19 o l F , F , F 0.850.12 0.590.12 20.023.51 14.113.98 55.1016.20 o l h Real DWI 72.1719.54 Table 3: Quantitative evaluation of di erent networks for ischemic stroke lesion segmentation. SLNet: The proposed network for ischemic stroke lesion segmenta- tion. Concatenation of the CTP perfusion parameter maps (F ) was used as the input, and the cross entropy loss function was used for training. Network Parameter (M) Precision (%) Recall (%) Dice (%) HD (mm) ASSD (mm) FCN (Long et al., 2015) 18.64 52.6925.28 53.1033.47 45.7524.59 48.7530.83 3.785.37 UNet (Ronneberger et al., 2015) 31.04 63.5022.22 48.5024.55 49.9419.51 21.5213.84 2.543.34 R2UNet (Alom et al., 2018) 39.09 51.8517.99 60.0521.14 52.3416.62 26.7415.04 2.562.21 ResUNet (Xiao et al., 2018) 81.91 66.4619.32 52.3624.12 52.4918.66 23.5015.48 2.222.04 SLNet 33.84 51.2022.00 64.2023.99 54.4521.23 23.9717.61 2.743.44 SLNet (w/o SE) 31.04 69.0022.90 51.5419.81 53.4114.31 21.2612.32 2.161.90 SLNet (w/o SN) 33.84 68.1421.08 48.3223.31 52.0120.69 21.5015.63 2.572.78 di erence between di erent networks is relatively small. In the second row, SLNet w/o SE, SLNet w/o SN and UNet obtained more under-segmentations than SLNet, and FCN, R2UNet and ResUNet obtained more over-segmentations than SLNet. Quantitative comparison between these di erent networks is shown in Table 3. The proposed SLNet achieved the highest av- erage Dice score and Recall among all the compared networks, while SLNet w/o SE achieved slightly better HD and ASSD evaluation results. 𝐹 𝐹 ,𝐹 𝐹 ,𝐹 ,𝐹 𝐹 ,𝐹 ,𝐹 Real DWI 4.2.4. Comparison of Di erent Training Loss Functions for Figure 7: Visual comparison of pseudo DWI synthesized (top row) from di er- Segmentation ent input images and their e ect on segmentation (bottom row). F : Concatena- We also investigate the e ect of di erent training loss func- tion of perfusion parameter maps. F : MIP of spatiotemporal CTA images. F and F are the high-level features obtained by the CNN-based feature extractor tions for the segmentation network. We refer to our proposed trained without and with explicit supervision through L , respectively. The e e weighted cross entropy loss with hardness-aware generalized proposed hybrid loss function I defined in Eq. 6 was used for training. Green Dice loss as L + L and compare it with 1) cross entropy WCE HGD arrows highlight local di erences of the pseudo DWI, and red arrows highlight loss L , 2) Dice loss L (Milletari et al., 2016), 3) gener- CE DICE the segmentation di erence. Green and yellow curves show segmentation result alized Dice loss L (Sudre et al., 2017), 4) hardness-weighted and the ground truth, respectively. GD L , which is defined in Eq. 13 and referred to as L , and 5) GD HGD a variant of the proposed loss that does not pay attention to le- 4.2.3. Comparison of Di erent Network for Segmentation sion foreground (i.e., A is 1 for every voxel), which is referred To investigate the e ect of network structure on our ischemic to as L + L . We used these loss functions to train our CE HGD stroke lesion segmentation task, we compared our proposed SLNet to segment the ischemic stroke lesion from CTP perfu- SLNet with 1) SLNet w/o SE, where the SE blocks are not sion parameter maps F respectively. used in SLNet, 2) SLNet w/o SN, where the switchable nor- Quantitative evaluation results of these di erent segmenta- malization layers are replaced with traditional batch normal- tion loss functions are listed in Table 4. It can be observed that ization layers in SLNet, 3) the Fully Convolutional Network the combination of L and L outperforms using a single CE HGD (FCN) (Long et al., 2015), 4) UNet (Ronneberger et al., 2015), loss of L or L . By enabling the network to focus more CE HGD 5) Recurrent Residual UNet (R2UNet) (Alom et al., 2018), and on the lesion region through L + L , the values of Recall WCE HGD 6) Residual UNet (ResUNet) (Xiao et al., 2018). We trained and Dice are improved. Our proposed L + L achieved WCE HGD these networks with CTP perfusion parameter maps F as input the highest average Dice score of 59.37%, which is a large im- and used the cross entropy loss function for training. provement from 54.45% achieved by the baseline of L . CE Fig. 8 shows a visual comparison of segmentation results ob- 4.2.5. E ect of Feature Extractor and Pseudo DWI Generator tained by these networks, where the lesions are shown with the on Segmentation corresponding real DWI images for better visualization. It can be observed that it is challenging for all these networks to ob- With our proposed feature extraction and image synthesis tain very accurate segmentation of the ischemic stroke lesion. method, we evaluate the value of our pseudo DWI generated However, the results of our SLNet have a better overlap with from F , F and F for ischemic stroke lesion segmentation, o l h o;l;h the ground truth compared with the others. In the first row, the where the pseudo DWI is referred to as DWI . We compared 9 Segmentation Ground truth SLNet SLNet w/o SE SLNet w/o SN FCN UNet R2UNet ResUNet Figure 8: Visual comparison of di erent networks for ischemic stroke lesion segmentation. Concatenation of the CTP perfusion parameter maps (F ) was used as the input of CNNs and cross entropy loss function was used for training. For better visualization, the segmentation results are shown with the real DWI images. Table 4: Quantitative evaluation of di erent training loss functions for ischemic stroke lesion segmentation based on our proposed SLNet. Concatenation of perfusion parameter maps (F ) was used as the input. L : Cross entropy loss. L : Weighted cross entropy loss. L : Dice loss. L : Generalized Dice loss. o CE WCE DICE GD L : Hardness-aware generalized Dice loss. HGD Loss function Precision (%) Recall (%) Dice (%) HD (mm) ASSD (mm) L 51.2022.00 64.2023.99 54.4521.23 23.9717.61 2.743.44 CE L 67.4825.25 45.4521.81 51.5720.98 24.4318.48 2.472.56 DICE L 55.0722.10 62.1317.09 54.9817.16 36.8729.61 3.453.37 GD L 52.4020.71 66.8318.98 55.3017.78 30.8620.10 2.812.61 HGD L + L 57.3122.87 66.3717.85 57.8217.33 21.5812.69 1.961.84 CE HGD L + L 55.2020.99 73.5117.48 59.3715.73 22.2913.67 1.902.05 WCE HGD o;l;h segmentation from DWI with segmentation from 1) raw sponding Hausdor Distance values are 25.25 mm, 22.29 mm, CTA images that were temporally cropped and down-sampled 19.27 mm and 15.90 mm, respectively. We found that adding o;h;l (i.e., I as described in Section 3.1), 2) CTP perfusion parame- F to DWI leads to a reduced segmentation performance o o;h;l ter maps F , 3) DWI that refers to pseudo DWI generated from compared with using DWI only. This is due to that using F o o o;l o;h;l F , 4) DWI that refers to pseudo DWI generated from F and performs worse than using DWI , and a combination of them o o o;l;h F , and 5) concatenation of F and DWI . We used these dif- just obtains a segmentation accuracy above that of using F and l o o o;h;l ferent setting of synthesized pseudo DWI images for end-to-end below that of using DWI . It can be observed from Table 5 o;l;h o;l;h training respectively, where the overall loss function in Eq. 15 that DWI and DWI (s) obtained very close segmentation o;l;h combined with our SLNet was used for segmentation. We also accuracy in terms of Dice. However, DWI achieved smaller o;l;h o;l;h o;l;h compared DWI with its variant DWI (s) that refers to our HD and ASSD values than DWI (s). ,  and  were trained subsequently rather than end-to- e g s end. Additionally, we trained SLNet with real DWI images to investigate the gap between segmentation from synthesized As the ischemic stroke lesions vary largely in sizes, we inves- pseudo DWI images and from real DWI images. tigated the segmentation performance at di erent lesion scales. Fig. 9 presents a visual comparison between ischemic stroke We divided the local testing set into three groups: 1) 9 images lesion segmentation results from di erent input images, which with small lesions (< 10 CC), 2) 10 images with medium le- shows that the results segmented from our synthesized pseudo sions (10 - 50 CC) and 3) 4 images with large lesions (> 50 CC). DWI images are better than those of other variants. Table 5 For evaluation, we additionally measured the Relative Volume presents the quantitative evaluation results. It shows that us- Error (RVE): RVE = abs(V V )=V , where V and V are g s g g s ing DWI generated from CTP perfusion parameter maps leads the volume of a ground truth lesion and the segmented lesion, o;l;h to a slightly decreased segmentation accuracy. By using ad- respectively. Table 5 shows that DWI obtained a lower aver- ditional features F and F extracted from the raw spatiotem- age RVE value than the others except for the real DWI. Fig. 10 l h o;l o;l;h poral CTA images for synthesis, DWI and DWI lead to shows the distributions of Dice and RVE in these three groups. an improvement of Dice score respectively. Table 5 shows The average Dice values achieved by our proposed method o;l;h o;l;h that using DWI outperformed the other variants. The av- (i.e., DWI ) for these three groups were 59.50%, 68.87% and erage Dice scores for segmentation from original CTA images, 56.44% respectively. The lower performance in the small and perfusion parameter maps (i.e., F ), synthesized pseudo DWI large groups indicate that it remains dicult for the proposed o;l;h based on our proposed method (i.e., DWI ) and real DWI are method to deal with extreme cases with small and very large 56.10%, 59.37%, 62.11% and 79.72%, respectively. The corre- lesions. , ,, ,, ,, CTA 𝐹 𝐷𝑊𝐼 𝐷𝑊𝐼 𝐷𝑊𝐼 𝐷𝑊𝐼 (s) 𝐹 𝐷𝐼𝑊 Real DWI Figure 9: Visual comparison of ischemic stroke lesion segmentation from di erent input images. Yellow and green curves show segmentation and the ground truth, o o;l o;l;h respectively. F : CTP perfusion parameter maps. DWI , DWI and DWI are pseudo DWI images generated from F , (F , F ), and (F , F , F ) respectively. o o o l o l h o;l;h o;l;h DWI (s) is a variant of DWI where our  ,  and  were trained subsequently rather than end-to-end. For better visualization, the segmentation results are e g s shown with the real DWI images. o;l;h Table 5: Quantitative comparison of ischemic stroke lesion segmentation from di erent input images. F : perfusion parameter maps. DWI is our proposed o;l;h o;l;h method with pseudo DWI synthesized from (F , F , F ) as shown in Fig. 2. DWI (s) is a variant of DWI where our  ,  and  were trained subsequently o l h e g s rather than end-to-end. The results are based on our proposed SLNet and loss function L defined in Eq. 11. Input Precision (%) Recall (%) Dice (%) HD (mm) ASSD (mm) RVE CTA 65.6323.08 58.1018.63 56.1014.22 25.2516.60 2.452.26 0.730.74 F 55.2020.99 73.5117.48 59.3715.73 22.2913.67 1.902.05 0.831.27 DWI 53.6821.27 74.3716.11 58.3215.74 19.8313.10 1.902.09 0.991.63 o;l DWI 58.7022.32 71.0414.75 60.4916.26 22.7916.90 1.992.07 0.831.34 o;l;h DWI 61.9721.98 69.5217.89 62.1117.18 19.2713.17 1.762.10 0.681.36 o;l;h DWI (s) 57.0520.76 77.8013.56 62.2315.47 20.8615.05 1.841.97 0.911.60 o;l;h F + DWI 59.0622.25 71.3015.46 60.5417.17 22.4219.58 2.072.95 0.831.33 Real DWI 85.0719.35 77.3415.03 79.7215.53 15.9014.13 1.352.77 0.240.24 Table 6: Quantitative comparison of the top five methods for ISLES 2018 test- used a CNN to generate pseudo DWI for segmentation, but ing set. only from CTP perfusion parameter maps with GAN, and the achieved Dice and Recall are lower than ours. The other three Method Dice Precision Recall methods segmented the ischemic stroke lesion from CTP per- Ours 0.51  0.31 0.55  0.36 0.55  0.34 fusion parameter maps directly. Chen et al. used an ensemble Liu (2018) 0.49  0.31 0.56  0.37 0.53  0.33 Chen et al. 0.48  0.32 0.59  0.38 0.46  0.33 of multiple networks combined with several data augmentation Hu et al. 0.47  0.31 0.56  0.37 0.47  0.33 methods. Hu et al. proposed a multi-level 3D refinement mod- Garcia et al. 0.47  0.31 0.56  0.37 0.47  0.33 ule trained with curriculum learning. Clerigues et al. also used an ensemble of multiple networks, and employed a patch sam- pling strategy to alleviate class imbalance. 4.3. Comparison with Other ISLES Participants We also trained our proposed method with the entire ISLES 5. Discussion and Conclusion 2018 training set, and submitted the segmentation results of ISLES 2018 testing set to the online evaluation platform for Due to the low contrast and low resolution of CTP perfusion quantitative evaluation. According to the ISLES 2018 leader- parameter maps, it is challenging to directly use these images board , our method achieved the top performance among 62 for ischemic stroke lesion segmentation. Transferring the perfu- teams. Table 6 lists the quantitative evaluation results of the top sion parameter maps to pseudo DWI images via image synthe- five methods for ISLES 2018, where our method outperformed sis is a promising way for the segmentation task, as DWI images the others with an average Dice score of 0.51. Liu (2018) also http://www.isles-challenge.org/articles/Yu_Chen.pdf 5 8 https://www.smir.ch/ISLES/Start2018 http://www.isles-challenge.org/articles/Xiaojun_Hu.pdf 6 9 Listed in the ’Results’ section of http://www.isles-challenge.org http://www.isles-challenge.org/articles/albert.pdf 11 1.0 1.0 0.8 0.8 0.8 0.6 0.6 0.6 0.4 0.4 0.2 0.4 0.2 1.0 7.5 3.0 5.0 2.0 0.5 2.5 1.0 0.0 0.0 0.0 (a) Small lesions (< 10 CC) (b) Medium lesions (10 to 50 CC) (c) Large lesions (> 50 CC) o;l;h Figure 10: Dice and RVE for lesions at three scales segmented from di erent types of images. F : perfusion parameter maps. DWI is our proposed method with o;l;h o;l;h pseudo DWI synthesized from (F , F , F ) as shown in Fig. 2. DWI (s) is a variant of DWI where our  ,  and  were trained subsequently rather than o l h e g s end-to-end. The results are based on our proposed SLNet and loss function L . have a better contrast between the lesion and the background Results in Table 1 show that this leads to an improvement of lo- and they are used for obtaining the ground truth ischemic stroke cal SSIM around the lesion region. However, we found that our lesion region. The ISLES 2018 finalist and our experiments synthesized pseudo DWI images are still not as good as the real showed that pseudo DWI-based segmentation methods outper- DWI images. For example, Table 1 and Table 2 indicate that the formed direct segmentation from perfusion parameter maps. PSNR numbers are not very high. This is mainly due to that the high-frequency components in the real DWI images are not well The quality of the synthesized pseudo DWI images has a synthesized, as shown in Fig. 6 and Fig. 7. The high-frequency large impact on the segmentation performance. A good contrast components are related to local fine-grained details, noises and with enhanced and preserved lesion information in the pseudo some artifacts. As demonstrated by Xu et al. (2019), CNNs DWI is important for good segmentation results. Though deep capture low-frequency components at the early stage of train- learning for image synthesis has achieved very good perfor- ing, and then capture high-frequency components and tend to mance in other tasks (Frangi et al., 2018), the synthesis of overfit at the late stage of training. During the training with our pseudo DWI with ischemic stroke lesion in this study is still relatively small dataset, we used the best performing checkpoint challenging due to the low quality of perfusion parameter maps on the validation set for testing to minimize the risk of under- and a small number of training images. To alleviate this prob- fitting or over-fitting. As an incidental e ect, we found that the lem, we used two strategies. First, we exploited information synthesized pseudo DWI images related to that checkpoint did in the raw spatiotemporal CTA images by extracting low-level not have many high-frequency components. It is of interest to and high-level features in additional to the perfusion parameter further improve the pseudo DWI quality, which has a promising maps. Results show that this helps to obtain higher pseudo DWI to obtain better segmentation results. As the synthesized pseudo quality and higher segmentation accuracy than using perfusion DWI and real DWI can be regarded as coming from two di er- parameter maps only, as demonstrated in Table 2 and Table 5. ent domains, some domain adaptation methods (Perone et al., From Fig. 7 and Table 2, we find that using an explicit supervi- 2019) can be used in the future to obtain better segmentation sion on the feature extractor leads to some improvement of seg- performance with pseudo DWI. mentation accuracy, but the di erence was not significant. This phenomenon is expected as the explicit supervision serves as a For segmentation networks, by using switchable normaliza- deep supervision. When it is not used, the feature extractor can tion and SE block based on channel attention, the segmenta- also be updated based on the loss function, and the deep super- tion Dice and Recall are improved with a marginal increase vision mainly helps to improve the convergence during training. of parameter number, as shown in Table 3. The loss function Second, we designed a weighted loss function that pays atten- for training the segmentation network also has a large impact tion to the lesion region so that the quality of the generated le- on the segmentation performance. Our weighted cross entropy sion is highlighted. It is combined with a high-level contextual loss function L pays more attention to the lesion region and WCE loss function that encourages global and high-level consistency helps to alleviate the imbalance between the foreground and the between the generated pseudo DWI and the ground truth DWI. background. The hardness-aware generalized Dice loss L HGD CTA DWI o, l DWI o, l, h DWI o, l, h DWI (s) o, l, h F + DWI Real DWI CTA DWI o, l DWI o, l, h DWI o, l, h DWI (s) o, l, h F + DW Real DWI CTA DWI o, l DWI o, l, h DWI o, l, h DWI (s) o, l, h F + DWI Real DWI RVE Dice RVE Dice RVE Dice automatically gives higher weights to harder samples. A combi- normalization and channel calibration trained with hardness- nation of L and L considers pixel-wise and region-level aware generalized Dice loss is proposed for the final segmen- WCE HGD accuracy simultaneously, which leads to better Dice, Recall and tation from synthesized pseudo DWI. Extensive experimental ASSD than the other variants as shown in Table 4. It should be results on ISLES 2018 dataset showed that our method using noticed that the Hausdor distance of our results is still high. To synthesized pseudo DWI outperformed methods using CTA im- address this problem, using Hausdor distance-based loss func- ages or perfusion parameter maps directly for ischemic stroke tions (Kervadec et al., 2019a) or high-level constraints (Oktay lesion segmentation, and demonstrated that our feature extrac- et al., 2018) are potential solutions. tor helps to obtain better synthesized pseudo DWI quality that Our high-level feature extraction, pseudo DWI generation leads to higher segmentation accuracy. The proposed automatic and lesion segmentation modules are trained end-to-end so that segmentation framework has a potential for improving diagno- they are updated simultaneously and adaptive to each other with sis and treatment of the ischemic stroke in a timely fashion, a high coherence. This makes the training process more ef- especially in acute units with limited availability of DWI scan- ficient than training these modules subsequently. Results in ning. Fig. 9 and Table 5 show that the end-to-end training also ben- efits the final segmentation performance. However, a draw- 6. Acknowledgements back of end-to-end training is that these modules become less portable as a change of the segmentation network requires the This work was supported by the National Natural Science whole system to be trained again. Subsequent training would Foundation of China funding [81771921, 61901084]. make the system more modular and is preferred in a scenario where there is a high demand for replacing some of these mod- ules. For example, the segmentation network can be replaced References when more training images become available without retrain- Abdulkadir, A., Lienkamp, S.S., Brox, T., Ronneberger, O., 2016. 3D U-Net: ing the feature extractor and pseudo DWI generator. In this Learning dense volumetric segmentation from sparse annotation, in: MIC- paper, as the training set had a small size and was fixed during CAI, pp. 424–432. the study, we chose the end-to-end training strategy due to its Abulnaga, S.M., Rubin, J., 2018. Ischemic stroke lesion segmentation in CT eciency and better segmentation performance. perfusion scans using pyramid pooling and focal loss, in: Int. MICCAI Brainlesion Work., pp. 352–363. Comparing Table 5 and Table 6, we observe that there is a Alom, M.Z., Hasan, M., Yakopcic, C., Taha, T.M., Asari, V.K., 2018. Recur- performance drop between our local testing set and the ocial rent residual convolutional neural network based on U-Net (R2U-Net) for testing set of ISLES 2018. This indicates some overfitting of medical image segmentation. arXiv Prepr. arXiv1802.06955 . the proposed method. The overfitting could be attributed to a Bahrami, K., Shi, F., Zong, X., Shin, H.W., An, H., Shen, D., 2016. Recon- struction of 7T-Like images from 3T MRI. IEEE Trans. Med. Imaging 35, couple of reasons. First, the training set was relatively small 2085–2097. and each image only contained 5.34 slices in average. Second, Burgos, N., Cardoso, M.J., Thielemans, K., Modat, M., Pedemonte, S., Dick- our method relies on image synthesis as an intermediate step, son, J., Barnes, A., Ahmed, R., Mahoney, C.J., Schott, J.M., Duncan, J.S., and there might be a domain shift between synthesized pseudo Atkinson, D., Arridge, S.R., Hutton, B.F., Ourselin, S., 2014. Attenuation correction synthesis for hybrid PET-MR scanners: application to brain stud- DWI images and real DWI images. The two steps of synthesis ies. IEEE Trans. Med. Imaging 33, 2332–2341. and segmentation are prone to accumulate the prediction error Chartsias, A., Joyce, T., Giu rida, M.V., Tsaftaris, S.A., 2017. Multimodal MR and possibility of overfitting. To deal with this problem, using synthesis via modality-invariant latent representation. IEEE Trans. Med. some advanced data augmentation methods (Abdulkadir et al., Imaging 37, 803 – 814. Cui, W., Liu, Y., Li, Y., Guo, M., Li, Y., Li, X., Wang, T., Zeng, X., Ye, C., 2016; Frid-Adar et al., 2018) and additional regularizations 2019. Semi-supervised brain lesion segmentation with an adapted mean such as auxiliary tasks (Myronenko, 2018) and volume con- teacher model, in: IPMI, pp. 554–565. straints (Kervadec et al., 2019b) could be potential approaches. Dolz, J., Ben Ayed, I., Desrosiers, C., 2018. Dense multi-path U-net for is- chemic stroke lesion segmentation in multiple image modalities, in: Int. Fig. 10 shows that the proposed method did not segment well MICCAI Brainlesion Work., pp. 271–282. on large lesions, which is mainly because the large lesion group Donahue, J., Wintermark, M., 2015. Perfusion CT and acute stroke imaging: contained only few cases (i.e., 4 images for testing), and it was foundations, applications, and literature review. J. Neuroradiol. 42, 21–29. not statistically significant to evaluate the segmentation perfor- Feng, C., Zhao, D., Huang, M., 2015. Segmentation of ischemic stroke lesions in multi-spectral MR images using weighting suppressed FCM and three mance for that group. In the future, a larger dataset could be phase level set, in: Int. Work. Brainlesion Glioma, Mult. Sclerosis, Stroke used for a better evaluation. Trauma. Brain Inj., pp. 233–245. In conclusion, to deal with the problem of ischemic stroke Frangi, A.F., Tsaftaris, S.A., Prince, J.L., 2018. Simulation and synthesis in lesion segmentation from CTP images, we propose a novel medical imaging. IEEE Trans. Med. Imaging 37, 673–679. Frid-Adar, M., Diamant, I., Klang, E., Amitai, M., Goldberger, J., Greenspan, framework using synthesized pseudo DWI images for better H., 2018. GAN-based synthetic medical image augmentation for increased segmentation results. We propose a feature extractor that ob- CNN performance in liver lesion classification. Neurocomputing 321, 321– tains both a low-level and a high-level compact representation of the raw spatiotemporal CTA images, and combine them with Ghosh, A., Kumar, H., Sastry, P.S., 2017. Robust loss functions under label noise for deep neural networks, in: AAAI, pp. 1919–1925. the CTP perfusion parameter maps for better pseudo DWI syn- Gillebert, C.R., Humphreys, G.W., Mantini, D., 2014. Automated delineation thesis quality. We also propose to pay more attention to the le- of stroke lesions using brain CT images. NeuroImage Clin. 4, 540–548. sion region and encourage high-level similarity for synthesis of Glorot, X., Bengio, Y., 2010. Understanding the diculty of training deep pseudo DWI with stroke lesions. A network with switchable feedforward neural networks, in: AISTATS, pp. 249–256. 13 Gonzalez, ´ R.G., Hirsch, J.A., Lev, M.H., Schaefer, P.W., Schwamm, L.H., sumi, T., Fujii, K., Katada, K., Toyama, H., 2018. Preliminary study of 2011. Acute ischemic stroke: imaging and intervention. Springer, Berlin, time maximum intensity projection computed tomography imaging for the Heidelberg. detection of early ischemic change in patient with acute ischemic stroke. Hu, J., Shen, L., Sun, G., 2018. Squeeze-and-excitation networks, in: CVPR, Medicine (Baltimore). 97, e9906. pp. 7132–7141. Myronenko, A., 2018. 3D MRI brain tumor segmentation using autoencoder Isensee, F., Kickingereder, P., Wick, W., Bendszus, M., Maier-Hein, K.H., regularization, in: Int. MICCAI Brainlesion Work., pp. 349–356. 2018. No new-net, in: Int. MICCAI Brainlesion Work., pp. 234–244. Nguyen, H.V., Zhou, K., Vemulapalli, R., 2015. Cross-domain synthesis of Jog, A., Carass, A., Roy, S., Pham, D.L., Prince, J.L., 2017. Random forest medical images using ecient location-sensitive deep network, in: MIC- regression for magnetic resonance image synthesis. Med. Image Anal. 35, CAI, pp. 677–684. 475–488. Nie, D., Trullo, R., Lian, J., Wang, L., Petitjean, C., Ruan, S., Wang, Q., Shen, Kabir, Y., Dojat, M., Scherrer, B., Forbes, F., Garbay, C., 2007. Multimodal D., 2018. Medical image synthesis with deep convolutional adversarial net- MRI segmentation of ischemic stroke lesions, in: EMBS, pp. 1595–1598. works. IEEE Trans. Biomed. Eng. 65, 2720–2730. Kamnitsas, K., Ledig, C., Newcombe, V.F.J., Simpson, J.P., Kane, A.D., Oktay, O., Ferrante, E., Kamnitsas, K., Heinrich, M., Bai, W., Caballero, J., Menon, D.K., Rueckert, D., Glocker, B., 2017. Ecient multi-scale 3D Cook, S., Marvao, A.D., Dawes, T., Regan, D.O., Kainz, B., Glocker, B., CNN with fully connected CRF for accurate brain lesion segmentation. Rueckert, D., 2018. Anatomically constrained neural networks (ACNN): Med. Image Anal. 36, 61–78. Application to cardiac image enhancement and segmentation. IEEE Trans. Ker, J., Wang, L., Rao, J., Lim, T., 2017. Deep learning applications in medical Med. Imaging 37, 384–395. image analysis. IEEE Access 6, 9375 – 9389. Perone, C.S., Ballester, P., Barros, R.C., Cohen-Adad, J., 2019. Unsupervised Kervadec, H., Bouchtiba, J., Desrosiers, C., Granger, E., Dolz, J., Ayed, I.B., domain adaptation for medical imaging segmentation with self-ensembling. 2019a. Boundary loss for highly unbalanced segmentation, in: Int. Conf. Neuroimage 194, 1–11. Med. Imaging with Deep Learn., pp. 285–296. Pinheiro, G.R., Voltoline, R., Bento, M., Rittner, L., 2018. V-Net and U-Net Kervadec, H., Dolz, J., Tang, M., Granger, E., Boykov, Y., Ben Ayed, I., 2019b. for ischemic stroke lesion segmentation in a small dataset of perfusion data, Constrained-CNN losses for weakly supervised segmentation. Med. Image in: Int. MICCAI Brainlesion Work., pp. 301–309. Anal. 54, 88–99. Rekik, I., Allassonniere, ` S., Carpenter, T.K., Wardlaw, J.M., 2012. Medical Kissela, B.M., Khoury, J.C., Alwell, K., Moomaw, C.J., Woo, D., Adeoye, O., image analysis methods in MR/CT-imaged acute-subacute ischemic stroke Flaherty, M.L., Khatri, P., Ferioli, S., De Los Rios La Rosa, F., Broderick, lesion: Segmentation, prediction and insights into dynamic evolution simu- J.P., Kleindorfer, D.O., 2012. Age at stroke: Temporal trends in stroke inci- lation models. A critical appraisal. NeuroImage Clin. 1, 164–178. dence in a large, biracial population. Neurology 79, 1781–1787. Ronneberger, O., Fischer, P., Brox, T., 2015. U-Net: Convolutional networks Li, X., Chen, H., Qi, X., Dou, Q., Fu, C.W., Heng, P.A., 2018. H-DenseUNet: for biomedical image segmentation, in: MICCAI, pp. 234–241. Hybrid densely connected UNet for liver and liver tumor segmentation from Roy, S., Carass, A., Shiee, N., Pham, D.L., Prince, J.L., 2010. MR contrast CT volumes. IEEE Trans. Med. Imaging 37, 2663–2674. synthesis for lesion segmentation, in: ISBI, IEEE. pp. 932–935. Lin, T.Y., Goyal, P., Girshick, R., He, K., Dollar, P., 2017. Focal loss for dense Shen, D., Wu, G., Suk, H.I., 2017. Deep learning in medical image analysis. object detection, in: ICCV, pp. 2980–2988. Annu. Rev. Biomed. Eng. 19, 221–248. Liu, P., 2018. Stroke lesion segmentation with 2D novel CNN pipeline and Song, T., Huang, N., 2018. Integrated extractor, generator and segmentor for novel loss function, in: Int. MICCAI Brainlesion Work., pp. 253–262. ischemic stroke lesion segmentation, in: Int. MICCAI Brainlesion Work., Long, J., Shelhamer, E., Darrell, T., 2015. Fully convolutional networks for pp. 310–318. semantic segmentation, in: CVPR, pp. 3431–3440. Sudre, C.H., Li, W., Vercauteren, T., Ourselin, S., Cardoso, M.J., 2017. Gen- Luo, P., Ren, J., Peng, Z., Zhang, R., Li, J., 2018. Di erentiable learning- eralised Dice overlap as a deep learning loss function for highly unbalanced to-normalize via switchable normalization. arXiv Prepr. arXiv1806.10779 segmentations, in: Deep Learn. Med. Image Anal. Multimodal Learn. Clin. . Decis. Support, pp. 240–248. Maier, O., Menze, B.H., von der Gablentz, J., Hani, ¨ L., Heinrich, M.P., Szegedy, C., Vanhoucke, V., Io e, S., Shlens, J., Wojna, Z., 2016. Rethinking Liebrand, M., Winzeck, S., Basit, A., Bentley, P., Chen, L., Christiaens, D., the Inception Architecture for Computer Vision, in: CVPR, pp. 2818–2826. Dutil, F., Egger, K., Feng, C., Glocker, B., Gotz, ¨ M., Haeck, T., Halme, H.L., Tieleman, T., Hinton, G., 2012. Lecture 6.5-RMSProp, COURSERA: Neural Havaei, M., Iftekharuddin, K.M., Jodoin, P.M., Kamnitsas, K., Kellner, E., networks for machine learning. Technical Report. University of Toronto. Korvenoja, A., Larochelle, H., Ledig, C., Lee, J.H., Maes, F., Mahmood, Q., Ting-Chun Wang, Liu, M.Y., Jun-Yan Zhu, Andrew Tao, Jan Kautz, Bryan Maier-Hein, K.H., McKinley, R., Muschelli, J., Pal, C., Pei, L., Rangarajan, Catanzaro, 2018. High-resolution image synthesis and semantic manipu- J.R., Reza, S.M., Robben, D., Rueckert, D., Salli, E., Suetens, P., Wang, lation with conditional GANs, in: CVPR, pp. 8798–8807. C.W., Wilms, M., Kirschke, J.S., Kramer ¨ , U.M., Munte, ¨ T.F., Schramm, P., Vikas Kumar Anand, Khened, M., Alex, V., Krishnamurthi, G., 2018. Fully Wiest, R., Handels, H., Reyes, M., 2017. ISLES 2015 - A public evaluation automatic segmentation for ischemic stroke using CT perfusion maps, in: benchmark for ischemic stroke lesion segmentation from multispectral MRI. Int. MICCAI Brainlesion Work., pp. 328–334. Med. Image Anal. 35, 250–269. Winzeck, S., Hakim, A., McKinley, R., Pinto, J.A., Alves, V., Silva, C., Pisov, Maier, O., Schroder ¨ , C., Forkert, N.D., Martinetz, T., Handels, H., 2015. Clas- M., Krivov, E., Belyaev, M., Monteiro, M., Oliveira, A., Choi, Y., Paik, sifiers for ischemic stroke lesion segmentation: a comparison study. PLoS M.C., Kwon, Y., Lee, H., Kim, B.J., Won, J.H., Islam, M., Ren, H., Robben, One 10, e0145118. D., Suetens, P., Gong, E., Niu, Y., Xu, J., Pauly, J.M., Lucas, C., Heinrich, Maier, O., Wilms, M., von der Gablentz, J., Kramer ¨ , U., Handels, H., 2014. Is- M.P., Rivera, L.C., Castillo, L.S., Daza, L.A., Beers, A.L., Arbelaezs, P., chemic stroke lesion segmentation in multi-spectral MR images with support Maier, O., Chang, K., Brown, J.M., Kalpathy-Cramer, J., Zaharchuk, G., vector machine classifiers, in: SPIE Med. Imaging 2014 Comput. Diagnosis, Wiest, R., Reyes, M., 2018. ISLES 2016 and 2017-benchmarking ischemic p. 903504. stroke lesion outcome prediction based on multispectral MRI. Front. Neurol. Mao, X., Li, Q., Xie, H., Lau, R.Y.K., Wang, Z., 2017. Least squares generative 9, 679. adversarial networks, in: ICCV, pp. 2794–2802. Xiao, X., Lian, S., Luo, Z., Li, S., 2018. Weighted Res-UNet for high- Mezzapesa, D.M., Petruzzellis, M., Lucivero, V., Prontera, M., Tinelli, A., San- quality retina vessel segmentation, in: Int. Conf. Inf. Technol. Med. Educ., cilio, M., Carella, A., Federico, F., 2006. Multimodal MR examination in Hangzhou. pp. 327–331. acute ischemic stroke. Neuroradiology 48, 238–246. Xu, Z.Q.J., Zhang, Y., Xiao, Y., 2019. Training behavior of deep neural network Milletari, F., Navab, N., Ahmadi, S.A., 2016. V-Net: Fully convolutional neural in frequency domain, in: ICONIP, pp. 264–274. networks for volumetric medical image segmentation, in: IC3DV, pp. 565– Yahiaoui, A.F.Z., Bessaid, A., 2016. Segmentation of ischemic stroke area from 571. CT brain images. ISIVC , 13–17. Mitra, J., Bourgeat, P., Fripp, J., Ghose, S., Rose, S., Salvado, O., Connelly, Zaharchuk, G., El Mogy, I.S., Fischbein, N.J., Albers, G.W., 2012. Comparison A., Campbell, B., Palmer, S., Sharma, G., Christensen, S., Carey, L., 2014. of arterial spin labeling and bolus perfusion-weighted imaging for detecting Lesion segmentation from multimodal MRI using random forest following mismatch in acute stroke. Stroke 43, 1843–1848. ischemic stroke. Neuroimage 98, 324–335. Murayama, K., Suzuki, S., Matsukiyo, R., Takenaka, A., Hayakawa, M., Tsut-

Journal

Computing Research RepositoryarXiv (Cornell University)

Published: Jul 7, 2020

There are no references for this article.