Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Semi-supervised Sequence Modeling for Elastic Impedance Inversion

Semi-supervised Sequence Modeling for Elastic Impedance Inversion Citation M. Alfarraj and G. AlRegib, (2019), "Semisupervised sequence modeling for elas- tic impedance inversion," Interpretation 7: SE237-SE249. Review Accepted on: 29 May 2019 Data and Codes [GitHub Link] Bib f@articlealfarraj2019semi, title=fSemi-supervised Sequence Modeling for Elastic Impedance Inversiong, author=fAlfarraj, Motaz and AlRegib, Ghassang, journal=fInterpretationg, volume=f7g, number=f3g, pages=fSE237--SE249g, year=f2019g, publisher=Society of Exploration Geophysicists and American Association of Petroleum Geologistsg Contact motaz@gatech.edu OR alregib@gatech.edu http://ghassanalregib.info/ arXiv:1908.07849v1 [physics.geo-ph] 19 Aug 2019 Semi-supervised Sequence Modeling for Elastic Impedance Inversion 1 Motaz Alfarraj and Ghassan AlRegib for EI is a more powerful tool for reservoir characterization ABSTRACT compared to AI inversion. The goal of inversion is to infer true model parameters Recent applications of machine learning algorithms in (m 2 X ) through an indirect set of measurements d 2 Y . the seismic domain have shown great potential in dif- Mathematically, the problem can be formulated as follows ferent areas such as seismic inversion and interpreta- tion. However, such algorithms rarely enforce geophys- d = F (m) + n; (1) ical constraints; the lack of which might lead to un- where F : X ! Y is a forward operator, d is the mea- desirable results. To overcome this issue, we propose sured data, m is the model, and n 2 Y is a random a semi-supervised sequence modeling framework based variable that represents noise in the measurements. To on recurrent neural networks for elastic inversion from estimate the model from the measured data, one needs multi-angle seismic data. Speci cally, seismic traces to solve an inverse problem. The solution depends on and elastic impedance traces are modeled as time se- the nature of the forward model and observed data. In ries. Then, a neural-network-based inversion model the case of seismic inversion, and due to the non-linearity comprising convolutional and recurrent neural layers is and heterogeneity of the subsurface, the inverse problem used to invert seismic data for elastic impedance. The is highly ill-posed. In order to nd a stable solution to proposed work ow uses well-log data to guide the inver- an ill-posed problem, the problem is often regularized by sion. In addition, it utilizes seismic forward modeling to regularize the training, and to serve as a geophysi- imposing constraints on the solution space, or by incor- porating prior knowledge about the model. A classical cal constraint for the inversion. The proposed work ow approach to solve inverse problems is to set up the prob- achieves an average correlation of 98% between the es- lem as a Bayesian inference problem, and improve prior timated and target elastic impedance using 10 well-logs knowledge by optimizing for a cost function based on the for training on a synthetic dataset. data likelihood, m ^ = argmin [H (F (m); d) + C(m)] ; (2) m2X INTRODUCTION where m ^ is the estimated model, H : Y  Y ! R is Seismic inversion is a process used to estimate rock prop- an ane transform of the data likelihood, C : X ! R is a erties from seismic re ection data. For example, seismic regularization function that incorporates prior knowledge, inversion can be used to infer acoustic impedance (AI) and  2 R is regularization parameter that controls the from zero-o set seismic data, which in turn is used to es- in uence of the regularization function. timate porosity. AI is the product of P-wave velocity (V ) The solution of equation 2 in seismic inversion can be and bulk density (). An extension of AI for multi-angle sought in a stochastic or a deterministic fashion through seismic data is elastic impedance (EI) (Connolly, 1999). an optimization routine. In stochastic inversion, the out- EI is a function of P-wave velocity (V ), S-wave velocity come is a posterior probability density function; or multi- (V ), density (), and incident angle (). EI reduces to AI ple valid solutions to account for uncertainty in the data. when  = 0 (Whitcombe, 2002), and it utilizes informa- On the other hand, deterministic inversion produces an es- tion from multi-o set/angle seismic data. Thus, inverting timate (m ^ ) that maximizes the posterior probability den- sity function. The literature of seismic inversion (both de- Center for Energy and Geo Processing (CeGP), Georgia In- stitute of Technology, Atlanta, GA terministic and stochastic) is rich in various methods to 2 Alfarraj & AlRegib vey area on which an algorithm can be trained. For this formulate, regularize and solve the problem (e.g., (Duijn- reason, such algorithms must have a limited number of dam, 1988a; Doyen, 1988; Duijndam, 1988b; Ulrych et al., 2001; Buland and Omre, 2003; Tarantola, 2005; Doyen, learnable parameters and a good regularization method 2007; Bosch et al., 2010; Gholami, 2015; Azevedo and in order to prevent over- tting and to be able to gener- Soares, 2017)). alize beyond the training data. Moreover, applications of Recently, there have been several successful applications machine learning on seismic data do not often utilize or of machine learning and deep learning methods to solving enforce a physical model that can be used to check the inverse problems (Lucas et al., 2018). Moreover, machine validity of their outputs. In other words, there is a high learning and deep learning methods have been utilized in dependence on machine learning algorithms to understand the seismic domain for di erent tasks such as inversion and the inherent properties of the target outputs without ex- interpretation (AlRegib et al., 2018). For example, seismic plicitly specifying a physical model. Such dependence can inversion has been attempted using supervised-learning lead to undesirable or incorrect results, especially when data is limited. algorithms such as support vector regression (SVR) (Al- In this work, we propose a semi-supervised learning Anazi and Gates, 2012), arti cial neural networks (R oth work ow for elastic impedance (EI) inversion from multi- and Tarantola, 1994; Araya-Polo et al., 2018), committee angle seismic data using sequence modeling through a models (Gholami and Ansari, 2017), convolutional neu- combination of recurrent and convolutional neural net- ral networks (CNNs) (Das et al., 2018), and many other works. The proposed work ow learns a non-linear inverse methods (Chaki et al., 2015; Yuan and Wang, 2013; Gho- mapping from a training set consisting of well-log data lami and Ansari, 2017; Chaki et al., 2017; Mosser et al., and their corresponding seismic data. In addition, the 2018; Chaki et al., 2018). More recently, a sequence- learning is guided and regularized by a forward model as modeling-based machine learning work ow was used to es- often incorporated in classical inversion theory. The rest timate petrophysical properties from seismic data (Alfar- of this paper is organized as follows. First, we discuss the raj and AlRegib, 2018), which showed that recurrent neu- formulation of learning-based seismic inversion. Then, we ral networks are superior to feed-forward neural networks introduce recurrent neural networks as one of the main in capturing the temporal dynamics of seismic traces. In general, machine learning algorithms can be used components in the proposed work ow. Next, we discuss to invert seismic data by learning a non-linear mapping the technical details of the proposed work ow. Then, we parameterized by  2 Z  R , i.e., F : Y ! X from a present a case study to validate the proposed framework set of examples (known as a training dataset) such that: on the Marmousi 2 model (Martin et al., 2006) by invert- ing for EI using seismic data and 10 well-logs only. F (d)  m: (3) PROBLEM FORMULATION A key di erence between classical inversion methods Using neural networks, one can learn the parameter  (in and learning-based methods is the outcome. In classi- equation 3) in either a supervised manner or an unsuper- cal inversion, the outcome is a set of model parameters vised manner (Adler and Oktem, 2017). In supervised (deterministic) or a posterior probability density func- learning, the machine learning algorithm is given a set of tion (stochastic). On the other hand, learning methods measurement-model pairs (d; m) (e.g., seismic traces and produce a mapping from the measurements domain (seis- their corresponding property traces from well-logs). Then, mic) to model parameters domain (rock property). An- the algorithm learns the inverse mapping by minimizing other key di erence between the two approaches is their sensitivity to the initialization of the optimization rou- the following loss function tine. In classical inversion, the initial guess (posed as a L () := D m ;F (d ) (4) prior model) plays an important role in the convergence 1 i i of the method, and in the nal solution. On the other m 2S hand, learning-based methods are often randomly initial- ized, and prior knowledge is integrated into the objective where S is the set of available property traces from well- th function and is inferred by the learning algorithm from logs, m is the i trace in S , d is the seismic traces cor- i i the training data. Thus, learning-based inversion is less responding to m , and D is a distance measure that com- sensitive to the initial guess. On the other hand, learning- pares the estimated rock property to the true property. based inversion requires a high-quality training dataset in Namely, supervised machine learning algorithms seek a order to generate reliable results. Nevertheless, there have solution that minimizes the inversion error over the given been e orts to overcome this shortcoming of neural net- measurement-model pairs. Note that equation 4 is com- works and enable them to learn from noisy or unreliable puted only over a subset of all traces in the survey. This data (Natarajan et al., 2013). subset includes the traces for which a corresponding prop- There are many challenges, however, that might prevent erty trace is available from well-logs. machine learning algorithms from nding a proper map- Alternatively, the parameters  can be sought in an ping that can be generalized for an entire survey area. unsupervised-learning scheme where the learning algorithm One of the challenges is the lack of data from a given sur- is given a set of measurements (d) and a forward model Semi-supervised Sequence Modeling for Elastic Impedance Inversion 3 F . The algorithm then learns by minimizing the following each trace is treated as a single training sample. With a loss function, limited amount of training data, common machine learn- ing algorithms might fail to generalize beyond the train- L () := D F F (d ) ; d ; (5) 2 i i ing data because of their high data requirements. Such a requirement might be dicult to meet in practical set- tings where the number of well-logs is limited. In order which is computed over all seismic traces in the survey. to remedy this, the proposed work ow utilizes sequence The loss in equation 5 is known as data mis t. It mea- modeling to model traces as sequential data via recurrent sures the distance between the input seismic traces and neural networks. the synthesized seismograms from the estimated property traces using the forward model. Although supervised methods have been shown to be RECURRENT NEURAL NETWORKS superior to unsupervised ones in various learning tasks Despite the success of feed-forward machine learning meth- (e.g., image segmentation and object recognition), they ods, including convolutional networks, multilayer percep- often need to be trained on a large number of training trons, and support vector machines for various learning examples. In the case of the seismic inversion, the labels tasks, they have their limitations. Feed-forward methods (i.e., property traces) are scarce since they come from well- have an underlying assumption that data points are inde- logs. Unsupervised methods, on the other hand, do not pendent, which makes them fail in modeling sequentially require labeled data, and thus can be trained on all avail- dependent data such as videos, speech signals, and seismic able seismic data only. However, unsupervised learning traces. does not integrate well-logs (direct model measurements) Recurrent neural networks (RNN), on the other hand, in the learning. are a class of arti cial neural networks that are designed In this work, we propose a semi-supervised learning to capture temporal dynamics of sequential data. Un- work ow for seismic inversion that integrates both well- like feed-forward neural networks, RNNs have a hidden log data in addition to data mis t in learning the inverse state variable that can be passed between sequence sam- model. Formally, the loss function of the proposed semi- ples which allows them to capture long temporal depen- supervised work ow is written as dencies in sequential data. RNNs have been utilized to solve many problems in language modeling and natural L() :=  L () +  L () (6) 1 2 | {z } | {z } language processing (NLP)(Mikolov et al., 2010), speech property loss seismic loss and audio processing (Graves et al., 2013), video process- ing (Ma et al., 2017), petrophysical property estimation where ; 2 R are tuning parameters that govern the (Alfarraj and AlRegib, 2018), detection of natural earth- in uence of each of the property loss and seismic loss, quakes (Wiszniowski et al., 2014), and stacking velocity respectively. For example, if the input seismic data is estimation (Biswas et al., 2018). noisy, or well-log data is corrupted, the values of and A single layer feed-forward neural network produces an can be used to limit the role of the corrupted data in the output y which is a weighted sum of input features x i i learning process. The property loss is computed over the followed by an activation function (a non-linearity) like traces for which we have access to model measurements the sigmoid or hyperbolic tangent functions, i.e. y = (rock properties) form well-logs. The seismic loss , on the (Wx + b), where x and y are the input and output i i i other hand, is computed over all traces in the survey. th feature vectors of the i sample, respectively, () is the The goal of this work is to invert for elastic impedance activation function, W and b are the learnable weight (EI) using multi-o set data using semi-supervised sequence matrix and bias vector, respectively. The same equation modeling. Hence, (d) in the equation 6 is the set of all is applied to all data samples independently to produce multi-angle traces in the seismic survey, and m is the set corresponding outputs. of available EI traces from well-logs. Naturally, the size In addition to the ane transformation and the acti- of m is small compared to d. Furthermore, the vertical vation function, RNNs introduce a hidden state variable resolution of EI traces is ner than that of the seismic that is computed using the current input and the hidden traces . There are two common ways to model traces in a state variable from the previous step, learning paradigm. The rst method is to treat each data (t) (t) (t1) point in a well-log (in the z-direction) as an independent h =  W x + W h + b ; xh hh h i i i sample and try to invert for a given rock property from (7) (t) (t) the corresponding multi-angle seismic data sample. This y =  W h + b hy y i i method fails to capture the long-term temporal dynamics (t) (t) (t) of seismic traces; that is the dependency between a data where x , y and h , are the input, output, and i i i point at a given depth and the data points before it and state vectors at time step t, respectively, W's and b's are after it. An alternative approach is to treat each trace network weights and bias vectors respectively. For time (0) as a single training sample to incorporate the temporal t = 0, the hidden state variable is set to h = 0. Figure dependency. However, this approach severely limits the 1 shows a side-by-side comparison between a feed-forward amount of data from which the algorithm can learn since unit and a recurrent unit. 4 Alfarraj & AlRegib is given by the following equations, Feed-forward Network (t) (t) (t1) u = sigmoid W x + W y + b xu yu u i i i (t) (t) (t1) r = sigmoid W x + W y + b xr yr r i i i Neural (#) (#) (t) (t) (t) (t1) Network y ^ = tanh W x + b + r  W y + b xy ^ y ^ hy ^ y ^ i i 1 i i 2 (t) (t1) (t) (t) (t) y = (1 u ) y + u  y ^ i i (8) (t) (t) where u and r are the update-gate and reset-gate (#) (#) (#) i i Input y Output h State (t) vectors, respectively, y ^ is the candidate output for the current time step, W's and b's are the learnable parame- Delay ters, and the operator  is the element-wise product. Note (#) that the GRU introduces two new state variables, update- (#$%) gate u and reset-gate r, which control the ow of infor- mation from one time step to another, and thus they are Neural (#) Network (#) able to capture long-term dependency. The output of the (t) GRU at the current time step (y ) is a weighted sum of the candidate output for the current time step and the output from the previous step. Figure 2 shows a GRU net- Recurrent Network work unfolded through time. Note that all GRUs in an unfolded network share the same W and u parameters. Figure 1: An illustration of feed-forward and recurrent networks. (%) (%'() (%')) * * * # # # (%+() … … GRU GRU GRU (%) (%'() (%')) " " " When RNNs were proposed in the 1980s, they were dif- # # # cult to train because of the introduced dependency be- tween data samples that made the gradients more dicult Figure 2: Gated Recurrent Unit (GRU) unfolded through to compute. The problem was later solved using back- time. propagation through time (BPTT) algorithms (Werbos, 1990), which turns gradients into long products of terms using the chain rule. Theoretically, RNNs are supposed to METHODOLOGY learn long-term dependencies from their hidden state vari- able. However, even with BPTT, RNNs fail to learn long- Similar to all deep learning methods, RNNs require tremen- term dependencies mainly because the gradients tend to dous amounts of data to train. Given that well-log data vanish for long sequences when backpropagated through is limited in a given survey area, the number of training time. samples is limited. With such limitation, a combination of New RNN architectures have been proposed to over- regularization techniques must be used to train a learning- come the issue of vanishing gradients using gated units. based model properly and ensure it generalizes beyond Examples of such architectures are Long Short-Term Mem- the training dataset (Alfarraj and AlRegib, 2018). In ad- ory (LSTM) (Hochreiter and Schmidhuber, 1997) and the dition, the data shortage limits the number of the layers recently proposed Gated Recurrent Units (GRU) (Cho (and hence parameters) that can be used in learning-based et al., 2014). Such architectures have been shown to cap- models. Therefore, using deeper networks to capture the ture long-term dependency and perform well for various highly non-linear inverse mapping from seismic data to EI tasks such as machine translation and speech recognition. might not be feasible using supervised learning. In this work, we utilize GRUs in the proposed inversion In this work, we utilize a seismic forward model as an- work ow. other form of supervision in addition to well-log data. Al- GRUs supplement the simple RNN described above by though forward modeling is an essential block in classical incorporating reset-gate and update-gate variables which seismic inversion approaches, it has not been integrated are internal states that are used to evaluate the long-term into learning-based seismic inversion work ows. dependency and keep information from previous times Using a forward model in a learning-based inversion has only if they are needed. The forward step through a GRU two main advantages. First, it allows the incorporation of Update Semi-supervised Sequence Modeling for Elastic Impedance Inversion 5 geophysics into a machine learning paradigm to ensure where N is the total number of seismic traces in the that the outputs of the networks are obeying the physical survey, and N = jSj is the number of available well- laws. In addition, it allows the algorithm to learn from all logs. In seismic surveys, N  N , therefore, seismic loss p s traces in the seismic survey without explicitly providing is computed over many more traces than the property an EI trace (label) for each seismic trace. loss. On the other hand, property loss has access to direct model parameters (well-log data). These factors make the two losses work in collaboration to learn a stable and Proposed work ow accurate inverse model. The inverse model (F ) can, in principle, be trained us- It is important to note that the choice of the forward ing well-log data and their corresponding seismic data model is critical in the proposed work ow for two rea- only. However, as we have discussed earlier, a deep in- sons. First, the forward model must be able to synthesize verse model requires a large dataset of labeled data to at a speed comparable to the speed at which the inverse train properly which is not possible in a practical setting model processes data. Since deep learning models, in gen- where the number of well-logs is limited. Integrating a eral, are capable of processing large amounts of data in a forward model, as commonly done in classical inversion very short time with GPU technology, the forward model work ows, allows the inverse model to learn from the seis- must be fast. Second, the proposed inverse model, as all mic data without requiring their corresponding property other deep learning models, adjusts its parameters accord- traces, in addition to learning from the few available prop- ing to the gradients with respect to a de ned loss function, erty traces from well-logs. The overall proposed inversion therefore, the forward model must be di erentiable in or- work ow is shown in Figure 3. der to compute gradients with respect to the seismic loss. Therefore, in this work, we chose a convolutional forward model due to its simplicity and eciency to reduce com- Seismic Loss putation time. Other choices of the forward model are possible as long as they satisfy the two conditions stated above. Input Synthesized Inverse Model Estimated EI Forward Model Seismic Seismic Inverse Model Property Loss The proposed inverse model of the proposed work ow (shown in Figure 4) consists of four main submodules. Target EI These submodules are labeled as sequence modeling, local (from well logs) pattern analysis, upscaling, and regression. Each of the four submodules performs a di erent task in the overall Figure 3: The proposed semi-supervised inversion work- inverse model. ow. The sequence modeling submodule models temporal dy- namics of seismic traces and produces features that best The work ow in Figure 3 consists of two main modules: represent the low-frequency content of EI. The local pat- the inverse model (F ) with learnable parameters, and tern analysis submodule extracts local attributes from a forward model (F ) with no learnable parameters. The seismic traces that best model high-frequency trends of proposed work ow takes multi-angle seismic traces as in- EI trace. The upscaling submodule takes the sum of the puts, and outputs the best estimate of the corresponding features produced by the previous modules and upscales EI. Then, the forward model is used to synthesize multi- them vertically. This module is added based on the as- o set seismograms from the estimated EI. The error (data sumption that seismic data are sampled (vertically) at a mis t) is computed between the synthesized seismogram lower resolution than that of well-log data. Finally, the re- and the input seismic traces using the seismic loss. This gression submodule maps the upscaled outputs from fea- process is done for all traces in the survey. Furthermore, tures domain to target domain (i.e., EI). The details of property loss is computed between estimated and true EI each these submodule are discussed next. on traces for which we have a true EI from well-logs. The parameters of the inverse model are adjusted by combin- Sequence Modeling ing both losses as in equation 6 using a gradient-descent optimization. The sequence modeling submodule consists of a series of In this work, we chose the distance measure (D) as the bidirectional Gated Recurrent Units (GRU). Each bidi- Mean Squared Error (MSE) and = = 1. Hence, rectional GRU computes a state variable from future and equation 6 reduces to: past predictions and is equivalent to 2 GRUs where one X X models the trace from shallow to deeper layers, and the 1 1 y y 2 2 L() = km F (d )k + kd F (F (d ))k i i i i 2 2 other models the reverse trace. Assuming each multi-angle N N p s i i seismic traces have c channels (one channel for each in- m 2S (9) cident angle), the First GRU takes these c channels as Update 6 Alfarraj & AlRegib Sequence Modeling Upscaling DeconvBlock DeconvBlock Input GRU GRU GRU (kernel =+) (kernel=+) (in=2) , out=) ) (in=) , out=) ) (in=) , out=) ) (in=) , out=) ) (in=) , out=2) ) * * * * % * * * * * Seismic (stride=2 ) (stride=2 ) % * ConvBlock (kernel=+) ( in=) , out=) ) % * (dilation=, ) Output GRU Linear ConvBlock ConvBlock (in=) , out=2) ) (in=) , out=) ) (kernel=+) (kernel=+) * * * % EI (in=) , out=) ) (in=3) , out=) ) % * * * (dilation=, ) (dilation=1) ConvBlock Regression (kernel= +) (in=) , out=) ) % * (dilation=, ) Local Pattern Analysis Figure 4: The inverse model in the proposed work ow with generic hyperparameters. The hyperparameters are chosen based on the data. The number of input and output features are denoted by in and out, respectively Center sample input features and computes c temporal features based on the temporal variations of the processed traces. The Dilation = 1 next two GRUs perform a similar task on the outputs of their respective preceding GRU. The series of the three Dilation = 2 GRUs is equivalent to a 3-layer deep GRU. Deeper net- works are able to model complex input-output relation- Dilation = 3 ships that shallow networks might not capture. Moreover, deep GRUs generally produce smooth outputs. Hence, the Convolution kernel point output of the sequence modeling submodule is considered as the low-frequency trend of EI. Figure 5: An illustration of dilated convolution for multi- scale feature extraction (kernel size = 5, dilation factors Local pattern analysis dilation = 1; 2 and 3). Another submodule of the inverse model is the local pat- tern analysis submodule which consists of a set of 1-dimensional convolutional blocks with di erent dilation factors in par- mean and standard deviation. They have been shown to allel. The output features of each of the parallel con- reduce covariant shift in the learned features and speed volutional blocks are then combined using another con- up the learning. In addition, activation functions are one volutional block. Dilation refers to the spacing between of the building blocks of any neural network. They are convolution kernel points in the convolutional layers (Yu a source of non-linearity that allow the neural networks and Koltun, 2015). Multiple dilation factors of the kernel to approximate highly non-linear functions. In this work, extract multiscale features by incorporating information we chose the hyperbolic tangent function as the activation from trace samples that are direct neighbors to a refer- function. ence sample (i.e., the center sample), in addition to the Convolutional layers operate on small windows of the samples that are further from it. An illustration of dilated input trace due to their small kernel sizes. Therefore, convolution is shown in Figure 5 for a convolution kernel they capture high-frequency content. Since convolutional of size 5 and dilation factors dilation = 1; 2 and 3. layers do not have a state variable like recurrent layers to A convolutional block (ConvBlock ) consists of a convo- serve as a memory, they do poorly in estimating the low- lutional layer followed by group normalization (Wu and frequency content. The outputs of the local pattern anal- He, 2018) and an activation function. Group normaliza- ysis submodule are of very similar dimensions of those of tion scales divides the output of the convolutional layers the sequence modeling submodule. Hence, the outputs of into groups, and normalizes each group using a learned the two modules are added to obtain a full-band frequency Concatenation Semi-supervised Sequence Modeling for Elastic Impedance Inversion 7 content. 1 EI (t + t; ) EI (t; ) RC (t; ) = ; (10) 2 EI (t + t; ) + EI (t; ) upscaling where t is the time step, and EI (t; ) is the elastic impedance at time t and incident angle . EI in equation Seismic data is sampled at a lower rate than that of well- 10 refers to the normalized elastic impedance proposed logs data. The role of the upscaling submodule is to com- by Whitcombe (2002) which is computed from the elastic pensate for this resolution mismatch. This submodule properties as follows: consists of two Deconvolutional Blocks with di erent ker- nel stride. The stride controls the factor by which the a b c V (t) V (t) (t) p s inputs are upscaled. A stride of (s = 2) deconvolutional EI (t; ) = V  ; (11) p 0 V V p s 0 block produces an output that has twice the number of 0 0 the input samples (vertically). Deconvolutional layers (also known as transposed con- 2 2 a = 1 + tan c = 1 4K sin volutional or fractionally-strided convolutional layers) are where, b = 4K sin  K = s=V upscaling modules with learnable kernel parameters un- like classical interpolation methods with xed kernel pa- rameters (e.g., linear interpolation). They learn kernel and V ,V and  are P-wave velocity, S-wave velocity, and p s parameters from both feature and local spatial/temporal density, respectively, and V ,V and  are their respec- p s 0 0 0 domain. They have been used for various applications tive averages over the training sample (i.e., well-logs). It is like semantic segmentation and seismic structure labeling worth noting that the Aki-Richard's approximation of the (Noh et al., 2015; Alaudah et al., 2018). elastic impedance is only valid for incident angles that are Deconvolutional blocks in Figure 4 have a similar struc- less than 35 . Thus, in the case study, we only consider ture as the convolutional blocks introduced earlier. They valid angles for this approximation. are a series of deconvolutional layer followed by a group The seismograms are then generated by convolving RC (t; ) normalization module and an activation function. with a wavelet w(t), i.e., S(t; ) = w(t) RC (t; ); (12) Regression where  is the linear convolution operator, and S(t; ) is The nal submodule in the inverse model is the regres- the synthesized seismogram. Thus, the forward model uti- sion submodule which consists of a GRU followed by a lized in this work is a 1-dimensional convolutional forward linear mapping layer (fully-connected layer). Its role is to model that synthesized seismograms from EI traces. regress the extracted features from the other modules to the target domain (EI domain). The GRU in this mod- CASE STUDY ON MARMOUSI 2 MODEL ule is a simple 1-layer GRU that augments the upscaled outputs using global temporal features. Finally, a linear In order to validate the proposed algorithm, we chose Mar- ane transformation layer (fully-connected layer) takes mousi 2 model as a case study. Marmousi 2 model is an the output features from the GRU and maps them to the elastic extension of the original Marmousi synthetic model same number of features in the target domain, which is, that has been used for numerous studies in geophysics for in this case, the number of incident angles in the target various applications including seismic inversion, seismic EI trace. modeling, seismic imaging, and AVO analysis. The model spans 17 km in width and 3.5 km in depth with a verti- cal resolution of 1.25 m. The details of the model can be Forward Model found in Martin et al. (2006). Forward modeling is the process of synthesizing seismo- In this work, we used the elastic model (converted to grams from elastic properties of the earth (i.e., P-wave time) to generate EI and seismic data to train the model. velocity, S-wave velocity, and density) or from a function The details of the dataset generation are discussed in the of the elastic properties such as EI. In this work, and since next section. The proposed work ow is trained using all we are inverting for EI, we used a forward model that takes seismic traces in Marmousi 2 in addition to a few EI traces EI as an input, and outputs a corresponding seismogram. (training traces). The work ow is then used to invert for EI was proposed by Connolly (1999) and later normalized EI on the entire Marmousi. Since the full elastic model is by Whitcombe (2002). It is based on the Aki-Richards available, we compare our EI inversion with the true EI approximation for Zoeppritz equations (Aki and Richards, quantitatively. 1980). The Aki-Richards approximation incorporates am- plitude variations with o set/angle (AVO/AVA) based on Dataset Generation the changes of elastic properties and the incident angle. The forward model adopted in this work uses Connolly's We used the elastic model of Marmousi 2 to generate EI formulation to compute the re ection coecients from EI for 4 incident angles  = 0 ,10 ,20 , and 30 . Thus, in for di erent incident angles  as follows: Figure 4, c = 4 which represents the number of channels 1 8 Alfarraj & AlRegib di erent values and testing the performance on a valida- in the input traces. Multi-angle seismic data (a total of tion dataset using cross-validation. In this case study, we N = 2720 traces) is then generated from EI using the forward model with Ormsby wavelet (5-10-60-80 Hz) fol- choose c = 8, k = 5, d = 1, d = 3, and d = 6. 2 1 2 3 lowing the synthesis procedure in (Martin et al., 2006). The seismic traces are then downsampled (in time) by a Results and Discussion factor of 6 to simulate resolution di erence between seis- Figure 7 shows estimated EI and true EI for the entire sec- mic and well-log data. Finally, a 15 db white Gaussian tion for all four incident angles. Figure 8 shows the abso- noise is added to assess the robustness of the proposed lute di erence between the true and estimated EI. The g- work ow to noise. ures indicate that the proposed work ow estimates EI ac- To train the proposed inversion work ow, we chose 10 curately for most parts of the section with a visible lateral evenly-spaced traces for training (N = 10) as shown in jitter. Jitter e ect is expected since the proposed work- Figure 6. We assume we have access to both EI and seis- ow is based on 1-dimensional sequence modeling with no mic data for the training traces. For all other traces in the explicit spatial constraints as often done in classical in- survey, we assume we have access to seismic data only. Al- version methods. Furthermore, the noise in the seismic though using more EI traces for training would improve data reduces the similarity between neighboring seismic the inversion, in this work we use only 10 EI traces for traces, which can also cause such jitter. The shown sec- training to simulate well-log data in a practical setting. tions are the direct output of the inversion work ow with no post-processing which can reduce such artifacts. Figure 9 shows two selected traces from the section that were not in the training dataset (x = 3200 m and x = 8500 m). The trace at x = 3200 passes through an anomaly (Gas-charged sand channel) represented by an isolated and sudden transition in EI at 1:25 seconds. This anomaly causes the inverse model to incorrectly estimate EI. Since our work ow is based on bidirectional sequence Figure 6: The training EI traces overlaid on the zero-o set modeling, we expect the error to propagate to nearby sam- seismic section. ples in both directions. However, the algorithm quickly recovers a good estimate for deeper and shallower sam- ples of the trace. This quick recovery is mainly due to the Training The Inverse Model reset-gate variable in the GRU that limits the propagation First, the inverse model (neural network) is initialized of such errors in sequential data estimation. Furthermore, with random parameters. Then, randomly chosen seismic the trace at x = 8500 passes through most layers in the traces in addition to the seismic traces for which we have section which makes it a challenging trace to invert. The EI traces in the training dataset are inputted to the inverse estimated EI at x = 8500 follows the trends in true EI model to get a corresponding set of EI traces. The forward trace, including the thin layers. model is then used to synthesize seismograms from the es- Furthermore, we show scatter plots of the estimated and timated EI. Seismic loss is computed as the MSE between true EI for all incident angles in Figure 10. The shaded the synthesized seismic and the input seismic. Moreover, regions in the gure include all points that are within one property loss is computed as the MSE between the pre- standard deviation of the true EI ( ). The scatter plots EI dicted EI and the true EI trace on the training traces only. show a linear correlation between the estimated and true The total loss is computed as the sum of the two losses. EI with the majority of the estimated samples within EI Then, the gradients of the total loss are computed, and the from the true EI. parameters of the inverse model are updated accordingly. To evaluate the performance of the proposed work ow The process is repeated until convergence. quantitatively, we use two metrics that are commonly used Figure 4 shows the inverse model with no given hyper- for regression analysis. Namely, Pearson correlation coef- parameters to ensure its generalizability for data other cient (PCC), and coecient of determination (r ). PCC than the data used in this case study. These hyperpa- is de ned as: rameters are fc ; c ; s ; s ; d ; d ,kg which are parameters 1 2 1 2 1 2 T1 of the inversion model that need to be set rather than 1 1 PCC = [x(t)  ] [x ^(t)  ] ; (13) x x ^ learned. Some of these hyperparameters are completely x x ^ t=0 dictated by the data. For example, c which is the num- ber of inputs channels must be chosen as the number of where x and x ^ are the target trace and the estimated incident angles of the data. Also, s ; s must be chosen trace, respectively, and  and  are mean and standard 1 2 such that s  s is equal to the resolution mismatch fac- deviation, respectively. PCC is a measure of the linear 1 2 tor between seismic and EI data. In this case study, we correlation between the estimated and target traces. It set c = 4, s = 3, and s = 2. The rest of the hyper- is commonly used to measure the overall t between the 1 1 2 parameters are chosen by analyzing the data and trying two traces with no regard to the individual values. On Semi-supervised Sequence Modeling for Elastic Impedance Inversion 9 (a) Estimated EI ( = 0 ) (b) True EI ( = 0 ) (c) Estimated EI ( = 10 ) (d) True EI ( = 10 ) (e) Estimated EI ( = 20 ) (f) True EI ( = 20 ) (g) Estimated EI ( = 30 ) (h) True EI ( = 30 ) Figure 7: Estimated EI and true EI for Marmousi 2 model. spectively. > 0, > 0, > 0 are constants chosen the other hand, r is a goodness-of- t measure, where it to adjust the in uence of the each of three terms. From takes into account the mean squared error between the two SSIM, a single similarity value, denoted as M-SSIM, can traces. The coecient of determination (r ) is de ned as: be computed by taking the mean of all SSIM scores for all T1 2 [x(t) x ^(t)] local windows. 2 i=0 r = 1 (14) T1 2 The quantitative results are summarized in Table 1. It [x(t)  ] i=0 is evident that the estimated EI captures the overall trend In addition, we used Structural Similarity (SSIM) (Wang of true EI (average PCC  98%, average r  94%). The et al., 2004) to assess the quality of the estimated EI sec- average r score is lower than PCC since it is more sensi- tion from an image point of view. SSIM evaluates the tive to the residual sum of squares rather than the overall similarity of two images from local statistics (on local win- trend. Note that PCC and r compute scores over indi- dows) using the following equation: vidual traces, then the nal score (for each incident angle) is computed by averaging across all traces. On the other SSIM(X; X) = [l(x; x^)]  [c(x; x^)]  [s(x; x^)] ; (15) hand, M-SSIM is computed over the 2D section (image) where x and x^ are patches from estimated and target which takes both lateral and vertical variations into ac- image, respectively. l(x; y), c(x; x^), and s(x; x^) are lu- count. minance, contrast and structure comparison functions re- 10 Alfarraj & AlRegib (a)  = 0 (b)  = 10 (c)  = 20 (d)  = 30 Figure 8: Absolute di erence between True EI and estimate EI for all incident angles. (a) x = 3300 m (b) x = 8500 m Figure 9: Selected EI trace. Estimate EI is shown in red, and true EI is shown in black. Furthermore, we quantify the contribution of each of without integrating the well-logs. Thus, the results are the the two terms in the loss function in equation (6) by com- worst out of three schemes. Note that the performance of paring the inversion results for di erent values of and the unsupervised learning scheme could be improved by . The average results are reported in Table 2. using an initial smooth model and by enforcing some con- When = 0 and = 1 (unsupervised learning), the straints on the inversion as often doe in classical inversion. inverse model is learned completely from the seismic data Moreover, when = 1 and = 0 (supervised learning), Semi-supervised Sequence Modeling for Elastic Impedance Inversion 11 ues, with a wider distribution than that of PCC. This is mainly due to the fact that PCC is de ned to be in the 2 2 range [0; 1] and r is in the range (1; 1]. In addition, r is a more strict metric than PCC as it factors in the MSE between the estimated EI and true EI. Figure 11(c) shows the distribution of local SSIM scores over the entire sec- tion, with the majority of the scores in the range [0:8; 1] indicating that the estimated EI is structurally similar to the true EI from an image point of view. (a)  = 0 (b)  = 10 (a) Distribution of PCC values (c)  = 20 (d)  = 30 Figure 10: Scatter plots of the estimated and true EI for di erent values of . The shaded regions include all points that are within  of the true EI. EI Table 1: Quantitative evaluation of the proposed work ow on Marmousi 2. Metric 2 (b) Distribution of r values PCC r M-SSIM Angle = 0 0.98 0.95 0.92 = 10 0.98 0.95 0.92 = 20 0.98 0.95 0.92 = 30 0.98 0.92 0.92 Average 0.98 0.94 0.92 the inversion work ow learns from only 10 seismic traces and their corresponding EI traces from well-logs. Thus, it is expected that it results in better inversion compared (c) Distribution of SSIM values to the unsupervised scheme. However, training a deep inverse model in a supervised learning scheme requires Figure 11: The distribution of Pearson correlation coef- heavy regularization and careful selection of the train- cient (PCC), and coecient of determination (r ), and ing parameters. In addition, the learned inverse model SSIM values on Marmousi 2. might not generalize well beyond training data. Finally, when = = 1 (semi-supervised learning), the inverse model is learned from all seismic data in addition to 10 EI Implementation traces from well-logs. Hence, the semi-supervised learn- ing scheme improves the performance and regularizes the The proposed work ow was implemented in Python us- learning. ing PyTorch deep learning library (Paszke et al., 2017). Figure 11(a) shows the distribution of PCC of the es- For optimization, we used Adam (Kingma and Ba, 2014) timated EI with respect to the true EI for all traces in which is a gradient-based stochastic optimization tech- Marmousi 2 model. The gure shows that the estimated nique with adaptive learning rate that was designed specif- EI correlates very well with the true EI, with the major- ically for training deep neural networks. The codes were ity of PCC values between [0:9 1], and a spike near 1. run on a PC with Intel i7 Quad-Core CPU, and a single Similarity, Figure 11(b) shows the distribution of r val- Nvidia GeForce GTX 1080 Ti GPU. The run time of 500 12 Alfarraj & AlRegib Table 2: Quantitative evaluation of the sensitivity of the loss function with respect to its two terms. Unsupervised: learning from all seismic traces in the section, Supervised: learning from 10 seismic traces and their corresponding EI traces, Semi-supervised: learning from all seismic data in the section in addition to 10 seismic traces and their corresponding EI traces Metric PCC r M-SSIM Training Scheme Unsupervised ( = 0; = 1) 0.33 -0.45 0.77 Supervised ( = 1; = 0) 0.96 0.88 0.87 Semi-supervised ( = 1; = 1) 0.98 0.94 0.92 iterations of the GPU-accelerated algorithm was 2 min- lation of 98%. Furthermore, the applications of the pro- utes. Figure 12 shows the convergence behavior of the posed work ow are not limited to EI inversion; it can be proposed work ow over 500 training iterations. The y- easily extended to perform full elastic inversion as well as axis shows the total loss over the training dataset which property estimation for reservoir characterization. is computed as in equation 9 except for an additional normalization factor that is equal to the number of time samples per trace. Note that the loss is also computed ACKNOWLEDGMENT over normalized traces after subtracting the mean value This work is supported by the Center for Energy and and dividing by the standard deviation to ensure faster Geo Processing (CeGP) at Georgia Institute of Technol- convergence of the inversion work ow as often done for ogy and King Fahd University of Petroleum and Minerals deep learning models. The proposed work ow converges (KFUPM). in about 300 iterations. However, the loss continues to de- crease slightly. The training was terminated at after 500 iterations to avoid over- tting to the training dataset. REFERENCES Adler, J., and O. Oktem, 2017, Solving ill-posed inverse problems using iterative deep neural networks: Inverse Problems, 33, 124007. Aki, K., and P. Richards, 1980, Quantitative seismology, vol. 2. Al-Anazi, A., and I. Gates, 2012, Support vector regres- sion to predict porosity and permeability: e ect of sam- ple size: Computers & Geosciences, 39, 64{76. Alaudah, Y., S. Gao, and G. AlRegib, 2018, Learning to label seismic structures with deconvolution networks and weak labels, in SEG Technical Program Expanded Abstracts 2018: Society of Exploration Geophysicists, Figure 12: Training learning curve showing the loss func- 2121{2125. tion value over 500 iterations. Alfarraj, M., and G. AlRegib, 2018, Petrophysical prop- erty estimation from seismic data using recurrent neu- The eciency of the proposed work ow is due to the ral networks, in SEG Technical Program Expanded use of 1-dimensional modeling. In addition, Marmousi 2 Abstracts 2018: Society of Exploration Geophysicists, is a relatively small model with 2720 traces only. However, 2141{2146. the computation time of the proposed work ow will scale AlRegib, G., M. Deriche, Z. Long, H. Di, Z. Wang, Y. linearly with the number of traces in the dataset. Alaudah, M. A. Sha q, and M. Alfarraj, 2018, Subsur- The code used to reproduce the results reported in this face structure analysis using computational interpreta- manuscript is publicly available on GitHub [link to code]. tion and learning: A visual signal processing perspec- tive: IEEE Signal Processing Magazine, 35, 82{98. CONCLUSIONS Araya-Polo, M., J. Jennings, A. Adler, and T. Dahlke, In this work, we proposed an innovative semi-supervised 2018, Deep-learning tomography: The Leading Edge, machine learning work ow for elastic impedance inversion 37, 58{66. from multi-angle seismic data using recurrent neural net- Azevedo, L., and A. Soares, 2017, Geostatistical methods works. The proposed work ow was validated on the Mar- for reservoir geophysics: Springer. mousi 2 model. Although the training was carried out Biswas, R., A. Vassiliou, R. Stromberg, and M. K. Sen, on a small number of EI traces for training, the proposed 2018, Stacking velocity estimation using recurrent neu- work ow was able to estimate EI with an average corre- ral network, in SEG Technical Program Expanded Ab- Semi-supervised Sequence Modeling for Elastic Impedance Inversion 13 stracts 2018: Society of Exploration Geophysicists, Kingma, D. P., and J. Ba, 2014, Adam: A 2241{2245. method for stochastic optimization: arXiv preprint Bosch, M., T. Mukerji, and E. F. Gonzalez, 2010, Seismic arXiv:1412.6980. inversion for reservoir properties combining statistical Lucas, A., M. Iliadis, R. Molina, and A. K. Katsaggelos, rock physics and geostatistics: A review: Geophysics, 2018, Using deep neural networks for inverse problems 75, 75A165{75A176. in imaging: beyond analytical methods: IEEE Signal Buland, A., and H. Omre, 2003, Bayesian linearized avo Processing Magazine, 35, 20{36. inversion: Geophysics, 68, 185{198. Ma, C.-Y., M.-H. Chen, Z. Kira, and G. AlRegib, 2017, Chaki, S., A. Routray, and W. K. Mohanty, 2015, A novel TS-LSTM and temporal-inception: Exploiting spa- preprocessing scheme to improve the prediction of sand tiotemporal dynamics for activity recognition: arXiv fraction from seismic attributes using neural networks: preprint arXiv:1703.10667. IEEE Journal of Selected Topics in Applied Earth Ob- Martin, G. S., R. Wiley, and K. J. Marfurt, 2006, Mar- servations and Remote Sensing, 8, 1808{1820. mousi2: An elastic upgrade for marmousi: The Leading ||{, 2017, A di usion lter based scheme to denoise Edge, 25, 156{166. seismic attributes and improve predicted porosity vol- Mikolov, T., M. Kara  at, L. Burget, J. Cernocky,  and S. ume: IEEE Journal of Selected Topics in Applied Earth Khudanpur, 2010, Recurrent neural network based lan- Observations and Remote Sensing, 10, 5265{5274. guage model: Presented at the Eleventh Annual Con- ||{, 2018, Well-log and seismic data integration for ference of the International Speech Communication As- reservoir characterization: A signal processing and sociation. machine-learning perspective: IEEE Signal Processing Mosser, L., W. Kimman, J. Dramsch, S. Purves, A. De la Magazine, 35, 72{81. Fuente Briceno, ~ and G. Ganssle, 2018, Rapid seismic Cho, K., B. Van Merri enboer, D. Bahdanau, and Y. Ben- domain transfer: Seismic velocity inversion and model- gio, 2014, On the properties of neural machine trans- ing using deep generative neural networks: Presented lation: Encoder-decoder approaches: arXiv preprint at the 80th EAGE Conference and Exhibition 2018. arXiv:1409.1259. Natarajan, N., I. S. Dhillon, P. K. Ravikumar, and A. Connolly, P., 1999, Elastic impedance: The Leading Edge, Tewari, 2013, Learning with noisy labels: Advances in 18, 438{452. neural information processing systems, 1196{1204. Das, V., A. Pollack, U. Wollner, and T. Mukerji, 2018, Noh, H., S. Hong, and B. Han, 2015, Learning deconvolu- Convolutional neural network for seismic impedance tion network for semantic segmentation: Proceedings of inversion, in SEG Technical Program Expanded Ab- the IEEE international conference on computer vision, stracts 2018: Society of Exploration Geophysicists, 1520{1528. 2071{2075. Paszke, A., S. Gross, S. Chintala, G. Chanan, E. Yang, Z. Doyen, P., 2007, Seismic reservoir characterization: DeVito, Z. Lin, A. Desmaison, L. Antiga, and A. Lerer, An earth modelling perspective: EAGE publications 2017, Automatic di erentiation in pytorch: Presented Houten, 2. at the NIPS-W. Doyen, P. M., 1988, Porosity from seismic data: A geo- R oth, G., and A. Tarantola, 1994, Neural networks and statistical approach: Geophysics, 53, 1263{1275. inversion of seismic data: Journal of Geophysical Re- Duijndam, A., 1988a, Bayesian estimation in seismic in- search: Solid Earth, 99, 6753{6768. version. part i: Principles: Geophysical Prospecting, Tarantola, A., 2005, Inverse problem theory and methods 36, 878{898. for model parameter estimation: siam, 89. ||{, 1988b, Bayesian estimation in seismic inversion. Ulrych, T. J., M. D. Sacchi, and A. Woodbury, 2001, A part ii: Uncertainty analysis: Geophysical Prospecting, bayes tour of inversion: A tutorial: Geophysics, 66, 36, 899{918. 55{69. Gholami, A., 2015, Nonlinear multichannel impedance in- Wang, Z., A. C. Bovik, H. R. Sheikh, E. P. Simoncelli, version by total-variation regularization: Geophysics, et al., 2004, Image quality assessment: from error vis- 80, R217{R224. ibility to structural similarity: IEEE transactions on Gholami, A., and H. R. Ansari, 2017, Estimation of poros- image processing, 13, 600{612. ity from seismic attributes using a committee model Werbos, P. J., 1990, Backpropagation through time: what with bat-inspired optimization algorithm: Journal of it does and how to do it: Proceedings of the IEEE, 78, Petroleum Science and Engineering, 152, 238{249. 1550{1560. Graves, A., A.-r. Mohamed, and G. Hinton, 2013, Whitcombe, D. N., 2002, Elastic impedance normaliza- Speech recognition with deep recurrent neural networks: tion: Geophysics, 67, 60{62. Acoustics, speech and signal processing (ICASSP), 2013 Wiszniowski, J., B. M. Plesiewicz, and J. Trojanowski, IEEE international conference on, IEEE, 6645{6649. 2014, Application of real time recurrent neural network Hochreiter, S., and J. Schmidhuber, 1997, LSTM can solve for detection of small natural earthquakes in poland: hard long time lag problems: Advances in neural infor- Acta Geophysica, 62, 469{485. mation processing systems, 473{479. Wu, Y., and K. He, 2018, Group normalization: Proceed- 14 Alfarraj & AlRegib ings of the European Conference on Computer Vision (ECCV), 3{19. Yu, F., and V. Koltun, 2015, Multi-scale context ag- gregation by dilated convolutions: arXiv preprint arXiv:1511.07122. Yuan, S., and S. Wang, 2013, Spectral sparse Bayesian learning re ectivity inversion: Geophysical Prospect- ing, 61, 735{746. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Electrical Engineering and Systems Science arXiv (Cornell University)

Semi-supervised Sequence Modeling for Elastic Impedance Inversion

Loading next page...
 
/lp/arxiv-cornell-university/semi-supervised-sequence-modeling-for-elastic-impedance-inversion-jpH6RXVuKz

References (43)

ISSN
2324-8858
eISSN
ARCH-3348
DOI
10.1190/INT-2018-0250.1
Publisher site
See Article on Publisher Site

Abstract

Citation M. Alfarraj and G. AlRegib, (2019), "Semisupervised sequence modeling for elas- tic impedance inversion," Interpretation 7: SE237-SE249. Review Accepted on: 29 May 2019 Data and Codes [GitHub Link] Bib f@articlealfarraj2019semi, title=fSemi-supervised Sequence Modeling for Elastic Impedance Inversiong, author=fAlfarraj, Motaz and AlRegib, Ghassang, journal=fInterpretationg, volume=f7g, number=f3g, pages=fSE237--SE249g, year=f2019g, publisher=Society of Exploration Geophysicists and American Association of Petroleum Geologistsg Contact motaz@gatech.edu OR alregib@gatech.edu http://ghassanalregib.info/ arXiv:1908.07849v1 [physics.geo-ph] 19 Aug 2019 Semi-supervised Sequence Modeling for Elastic Impedance Inversion 1 Motaz Alfarraj and Ghassan AlRegib for EI is a more powerful tool for reservoir characterization ABSTRACT compared to AI inversion. The goal of inversion is to infer true model parameters Recent applications of machine learning algorithms in (m 2 X ) through an indirect set of measurements d 2 Y . the seismic domain have shown great potential in dif- Mathematically, the problem can be formulated as follows ferent areas such as seismic inversion and interpreta- tion. However, such algorithms rarely enforce geophys- d = F (m) + n; (1) ical constraints; the lack of which might lead to un- where F : X ! Y is a forward operator, d is the mea- desirable results. To overcome this issue, we propose sured data, m is the model, and n 2 Y is a random a semi-supervised sequence modeling framework based variable that represents noise in the measurements. To on recurrent neural networks for elastic inversion from estimate the model from the measured data, one needs multi-angle seismic data. Speci cally, seismic traces to solve an inverse problem. The solution depends on and elastic impedance traces are modeled as time se- the nature of the forward model and observed data. In ries. Then, a neural-network-based inversion model the case of seismic inversion, and due to the non-linearity comprising convolutional and recurrent neural layers is and heterogeneity of the subsurface, the inverse problem used to invert seismic data for elastic impedance. The is highly ill-posed. In order to nd a stable solution to proposed work ow uses well-log data to guide the inver- an ill-posed problem, the problem is often regularized by sion. In addition, it utilizes seismic forward modeling to regularize the training, and to serve as a geophysi- imposing constraints on the solution space, or by incor- porating prior knowledge about the model. A classical cal constraint for the inversion. The proposed work ow approach to solve inverse problems is to set up the prob- achieves an average correlation of 98% between the es- lem as a Bayesian inference problem, and improve prior timated and target elastic impedance using 10 well-logs knowledge by optimizing for a cost function based on the for training on a synthetic dataset. data likelihood, m ^ = argmin [H (F (m); d) + C(m)] ; (2) m2X INTRODUCTION where m ^ is the estimated model, H : Y  Y ! R is Seismic inversion is a process used to estimate rock prop- an ane transform of the data likelihood, C : X ! R is a erties from seismic re ection data. For example, seismic regularization function that incorporates prior knowledge, inversion can be used to infer acoustic impedance (AI) and  2 R is regularization parameter that controls the from zero-o set seismic data, which in turn is used to es- in uence of the regularization function. timate porosity. AI is the product of P-wave velocity (V ) The solution of equation 2 in seismic inversion can be and bulk density (). An extension of AI for multi-angle sought in a stochastic or a deterministic fashion through seismic data is elastic impedance (EI) (Connolly, 1999). an optimization routine. In stochastic inversion, the out- EI is a function of P-wave velocity (V ), S-wave velocity come is a posterior probability density function; or multi- (V ), density (), and incident angle (). EI reduces to AI ple valid solutions to account for uncertainty in the data. when  = 0 (Whitcombe, 2002), and it utilizes informa- On the other hand, deterministic inversion produces an es- tion from multi-o set/angle seismic data. Thus, inverting timate (m ^ ) that maximizes the posterior probability den- sity function. The literature of seismic inversion (both de- Center for Energy and Geo Processing (CeGP), Georgia In- stitute of Technology, Atlanta, GA terministic and stochastic) is rich in various methods to 2 Alfarraj & AlRegib vey area on which an algorithm can be trained. For this formulate, regularize and solve the problem (e.g., (Duijn- reason, such algorithms must have a limited number of dam, 1988a; Doyen, 1988; Duijndam, 1988b; Ulrych et al., 2001; Buland and Omre, 2003; Tarantola, 2005; Doyen, learnable parameters and a good regularization method 2007; Bosch et al., 2010; Gholami, 2015; Azevedo and in order to prevent over- tting and to be able to gener- Soares, 2017)). alize beyond the training data. Moreover, applications of Recently, there have been several successful applications machine learning on seismic data do not often utilize or of machine learning and deep learning methods to solving enforce a physical model that can be used to check the inverse problems (Lucas et al., 2018). Moreover, machine validity of their outputs. In other words, there is a high learning and deep learning methods have been utilized in dependence on machine learning algorithms to understand the seismic domain for di erent tasks such as inversion and the inherent properties of the target outputs without ex- interpretation (AlRegib et al., 2018). For example, seismic plicitly specifying a physical model. Such dependence can inversion has been attempted using supervised-learning lead to undesirable or incorrect results, especially when data is limited. algorithms such as support vector regression (SVR) (Al- In this work, we propose a semi-supervised learning Anazi and Gates, 2012), arti cial neural networks (R oth work ow for elastic impedance (EI) inversion from multi- and Tarantola, 1994; Araya-Polo et al., 2018), committee angle seismic data using sequence modeling through a models (Gholami and Ansari, 2017), convolutional neu- combination of recurrent and convolutional neural net- ral networks (CNNs) (Das et al., 2018), and many other works. The proposed work ow learns a non-linear inverse methods (Chaki et al., 2015; Yuan and Wang, 2013; Gho- mapping from a training set consisting of well-log data lami and Ansari, 2017; Chaki et al., 2017; Mosser et al., and their corresponding seismic data. In addition, the 2018; Chaki et al., 2018). More recently, a sequence- learning is guided and regularized by a forward model as modeling-based machine learning work ow was used to es- often incorporated in classical inversion theory. The rest timate petrophysical properties from seismic data (Alfar- of this paper is organized as follows. First, we discuss the raj and AlRegib, 2018), which showed that recurrent neu- formulation of learning-based seismic inversion. Then, we ral networks are superior to feed-forward neural networks introduce recurrent neural networks as one of the main in capturing the temporal dynamics of seismic traces. In general, machine learning algorithms can be used components in the proposed work ow. Next, we discuss to invert seismic data by learning a non-linear mapping the technical details of the proposed work ow. Then, we parameterized by  2 Z  R , i.e., F : Y ! X from a present a case study to validate the proposed framework set of examples (known as a training dataset) such that: on the Marmousi 2 model (Martin et al., 2006) by invert- ing for EI using seismic data and 10 well-logs only. F (d)  m: (3) PROBLEM FORMULATION A key di erence between classical inversion methods Using neural networks, one can learn the parameter  (in and learning-based methods is the outcome. In classi- equation 3) in either a supervised manner or an unsuper- cal inversion, the outcome is a set of model parameters vised manner (Adler and Oktem, 2017). In supervised (deterministic) or a posterior probability density func- learning, the machine learning algorithm is given a set of tion (stochastic). On the other hand, learning methods measurement-model pairs (d; m) (e.g., seismic traces and produce a mapping from the measurements domain (seis- their corresponding property traces from well-logs). Then, mic) to model parameters domain (rock property). An- the algorithm learns the inverse mapping by minimizing other key di erence between the two approaches is their sensitivity to the initialization of the optimization rou- the following loss function tine. In classical inversion, the initial guess (posed as a L () := D m ;F (d ) (4) prior model) plays an important role in the convergence 1 i i of the method, and in the nal solution. On the other m 2S hand, learning-based methods are often randomly initial- ized, and prior knowledge is integrated into the objective where S is the set of available property traces from well- th function and is inferred by the learning algorithm from logs, m is the i trace in S , d is the seismic traces cor- i i the training data. Thus, learning-based inversion is less responding to m , and D is a distance measure that com- sensitive to the initial guess. On the other hand, learning- pares the estimated rock property to the true property. based inversion requires a high-quality training dataset in Namely, supervised machine learning algorithms seek a order to generate reliable results. Nevertheless, there have solution that minimizes the inversion error over the given been e orts to overcome this shortcoming of neural net- measurement-model pairs. Note that equation 4 is com- works and enable them to learn from noisy or unreliable puted only over a subset of all traces in the survey. This data (Natarajan et al., 2013). subset includes the traces for which a corresponding prop- There are many challenges, however, that might prevent erty trace is available from well-logs. machine learning algorithms from nding a proper map- Alternatively, the parameters  can be sought in an ping that can be generalized for an entire survey area. unsupervised-learning scheme where the learning algorithm One of the challenges is the lack of data from a given sur- is given a set of measurements (d) and a forward model Semi-supervised Sequence Modeling for Elastic Impedance Inversion 3 F . The algorithm then learns by minimizing the following each trace is treated as a single training sample. With a loss function, limited amount of training data, common machine learn- ing algorithms might fail to generalize beyond the train- L () := D F F (d ) ; d ; (5) 2 i i ing data because of their high data requirements. Such a requirement might be dicult to meet in practical set- tings where the number of well-logs is limited. In order which is computed over all seismic traces in the survey. to remedy this, the proposed work ow utilizes sequence The loss in equation 5 is known as data mis t. It mea- modeling to model traces as sequential data via recurrent sures the distance between the input seismic traces and neural networks. the synthesized seismograms from the estimated property traces using the forward model. Although supervised methods have been shown to be RECURRENT NEURAL NETWORKS superior to unsupervised ones in various learning tasks Despite the success of feed-forward machine learning meth- (e.g., image segmentation and object recognition), they ods, including convolutional networks, multilayer percep- often need to be trained on a large number of training trons, and support vector machines for various learning examples. In the case of the seismic inversion, the labels tasks, they have their limitations. Feed-forward methods (i.e., property traces) are scarce since they come from well- have an underlying assumption that data points are inde- logs. Unsupervised methods, on the other hand, do not pendent, which makes them fail in modeling sequentially require labeled data, and thus can be trained on all avail- dependent data such as videos, speech signals, and seismic able seismic data only. However, unsupervised learning traces. does not integrate well-logs (direct model measurements) Recurrent neural networks (RNN), on the other hand, in the learning. are a class of arti cial neural networks that are designed In this work, we propose a semi-supervised learning to capture temporal dynamics of sequential data. Un- work ow for seismic inversion that integrates both well- like feed-forward neural networks, RNNs have a hidden log data in addition to data mis t in learning the inverse state variable that can be passed between sequence sam- model. Formally, the loss function of the proposed semi- ples which allows them to capture long temporal depen- supervised work ow is written as dencies in sequential data. RNNs have been utilized to solve many problems in language modeling and natural L() :=  L () +  L () (6) 1 2 | {z } | {z } language processing (NLP)(Mikolov et al., 2010), speech property loss seismic loss and audio processing (Graves et al., 2013), video process- ing (Ma et al., 2017), petrophysical property estimation where ; 2 R are tuning parameters that govern the (Alfarraj and AlRegib, 2018), detection of natural earth- in uence of each of the property loss and seismic loss, quakes (Wiszniowski et al., 2014), and stacking velocity respectively. For example, if the input seismic data is estimation (Biswas et al., 2018). noisy, or well-log data is corrupted, the values of and A single layer feed-forward neural network produces an can be used to limit the role of the corrupted data in the output y which is a weighted sum of input features x i i learning process. The property loss is computed over the followed by an activation function (a non-linearity) like traces for which we have access to model measurements the sigmoid or hyperbolic tangent functions, i.e. y = (rock properties) form well-logs. The seismic loss , on the (Wx + b), where x and y are the input and output i i i other hand, is computed over all traces in the survey. th feature vectors of the i sample, respectively, () is the The goal of this work is to invert for elastic impedance activation function, W and b are the learnable weight (EI) using multi-o set data using semi-supervised sequence matrix and bias vector, respectively. The same equation modeling. Hence, (d) in the equation 6 is the set of all is applied to all data samples independently to produce multi-angle traces in the seismic survey, and m is the set corresponding outputs. of available EI traces from well-logs. Naturally, the size In addition to the ane transformation and the acti- of m is small compared to d. Furthermore, the vertical vation function, RNNs introduce a hidden state variable resolution of EI traces is ner than that of the seismic that is computed using the current input and the hidden traces . There are two common ways to model traces in a state variable from the previous step, learning paradigm. The rst method is to treat each data (t) (t) (t1) point in a well-log (in the z-direction) as an independent h =  W x + W h + b ; xh hh h i i i sample and try to invert for a given rock property from (7) (t) (t) the corresponding multi-angle seismic data sample. This y =  W h + b hy y i i method fails to capture the long-term temporal dynamics (t) (t) (t) of seismic traces; that is the dependency between a data where x , y and h , are the input, output, and i i i point at a given depth and the data points before it and state vectors at time step t, respectively, W's and b's are after it. An alternative approach is to treat each trace network weights and bias vectors respectively. For time (0) as a single training sample to incorporate the temporal t = 0, the hidden state variable is set to h = 0. Figure dependency. However, this approach severely limits the 1 shows a side-by-side comparison between a feed-forward amount of data from which the algorithm can learn since unit and a recurrent unit. 4 Alfarraj & AlRegib is given by the following equations, Feed-forward Network (t) (t) (t1) u = sigmoid W x + W y + b xu yu u i i i (t) (t) (t1) r = sigmoid W x + W y + b xr yr r i i i Neural (#) (#) (t) (t) (t) (t1) Network y ^ = tanh W x + b + r  W y + b xy ^ y ^ hy ^ y ^ i i 1 i i 2 (t) (t1) (t) (t) (t) y = (1 u ) y + u  y ^ i i (8) (t) (t) where u and r are the update-gate and reset-gate (#) (#) (#) i i Input y Output h State (t) vectors, respectively, y ^ is the candidate output for the current time step, W's and b's are the learnable parame- Delay ters, and the operator  is the element-wise product. Note (#) that the GRU introduces two new state variables, update- (#$%) gate u and reset-gate r, which control the ow of infor- mation from one time step to another, and thus they are Neural (#) Network (#) able to capture long-term dependency. The output of the (t) GRU at the current time step (y ) is a weighted sum of the candidate output for the current time step and the output from the previous step. Figure 2 shows a GRU net- Recurrent Network work unfolded through time. Note that all GRUs in an unfolded network share the same W and u parameters. Figure 1: An illustration of feed-forward and recurrent networks. (%) (%'() (%')) * * * # # # (%+() … … GRU GRU GRU (%) (%'() (%')) " " " When RNNs were proposed in the 1980s, they were dif- # # # cult to train because of the introduced dependency be- tween data samples that made the gradients more dicult Figure 2: Gated Recurrent Unit (GRU) unfolded through to compute. The problem was later solved using back- time. propagation through time (BPTT) algorithms (Werbos, 1990), which turns gradients into long products of terms using the chain rule. Theoretically, RNNs are supposed to METHODOLOGY learn long-term dependencies from their hidden state vari- able. However, even with BPTT, RNNs fail to learn long- Similar to all deep learning methods, RNNs require tremen- term dependencies mainly because the gradients tend to dous amounts of data to train. Given that well-log data vanish for long sequences when backpropagated through is limited in a given survey area, the number of training time. samples is limited. With such limitation, a combination of New RNN architectures have been proposed to over- regularization techniques must be used to train a learning- come the issue of vanishing gradients using gated units. based model properly and ensure it generalizes beyond Examples of such architectures are Long Short-Term Mem- the training dataset (Alfarraj and AlRegib, 2018). In ad- ory (LSTM) (Hochreiter and Schmidhuber, 1997) and the dition, the data shortage limits the number of the layers recently proposed Gated Recurrent Units (GRU) (Cho (and hence parameters) that can be used in learning-based et al., 2014). Such architectures have been shown to cap- models. Therefore, using deeper networks to capture the ture long-term dependency and perform well for various highly non-linear inverse mapping from seismic data to EI tasks such as machine translation and speech recognition. might not be feasible using supervised learning. In this work, we utilize GRUs in the proposed inversion In this work, we utilize a seismic forward model as an- work ow. other form of supervision in addition to well-log data. Al- GRUs supplement the simple RNN described above by though forward modeling is an essential block in classical incorporating reset-gate and update-gate variables which seismic inversion approaches, it has not been integrated are internal states that are used to evaluate the long-term into learning-based seismic inversion work ows. dependency and keep information from previous times Using a forward model in a learning-based inversion has only if they are needed. The forward step through a GRU two main advantages. First, it allows the incorporation of Update Semi-supervised Sequence Modeling for Elastic Impedance Inversion 5 geophysics into a machine learning paradigm to ensure where N is the total number of seismic traces in the that the outputs of the networks are obeying the physical survey, and N = jSj is the number of available well- laws. In addition, it allows the algorithm to learn from all logs. In seismic surveys, N  N , therefore, seismic loss p s traces in the seismic survey without explicitly providing is computed over many more traces than the property an EI trace (label) for each seismic trace. loss. On the other hand, property loss has access to direct model parameters (well-log data). These factors make the two losses work in collaboration to learn a stable and Proposed work ow accurate inverse model. The inverse model (F ) can, in principle, be trained us- It is important to note that the choice of the forward ing well-log data and their corresponding seismic data model is critical in the proposed work ow for two rea- only. However, as we have discussed earlier, a deep in- sons. First, the forward model must be able to synthesize verse model requires a large dataset of labeled data to at a speed comparable to the speed at which the inverse train properly which is not possible in a practical setting model processes data. Since deep learning models, in gen- where the number of well-logs is limited. Integrating a eral, are capable of processing large amounts of data in a forward model, as commonly done in classical inversion very short time with GPU technology, the forward model work ows, allows the inverse model to learn from the seis- must be fast. Second, the proposed inverse model, as all mic data without requiring their corresponding property other deep learning models, adjusts its parameters accord- traces, in addition to learning from the few available prop- ing to the gradients with respect to a de ned loss function, erty traces from well-logs. The overall proposed inversion therefore, the forward model must be di erentiable in or- work ow is shown in Figure 3. der to compute gradients with respect to the seismic loss. Therefore, in this work, we chose a convolutional forward model due to its simplicity and eciency to reduce com- Seismic Loss putation time. Other choices of the forward model are possible as long as they satisfy the two conditions stated above. Input Synthesized Inverse Model Estimated EI Forward Model Seismic Seismic Inverse Model Property Loss The proposed inverse model of the proposed work ow (shown in Figure 4) consists of four main submodules. Target EI These submodules are labeled as sequence modeling, local (from well logs) pattern analysis, upscaling, and regression. Each of the four submodules performs a di erent task in the overall Figure 3: The proposed semi-supervised inversion work- inverse model. ow. The sequence modeling submodule models temporal dy- namics of seismic traces and produces features that best The work ow in Figure 3 consists of two main modules: represent the low-frequency content of EI. The local pat- the inverse model (F ) with learnable parameters, and tern analysis submodule extracts local attributes from a forward model (F ) with no learnable parameters. The seismic traces that best model high-frequency trends of proposed work ow takes multi-angle seismic traces as in- EI trace. The upscaling submodule takes the sum of the puts, and outputs the best estimate of the corresponding features produced by the previous modules and upscales EI. Then, the forward model is used to synthesize multi- them vertically. This module is added based on the as- o set seismograms from the estimated EI. The error (data sumption that seismic data are sampled (vertically) at a mis t) is computed between the synthesized seismogram lower resolution than that of well-log data. Finally, the re- and the input seismic traces using the seismic loss. This gression submodule maps the upscaled outputs from fea- process is done for all traces in the survey. Furthermore, tures domain to target domain (i.e., EI). The details of property loss is computed between estimated and true EI each these submodule are discussed next. on traces for which we have a true EI from well-logs. The parameters of the inverse model are adjusted by combin- Sequence Modeling ing both losses as in equation 6 using a gradient-descent optimization. The sequence modeling submodule consists of a series of In this work, we chose the distance measure (D) as the bidirectional Gated Recurrent Units (GRU). Each bidi- Mean Squared Error (MSE) and = = 1. Hence, rectional GRU computes a state variable from future and equation 6 reduces to: past predictions and is equivalent to 2 GRUs where one X X models the trace from shallow to deeper layers, and the 1 1 y y 2 2 L() = km F (d )k + kd F (F (d ))k i i i i 2 2 other models the reverse trace. Assuming each multi-angle N N p s i i seismic traces have c channels (one channel for each in- m 2S (9) cident angle), the First GRU takes these c channels as Update 6 Alfarraj & AlRegib Sequence Modeling Upscaling DeconvBlock DeconvBlock Input GRU GRU GRU (kernel =+) (kernel=+) (in=2) , out=) ) (in=) , out=) ) (in=) , out=) ) (in=) , out=) ) (in=) , out=2) ) * * * * % * * * * * Seismic (stride=2 ) (stride=2 ) % * ConvBlock (kernel=+) ( in=) , out=) ) % * (dilation=, ) Output GRU Linear ConvBlock ConvBlock (in=) , out=2) ) (in=) , out=) ) (kernel=+) (kernel=+) * * * % EI (in=) , out=) ) (in=3) , out=) ) % * * * (dilation=, ) (dilation=1) ConvBlock Regression (kernel= +) (in=) , out=) ) % * (dilation=, ) Local Pattern Analysis Figure 4: The inverse model in the proposed work ow with generic hyperparameters. The hyperparameters are chosen based on the data. The number of input and output features are denoted by in and out, respectively Center sample input features and computes c temporal features based on the temporal variations of the processed traces. The Dilation = 1 next two GRUs perform a similar task on the outputs of their respective preceding GRU. The series of the three Dilation = 2 GRUs is equivalent to a 3-layer deep GRU. Deeper net- works are able to model complex input-output relation- Dilation = 3 ships that shallow networks might not capture. Moreover, deep GRUs generally produce smooth outputs. Hence, the Convolution kernel point output of the sequence modeling submodule is considered as the low-frequency trend of EI. Figure 5: An illustration of dilated convolution for multi- scale feature extraction (kernel size = 5, dilation factors Local pattern analysis dilation = 1; 2 and 3). Another submodule of the inverse model is the local pat- tern analysis submodule which consists of a set of 1-dimensional convolutional blocks with di erent dilation factors in par- mean and standard deviation. They have been shown to allel. The output features of each of the parallel con- reduce covariant shift in the learned features and speed volutional blocks are then combined using another con- up the learning. In addition, activation functions are one volutional block. Dilation refers to the spacing between of the building blocks of any neural network. They are convolution kernel points in the convolutional layers (Yu a source of non-linearity that allow the neural networks and Koltun, 2015). Multiple dilation factors of the kernel to approximate highly non-linear functions. In this work, extract multiscale features by incorporating information we chose the hyperbolic tangent function as the activation from trace samples that are direct neighbors to a refer- function. ence sample (i.e., the center sample), in addition to the Convolutional layers operate on small windows of the samples that are further from it. An illustration of dilated input trace due to their small kernel sizes. Therefore, convolution is shown in Figure 5 for a convolution kernel they capture high-frequency content. Since convolutional of size 5 and dilation factors dilation = 1; 2 and 3. layers do not have a state variable like recurrent layers to A convolutional block (ConvBlock ) consists of a convo- serve as a memory, they do poorly in estimating the low- lutional layer followed by group normalization (Wu and frequency content. The outputs of the local pattern anal- He, 2018) and an activation function. Group normaliza- ysis submodule are of very similar dimensions of those of tion scales divides the output of the convolutional layers the sequence modeling submodule. Hence, the outputs of into groups, and normalizes each group using a learned the two modules are added to obtain a full-band frequency Concatenation Semi-supervised Sequence Modeling for Elastic Impedance Inversion 7 content. 1 EI (t + t; ) EI (t; ) RC (t; ) = ; (10) 2 EI (t + t; ) + EI (t; ) upscaling where t is the time step, and EI (t; ) is the elastic impedance at time t and incident angle . EI in equation Seismic data is sampled at a lower rate than that of well- 10 refers to the normalized elastic impedance proposed logs data. The role of the upscaling submodule is to com- by Whitcombe (2002) which is computed from the elastic pensate for this resolution mismatch. This submodule properties as follows: consists of two Deconvolutional Blocks with di erent ker- nel stride. The stride controls the factor by which the a b c V (t) V (t) (t) p s inputs are upscaled. A stride of (s = 2) deconvolutional EI (t; ) = V  ; (11) p 0 V V p s 0 block produces an output that has twice the number of 0 0 the input samples (vertically). Deconvolutional layers (also known as transposed con- 2 2 a = 1 + tan c = 1 4K sin volutional or fractionally-strided convolutional layers) are where, b = 4K sin  K = s=V upscaling modules with learnable kernel parameters un- like classical interpolation methods with xed kernel pa- rameters (e.g., linear interpolation). They learn kernel and V ,V and  are P-wave velocity, S-wave velocity, and p s parameters from both feature and local spatial/temporal density, respectively, and V ,V and  are their respec- p s 0 0 0 domain. They have been used for various applications tive averages over the training sample (i.e., well-logs). It is like semantic segmentation and seismic structure labeling worth noting that the Aki-Richard's approximation of the (Noh et al., 2015; Alaudah et al., 2018). elastic impedance is only valid for incident angles that are Deconvolutional blocks in Figure 4 have a similar struc- less than 35 . Thus, in the case study, we only consider ture as the convolutional blocks introduced earlier. They valid angles for this approximation. are a series of deconvolutional layer followed by a group The seismograms are then generated by convolving RC (t; ) normalization module and an activation function. with a wavelet w(t), i.e., S(t; ) = w(t) RC (t; ); (12) Regression where  is the linear convolution operator, and S(t; ) is The nal submodule in the inverse model is the regres- the synthesized seismogram. Thus, the forward model uti- sion submodule which consists of a GRU followed by a lized in this work is a 1-dimensional convolutional forward linear mapping layer (fully-connected layer). Its role is to model that synthesized seismograms from EI traces. regress the extracted features from the other modules to the target domain (EI domain). The GRU in this mod- CASE STUDY ON MARMOUSI 2 MODEL ule is a simple 1-layer GRU that augments the upscaled outputs using global temporal features. Finally, a linear In order to validate the proposed algorithm, we chose Mar- ane transformation layer (fully-connected layer) takes mousi 2 model as a case study. Marmousi 2 model is an the output features from the GRU and maps them to the elastic extension of the original Marmousi synthetic model same number of features in the target domain, which is, that has been used for numerous studies in geophysics for in this case, the number of incident angles in the target various applications including seismic inversion, seismic EI trace. modeling, seismic imaging, and AVO analysis. The model spans 17 km in width and 3.5 km in depth with a verti- cal resolution of 1.25 m. The details of the model can be Forward Model found in Martin et al. (2006). Forward modeling is the process of synthesizing seismo- In this work, we used the elastic model (converted to grams from elastic properties of the earth (i.e., P-wave time) to generate EI and seismic data to train the model. velocity, S-wave velocity, and density) or from a function The details of the dataset generation are discussed in the of the elastic properties such as EI. In this work, and since next section. The proposed work ow is trained using all we are inverting for EI, we used a forward model that takes seismic traces in Marmousi 2 in addition to a few EI traces EI as an input, and outputs a corresponding seismogram. (training traces). The work ow is then used to invert for EI was proposed by Connolly (1999) and later normalized EI on the entire Marmousi. Since the full elastic model is by Whitcombe (2002). It is based on the Aki-Richards available, we compare our EI inversion with the true EI approximation for Zoeppritz equations (Aki and Richards, quantitatively. 1980). The Aki-Richards approximation incorporates am- plitude variations with o set/angle (AVO/AVA) based on Dataset Generation the changes of elastic properties and the incident angle. The forward model adopted in this work uses Connolly's We used the elastic model of Marmousi 2 to generate EI formulation to compute the re ection coecients from EI for 4 incident angles  = 0 ,10 ,20 , and 30 . Thus, in for di erent incident angles  as follows: Figure 4, c = 4 which represents the number of channels 1 8 Alfarraj & AlRegib di erent values and testing the performance on a valida- in the input traces. Multi-angle seismic data (a total of tion dataset using cross-validation. In this case study, we N = 2720 traces) is then generated from EI using the forward model with Ormsby wavelet (5-10-60-80 Hz) fol- choose c = 8, k = 5, d = 1, d = 3, and d = 6. 2 1 2 3 lowing the synthesis procedure in (Martin et al., 2006). The seismic traces are then downsampled (in time) by a Results and Discussion factor of 6 to simulate resolution di erence between seis- Figure 7 shows estimated EI and true EI for the entire sec- mic and well-log data. Finally, a 15 db white Gaussian tion for all four incident angles. Figure 8 shows the abso- noise is added to assess the robustness of the proposed lute di erence between the true and estimated EI. The g- work ow to noise. ures indicate that the proposed work ow estimates EI ac- To train the proposed inversion work ow, we chose 10 curately for most parts of the section with a visible lateral evenly-spaced traces for training (N = 10) as shown in jitter. Jitter e ect is expected since the proposed work- Figure 6. We assume we have access to both EI and seis- ow is based on 1-dimensional sequence modeling with no mic data for the training traces. For all other traces in the explicit spatial constraints as often done in classical in- survey, we assume we have access to seismic data only. Al- version methods. Furthermore, the noise in the seismic though using more EI traces for training would improve data reduces the similarity between neighboring seismic the inversion, in this work we use only 10 EI traces for traces, which can also cause such jitter. The shown sec- training to simulate well-log data in a practical setting. tions are the direct output of the inversion work ow with no post-processing which can reduce such artifacts. Figure 9 shows two selected traces from the section that were not in the training dataset (x = 3200 m and x = 8500 m). The trace at x = 3200 passes through an anomaly (Gas-charged sand channel) represented by an isolated and sudden transition in EI at 1:25 seconds. This anomaly causes the inverse model to incorrectly estimate EI. Since our work ow is based on bidirectional sequence Figure 6: The training EI traces overlaid on the zero-o set modeling, we expect the error to propagate to nearby sam- seismic section. ples in both directions. However, the algorithm quickly recovers a good estimate for deeper and shallower sam- ples of the trace. This quick recovery is mainly due to the Training The Inverse Model reset-gate variable in the GRU that limits the propagation First, the inverse model (neural network) is initialized of such errors in sequential data estimation. Furthermore, with random parameters. Then, randomly chosen seismic the trace at x = 8500 passes through most layers in the traces in addition to the seismic traces for which we have section which makes it a challenging trace to invert. The EI traces in the training dataset are inputted to the inverse estimated EI at x = 8500 follows the trends in true EI model to get a corresponding set of EI traces. The forward trace, including the thin layers. model is then used to synthesize seismograms from the es- Furthermore, we show scatter plots of the estimated and timated EI. Seismic loss is computed as the MSE between true EI for all incident angles in Figure 10. The shaded the synthesized seismic and the input seismic. Moreover, regions in the gure include all points that are within one property loss is computed as the MSE between the pre- standard deviation of the true EI ( ). The scatter plots EI dicted EI and the true EI trace on the training traces only. show a linear correlation between the estimated and true The total loss is computed as the sum of the two losses. EI with the majority of the estimated samples within EI Then, the gradients of the total loss are computed, and the from the true EI. parameters of the inverse model are updated accordingly. To evaluate the performance of the proposed work ow The process is repeated until convergence. quantitatively, we use two metrics that are commonly used Figure 4 shows the inverse model with no given hyper- for regression analysis. Namely, Pearson correlation coef- parameters to ensure its generalizability for data other cient (PCC), and coecient of determination (r ). PCC than the data used in this case study. These hyperpa- is de ned as: rameters are fc ; c ; s ; s ; d ; d ,kg which are parameters 1 2 1 2 1 2 T1 of the inversion model that need to be set rather than 1 1 PCC = [x(t)  ] [x ^(t)  ] ; (13) x x ^ learned. Some of these hyperparameters are completely x x ^ t=0 dictated by the data. For example, c which is the num- ber of inputs channels must be chosen as the number of where x and x ^ are the target trace and the estimated incident angles of the data. Also, s ; s must be chosen trace, respectively, and  and  are mean and standard 1 2 such that s  s is equal to the resolution mismatch fac- deviation, respectively. PCC is a measure of the linear 1 2 tor between seismic and EI data. In this case study, we correlation between the estimated and target traces. It set c = 4, s = 3, and s = 2. The rest of the hyper- is commonly used to measure the overall t between the 1 1 2 parameters are chosen by analyzing the data and trying two traces with no regard to the individual values. On Semi-supervised Sequence Modeling for Elastic Impedance Inversion 9 (a) Estimated EI ( = 0 ) (b) True EI ( = 0 ) (c) Estimated EI ( = 10 ) (d) True EI ( = 10 ) (e) Estimated EI ( = 20 ) (f) True EI ( = 20 ) (g) Estimated EI ( = 30 ) (h) True EI ( = 30 ) Figure 7: Estimated EI and true EI for Marmousi 2 model. spectively. > 0, > 0, > 0 are constants chosen the other hand, r is a goodness-of- t measure, where it to adjust the in uence of the each of three terms. From takes into account the mean squared error between the two SSIM, a single similarity value, denoted as M-SSIM, can traces. The coecient of determination (r ) is de ned as: be computed by taking the mean of all SSIM scores for all T1 2 [x(t) x ^(t)] local windows. 2 i=0 r = 1 (14) T1 2 The quantitative results are summarized in Table 1. It [x(t)  ] i=0 is evident that the estimated EI captures the overall trend In addition, we used Structural Similarity (SSIM) (Wang of true EI (average PCC  98%, average r  94%). The et al., 2004) to assess the quality of the estimated EI sec- average r score is lower than PCC since it is more sensi- tion from an image point of view. SSIM evaluates the tive to the residual sum of squares rather than the overall similarity of two images from local statistics (on local win- trend. Note that PCC and r compute scores over indi- dows) using the following equation: vidual traces, then the nal score (for each incident angle) is computed by averaging across all traces. On the other SSIM(X; X) = [l(x; x^)]  [c(x; x^)]  [s(x; x^)] ; (15) hand, M-SSIM is computed over the 2D section (image) where x and x^ are patches from estimated and target which takes both lateral and vertical variations into ac- image, respectively. l(x; y), c(x; x^), and s(x; x^) are lu- count. minance, contrast and structure comparison functions re- 10 Alfarraj & AlRegib (a)  = 0 (b)  = 10 (c)  = 20 (d)  = 30 Figure 8: Absolute di erence between True EI and estimate EI for all incident angles. (a) x = 3300 m (b) x = 8500 m Figure 9: Selected EI trace. Estimate EI is shown in red, and true EI is shown in black. Furthermore, we quantify the contribution of each of without integrating the well-logs. Thus, the results are the the two terms in the loss function in equation (6) by com- worst out of three schemes. Note that the performance of paring the inversion results for di erent values of and the unsupervised learning scheme could be improved by . The average results are reported in Table 2. using an initial smooth model and by enforcing some con- When = 0 and = 1 (unsupervised learning), the straints on the inversion as often doe in classical inversion. inverse model is learned completely from the seismic data Moreover, when = 1 and = 0 (supervised learning), Semi-supervised Sequence Modeling for Elastic Impedance Inversion 11 ues, with a wider distribution than that of PCC. This is mainly due to the fact that PCC is de ned to be in the 2 2 range [0; 1] and r is in the range (1; 1]. In addition, r is a more strict metric than PCC as it factors in the MSE between the estimated EI and true EI. Figure 11(c) shows the distribution of local SSIM scores over the entire sec- tion, with the majority of the scores in the range [0:8; 1] indicating that the estimated EI is structurally similar to the true EI from an image point of view. (a)  = 0 (b)  = 10 (a) Distribution of PCC values (c)  = 20 (d)  = 30 Figure 10: Scatter plots of the estimated and true EI for di erent values of . The shaded regions include all points that are within  of the true EI. EI Table 1: Quantitative evaluation of the proposed work ow on Marmousi 2. Metric 2 (b) Distribution of r values PCC r M-SSIM Angle = 0 0.98 0.95 0.92 = 10 0.98 0.95 0.92 = 20 0.98 0.95 0.92 = 30 0.98 0.92 0.92 Average 0.98 0.94 0.92 the inversion work ow learns from only 10 seismic traces and their corresponding EI traces from well-logs. Thus, it is expected that it results in better inversion compared (c) Distribution of SSIM values to the unsupervised scheme. However, training a deep inverse model in a supervised learning scheme requires Figure 11: The distribution of Pearson correlation coef- heavy regularization and careful selection of the train- cient (PCC), and coecient of determination (r ), and ing parameters. In addition, the learned inverse model SSIM values on Marmousi 2. might not generalize well beyond training data. Finally, when = = 1 (semi-supervised learning), the inverse model is learned from all seismic data in addition to 10 EI Implementation traces from well-logs. Hence, the semi-supervised learn- ing scheme improves the performance and regularizes the The proposed work ow was implemented in Python us- learning. ing PyTorch deep learning library (Paszke et al., 2017). Figure 11(a) shows the distribution of PCC of the es- For optimization, we used Adam (Kingma and Ba, 2014) timated EI with respect to the true EI for all traces in which is a gradient-based stochastic optimization tech- Marmousi 2 model. The gure shows that the estimated nique with adaptive learning rate that was designed specif- EI correlates very well with the true EI, with the major- ically for training deep neural networks. The codes were ity of PCC values between [0:9 1], and a spike near 1. run on a PC with Intel i7 Quad-Core CPU, and a single Similarity, Figure 11(b) shows the distribution of r val- Nvidia GeForce GTX 1080 Ti GPU. The run time of 500 12 Alfarraj & AlRegib Table 2: Quantitative evaluation of the sensitivity of the loss function with respect to its two terms. Unsupervised: learning from all seismic traces in the section, Supervised: learning from 10 seismic traces and their corresponding EI traces, Semi-supervised: learning from all seismic data in the section in addition to 10 seismic traces and their corresponding EI traces Metric PCC r M-SSIM Training Scheme Unsupervised ( = 0; = 1) 0.33 -0.45 0.77 Supervised ( = 1; = 0) 0.96 0.88 0.87 Semi-supervised ( = 1; = 1) 0.98 0.94 0.92 iterations of the GPU-accelerated algorithm was 2 min- lation of 98%. Furthermore, the applications of the pro- utes. Figure 12 shows the convergence behavior of the posed work ow are not limited to EI inversion; it can be proposed work ow over 500 training iterations. The y- easily extended to perform full elastic inversion as well as axis shows the total loss over the training dataset which property estimation for reservoir characterization. is computed as in equation 9 except for an additional normalization factor that is equal to the number of time samples per trace. Note that the loss is also computed ACKNOWLEDGMENT over normalized traces after subtracting the mean value This work is supported by the Center for Energy and and dividing by the standard deviation to ensure faster Geo Processing (CeGP) at Georgia Institute of Technol- convergence of the inversion work ow as often done for ogy and King Fahd University of Petroleum and Minerals deep learning models. The proposed work ow converges (KFUPM). in about 300 iterations. However, the loss continues to de- crease slightly. The training was terminated at after 500 iterations to avoid over- tting to the training dataset. REFERENCES Adler, J., and O. Oktem, 2017, Solving ill-posed inverse problems using iterative deep neural networks: Inverse Problems, 33, 124007. Aki, K., and P. Richards, 1980, Quantitative seismology, vol. 2. Al-Anazi, A., and I. Gates, 2012, Support vector regres- sion to predict porosity and permeability: e ect of sam- ple size: Computers & Geosciences, 39, 64{76. Alaudah, Y., S. Gao, and G. AlRegib, 2018, Learning to label seismic structures with deconvolution networks and weak labels, in SEG Technical Program Expanded Abstracts 2018: Society of Exploration Geophysicists, Figure 12: Training learning curve showing the loss func- 2121{2125. tion value over 500 iterations. Alfarraj, M., and G. AlRegib, 2018, Petrophysical prop- erty estimation from seismic data using recurrent neu- The eciency of the proposed work ow is due to the ral networks, in SEG Technical Program Expanded use of 1-dimensional modeling. In addition, Marmousi 2 Abstracts 2018: Society of Exploration Geophysicists, is a relatively small model with 2720 traces only. However, 2141{2146. the computation time of the proposed work ow will scale AlRegib, G., M. Deriche, Z. Long, H. Di, Z. Wang, Y. linearly with the number of traces in the dataset. Alaudah, M. A. Sha q, and M. Alfarraj, 2018, Subsur- The code used to reproduce the results reported in this face structure analysis using computational interpreta- manuscript is publicly available on GitHub [link to code]. tion and learning: A visual signal processing perspec- tive: IEEE Signal Processing Magazine, 35, 82{98. CONCLUSIONS Araya-Polo, M., J. Jennings, A. Adler, and T. Dahlke, In this work, we proposed an innovative semi-supervised 2018, Deep-learning tomography: The Leading Edge, machine learning work ow for elastic impedance inversion 37, 58{66. from multi-angle seismic data using recurrent neural net- Azevedo, L., and A. Soares, 2017, Geostatistical methods works. The proposed work ow was validated on the Mar- for reservoir geophysics: Springer. mousi 2 model. Although the training was carried out Biswas, R., A. Vassiliou, R. Stromberg, and M. K. Sen, on a small number of EI traces for training, the proposed 2018, Stacking velocity estimation using recurrent neu- work ow was able to estimate EI with an average corre- ral network, in SEG Technical Program Expanded Ab- Semi-supervised Sequence Modeling for Elastic Impedance Inversion 13 stracts 2018: Society of Exploration Geophysicists, Kingma, D. P., and J. Ba, 2014, Adam: A 2241{2245. method for stochastic optimization: arXiv preprint Bosch, M., T. Mukerji, and E. F. Gonzalez, 2010, Seismic arXiv:1412.6980. inversion for reservoir properties combining statistical Lucas, A., M. Iliadis, R. Molina, and A. K. Katsaggelos, rock physics and geostatistics: A review: Geophysics, 2018, Using deep neural networks for inverse problems 75, 75A165{75A176. in imaging: beyond analytical methods: IEEE Signal Buland, A., and H. Omre, 2003, Bayesian linearized avo Processing Magazine, 35, 20{36. inversion: Geophysics, 68, 185{198. Ma, C.-Y., M.-H. Chen, Z. Kira, and G. AlRegib, 2017, Chaki, S., A. Routray, and W. K. Mohanty, 2015, A novel TS-LSTM and temporal-inception: Exploiting spa- preprocessing scheme to improve the prediction of sand tiotemporal dynamics for activity recognition: arXiv fraction from seismic attributes using neural networks: preprint arXiv:1703.10667. IEEE Journal of Selected Topics in Applied Earth Ob- Martin, G. S., R. Wiley, and K. J. Marfurt, 2006, Mar- servations and Remote Sensing, 8, 1808{1820. mousi2: An elastic upgrade for marmousi: The Leading ||{, 2017, A di usion lter based scheme to denoise Edge, 25, 156{166. seismic attributes and improve predicted porosity vol- Mikolov, T., M. Kara  at, L. Burget, J. Cernocky,  and S. ume: IEEE Journal of Selected Topics in Applied Earth Khudanpur, 2010, Recurrent neural network based lan- Observations and Remote Sensing, 10, 5265{5274. guage model: Presented at the Eleventh Annual Con- ||{, 2018, Well-log and seismic data integration for ference of the International Speech Communication As- reservoir characterization: A signal processing and sociation. machine-learning perspective: IEEE Signal Processing Mosser, L., W. Kimman, J. Dramsch, S. Purves, A. De la Magazine, 35, 72{81. Fuente Briceno, ~ and G. Ganssle, 2018, Rapid seismic Cho, K., B. Van Merri enboer, D. Bahdanau, and Y. Ben- domain transfer: Seismic velocity inversion and model- gio, 2014, On the properties of neural machine trans- ing using deep generative neural networks: Presented lation: Encoder-decoder approaches: arXiv preprint at the 80th EAGE Conference and Exhibition 2018. arXiv:1409.1259. Natarajan, N., I. S. Dhillon, P. K. Ravikumar, and A. Connolly, P., 1999, Elastic impedance: The Leading Edge, Tewari, 2013, Learning with noisy labels: Advances in 18, 438{452. neural information processing systems, 1196{1204. Das, V., A. Pollack, U. Wollner, and T. Mukerji, 2018, Noh, H., S. Hong, and B. Han, 2015, Learning deconvolu- Convolutional neural network for seismic impedance tion network for semantic segmentation: Proceedings of inversion, in SEG Technical Program Expanded Ab- the IEEE international conference on computer vision, stracts 2018: Society of Exploration Geophysicists, 1520{1528. 2071{2075. Paszke, A., S. Gross, S. Chintala, G. Chanan, E. Yang, Z. Doyen, P., 2007, Seismic reservoir characterization: DeVito, Z. Lin, A. Desmaison, L. Antiga, and A. Lerer, An earth modelling perspective: EAGE publications 2017, Automatic di erentiation in pytorch: Presented Houten, 2. at the NIPS-W. Doyen, P. M., 1988, Porosity from seismic data: A geo- R oth, G., and A. Tarantola, 1994, Neural networks and statistical approach: Geophysics, 53, 1263{1275. inversion of seismic data: Journal of Geophysical Re- Duijndam, A., 1988a, Bayesian estimation in seismic in- search: Solid Earth, 99, 6753{6768. version. part i: Principles: Geophysical Prospecting, Tarantola, A., 2005, Inverse problem theory and methods 36, 878{898. for model parameter estimation: siam, 89. ||{, 1988b, Bayesian estimation in seismic inversion. Ulrych, T. J., M. D. Sacchi, and A. Woodbury, 2001, A part ii: Uncertainty analysis: Geophysical Prospecting, bayes tour of inversion: A tutorial: Geophysics, 66, 36, 899{918. 55{69. Gholami, A., 2015, Nonlinear multichannel impedance in- Wang, Z., A. C. Bovik, H. R. Sheikh, E. P. Simoncelli, version by total-variation regularization: Geophysics, et al., 2004, Image quality assessment: from error vis- 80, R217{R224. ibility to structural similarity: IEEE transactions on Gholami, A., and H. R. Ansari, 2017, Estimation of poros- image processing, 13, 600{612. ity from seismic attributes using a committee model Werbos, P. J., 1990, Backpropagation through time: what with bat-inspired optimization algorithm: Journal of it does and how to do it: Proceedings of the IEEE, 78, Petroleum Science and Engineering, 152, 238{249. 1550{1560. Graves, A., A.-r. Mohamed, and G. Hinton, 2013, Whitcombe, D. N., 2002, Elastic impedance normaliza- Speech recognition with deep recurrent neural networks: tion: Geophysics, 67, 60{62. Acoustics, speech and signal processing (ICASSP), 2013 Wiszniowski, J., B. M. Plesiewicz, and J. Trojanowski, IEEE international conference on, IEEE, 6645{6649. 2014, Application of real time recurrent neural network Hochreiter, S., and J. Schmidhuber, 1997, LSTM can solve for detection of small natural earthquakes in poland: hard long time lag problems: Advances in neural infor- Acta Geophysica, 62, 469{485. mation processing systems, 473{479. Wu, Y., and K. He, 2018, Group normalization: Proceed- 14 Alfarraj & AlRegib ings of the European Conference on Computer Vision (ECCV), 3{19. Yu, F., and V. Koltun, 2015, Multi-scale context ag- gregation by dilated convolutions: arXiv preprint arXiv:1511.07122. Yuan, S., and S. Wang, 2013, Spectral sparse Bayesian learning re ectivity inversion: Geophysical Prospect- ing, 61, 735{746.

Journal

Electrical Engineering and Systems SciencearXiv (Cornell University)

Published: Aug 19, 2019

There are no references for this article.