Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

A Machine Learning–Based Approach to Time-series Wave Identification in the Solar Wind

A Machine Learning–Based Approach to Time-series Wave Identification in the Solar Wind The Wind spacecraft has yielded several decades of high-resolution magnetic field data, a large fraction of which displays small-scale structures. In particular, the solar wind is full of wavelike fluctuations that appear in both the field magnitude and its components. The nature of these fluctuations can be tied to the properties of other structures in the solar wind, such as shocks, that have implications for the time evolution of the solar wind. As such, having a large collection of wave events would facilitate further study of the effects that these fluctuations have on solar wind evolution. Given the large volume of magnetic field data available, machine learning is the most practical approach to classifying the myriad small-scale structures observed. To this end, a subset of Wind data is labeled and used as a training set for a multibranch 1D convolutional neural network aimed at classifying circularly polarized wave modes. Using this algorithm, a preliminary statistical study of 1 yr of data is performed, yielding about 300,000 wave intervals out of about 5,000,000 solar wind intervals. The wave intervals come about more often in the fast solar wind and at higher temperatures, and the number of waves per day is highly periodic. This machine learning–based approach to wave detection has the potential to be a powerful, inexpensive way to catalog waves throughout decades of spacecraft data. Unified Astronomy Thesaurus concepts: Solar wind (1534); Interplanetary physics (827); Space weather (2037); Space plasmas (1544); Neural networks (1933); Convolutional neural networks (1938); Alfven waves (23) 1. Introduction we focus on these types of modes for this study. One such mode is the magnetosonic-whistler (MSW) mode. The MSWs The solar wind is a turbulent supersonic plasma streaming are an extension of the fast magnetosonic wave above the ion away from the Sun that has a major impact on the near-space cyclotron frequency (Marsch & Chang 1983). They are right- region of Earth. Understanding the important dynamics and hand polarized with respect to the quasi-static magnetic field, morphology of the solar wind, especially the dissipation of and the plasma density oscillates in phase with the magnitude fluctuations/discontinuities and subsequent heating, is a major of the magnetic field. They have been seen at many locations in unsolved problem (e.g., Viall & Borovsky 2020). Many of the solar wind from 0.1 to 1 au in data from many satellites, these fluctuations have been classified in terms of their such as the Parker Solar Probe (Jagarlamudi et al. 2021; Mozer wavelike properties. The exact role that waves play in et al. 2021), ARTEMIS (Stansby et al. 2016; Tong et al. 2019), influencing solar wind evolution is an active area of research, Wind (Wilson et al. 2017), and Cluster (Lacombe et al. 2014), with solar wind studies examining a broad class of waves among numerous others. The MSWs can be generated by a including but not limited to Alfvén and kinetic Alfvén (Unti & range of instabilities (Tong et al. 2019; Jagarlamudi et al. Neugebauer 1968; Hasegawa 1976), magnetosonic whistler 2021). They are also generated near shocks and are associated (Wilson et al. 2012; Verniero et al. 2020), and ion cyclotron with dissipation in collisionless shocks (Wilson et al. 2012). (Jian et al. 2009; Wicks et al. 2016). The large amount of Based on spectral analysis of ARTEMIS data, MSWs seem to spacecraft data available from in situ sources in the solar wind be present in the solar wind ∼10% of the time at 1 au (Tong present an excellent opportunity to investigate the nature of et al. 2019), though they seem to be less common near the Sun wave modes and the roles that they play. One such data source based on Parker Solar Probe measurements (Jagarlamudi et al. 2021; Cattell et al. 2022). is the Wind spacecraft, which has taken continuous measure- Besides MSW waves, other circularly polarized wave modes ments of solar wind properties for 28 yr, mostly at 1 au (e.g., near the ion cyclotron frequency are generally identified in the Wilson et al. 2021). High-resolution magnetic field data from solar wind. Circularly polarized Alfvén/ion cyclotron waves the Magnetic Field Investigation (MFI; Lepping et al. 1995) are present in many parts of the solar wind (Jian et al. 2009; contain a large number of wave modes that originate from Wicks et al. 2016). Alfvén wave frequencies tend to be below different physical processes. the ion cyclotron frequency, lying in the frequency range of The cadence of the Wind MFI instrumentation allows it to −4 −2 ∼10 –10 Hz in the plasma frame. Alfvén waves tend to be measure modes with spacecraft-frame frequencies up to mainly antisunward-propagating (Bruno & Carbone 2013) and ∼5.5 Hz. In this range of frequencies, circularly or elliptically may heat protons (Telloni et al. 2019). Ion cyclotron waves polarized modes are frequently observed in the solar wind, and have been observed in solar wind data (Jian et al. 2009), and the ion cyclotron resonance is thought to provide a major Original content from this work may be used under the terms source of energy dissipation (Telloni et al. 2019). of the Creative Commons Attribution 4.0 licence. Any further Given the long duration of the Wind mission, the Wind data distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI. set represents an unprecedented opportunity to study circularly 1 The Astrophysical Journal, 949:40 (13pp), 2023 June 1 Fordin et al. structures with a high degree of variance in their size and duration. One candidate for the problem of time-series wave identification is the 1D CNN (1DCNN). Artificial neural networks pass and feed information forward using sequential layers of neurons, and 1DCNNs include one or more 1D convolutional layers in their layout. The 1D convolutional layers within a neural network are given a kernel size, which defines the length of the 1D convolution operation performed by that layer upon a given input data series (LeCun 2015). When these 1D convolutional layers are placed before feed- forward layers in a neural network, the convolutional layers effectively perform a smoothing operation on the input data series and create additional features that are passed to the feed- forward layers. In this way, convolutional layers perform feature extraction by learning their own structures corresp- onding to a designated kernel size (Kiranyaz et al. 2021). In space physics, CNNs have been used to process solar wind data for space weather forecasting models that predict the SYM-H geomagnetic index (Siciliano et al. 2021) and the rate of change of the ground magnetic field (Smith et al. 2021; Pinto et al. 2022). In other fields, particularly biology, CNNs have been used in various contexts. For example, they have been used to classify heartbeats from ECG data (Acharya et al. 2017; Wu et al. 2020), automatically diagnose brain wave activity from EEG signals (Acharya et al. 2018), and diagnose structural damage in several different civil, mechanical, and Figure 1. (a) 12 hr interval of Wind magnetic field data from 2004 April 28. aerospace systems for structural health monitoring (Abdeljaber The spacecraft is in the solar wind above the ecliptic plane. (b) 10 minute subinterval split into 66 point (∼6s) time windows manually classified using et al. 2017). The types of signals analyzed in these examples minimum variance techniques. Dark green time windows correspond to share many similarities with Wind magnetic field data in that coherent wave time windows, light green time windows correspond to complex they are noisy time series with periodic features of interest that wave time windows, and gray time windows correspond to nonwave time are of varying length. windows. Given the complexity of using these techniques to identify solar wind fluctuations, we focus first on identifying circularly polarized wave modes at 1 au across multiple solar cycles. To polarized wave modes in a single high-pass filter (referred to examine these wave modes, statistical analysis could be used to here as coherent wave modes). We utilize a machine-learning examine the general properties of the solar wind, but given the approach based on CNNs to identify waves in time-series data breadth of the data, traditional methods of examining these from Wind magnetic field data. To this end, a subset of current wave modes are overly expensive and require manual Wind data is labeled and used as a training set for a machine- validation. For example, frequency filtering together with learning algorithm aimed at classifying small-scale structures. minimum variance analysis (MVA) is an effective approach for Eventually, the goal is to generalize to wave modes that identifying circularly polarized wave modes, but it requires originate in intervals that require narrower bandpass filter manual and visual analysis and is time-expensive (e.g., Wilson ranges to be resolved (referred to here as complex wave et al. 2009). Similarly, spectral analysis using wavelet trans- modes). forms is robust and effective but time-consuming and not a realistic approach to apply to several decades of data (e.g., Wilson et al. 2010). Machine learning is the most practical 2. Building the Training Set approach to classify the myriad small-scale structures observed 2.1. Collecting Real Data Events in the magnetic field data. With that being said, the choice of machine-learning An example solar wind interval is given in Figure 1(a),in algorithm to use in this context is not an obvious one. which 12 hr of magnetic field data are shown. In Figure 1(b),a Numerous machine-learning algorithms exist and have differ- 10 minute zoom of the magnetic field from the Wind MFI is ent applications. In space physics alone, various machine- shown. The magnetic field in the 10 minute zoom is primarily learning approaches have been used to analyze diverse in the xy plane in geocentric solar ecliptic coordinates phenomena. For example, 2D convolutional neural networks (GSE),with a B smaller than the other two components. At (CNNs) have been used to identify flux rope signatures in the the beginning of the interval, the magnetic field undergoes a solar corona (dos Santos et al. 2020), generative adversarial sharp rotation with a small increase in the magnetic field networks have been used to examine the coronal magnetic field magnitude. After the rotation, there are periods of high- (Jeong et al. 2020), and self-organizing maps have been used to frequency oscillations, eventually followed by a modest classify magnetic field spectra (Vech & Malaspina 2021). For reduction in the field magnitude. The high-frequency oscilla- wave classification, the range of wave periods and highly tions during this interval are typically characterized by varying wave durations poses a challenge, as not all machine- comparable fluctuations in all magnetic field components with learning algorithms generalize well to classifying time-series the magnitude of B relatively constant. 2 The Astrophysical Journal, 949:40 (13pp), 2023 June 1 Fordin et al. As a first step toward using machine learning to classify For the complex wave in panel (b) of Figure 2, there appears waves, we manually divide the data into equal-length time to be a potential wave packet, but the high-pass-filtered signal is noisy in panel (e). The MVA eigenvalue ratios for this case intervals, classifying each interval as having “coherent” waves, arel l » 1.4,l l » 6.8, with the MVA compo- “complex” waves, or no waves (“nonwave”). Note that for this max mid mid min nents plotted in panel (h). In order to isolate the wave modes in study, we are limiting the search to circularly polarized waves, this window, a bandpass filter covering a narrower frequency with elliptical and planar polarizations characterized as range is required. For this case, the passband 0.6–1Hz is nonwave. In addition, the study is limited to waves with chosen, and the bandpass-filtered components are plotted in amplitudes larger than 0.1 nT. panel (j). The time interval between the pink bounds has MVA To begin the analysis, a high-pass filter is performed on eigenvalue ratios l l » 1.8, l l » 71.6. The ∼10–20 hr of Wind magnetic field data with a frequency cutoff max mid mid min hodogram in panel (l) also displays an ellipse with radii that of 0.2 Hz. In most cases, a wavelet transform of the data is are close enough to satisfy our cutoff for circular polarization, examined by eye for intervals with significant fluctuation which corresponds to a complex wave interval. energy. In a smaller number of cases, the subintervals were For the nonwave interval in panel (c) of Figure 2,no chosen randomly from solar wind data to eliminate bias. circularly polarized wave modes are present for any choice of Whichever of the two methods is used, the chosen periods are bandpass filter. The high-pass-filtered components plotted in then divided into intervals with 66 time values, which panel (f) and the corresponding MVA components in panel (i) represents about 6 s of spacecraft data. do not show any circular polarization. We choose the same An MVA is performed on each of these 6 s intervals passband as for the complex wave. The bandpass-filtered (Khrabrov & Sonnerup 1998). Multiple MVAs are performed components are plotted in panel (k), and we evaluate the on various subintervals chosen by hand that span at least one eigenvalue ratios associated with the MVA components plotted period of fluctuations in the fields. Circularly polarized waves in panel (m).We findl l » 10.4,l l » 19.1 for max mid mid min are characterized by having the two largest MVA eigenvalues the whole interval. If we take a shorter section of data, such as similar in value and much larger than the minimum MVA the first third of the window, we find l l » 4.7, max mid eigenvalue. Coherent events are those that exhibit the property l l » 69.4, which does not meet our definition for a mid min that the two largest eigenvalues are within a factor of 3 of each circularly polarized wave interval. Furthermore, the hodogram other and at least 10 times greater than the minimum in panel (p) does not display any notable wave polarization. eigenvalue. Ultimately, we will use machine-learning algo- By analyzing several thousand ∼6 s solar wind intervals rithms to find these coherent wave events in the solar wind. using this time-intensive manual selection process on data Intervals that do not exhibit coherent waves are further spread over the date range 1995–2018, 1274 coherent, 715 analyzed to determine if more complex wavelike structure is complex, and 682 nonwave examples were found. Figure 3 present. First, multiple bandpass filters are applied to the contains histograms of total counts by year. intervals. For each resultant bandpass output, an MVA is run and analyzed. If at least one bandpass interval has good eigenvalue ratios, as described in the previous paragraph, that 2.2. Generating Synthetic Time-series Data interval is classified as a “complex wave interval” or Due to the relatively small size of the data set obtained by “complex.” If no such good eigenvalue ratios exist, then the the above methods, the data set used to train the neural network event is classified as a “nonwave interval” or “nonwave.” Note is augmented with synthetic time-series data that approximate that in all cases, a visual inspection is performed of the MVA both positive and negative training examples of the solar wind results to verify the classification. magnetic field. The creation of synthetic data is more time- In panel (b) of Figure 1, this wave classification method has efficient than the manual labeling method and thus is an been applied to 99 subintervals totaling 10 minutes of magnetic attractive approach for quickly increasing the number of field data. Green intervals (numbering 49) have coherent intervals available for training. Classifying by hand the 2671 waves, light green intervals (numbering 26) have complex data intervals displayed in Figure 3 required approximately 400 waves, and gray intervals (numbering 24) are nonwave person-hours; to obtain an order of magnitude more data by intervals. Examples of the three types of intervals are shown in hand is not feasible, as this would represent several thousand Figure 2. Panels (a)–(c) are the raw time-series data. Panels person-hours. Synthetic data have been used in other machine- (d)–(f) are the same intervals but high-pass-filtered. Panels (g)– learning contexts to bolster training sets, such as identifying (i) are the high-pass-filtered components in the MVA frame flux rope signatures (dos Santos et al. 2020) and ECG between the dotted gray boundaries. Panels (j) and (k) are the classification (Acharya et al. 2017). Appendix A provides a bandpass-filtered components in the frequency range 0.6–1Hz detailed description of how the synthetic data are generated, for complex and nonwave intervals, respectively. Panels (l) and including plots of representative examples. The synthetic data (m) are the bandpass-filtered coordinates in the MVA frame in are meant to approximate certain structures in the solar wind, the associated pink boundaries for the complex and nonwave not emulate them exactly; in many cases, the noise background intervals, respectively. Panels (n)–(p) are the hodograms in the synthetic time-series data is nonphysical and does not associated with the final MVA panel in each column. satisfy ∇ · B = 0. However, this property is present across both For the coherent wave in panel (a) of Figure 2, the MVA positive and negative synthetic data and is not expected to bias frame in panel (g) clearly shows an oscillation in B and B , network training. The ultimate goal as the number of real data y z with little oscillation in B . The MVA eigenvalue ratios for this events grows is to eventually fully replace the synthetic data in case are l l » 1.8, l l » 33.6. These para- the training set. max mid mid min meters, together with visual identification of a nearly circular The synthetic data consist of several different classes, all of ellipse in the hodogram in panel (n), are enough to classify this which have a background of noise. Synthetic positive examples interval as a coherent wave interval. have a randomly oriented, circularly polarized wave mode 3 The Astrophysical Journal, 949:40 (13pp), 2023 June 1 Fordin et al. Figure 2. Examples of coherent wave, complex wave, and nonwave intervals. Panels (a)–(c) are the raw time-series data. Panels (d)–(f) are the same intervals but high-pass-filtered. Panels (g)–(i) are the high-pass-filtered coordinates in the MVA frame between the dotted gray boundaries. Panels (j) and (k) are the raw time-series bandpass-filtered components for 0.6–1 Hz. Panels (l) and (m) are the bandpass-filtered coordinates in the MVA frame in the associated pink shaded regions. Panels (n)–(p) are the hodograms associated with the final MVA panel in each column. 4 The Astrophysical Journal, 949:40 (13pp), 2023 June 1 Fordin et al. combined with a wave envelope function. They may also contain a randomly oriented step function with a magnitude less than or equal to the amplitude of the wave. The circularly polarized wave mode may also be finite in extent inside the time window. Synthetic negative examples may have linearly polarized waves and/or a randomly oriented step function superposed on the noise background. A subset of examples contain no additional structure beyond the noise background. In total, there are 16 classes of synthetic data with 4000 training examples in each class, for a total of 64,000 synthetic data examples. Half of this is reserved for training, and half is reserved for validation during training. 3. Network Architecture, Training, and Validation As an initial proof of concept, we focus our effort around coherent waves in 66 point (∼6s) intervals. The machine- learning problem therefore becomes a supervised binary classification problem with only two classes: coherent events and nonwave events. A single neuron output is sufficient to describe the classification problem, with an output of one corresponding to a coherent event and zero corresponding to a nonwave event. As such, complex wave events are excluded from the training and validation sets. However, complex events will be examined during testing in Section 4. A priori, it is not known what type of neural network will be the most effective at finding waves. Therefore, we compare multiple types of neural networks with different layouts to classify coherent wave and nonwave intervals. As a simple test case, we consider a neural network with two fully connected and dropout layers and a single output neuron. This network achieved an average validation accuracy on real data of ∼63% across five training trials (detailed in Figure 5, described later), which is not sufficient for real-world applications. Clearly, such a simple network is not sufficient to diagnose waves in a time series where chronological progression plays a key role. To address this issue, it is necessary to add convolutional layers to the network. In order to allow the network architecture to diagnose waves with many different periods and envelopes, we include several convolutional layers to ensure that the network architecture inherently has feature extraction at multiple scales (Gu et al. 2018). Each convolutional layer in the network has a different kernel size, which is the size of the 1D convolution operation performed over the input time series. Each convolutional layer generates a chosen number of feature maps. Feature maps contain various types of structure that differ based on the size and weight of the convolutional filter applied to the input data. A multibranch CNN implementation provides a structure that combines several convolution operations performed in parallel on the input. A visualization of an N-branch network architecture is shown in Figure 4. Each branch of the 1DCNN begins with two convolutional layers in series that perform convolution on the input magnetic field components. The pooling layers reduce the number of parameters from the convolutional layers to prevent overfitting. The output of each Figure 3. Counts of (a) coherent, (b) complex, (c) nonwave, and (d) total pooling layer is flattened and concatenated before being fed coherent and nonwave events collected manually. The data are drawn from into two fully connected dense layers and two dropout layers. most years between 1995 and 2018 and cover multiple solar cycles. The The final layer is a single neuron layer whose output is the coherent and nonwave intervals represented in panel (d) are used to train and validate the networks. network’s prediction for the class of the input time series. 5 The Astrophysical Journal, 949:40 (13pp), 2023 June 1 Fordin et al. three training sets to test network performance. A classification threshold of 0.5 was used to differentiate between coherent and nonwave predictions. The validation results for each network are shown in Figure 7, and loss and accuracy plots associated with the highest-accuracy network are given in Figure 17 in Appendix B. Averaged across three training experiments, the lowest-accuracy networks were trained using the fully synthetic training set, with an average accuracy of 77.8%. The second- highest accuracy set was trained with half of the real training data training set, with an average accuracy of 88.1%. The networks trained with all of the real training data performed the Figure 4. Layout of the five-branch neural network architecture used for best, with an average accuracy of 94.0% on the validation set. coherent wave detection. Each branch contains two convolutional layers, The highest-accuracy network has an accuracy of 95%. followed by a pooling layer. Information from each branch is combined and fed Histograms of network predictions from the highest-accuracy into multiple fully connected and dropout layers. The choice N = 5 yields the network on the holdout set are shown for coherent events in best performance on real data and is simpler than N = 6or 7. Figure 8. It is promising that higher proportions of real data noticeably improve network accuracy; this is a sign that the We examine the performance of several iterations of an N- network is learning meaningful features about the real data. As branch neural network. Each iteration has a different number of the proportion of real data in the training set grows, a larger branches, N, with each branch utilizing different kernel sizes. number of relevant features are captured by the network, To test the performance of each iteration, we make use of improving the accuracy of the network. fivefold cross-validation, a technique used to rigorously examine the performance of different network architectures, 4. Testing Using Contiguous Data parameter sets, and hyperparameter sets on a given data set While the validation results are very promising, there is still that is commonly used in machine-learning applications significant uncertainty concerning the utility of the network for (Stone 1974). Fivefold cross-validation involves splitting the isolating coherent wave events in a typical contiguous solar training set into five partitions and performing five discrete wind interval. Specifically, complex wave events were training trials for a single network iteration, where each training excluded from the training and holdout sets, but of course, trial uses one of the partitions as the training set and the other they would be present in a real solar wind analysis. To study four partitions for validation. the response of the network to raw solar wind data, all 99 6 s For this fivefold validation, we keep the initialization of the intervals of panel (b) of Figure 1 were classified by the highest- random weights for each neuron in a given network iteration accuracy network from among the random training trials. These consistent across each of the five training trials. For each events included a mix of coherent, complex, and nonwave network variation and training trial, the same hyperparameters events. were used and are detailed in Appendix B. The fivefold cross- The results of this testing trial are tabulated in Figure 9. validation results for the neural network without convolutional Using an output value of 0.5 as the boundary between coherent layers described previously, as well as the results for networks and nonwave, all 49 coherent events were classified as with one, four, five, six, and seven convolutional branches, are wavelike. Similarly, all 24 nonwave events were classified as tabulated in Figure 5, and the validation accuracy corresp- nonwave. Twelve of 26 complex events were classified as onding to real data in the validation set for each network wavelike. iteration is plotted in Figure 6. Typical loss and accuracy plots The network output of complex waves may be problematic are shown in panels (a) and (b) of Figure 17 in Appendix B. depending on our goals for the network in classifying solar The five-branch network iteration shown in Figure 5 has the wind data. If we desire to find all complex and coherent waves highest average validation accuracy, with an average accuracy and discard nonwave intervals, clearly our network is not of 93.6%. The six- and seven-branch network iterations had sufficient; there is no threshold that both includes the majority slightly lower average accuracies of 93.2% and 92.7%, of complex events and at the same time discards most nonwave respectively. Given that the performances of the six- and events. On the other hand, the more modest goal of finding seven-branch networks do not show accuracy improvements coherent events is reasonable. A receiver operating character- over the five-branch network, the convolutional branches up to istic (ROC) curve is shown in panel (a) of Figure 10. Here the the fifth branch contribute features that contain useful true-positive rate (TPR) is defined as information, while the convolutional layers in the sixth and seventh branches do not contribute meaningful new features. TP TPR = ,1 () Next, an important consideration is the best ratio of real to TP + FN synthetic data to use in the training and the efficacy of the synthetic data in replacing real data during training. Several and the false-positive rate (FPR) is defined as random training trials using a five-branch architecture were FP performed using three different training sets, with each FPR = ,2 () containing the same synthetic data and a different amount of FP + TN real data. Respectively, the three training sets contained 0% of where TP, FP, FN, and TN are the numbers of true-positive, the real data, 35% of the real data (446 coherent and 239 false-positive, false-negative, and true-negative events, respec- nonwave intervals), and 70% of the real data (892 coherent and tively. In the ROC curve, we find that a threshold of 0.57 yields 477 nonwave intervals). A holdout set containing 360 coherent and 205 nonwave intervals was designated and used across all the closest point to (FPR = 0, TPR = 1). 6 The Astrophysical Journal, 949:40 (13pp), 2023 June 1 Fordin et al. Figure 6. Fivefold cross-validation results for the five network configurations plotted by number of branches. The five-branch configuration has the highest average accuracy. Figure 5. Fivefold cross-validation results for the five neural network configurations, detailing the total number of true positives (TP), false negatives (FN), false positives (FP), and true negatives (TN) for each training trial. The accuracy is the highest for five branches, with an average accuracy of 93.6% Figure 7. Validation results for the nine randomly trained networks, detailing across the five cross-validation folds. the total number of true positives (TP), false negatives (FN), false positives (FP), and true negatives (TN) for each network. Hybrid 35% refers to the networks trained with synthetic data and 35% of the real data set. Hybrid 70% refers to the networks trained with synthetic data and 70% of the real data However, this threshold does not consider the reality that for set.A classification threshold of 0.5 was used to determine the class prediction. typical long-duration solar wind measurements, it is expected The training set with the highest accuracy is the hybrid 70% set, and the that the nonwave events will outnumber the wave events by highest-accuracy network within that set achieved 95.0% validation accuracy. roughly a factor of 10. A threshold of 0.57 that optimizes the ROC curve would likely give comparable numbers of false- Figure 9, pertaining to the 10 minute data interval in Figure 1, positive and true-positive events, which is not acceptable for 94% of the coherent intervals had network predictions above the type of statistical studies planned with this data set. For that 0.95, and 0% of the nonwave intervals had network predictions reason, we choose to optimize the threshold to minimize the above 0.95. Assuming that the holdout set and 10 minute data ratio of the number of false-positive to true-positive events interval together provide an adequate representation of the while at the same time still finding a large number of true whole set of coherent and nonwave intervals in Wind data, we positives. extrapolate that a threshold value of ≈0.95 will capture nearly This goal can be achieved by increasing the classification all coherent events, some complex events, and a small number threshold of the network. In panel (b) of Figure 10, the ratio of of nonwave events relative to coherent events. the TPR to FPR for varying choices of threshold is shown. The ratio is highest near ≈1, but a threshold at or near 1 would 5. Classifying One Year exclude almost all coherent intervals. The ratio near ≈0.95 is more than double the ratio associated with 0.57. In Figure 8, Having demonstrated that the network can find coherent pertaining to the holdout set, 80% of the coherent wave waves, the next natural step is to apply the network to a much intervals had network predictions above 0.95, while only 2% of longer time interval and examine any patterns that emerge the nonwave intervals had network predictions above 0.95. In regarding circularly polarized waves in the solar wind. As an 7 The Astrophysical Journal, 949:40 (13pp), 2023 June 1 Fordin et al. Figure 8. Class predictions of the highest-accuracy network for (a) coherent and (b) nonwave intervals in the validation set. Using a classification threshold of 0.5, 343 out of 360 coherent and 194 out of 205 nonwave intervals were classified correctly. initial case, we study Wind magnetic data for a single year; 2005 marks the first full year during which Wind operated at the L1 point. The year has few corresponding data examples included in the training set (see Figure 9), which makes it particularly attractive for testing the network’s performance on data on which it has not been trained. The year is in the declining phase of the solar cycle about halfway between maximum and minimum. An overview of this year of data is shown in Figure 11. Averaged by day, the magnetic field components cover a wide Figure 9. Network prediction results for the 10 minute period shown in Figure 1, grouped by (a) coherent, (b) complex, and (c) nonwave intervals. The range of values, fluctuating in magnitude between 1 and 26 nT. classification threshold used is 0.5 and denoted by the thick black dashed lines. Similarly, the daily averages of the magnitude of the velocity All 49 coherent intervals and all 24 nonwave intervals were classified correctly components fluctuate regularly, with values corresponding to by the network, while 12 of 26 complex events were marked as wavelike. both the fast and slow solar wind. The proton density also varies substantially through the year, with averages ranging −3 from ∼1to ∼24 cm . The antisunward velocity component To ensure that the neural network successfully found waves, and the proton density notably fluctuate with a periodicity of we examined and classified by hand a random selection of 100 ∼9 days. These fluctuations appear to be oscillations between intervals scored 0.95 and above by the network; of these, 45 the fast and slow solar wind, possibly related to a corotating were coherent, 44 were complex, and 11 were nonwave. Of interaction region. these nonwave intervals, nine of 11 were either elliptically The analysis of this year of data is the same as described in polarized waves that did not meet our definition for circular Section 2.1; the magnetic field components for the year are polarization or circularly polarized waves with amplitudes less high-pass-filtered above 0.2 Hz and separated into 66 point than 0.1 nT. We extrapolate that ∼90% of the predictions intervals. Each interval was analyzed by the best-performing above the classification threshold of 0.95 contain circularly network from Figure 7, the results of which are shown in polarized waves. Any discussion of statistics should be Figure 12. As discussed in Section 4, we choose a classification considered carefully, therefore, as a relatively large number threshold of 0.95 (denoted by the vertical dashed line) to of false-positive intervals dilute the positive predictions made minimize false positives; this choice leads to 227,991 intervals by the network. However, due to the large number of positive classified as coherent out of a total of 5,040,592 inter- vals (∼4.5%). predictions in total, the statistics of the whole set of wave 8 The Astrophysical Journal, 949:40 (13pp), 2023 June 1 Fordin et al. Figure 11. Averages of solar wind parameters by day of the year for 2005. Shown are the (a) magnitude of magnetic field components, (b) solar wind velocity components, (c) proton density, and (d) core proton temperature. Figure 10. (a) ROC curve associated with three network iterations given in Figure 7. (b) Ratio of the TPR vs. FPR for varying choices of classification threshold for the highest-accuracy network (hybrid 70%, trial 2) given in Figure 7. The capability of the five-branch network to function as a binary classifier improves as real data are added to the training set, and the optimal threshold based on validation data is marked by the black dot in panel (a). However, the ratio of true-positive to false-positive events for the 0.57 threshold (denoted by the blue dotted line in panel (b)) is not sufficiently high when considering the distribution of wave vs. nonwave events in the solar wind. Instead, a threshold of 0.95 (green dotted line) yields a much higher ratio of true-positive to false-positive events while still preserving a large number of true-positive events. predictions can be examined with a relatively high degree of confidence. The counts of wave intervals per day, along with the Fourier Figure 12. Network prediction results for 2005. The classification threshold transforms of the set of counts, are shown in Figure 13. In panel used is 0.95, denoted by the black dashed line. Out of a total of 5,040,592 intervals, 227,991 were marked as wavelike. (a), the number of wave intervals varies greatly by day, from a minimum of seven to a maximum of 2641 intervals. The fast Fourier transform (FFT) in panel (b) reveals several peaks. The frequency or spacecraft incidence with the heliospheric current −1 strongest is associated with the ∼9 day oscillation period sheet. Beyond ∼0.18 day (∼5.5 days), there is a steady discussed in the previous paragraph, with smaller peaks decrease in the FFT. corresponding to periodicities of ∼6.6, ∼13.5, and ∼27 days. An important question is in regard to the kinds of solar wind Some of these peaks may be harmonics of the solar rotation conditions that are more likely to have wave modes. Due to the 9 The Astrophysical Journal, 949:40 (13pp), 2023 June 1 Fordin et al. Figure 13. (a) Histogram of counts of coherent events in 2005 binned by day of the year. (b) Associated FFT of the histogram. The total counts per day has a strong ∼9 day periodicity, along with smaller peaks corresponding to ∼6.6, ∼13.5, and ∼27 day periodicities. relatively small number of wave intervals compared to the total number of solar wind intervals, we study probability density functions (PDFs) of solar wind parameters for both wave intervals and all intervals. Figure 14 shows PDFs and medians of several key solar wind parameters: solar wind speed, magnetic field strength, proton temperature, and proton density. The median solar wind velocity of the solar wind is −1 ∼455 km s , and that of the wave population is −1 ∼540 km s , an increase of ∼20% for wave intervals. Similarly, the median proton temperature of the solar wind is ∼4.1 eV, and that of the wave population is ∼9.4 eV, an increase of ∼130%. The magnetic field strength for coherent intervals is ∼0.5 nT higher than for all intervals, an ∼10% increase in the magnetic field strength. On the other hand, the proton density shows a much smaller difference between the wave events and all events, with the median proton density for each population being less than 5% larger than the median for all intervals. Relative to all intervals, wave intervals tend to occur more often in the fast solar wind and at higher temperatures and magnetic field strengths. 6. Discussion The initial analysis presented here is a strong starting point for the automatic detection of wave modes in the solar wind. We have demonstrated that a multibranch convolutional neural network (CNN) is effective at finding coherent circularly polarized waves that can ultimately be used to create a database of such waves in the solar wind. A cursory analysis of 1 yr of Figure 14. The PDFs for (a) solar wind speed, (b) magnetic field strength, (c) solar wind data has yielded intriguing results, in particular that proton temperature, and (d) proton density. The blue distribution corresponds coherent waves are measured more frequently in fast solar wind to all intervals; the red distribution corresponds to coherent intervals. Dashed and at higher temperatures. Whether this is due to selection lines correspond to the median of each distribution. Wave intervals tend to effects such as Doppler shifting or a basic change in the occur more often in the fast solar wind and at higher temperatures and magnetic generating mechanism of these waves remains to be seen. field strengths compared to the rest of the solar wind. 10 The Astrophysical Journal, 949:40 (13pp), 2023 June 1 Fordin et al. Clearly, one next step will be to apply the neural network to a much longer time span of solar wind data. Of course, many interesting waves are complex, having multiple frequencies associated with a wave packet. We are hopeful that extension of the neural network framework from this study to allow diagnosis of complex waves will be straightforward for two reasons. First, even with a single bandpass filter, the neural network managed to identify a portion of the complex modes that were outside the scope of the implemented machine-learning problem. At the same time, the network had a low FPR for nonwave examples. This is a sign that the network is learning robustly and in a way that captures circularly polarized wave behavior in a fundamental way. Second, the method used in this paper can be generalized in a straightforward manner to complex waves by passing the time series through narrow bandpass filters. The use of these additional bandpass filters effectively isolates more coherent wave modes that can then be detected by the existing machine- learning approach. Ultimately, many waves in the solar wind are not circularly polarized. In principle, with a large training set of such waves, there is no obvious reason why a CNN trained to identify other wave polarizations would fail to find such waves. This approach can also be extended to other solar wind data sets. The CNNs are highly adaptable, and starting with a primarily synthetic training set allows for iterative improvement on the network without an overwhelmingly expensive manual approach. Figure 15. Examples of different classes of synthetic data: (a) noise without This research was supported by NASA grant Nos. additional features, (b) spiky noise, (c) circularly polarized wave mode, (d) 80NSSC22K1728 and 80NSSC20K0198. We thank Brian A. circularly polarized wave mode with interruption, (e) linearly polarized wave Thomas for helpful discussions concerning validation of neural with interruption, and (f) logistic step function. networks. We acknowledge high-performance computing support from Cheyenne provided by NCARs CISL, sponsored by the The three main pieces of synthetic time-series generation— NSF. This research also used NERSC resources, a U.S. DOE the noise background, the wave structure, and a step function— Office of Science User Facility operated under contract No. DE- are implemented as follows. AC02-05CH11231. (i) The noise background is first generated in frequency space, and then an inverse discrete Fourier transform (DFT) is applied. A power-law noise background is applied with a Appendix A Synthetic Data Parameters randomized slope given by To adequately train the neural net to find circularly polarized -p Bf (∣ 10 ∣) [ cosqq (f)+i sin (f)], waves, it is necessary to augment the observational data with 0 aa synthetic data. To that end, it is important that the synthetic ∣∣ f > 211 Bf() = () A1 n,a data have a representation of important factors present in the -3 10 nT s, ∣∣ f = 2 11 observational data. 0nT s, ∣∣ f = 0 Synthetic data generation combines several magnetic field ⎩ sources. All synthetic intervals have a noise background for each α ä {x, y, z}, where B is the DFT of the noise piece of generated from a power law with a randomized phase. To this p-1 noise is added one of three possibilities: a circularly polarized the magnetic field, B Î[] 0.08, 60 nT Hz is the amplitude wave, a linearly polarized wave, or no wave. The circularly and of the lowest-frequency bin, f ä [−5.5, 5.5] Hz is the range of linearly polarized waves either have a double Gaussian frequency bins for the DFT, p ä [−2, −1] is the randomized envelope or a “broken envelope” (a double Gaussian slope of the power law for a given generated example, and interrupted by a discontinuous change in amplitude, phase, θ ( f ) ä [0, 2π) rad is the randomized phase associated with and wave direction). Finally, two additional signals may be each frequency; note that θ (−f ) = −θ ( f ) to ensure that the α α added: a logistic step function and random high-frequency inverse DFT results in a real function. To avoid introducing noise (“spiky noise”). Note that none, one, or both of these circularly polarized wave modes by chance through choices of signals may be added. Several examples of synthetic data are random phase, we check the MVA eigenvalue ratios for the shown in Figure 15. These examples contain a range of noise and omit examples that satisfy the criteria for coherent structures and include both positive and negative training examples. waves. 11 The Astrophysical Journal, 949:40 (13pp), 2023 June 1 Fordin et al. A subset of noise examples has additional power added to generated with several higher-frequency bins, A ⎛«⎡ ⎤ ⎞ step BR () t = ,A (7) Bf(),, ∣f∣ Î [f f ] step 0 n,a ⎢ ⎥ ⎧ ss ,1 ,2 --() tt s step step ⎜ ⎟ 1 + e Bf() = ,A()2 s,a ⎣ ⎦ ⎝ ⎠ ⎨˜˜ BB+Î ()f,, ∣f∣ [f f ] sn 0,aa , ss ,1 ,2 where A ä [0.1, 3] nT is the step amplitude, t ä [0, 6] sis step step p−1 for each α ä {x, y, z}, where B ä [0.1, 0.75] nT Hz is the s,0 the center of the step function, σ ä [0.1, 0.3] s determines step randomly determined amplitude of the added power, and f , s,1 the width of the step function, and R is a randomized unitary f ä [3.4, 5.5] Hz form the randomly determined frequency s,2 rotation matrix. range with added power. (ii) If appropriate to the class of synthetic training data, a Appendix B wave structure is added to the noise: circularly polarized for Network Parameters, Hyperparameters, and Training positive wave examples or linearly polarized for a subset of Plots nonwave examples. Circularly polarized wave modes are generated with The training process itself is dependent on a set of hyperparameters that determine every aspect of training. The Af cos() 2pft - set of network parameters and hyperparameters was found by ⎛ ww ⎞ ⎡ ⎤ manually modifying each value and testing hundreds of such BR ()tg = ()t ,A()3 ⎢Af sin() 2pft - ⎥ circ ⎜ ⎟ env w ww −4 ⎜ ⎟ ⎢ ⎥ sets by hand. A learning rate of 8 × 10 and a batch size of ⎣ ⎦ ⎝ ⎠ 10 are used for each network. A rectified linear unit activation function is used for each layer except the output layer. A where a 3 × 3 unitary rotation matrix R is used to apply a sigmoid activation function is used for each output layer. The random rotation to the vector in square brackets, A ä [0.1, 2] parameters associated with each branch are outlined in nT is the randomly determined amplitude of the circularly Figure 16. For a given N-branch network, each of the branches polarized wave, f ä [0.2, 5.5] Hz is the randomly determined from 1 to N is included in the network and structured as frequency of the wave, f ä [0, 2π) rad is the randomly w outlined in Figure 4. The filter sizes s take a range of odd determined phase of the wave, and g (t) is an envelope values between 1 and 33, and the number of feature maps m is env k either 4, 16, or 32, depending on the kernel size. A kernel size function, described shortly. of 1 corresponds to a rescaling of the input magnetic field, Linearly polarized wave modes B (t) are generated with lin while higher kernel sizes correspond to smoothing of the input magnetic field. Af cos() 2pft - ⎛ ww ⎞ ⎡ ⎤ Loss and accuracy plots are shown in panels (a) and (b) of BR ()tg = ()t ,A()4 lin ⎢ 0 ⎥ env ⎜ ⎟ ⎜ ⎟ Figure 17 for the five-branch network trained with the second ⎢ ⎥ ⎣ ⎦ ⎝ ⎠ fold of the training set (labeled 1DCNN 5B, Fold 2 in Figure 5). The training and validation loss are similar and with the same definitions as in Equation (A3). decrease with training epoch before the validation loss reaches The time envelope of the wave g (t) takes different forms env a plateau, indicating that early stopping can be implemented to depending on the class of example, reduce training time without sacrificing performance. 2 2 --()tt m --( m ) 1 2 Loss and accuracy plots are shown in panels (c) and (d) of ⎛ ⎞ 2 2 4s 4s gt ()=+ g e 1 e 2 ,A (5) ⎜⎟ env env,0 Figure 17 for the highest-performing network in Figure 7 as ⎝ ⎠ functions of training epoch. Based on the results in panels (a) 2 2 and (b) of Figure 17, early stopping was implemented and () mm - 4s where ge = normalizes the function at env,0 conditioned on the training loss with a patience of three training t = μ ; μ , μ ä [0, 6] s are the randomly determined envelope 1 1 2 epochs. The patience was determined through trial and error. means; and σ , σ ä [3, 9] s are the randomly determined 1 2 envelope widths. For broken wave packets, Bt (),, t Î [t t ] circ1 pp 1 2 Bt () = () A6 circ AB ()t,, t Î [t t ], ⎨pp circ2 1p2 where t , t define the time range of the wave packet break, p1 p2 B and B are waves defined with the same amplitude and circ1 circ2 frequency but different phase and direction, and A ä [0, 1] gives the amplitude of the wave in the vicinity of the break. Figure 16. Table of kernel sizes s and number of filters m for each k k (iii) In a subset of both wave and nonwave examples, a convolutional branch used in the neural network. A given N-branch network logistic step function is added to the signal. Step functions are incorporates all branches from 1 to N. 12 The Astrophysical Journal, 949:40 (13pp), 2023 June 1 Fordin et al. Figure 17. (a) Training/validation loss and (b) training/validation accuracy as functions of training epoch for the five-branch network trained with the second fold of the training set (labeled 1DCNN 5B, Fold 2 in Figure 5) and (c) training/validation loss and (d) training/validation accuracy as functions of training epoch for the five-branch network trained with the highest-accuracy network in Figure 7. The training/validation losses in panel (a) are similar and smoothly decrease with training epoch before reaching a plateau in performance. Similarly, the training/validation accuracy in panel (b) smoothly increases before reaching a plateau. The plateaus in network performance indicate that the network was not overtrained, but further training beyond a certain point does not improve network performance. The random training trials represented in panels (c) and (d) incorporate early stopping. The training/validation losses in panel (c) are similar and smoothly decrease with training epoch, indicating that the network was not overtrained, and the training/accuracy in panel (d) smoothly increases, indicating that the network’s classification capability improved with training time. The training and validation losses are similar and smoothly Jian, L. K., Russell, C. T., Luhmann, J. G., et al. 2009, ApJ, 701, L105 Khrabrov, A. V., & Sonnerup, B. U. 1998, JGR, 103, 6641 decrease with training epoch, indicating that the network was Kiranyaz, S., Avci, O., Abdeljaber, O., et al. 2021, MSSP, 151, 107398 not overtrained. Lacombe, C., Alexandrova, O., Matteini, L., et al. 2014, ApJ, 796, 5 LeCun, Y., Bengio, Y., & Hinton, G. 2015, Natur, 521, 436 ORCID iDs Lepping, R. P., Acũna, M. H., Burlaga, L. F., et al. 1995, SSRv, 71, 207 Marsch, E., & Chang, T. 1983, JGRA, 88, 6869 Samuel Fordin https://orcid.org/0000-0002-1634-9122 Mozer, F. S., Bonnell, J. W., Halekas, J. S., et al. 2021, ApJ, 908, 26 Michael Shay https://orcid.org/0000-0003-1861-4767 Pinto, V. A., Keesee, A. M., Coughlan, M., et al. 2022, FrASS, 9, 869740 Lynn B. Wilson III https://orcid.org/0000-0002-4313-1970 Siciliano, F., Consolini, G., Tozzi, R., et al. 2021, SpWea, 19, Bennett Maruca https://orcid.org/0000-0002-2229-5618 e2020SW002589 Smith, A. W., Forsyth, C., Rae, I. J., et al. 2021, SpWea, 19, e2021SW002788 Barbara J. Thompson https://orcid.org/0000-0001- Stansby, D., Horbury, T. S., Chen, C. H. K., & Matteini, L. 2016, ApJ, 6952-7343 829, L16 Stone, M. 1974, J. R. Stat. Soc., B: Stat. Methodol., 36, 111 References Telloni, D., Carbone, F., Bruno, R., et al. 2019, ApJ, 885, L5 Tong, Y., Vasko, I., Artemyev, A., Bale, S., & Mozer, F. 2019, ApJ, 878, 41 Abdeljaber, O., Avci, O., Kiranyaz, S., Gabbouj, M., & Inman, D. J. 2017, Unti, T. W. J., & Neugebauer, M. 1968, PhFl, 11, 563 JSV, 388, 154 Vech, D., & Malaspina, D. M. 2021, JGRA, 126, e29567 Acharya, U. R., Oh, S. L., Hagiwara, Y., et al. 2017, Comput. Biol. Med., Verniero, J. L., Larson, D. E., Livi, R., et al. 2020, ApJS, 248, 5 89, 389 Viall, N. M., & Borovsky, J. E. 2020, JGRA, 125, e26005 Acharya, U. R., Oh, S. L., Hagiwara, Y., Tan, J. H., & Adeli, H. 2018, Comput. Wicks, R. T., Alexander, R. L., Stevens, M., et al. 2016, ApJ, 819, 6 Biol. Med., 100, 270 Wilson, L., III, Koval, A., Szabo, A., et al. 2012, GeoRL, 39, L08109 Bruno, R., & Carbone, V. 2013, LRSP, 10, 2 Wilson, L., III, Koval, A., Szabo, A., et al. 2017, JGRA, 122, 9115 Cattell, C., Breneman, A., Dombeck, J., et al. 2022, ApJL, 924, L33 Wilson, L. B., III, Brosius, A. L., Gopalswamy, N., et al. 2021, RvGeo, 59, dos Santos, L. F. G., Narock, A., Nieves-Chinchilla, T., Nuñez, M., & Kirk, M. e2020RG000714 2020, SoPh, 295, 131 Wilson, L. B., III, Cattell, C. A., Kellogg, P. J., et al. 2009, JGRA, 114, Gu, J., Wang, Z., Kuen, J., et al. 2018, PatRe, 77, 354 A10106 Hasegawa, A. 1976, JGR, 81, 5083 Wilson, L. B., III, Cattell, C. A., Kellogg, P. J., et al. 2010, JGRA, 115, Jagarlamudi, V. K., Dudok de Wit, T., Froment, C., et al. 2021, A&A, 650, A9 A12104 Jeong, H.-J., Moon, Y.-J., Park, E., & Lee, H. 2020, ApJ, 903, L25 Wu, Q., Sun, Y., Yan, H., & Wu, X. 2020, Comput. Biol. Med., 121, 103800 http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png The Astrophysical Journal IOP Publishing

A Machine Learning–Based Approach to Time-series Wave Identification in the Solar Wind

Loading next page...
 
/lp/iop-publishing/a-machine-learning-based-approach-to-time-series-wave-identification-08NXa4hTem

References

References for this paper are not available at this time. We will be adding them shortly, thank you for your patience.

Publisher
IOP Publishing
Copyright
© 2023. The Author(s). Published by the American Astronomical Society.
ISSN
0004-637X
eISSN
1538-4357
DOI
10.3847/1538-4357/acc8d5
Publisher site
See Article on Publisher Site

Abstract

The Wind spacecraft has yielded several decades of high-resolution magnetic field data, a large fraction of which displays small-scale structures. In particular, the solar wind is full of wavelike fluctuations that appear in both the field magnitude and its components. The nature of these fluctuations can be tied to the properties of other structures in the solar wind, such as shocks, that have implications for the time evolution of the solar wind. As such, having a large collection of wave events would facilitate further study of the effects that these fluctuations have on solar wind evolution. Given the large volume of magnetic field data available, machine learning is the most practical approach to classifying the myriad small-scale structures observed. To this end, a subset of Wind data is labeled and used as a training set for a multibranch 1D convolutional neural network aimed at classifying circularly polarized wave modes. Using this algorithm, a preliminary statistical study of 1 yr of data is performed, yielding about 300,000 wave intervals out of about 5,000,000 solar wind intervals. The wave intervals come about more often in the fast solar wind and at higher temperatures, and the number of waves per day is highly periodic. This machine learning–based approach to wave detection has the potential to be a powerful, inexpensive way to catalog waves throughout decades of spacecraft data. Unified Astronomy Thesaurus concepts: Solar wind (1534); Interplanetary physics (827); Space weather (2037); Space plasmas (1544); Neural networks (1933); Convolutional neural networks (1938); Alfven waves (23) 1. Introduction we focus on these types of modes for this study. One such mode is the magnetosonic-whistler (MSW) mode. The MSWs The solar wind is a turbulent supersonic plasma streaming are an extension of the fast magnetosonic wave above the ion away from the Sun that has a major impact on the near-space cyclotron frequency (Marsch & Chang 1983). They are right- region of Earth. Understanding the important dynamics and hand polarized with respect to the quasi-static magnetic field, morphology of the solar wind, especially the dissipation of and the plasma density oscillates in phase with the magnitude fluctuations/discontinuities and subsequent heating, is a major of the magnetic field. They have been seen at many locations in unsolved problem (e.g., Viall & Borovsky 2020). Many of the solar wind from 0.1 to 1 au in data from many satellites, these fluctuations have been classified in terms of their such as the Parker Solar Probe (Jagarlamudi et al. 2021; Mozer wavelike properties. The exact role that waves play in et al. 2021), ARTEMIS (Stansby et al. 2016; Tong et al. 2019), influencing solar wind evolution is an active area of research, Wind (Wilson et al. 2017), and Cluster (Lacombe et al. 2014), with solar wind studies examining a broad class of waves among numerous others. The MSWs can be generated by a including but not limited to Alfvén and kinetic Alfvén (Unti & range of instabilities (Tong et al. 2019; Jagarlamudi et al. Neugebauer 1968; Hasegawa 1976), magnetosonic whistler 2021). They are also generated near shocks and are associated (Wilson et al. 2012; Verniero et al. 2020), and ion cyclotron with dissipation in collisionless shocks (Wilson et al. 2012). (Jian et al. 2009; Wicks et al. 2016). The large amount of Based on spectral analysis of ARTEMIS data, MSWs seem to spacecraft data available from in situ sources in the solar wind be present in the solar wind ∼10% of the time at 1 au (Tong present an excellent opportunity to investigate the nature of et al. 2019), though they seem to be less common near the Sun wave modes and the roles that they play. One such data source based on Parker Solar Probe measurements (Jagarlamudi et al. 2021; Cattell et al. 2022). is the Wind spacecraft, which has taken continuous measure- Besides MSW waves, other circularly polarized wave modes ments of solar wind properties for 28 yr, mostly at 1 au (e.g., near the ion cyclotron frequency are generally identified in the Wilson et al. 2021). High-resolution magnetic field data from solar wind. Circularly polarized Alfvén/ion cyclotron waves the Magnetic Field Investigation (MFI; Lepping et al. 1995) are present in many parts of the solar wind (Jian et al. 2009; contain a large number of wave modes that originate from Wicks et al. 2016). Alfvén wave frequencies tend to be below different physical processes. the ion cyclotron frequency, lying in the frequency range of The cadence of the Wind MFI instrumentation allows it to −4 −2 ∼10 –10 Hz in the plasma frame. Alfvén waves tend to be measure modes with spacecraft-frame frequencies up to mainly antisunward-propagating (Bruno & Carbone 2013) and ∼5.5 Hz. In this range of frequencies, circularly or elliptically may heat protons (Telloni et al. 2019). Ion cyclotron waves polarized modes are frequently observed in the solar wind, and have been observed in solar wind data (Jian et al. 2009), and the ion cyclotron resonance is thought to provide a major Original content from this work may be used under the terms source of energy dissipation (Telloni et al. 2019). of the Creative Commons Attribution 4.0 licence. Any further Given the long duration of the Wind mission, the Wind data distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI. set represents an unprecedented opportunity to study circularly 1 The Astrophysical Journal, 949:40 (13pp), 2023 June 1 Fordin et al. structures with a high degree of variance in their size and duration. One candidate for the problem of time-series wave identification is the 1D CNN (1DCNN). Artificial neural networks pass and feed information forward using sequential layers of neurons, and 1DCNNs include one or more 1D convolutional layers in their layout. The 1D convolutional layers within a neural network are given a kernel size, which defines the length of the 1D convolution operation performed by that layer upon a given input data series (LeCun 2015). When these 1D convolutional layers are placed before feed- forward layers in a neural network, the convolutional layers effectively perform a smoothing operation on the input data series and create additional features that are passed to the feed- forward layers. In this way, convolutional layers perform feature extraction by learning their own structures corresp- onding to a designated kernel size (Kiranyaz et al. 2021). In space physics, CNNs have been used to process solar wind data for space weather forecasting models that predict the SYM-H geomagnetic index (Siciliano et al. 2021) and the rate of change of the ground magnetic field (Smith et al. 2021; Pinto et al. 2022). In other fields, particularly biology, CNNs have been used in various contexts. For example, they have been used to classify heartbeats from ECG data (Acharya et al. 2017; Wu et al. 2020), automatically diagnose brain wave activity from EEG signals (Acharya et al. 2018), and diagnose structural damage in several different civil, mechanical, and Figure 1. (a) 12 hr interval of Wind magnetic field data from 2004 April 28. aerospace systems for structural health monitoring (Abdeljaber The spacecraft is in the solar wind above the ecliptic plane. (b) 10 minute subinterval split into 66 point (∼6s) time windows manually classified using et al. 2017). The types of signals analyzed in these examples minimum variance techniques. Dark green time windows correspond to share many similarities with Wind magnetic field data in that coherent wave time windows, light green time windows correspond to complex they are noisy time series with periodic features of interest that wave time windows, and gray time windows correspond to nonwave time are of varying length. windows. Given the complexity of using these techniques to identify solar wind fluctuations, we focus first on identifying circularly polarized wave modes at 1 au across multiple solar cycles. To polarized wave modes in a single high-pass filter (referred to examine these wave modes, statistical analysis could be used to here as coherent wave modes). We utilize a machine-learning examine the general properties of the solar wind, but given the approach based on CNNs to identify waves in time-series data breadth of the data, traditional methods of examining these from Wind magnetic field data. To this end, a subset of current wave modes are overly expensive and require manual Wind data is labeled and used as a training set for a machine- validation. For example, frequency filtering together with learning algorithm aimed at classifying small-scale structures. minimum variance analysis (MVA) is an effective approach for Eventually, the goal is to generalize to wave modes that identifying circularly polarized wave modes, but it requires originate in intervals that require narrower bandpass filter manual and visual analysis and is time-expensive (e.g., Wilson ranges to be resolved (referred to here as complex wave et al. 2009). Similarly, spectral analysis using wavelet trans- modes). forms is robust and effective but time-consuming and not a realistic approach to apply to several decades of data (e.g., Wilson et al. 2010). Machine learning is the most practical 2. Building the Training Set approach to classify the myriad small-scale structures observed 2.1. Collecting Real Data Events in the magnetic field data. With that being said, the choice of machine-learning An example solar wind interval is given in Figure 1(a),in algorithm to use in this context is not an obvious one. which 12 hr of magnetic field data are shown. In Figure 1(b),a Numerous machine-learning algorithms exist and have differ- 10 minute zoom of the magnetic field from the Wind MFI is ent applications. In space physics alone, various machine- shown. The magnetic field in the 10 minute zoom is primarily learning approaches have been used to analyze diverse in the xy plane in geocentric solar ecliptic coordinates phenomena. For example, 2D convolutional neural networks (GSE),with a B smaller than the other two components. At (CNNs) have been used to identify flux rope signatures in the the beginning of the interval, the magnetic field undergoes a solar corona (dos Santos et al. 2020), generative adversarial sharp rotation with a small increase in the magnetic field networks have been used to examine the coronal magnetic field magnitude. After the rotation, there are periods of high- (Jeong et al. 2020), and self-organizing maps have been used to frequency oscillations, eventually followed by a modest classify magnetic field spectra (Vech & Malaspina 2021). For reduction in the field magnitude. The high-frequency oscilla- wave classification, the range of wave periods and highly tions during this interval are typically characterized by varying wave durations poses a challenge, as not all machine- comparable fluctuations in all magnetic field components with learning algorithms generalize well to classifying time-series the magnitude of B relatively constant. 2 The Astrophysical Journal, 949:40 (13pp), 2023 June 1 Fordin et al. As a first step toward using machine learning to classify For the complex wave in panel (b) of Figure 2, there appears waves, we manually divide the data into equal-length time to be a potential wave packet, but the high-pass-filtered signal is noisy in panel (e). The MVA eigenvalue ratios for this case intervals, classifying each interval as having “coherent” waves, arel l » 1.4,l l » 6.8, with the MVA compo- “complex” waves, or no waves (“nonwave”). Note that for this max mid mid min nents plotted in panel (h). In order to isolate the wave modes in study, we are limiting the search to circularly polarized waves, this window, a bandpass filter covering a narrower frequency with elliptical and planar polarizations characterized as range is required. For this case, the passband 0.6–1Hz is nonwave. In addition, the study is limited to waves with chosen, and the bandpass-filtered components are plotted in amplitudes larger than 0.1 nT. panel (j). The time interval between the pink bounds has MVA To begin the analysis, a high-pass filter is performed on eigenvalue ratios l l » 1.8, l l » 71.6. The ∼10–20 hr of Wind magnetic field data with a frequency cutoff max mid mid min hodogram in panel (l) also displays an ellipse with radii that of 0.2 Hz. In most cases, a wavelet transform of the data is are close enough to satisfy our cutoff for circular polarization, examined by eye for intervals with significant fluctuation which corresponds to a complex wave interval. energy. In a smaller number of cases, the subintervals were For the nonwave interval in panel (c) of Figure 2,no chosen randomly from solar wind data to eliminate bias. circularly polarized wave modes are present for any choice of Whichever of the two methods is used, the chosen periods are bandpass filter. The high-pass-filtered components plotted in then divided into intervals with 66 time values, which panel (f) and the corresponding MVA components in panel (i) represents about 6 s of spacecraft data. do not show any circular polarization. We choose the same An MVA is performed on each of these 6 s intervals passband as for the complex wave. The bandpass-filtered (Khrabrov & Sonnerup 1998). Multiple MVAs are performed components are plotted in panel (k), and we evaluate the on various subintervals chosen by hand that span at least one eigenvalue ratios associated with the MVA components plotted period of fluctuations in the fields. Circularly polarized waves in panel (m).We findl l » 10.4,l l » 19.1 for max mid mid min are characterized by having the two largest MVA eigenvalues the whole interval. If we take a shorter section of data, such as similar in value and much larger than the minimum MVA the first third of the window, we find l l » 4.7, max mid eigenvalue. Coherent events are those that exhibit the property l l » 69.4, which does not meet our definition for a mid min that the two largest eigenvalues are within a factor of 3 of each circularly polarized wave interval. Furthermore, the hodogram other and at least 10 times greater than the minimum in panel (p) does not display any notable wave polarization. eigenvalue. Ultimately, we will use machine-learning algo- By analyzing several thousand ∼6 s solar wind intervals rithms to find these coherent wave events in the solar wind. using this time-intensive manual selection process on data Intervals that do not exhibit coherent waves are further spread over the date range 1995–2018, 1274 coherent, 715 analyzed to determine if more complex wavelike structure is complex, and 682 nonwave examples were found. Figure 3 present. First, multiple bandpass filters are applied to the contains histograms of total counts by year. intervals. For each resultant bandpass output, an MVA is run and analyzed. If at least one bandpass interval has good eigenvalue ratios, as described in the previous paragraph, that 2.2. Generating Synthetic Time-series Data interval is classified as a “complex wave interval” or Due to the relatively small size of the data set obtained by “complex.” If no such good eigenvalue ratios exist, then the the above methods, the data set used to train the neural network event is classified as a “nonwave interval” or “nonwave.” Note is augmented with synthetic time-series data that approximate that in all cases, a visual inspection is performed of the MVA both positive and negative training examples of the solar wind results to verify the classification. magnetic field. The creation of synthetic data is more time- In panel (b) of Figure 1, this wave classification method has efficient than the manual labeling method and thus is an been applied to 99 subintervals totaling 10 minutes of magnetic attractive approach for quickly increasing the number of field data. Green intervals (numbering 49) have coherent intervals available for training. Classifying by hand the 2671 waves, light green intervals (numbering 26) have complex data intervals displayed in Figure 3 required approximately 400 waves, and gray intervals (numbering 24) are nonwave person-hours; to obtain an order of magnitude more data by intervals. Examples of the three types of intervals are shown in hand is not feasible, as this would represent several thousand Figure 2. Panels (a)–(c) are the raw time-series data. Panels person-hours. Synthetic data have been used in other machine- (d)–(f) are the same intervals but high-pass-filtered. Panels (g)– learning contexts to bolster training sets, such as identifying (i) are the high-pass-filtered components in the MVA frame flux rope signatures (dos Santos et al. 2020) and ECG between the dotted gray boundaries. Panels (j) and (k) are the classification (Acharya et al. 2017). Appendix A provides a bandpass-filtered components in the frequency range 0.6–1Hz detailed description of how the synthetic data are generated, for complex and nonwave intervals, respectively. Panels (l) and including plots of representative examples. The synthetic data (m) are the bandpass-filtered coordinates in the MVA frame in are meant to approximate certain structures in the solar wind, the associated pink boundaries for the complex and nonwave not emulate them exactly; in many cases, the noise background intervals, respectively. Panels (n)–(p) are the hodograms in the synthetic time-series data is nonphysical and does not associated with the final MVA panel in each column. satisfy ∇ · B = 0. However, this property is present across both For the coherent wave in panel (a) of Figure 2, the MVA positive and negative synthetic data and is not expected to bias frame in panel (g) clearly shows an oscillation in B and B , network training. The ultimate goal as the number of real data y z with little oscillation in B . The MVA eigenvalue ratios for this events grows is to eventually fully replace the synthetic data in case are l l » 1.8, l l » 33.6. These para- the training set. max mid mid min meters, together with visual identification of a nearly circular The synthetic data consist of several different classes, all of ellipse in the hodogram in panel (n), are enough to classify this which have a background of noise. Synthetic positive examples interval as a coherent wave interval. have a randomly oriented, circularly polarized wave mode 3 The Astrophysical Journal, 949:40 (13pp), 2023 June 1 Fordin et al. Figure 2. Examples of coherent wave, complex wave, and nonwave intervals. Panels (a)–(c) are the raw time-series data. Panels (d)–(f) are the same intervals but high-pass-filtered. Panels (g)–(i) are the high-pass-filtered coordinates in the MVA frame between the dotted gray boundaries. Panels (j) and (k) are the raw time-series bandpass-filtered components for 0.6–1 Hz. Panels (l) and (m) are the bandpass-filtered coordinates in the MVA frame in the associated pink shaded regions. Panels (n)–(p) are the hodograms associated with the final MVA panel in each column. 4 The Astrophysical Journal, 949:40 (13pp), 2023 June 1 Fordin et al. combined with a wave envelope function. They may also contain a randomly oriented step function with a magnitude less than or equal to the amplitude of the wave. The circularly polarized wave mode may also be finite in extent inside the time window. Synthetic negative examples may have linearly polarized waves and/or a randomly oriented step function superposed on the noise background. A subset of examples contain no additional structure beyond the noise background. In total, there are 16 classes of synthetic data with 4000 training examples in each class, for a total of 64,000 synthetic data examples. Half of this is reserved for training, and half is reserved for validation during training. 3. Network Architecture, Training, and Validation As an initial proof of concept, we focus our effort around coherent waves in 66 point (∼6s) intervals. The machine- learning problem therefore becomes a supervised binary classification problem with only two classes: coherent events and nonwave events. A single neuron output is sufficient to describe the classification problem, with an output of one corresponding to a coherent event and zero corresponding to a nonwave event. As such, complex wave events are excluded from the training and validation sets. However, complex events will be examined during testing in Section 4. A priori, it is not known what type of neural network will be the most effective at finding waves. Therefore, we compare multiple types of neural networks with different layouts to classify coherent wave and nonwave intervals. As a simple test case, we consider a neural network with two fully connected and dropout layers and a single output neuron. This network achieved an average validation accuracy on real data of ∼63% across five training trials (detailed in Figure 5, described later), which is not sufficient for real-world applications. Clearly, such a simple network is not sufficient to diagnose waves in a time series where chronological progression plays a key role. To address this issue, it is necessary to add convolutional layers to the network. In order to allow the network architecture to diagnose waves with many different periods and envelopes, we include several convolutional layers to ensure that the network architecture inherently has feature extraction at multiple scales (Gu et al. 2018). Each convolutional layer in the network has a different kernel size, which is the size of the 1D convolution operation performed over the input time series. Each convolutional layer generates a chosen number of feature maps. Feature maps contain various types of structure that differ based on the size and weight of the convolutional filter applied to the input data. A multibranch CNN implementation provides a structure that combines several convolution operations performed in parallel on the input. A visualization of an N-branch network architecture is shown in Figure 4. Each branch of the 1DCNN begins with two convolutional layers in series that perform convolution on the input magnetic field components. The pooling layers reduce the number of parameters from the convolutional layers to prevent overfitting. The output of each Figure 3. Counts of (a) coherent, (b) complex, (c) nonwave, and (d) total pooling layer is flattened and concatenated before being fed coherent and nonwave events collected manually. The data are drawn from into two fully connected dense layers and two dropout layers. most years between 1995 and 2018 and cover multiple solar cycles. The The final layer is a single neuron layer whose output is the coherent and nonwave intervals represented in panel (d) are used to train and validate the networks. network’s prediction for the class of the input time series. 5 The Astrophysical Journal, 949:40 (13pp), 2023 June 1 Fordin et al. three training sets to test network performance. A classification threshold of 0.5 was used to differentiate between coherent and nonwave predictions. The validation results for each network are shown in Figure 7, and loss and accuracy plots associated with the highest-accuracy network are given in Figure 17 in Appendix B. Averaged across three training experiments, the lowest-accuracy networks were trained using the fully synthetic training set, with an average accuracy of 77.8%. The second- highest accuracy set was trained with half of the real training data training set, with an average accuracy of 88.1%. The networks trained with all of the real training data performed the Figure 4. Layout of the five-branch neural network architecture used for best, with an average accuracy of 94.0% on the validation set. coherent wave detection. Each branch contains two convolutional layers, The highest-accuracy network has an accuracy of 95%. followed by a pooling layer. Information from each branch is combined and fed Histograms of network predictions from the highest-accuracy into multiple fully connected and dropout layers. The choice N = 5 yields the network on the holdout set are shown for coherent events in best performance on real data and is simpler than N = 6or 7. Figure 8. It is promising that higher proportions of real data noticeably improve network accuracy; this is a sign that the We examine the performance of several iterations of an N- network is learning meaningful features about the real data. As branch neural network. Each iteration has a different number of the proportion of real data in the training set grows, a larger branches, N, with each branch utilizing different kernel sizes. number of relevant features are captured by the network, To test the performance of each iteration, we make use of improving the accuracy of the network. fivefold cross-validation, a technique used to rigorously examine the performance of different network architectures, 4. Testing Using Contiguous Data parameter sets, and hyperparameter sets on a given data set While the validation results are very promising, there is still that is commonly used in machine-learning applications significant uncertainty concerning the utility of the network for (Stone 1974). Fivefold cross-validation involves splitting the isolating coherent wave events in a typical contiguous solar training set into five partitions and performing five discrete wind interval. Specifically, complex wave events were training trials for a single network iteration, where each training excluded from the training and holdout sets, but of course, trial uses one of the partitions as the training set and the other they would be present in a real solar wind analysis. To study four partitions for validation. the response of the network to raw solar wind data, all 99 6 s For this fivefold validation, we keep the initialization of the intervals of panel (b) of Figure 1 were classified by the highest- random weights for each neuron in a given network iteration accuracy network from among the random training trials. These consistent across each of the five training trials. For each events included a mix of coherent, complex, and nonwave network variation and training trial, the same hyperparameters events. were used and are detailed in Appendix B. The fivefold cross- The results of this testing trial are tabulated in Figure 9. validation results for the neural network without convolutional Using an output value of 0.5 as the boundary between coherent layers described previously, as well as the results for networks and nonwave, all 49 coherent events were classified as with one, four, five, six, and seven convolutional branches, are wavelike. Similarly, all 24 nonwave events were classified as tabulated in Figure 5, and the validation accuracy corresp- nonwave. Twelve of 26 complex events were classified as onding to real data in the validation set for each network wavelike. iteration is plotted in Figure 6. Typical loss and accuracy plots The network output of complex waves may be problematic are shown in panels (a) and (b) of Figure 17 in Appendix B. depending on our goals for the network in classifying solar The five-branch network iteration shown in Figure 5 has the wind data. If we desire to find all complex and coherent waves highest average validation accuracy, with an average accuracy and discard nonwave intervals, clearly our network is not of 93.6%. The six- and seven-branch network iterations had sufficient; there is no threshold that both includes the majority slightly lower average accuracies of 93.2% and 92.7%, of complex events and at the same time discards most nonwave respectively. Given that the performances of the six- and events. On the other hand, the more modest goal of finding seven-branch networks do not show accuracy improvements coherent events is reasonable. A receiver operating character- over the five-branch network, the convolutional branches up to istic (ROC) curve is shown in panel (a) of Figure 10. Here the the fifth branch contribute features that contain useful true-positive rate (TPR) is defined as information, while the convolutional layers in the sixth and seventh branches do not contribute meaningful new features. TP TPR = ,1 () Next, an important consideration is the best ratio of real to TP + FN synthetic data to use in the training and the efficacy of the synthetic data in replacing real data during training. Several and the false-positive rate (FPR) is defined as random training trials using a five-branch architecture were FP performed using three different training sets, with each FPR = ,2 () containing the same synthetic data and a different amount of FP + TN real data. Respectively, the three training sets contained 0% of where TP, FP, FN, and TN are the numbers of true-positive, the real data, 35% of the real data (446 coherent and 239 false-positive, false-negative, and true-negative events, respec- nonwave intervals), and 70% of the real data (892 coherent and tively. In the ROC curve, we find that a threshold of 0.57 yields 477 nonwave intervals). A holdout set containing 360 coherent and 205 nonwave intervals was designated and used across all the closest point to (FPR = 0, TPR = 1). 6 The Astrophysical Journal, 949:40 (13pp), 2023 June 1 Fordin et al. Figure 6. Fivefold cross-validation results for the five network configurations plotted by number of branches. The five-branch configuration has the highest average accuracy. Figure 5. Fivefold cross-validation results for the five neural network configurations, detailing the total number of true positives (TP), false negatives (FN), false positives (FP), and true negatives (TN) for each training trial. The accuracy is the highest for five branches, with an average accuracy of 93.6% Figure 7. Validation results for the nine randomly trained networks, detailing across the five cross-validation folds. the total number of true positives (TP), false negatives (FN), false positives (FP), and true negatives (TN) for each network. Hybrid 35% refers to the networks trained with synthetic data and 35% of the real data set. Hybrid 70% refers to the networks trained with synthetic data and 70% of the real data However, this threshold does not consider the reality that for set.A classification threshold of 0.5 was used to determine the class prediction. typical long-duration solar wind measurements, it is expected The training set with the highest accuracy is the hybrid 70% set, and the that the nonwave events will outnumber the wave events by highest-accuracy network within that set achieved 95.0% validation accuracy. roughly a factor of 10. A threshold of 0.57 that optimizes the ROC curve would likely give comparable numbers of false- Figure 9, pertaining to the 10 minute data interval in Figure 1, positive and true-positive events, which is not acceptable for 94% of the coherent intervals had network predictions above the type of statistical studies planned with this data set. For that 0.95, and 0% of the nonwave intervals had network predictions reason, we choose to optimize the threshold to minimize the above 0.95. Assuming that the holdout set and 10 minute data ratio of the number of false-positive to true-positive events interval together provide an adequate representation of the while at the same time still finding a large number of true whole set of coherent and nonwave intervals in Wind data, we positives. extrapolate that a threshold value of ≈0.95 will capture nearly This goal can be achieved by increasing the classification all coherent events, some complex events, and a small number threshold of the network. In panel (b) of Figure 10, the ratio of of nonwave events relative to coherent events. the TPR to FPR for varying choices of threshold is shown. The ratio is highest near ≈1, but a threshold at or near 1 would 5. Classifying One Year exclude almost all coherent intervals. The ratio near ≈0.95 is more than double the ratio associated with 0.57. In Figure 8, Having demonstrated that the network can find coherent pertaining to the holdout set, 80% of the coherent wave waves, the next natural step is to apply the network to a much intervals had network predictions above 0.95, while only 2% of longer time interval and examine any patterns that emerge the nonwave intervals had network predictions above 0.95. In regarding circularly polarized waves in the solar wind. As an 7 The Astrophysical Journal, 949:40 (13pp), 2023 June 1 Fordin et al. Figure 8. Class predictions of the highest-accuracy network for (a) coherent and (b) nonwave intervals in the validation set. Using a classification threshold of 0.5, 343 out of 360 coherent and 194 out of 205 nonwave intervals were classified correctly. initial case, we study Wind magnetic data for a single year; 2005 marks the first full year during which Wind operated at the L1 point. The year has few corresponding data examples included in the training set (see Figure 9), which makes it particularly attractive for testing the network’s performance on data on which it has not been trained. The year is in the declining phase of the solar cycle about halfway between maximum and minimum. An overview of this year of data is shown in Figure 11. Averaged by day, the magnetic field components cover a wide Figure 9. Network prediction results for the 10 minute period shown in Figure 1, grouped by (a) coherent, (b) complex, and (c) nonwave intervals. The range of values, fluctuating in magnitude between 1 and 26 nT. classification threshold used is 0.5 and denoted by the thick black dashed lines. Similarly, the daily averages of the magnitude of the velocity All 49 coherent intervals and all 24 nonwave intervals were classified correctly components fluctuate regularly, with values corresponding to by the network, while 12 of 26 complex events were marked as wavelike. both the fast and slow solar wind. The proton density also varies substantially through the year, with averages ranging −3 from ∼1to ∼24 cm . The antisunward velocity component To ensure that the neural network successfully found waves, and the proton density notably fluctuate with a periodicity of we examined and classified by hand a random selection of 100 ∼9 days. These fluctuations appear to be oscillations between intervals scored 0.95 and above by the network; of these, 45 the fast and slow solar wind, possibly related to a corotating were coherent, 44 were complex, and 11 were nonwave. Of interaction region. these nonwave intervals, nine of 11 were either elliptically The analysis of this year of data is the same as described in polarized waves that did not meet our definition for circular Section 2.1; the magnetic field components for the year are polarization or circularly polarized waves with amplitudes less high-pass-filtered above 0.2 Hz and separated into 66 point than 0.1 nT. We extrapolate that ∼90% of the predictions intervals. Each interval was analyzed by the best-performing above the classification threshold of 0.95 contain circularly network from Figure 7, the results of which are shown in polarized waves. Any discussion of statistics should be Figure 12. As discussed in Section 4, we choose a classification considered carefully, therefore, as a relatively large number threshold of 0.95 (denoted by the vertical dashed line) to of false-positive intervals dilute the positive predictions made minimize false positives; this choice leads to 227,991 intervals by the network. However, due to the large number of positive classified as coherent out of a total of 5,040,592 inter- vals (∼4.5%). predictions in total, the statistics of the whole set of wave 8 The Astrophysical Journal, 949:40 (13pp), 2023 June 1 Fordin et al. Figure 11. Averages of solar wind parameters by day of the year for 2005. Shown are the (a) magnitude of magnetic field components, (b) solar wind velocity components, (c) proton density, and (d) core proton temperature. Figure 10. (a) ROC curve associated with three network iterations given in Figure 7. (b) Ratio of the TPR vs. FPR for varying choices of classification threshold for the highest-accuracy network (hybrid 70%, trial 2) given in Figure 7. The capability of the five-branch network to function as a binary classifier improves as real data are added to the training set, and the optimal threshold based on validation data is marked by the black dot in panel (a). However, the ratio of true-positive to false-positive events for the 0.57 threshold (denoted by the blue dotted line in panel (b)) is not sufficiently high when considering the distribution of wave vs. nonwave events in the solar wind. Instead, a threshold of 0.95 (green dotted line) yields a much higher ratio of true-positive to false-positive events while still preserving a large number of true-positive events. predictions can be examined with a relatively high degree of confidence. The counts of wave intervals per day, along with the Fourier Figure 12. Network prediction results for 2005. The classification threshold transforms of the set of counts, are shown in Figure 13. In panel used is 0.95, denoted by the black dashed line. Out of a total of 5,040,592 intervals, 227,991 were marked as wavelike. (a), the number of wave intervals varies greatly by day, from a minimum of seven to a maximum of 2641 intervals. The fast Fourier transform (FFT) in panel (b) reveals several peaks. The frequency or spacecraft incidence with the heliospheric current −1 strongest is associated with the ∼9 day oscillation period sheet. Beyond ∼0.18 day (∼5.5 days), there is a steady discussed in the previous paragraph, with smaller peaks decrease in the FFT. corresponding to periodicities of ∼6.6, ∼13.5, and ∼27 days. An important question is in regard to the kinds of solar wind Some of these peaks may be harmonics of the solar rotation conditions that are more likely to have wave modes. Due to the 9 The Astrophysical Journal, 949:40 (13pp), 2023 June 1 Fordin et al. Figure 13. (a) Histogram of counts of coherent events in 2005 binned by day of the year. (b) Associated FFT of the histogram. The total counts per day has a strong ∼9 day periodicity, along with smaller peaks corresponding to ∼6.6, ∼13.5, and ∼27 day periodicities. relatively small number of wave intervals compared to the total number of solar wind intervals, we study probability density functions (PDFs) of solar wind parameters for both wave intervals and all intervals. Figure 14 shows PDFs and medians of several key solar wind parameters: solar wind speed, magnetic field strength, proton temperature, and proton density. The median solar wind velocity of the solar wind is −1 ∼455 km s , and that of the wave population is −1 ∼540 km s , an increase of ∼20% for wave intervals. Similarly, the median proton temperature of the solar wind is ∼4.1 eV, and that of the wave population is ∼9.4 eV, an increase of ∼130%. The magnetic field strength for coherent intervals is ∼0.5 nT higher than for all intervals, an ∼10% increase in the magnetic field strength. On the other hand, the proton density shows a much smaller difference between the wave events and all events, with the median proton density for each population being less than 5% larger than the median for all intervals. Relative to all intervals, wave intervals tend to occur more often in the fast solar wind and at higher temperatures and magnetic field strengths. 6. Discussion The initial analysis presented here is a strong starting point for the automatic detection of wave modes in the solar wind. We have demonstrated that a multibranch convolutional neural network (CNN) is effective at finding coherent circularly polarized waves that can ultimately be used to create a database of such waves in the solar wind. A cursory analysis of 1 yr of Figure 14. The PDFs for (a) solar wind speed, (b) magnetic field strength, (c) solar wind data has yielded intriguing results, in particular that proton temperature, and (d) proton density. The blue distribution corresponds coherent waves are measured more frequently in fast solar wind to all intervals; the red distribution corresponds to coherent intervals. Dashed and at higher temperatures. Whether this is due to selection lines correspond to the median of each distribution. Wave intervals tend to effects such as Doppler shifting or a basic change in the occur more often in the fast solar wind and at higher temperatures and magnetic generating mechanism of these waves remains to be seen. field strengths compared to the rest of the solar wind. 10 The Astrophysical Journal, 949:40 (13pp), 2023 June 1 Fordin et al. Clearly, one next step will be to apply the neural network to a much longer time span of solar wind data. Of course, many interesting waves are complex, having multiple frequencies associated with a wave packet. We are hopeful that extension of the neural network framework from this study to allow diagnosis of complex waves will be straightforward for two reasons. First, even with a single bandpass filter, the neural network managed to identify a portion of the complex modes that were outside the scope of the implemented machine-learning problem. At the same time, the network had a low FPR for nonwave examples. This is a sign that the network is learning robustly and in a way that captures circularly polarized wave behavior in a fundamental way. Second, the method used in this paper can be generalized in a straightforward manner to complex waves by passing the time series through narrow bandpass filters. The use of these additional bandpass filters effectively isolates more coherent wave modes that can then be detected by the existing machine- learning approach. Ultimately, many waves in the solar wind are not circularly polarized. In principle, with a large training set of such waves, there is no obvious reason why a CNN trained to identify other wave polarizations would fail to find such waves. This approach can also be extended to other solar wind data sets. The CNNs are highly adaptable, and starting with a primarily synthetic training set allows for iterative improvement on the network without an overwhelmingly expensive manual approach. Figure 15. Examples of different classes of synthetic data: (a) noise without This research was supported by NASA grant Nos. additional features, (b) spiky noise, (c) circularly polarized wave mode, (d) 80NSSC22K1728 and 80NSSC20K0198. We thank Brian A. circularly polarized wave mode with interruption, (e) linearly polarized wave Thomas for helpful discussions concerning validation of neural with interruption, and (f) logistic step function. networks. We acknowledge high-performance computing support from Cheyenne provided by NCARs CISL, sponsored by the The three main pieces of synthetic time-series generation— NSF. This research also used NERSC resources, a U.S. DOE the noise background, the wave structure, and a step function— Office of Science User Facility operated under contract No. DE- are implemented as follows. AC02-05CH11231. (i) The noise background is first generated in frequency space, and then an inverse discrete Fourier transform (DFT) is applied. A power-law noise background is applied with a Appendix A Synthetic Data Parameters randomized slope given by To adequately train the neural net to find circularly polarized -p Bf (∣ 10 ∣) [ cosqq (f)+i sin (f)], waves, it is necessary to augment the observational data with 0 aa synthetic data. To that end, it is important that the synthetic ∣∣ f > 211 Bf() = () A1 n,a data have a representation of important factors present in the -3 10 nT s, ∣∣ f = 2 11 observational data. 0nT s, ∣∣ f = 0 Synthetic data generation combines several magnetic field ⎩ sources. All synthetic intervals have a noise background for each α ä {x, y, z}, where B is the DFT of the noise piece of generated from a power law with a randomized phase. To this p-1 noise is added one of three possibilities: a circularly polarized the magnetic field, B Î[] 0.08, 60 nT Hz is the amplitude wave, a linearly polarized wave, or no wave. The circularly and of the lowest-frequency bin, f ä [−5.5, 5.5] Hz is the range of linearly polarized waves either have a double Gaussian frequency bins for the DFT, p ä [−2, −1] is the randomized envelope or a “broken envelope” (a double Gaussian slope of the power law for a given generated example, and interrupted by a discontinuous change in amplitude, phase, θ ( f ) ä [0, 2π) rad is the randomized phase associated with and wave direction). Finally, two additional signals may be each frequency; note that θ (−f ) = −θ ( f ) to ensure that the α α added: a logistic step function and random high-frequency inverse DFT results in a real function. To avoid introducing noise (“spiky noise”). Note that none, one, or both of these circularly polarized wave modes by chance through choices of signals may be added. Several examples of synthetic data are random phase, we check the MVA eigenvalue ratios for the shown in Figure 15. These examples contain a range of noise and omit examples that satisfy the criteria for coherent structures and include both positive and negative training examples. waves. 11 The Astrophysical Journal, 949:40 (13pp), 2023 June 1 Fordin et al. A subset of noise examples has additional power added to generated with several higher-frequency bins, A ⎛«⎡ ⎤ ⎞ step BR () t = ,A (7) Bf(),, ∣f∣ Î [f f ] step 0 n,a ⎢ ⎥ ⎧ ss ,1 ,2 --() tt s step step ⎜ ⎟ 1 + e Bf() = ,A()2 s,a ⎣ ⎦ ⎝ ⎠ ⎨˜˜ BB+Î ()f,, ∣f∣ [f f ] sn 0,aa , ss ,1 ,2 where A ä [0.1, 3] nT is the step amplitude, t ä [0, 6] sis step step p−1 for each α ä {x, y, z}, where B ä [0.1, 0.75] nT Hz is the s,0 the center of the step function, σ ä [0.1, 0.3] s determines step randomly determined amplitude of the added power, and f , s,1 the width of the step function, and R is a randomized unitary f ä [3.4, 5.5] Hz form the randomly determined frequency s,2 rotation matrix. range with added power. (ii) If appropriate to the class of synthetic training data, a Appendix B wave structure is added to the noise: circularly polarized for Network Parameters, Hyperparameters, and Training positive wave examples or linearly polarized for a subset of Plots nonwave examples. Circularly polarized wave modes are generated with The training process itself is dependent on a set of hyperparameters that determine every aspect of training. The Af cos() 2pft - set of network parameters and hyperparameters was found by ⎛ ww ⎞ ⎡ ⎤ manually modifying each value and testing hundreds of such BR ()tg = ()t ,A()3 ⎢Af sin() 2pft - ⎥ circ ⎜ ⎟ env w ww −4 ⎜ ⎟ ⎢ ⎥ sets by hand. A learning rate of 8 × 10 and a batch size of ⎣ ⎦ ⎝ ⎠ 10 are used for each network. A rectified linear unit activation function is used for each layer except the output layer. A where a 3 × 3 unitary rotation matrix R is used to apply a sigmoid activation function is used for each output layer. The random rotation to the vector in square brackets, A ä [0.1, 2] parameters associated with each branch are outlined in nT is the randomly determined amplitude of the circularly Figure 16. For a given N-branch network, each of the branches polarized wave, f ä [0.2, 5.5] Hz is the randomly determined from 1 to N is included in the network and structured as frequency of the wave, f ä [0, 2π) rad is the randomly w outlined in Figure 4. The filter sizes s take a range of odd determined phase of the wave, and g (t) is an envelope values between 1 and 33, and the number of feature maps m is env k either 4, 16, or 32, depending on the kernel size. A kernel size function, described shortly. of 1 corresponds to a rescaling of the input magnetic field, Linearly polarized wave modes B (t) are generated with lin while higher kernel sizes correspond to smoothing of the input magnetic field. Af cos() 2pft - ⎛ ww ⎞ ⎡ ⎤ Loss and accuracy plots are shown in panels (a) and (b) of BR ()tg = ()t ,A()4 lin ⎢ 0 ⎥ env ⎜ ⎟ ⎜ ⎟ Figure 17 for the five-branch network trained with the second ⎢ ⎥ ⎣ ⎦ ⎝ ⎠ fold of the training set (labeled 1DCNN 5B, Fold 2 in Figure 5). The training and validation loss are similar and with the same definitions as in Equation (A3). decrease with training epoch before the validation loss reaches The time envelope of the wave g (t) takes different forms env a plateau, indicating that early stopping can be implemented to depending on the class of example, reduce training time without sacrificing performance. 2 2 --()tt m --( m ) 1 2 Loss and accuracy plots are shown in panels (c) and (d) of ⎛ ⎞ 2 2 4s 4s gt ()=+ g e 1 e 2 ,A (5) ⎜⎟ env env,0 Figure 17 for the highest-performing network in Figure 7 as ⎝ ⎠ functions of training epoch. Based on the results in panels (a) 2 2 and (b) of Figure 17, early stopping was implemented and () mm - 4s where ge = normalizes the function at env,0 conditioned on the training loss with a patience of three training t = μ ; μ , μ ä [0, 6] s are the randomly determined envelope 1 1 2 epochs. The patience was determined through trial and error. means; and σ , σ ä [3, 9] s are the randomly determined 1 2 envelope widths. For broken wave packets, Bt (),, t Î [t t ] circ1 pp 1 2 Bt () = () A6 circ AB ()t,, t Î [t t ], ⎨pp circ2 1p2 where t , t define the time range of the wave packet break, p1 p2 B and B are waves defined with the same amplitude and circ1 circ2 frequency but different phase and direction, and A ä [0, 1] gives the amplitude of the wave in the vicinity of the break. Figure 16. Table of kernel sizes s and number of filters m for each k k (iii) In a subset of both wave and nonwave examples, a convolutional branch used in the neural network. A given N-branch network logistic step function is added to the signal. Step functions are incorporates all branches from 1 to N. 12 The Astrophysical Journal, 949:40 (13pp), 2023 June 1 Fordin et al. Figure 17. (a) Training/validation loss and (b) training/validation accuracy as functions of training epoch for the five-branch network trained with the second fold of the training set (labeled 1DCNN 5B, Fold 2 in Figure 5) and (c) training/validation loss and (d) training/validation accuracy as functions of training epoch for the five-branch network trained with the highest-accuracy network in Figure 7. The training/validation losses in panel (a) are similar and smoothly decrease with training epoch before reaching a plateau in performance. Similarly, the training/validation accuracy in panel (b) smoothly increases before reaching a plateau. The plateaus in network performance indicate that the network was not overtrained, but further training beyond a certain point does not improve network performance. The random training trials represented in panels (c) and (d) incorporate early stopping. The training/validation losses in panel (c) are similar and smoothly decrease with training epoch, indicating that the network was not overtrained, and the training/accuracy in panel (d) smoothly increases, indicating that the network’s classification capability improved with training time. The training and validation losses are similar and smoothly Jian, L. K., Russell, C. T., Luhmann, J. G., et al. 2009, ApJ, 701, L105 Khrabrov, A. V., & Sonnerup, B. U. 1998, JGR, 103, 6641 decrease with training epoch, indicating that the network was Kiranyaz, S., Avci, O., Abdeljaber, O., et al. 2021, MSSP, 151, 107398 not overtrained. Lacombe, C., Alexandrova, O., Matteini, L., et al. 2014, ApJ, 796, 5 LeCun, Y., Bengio, Y., & Hinton, G. 2015, Natur, 521, 436 ORCID iDs Lepping, R. P., Acũna, M. H., Burlaga, L. F., et al. 1995, SSRv, 71, 207 Marsch, E., & Chang, T. 1983, JGRA, 88, 6869 Samuel Fordin https://orcid.org/0000-0002-1634-9122 Mozer, F. S., Bonnell, J. W., Halekas, J. S., et al. 2021, ApJ, 908, 26 Michael Shay https://orcid.org/0000-0003-1861-4767 Pinto, V. A., Keesee, A. M., Coughlan, M., et al. 2022, FrASS, 9, 869740 Lynn B. Wilson III https://orcid.org/0000-0002-4313-1970 Siciliano, F., Consolini, G., Tozzi, R., et al. 2021, SpWea, 19, Bennett Maruca https://orcid.org/0000-0002-2229-5618 e2020SW002589 Smith, A. W., Forsyth, C., Rae, I. J., et al. 2021, SpWea, 19, e2021SW002788 Barbara J. Thompson https://orcid.org/0000-0001- Stansby, D., Horbury, T. S., Chen, C. H. K., & Matteini, L. 2016, ApJ, 6952-7343 829, L16 Stone, M. 1974, J. R. Stat. Soc., B: Stat. Methodol., 36, 111 References Telloni, D., Carbone, F., Bruno, R., et al. 2019, ApJ, 885, L5 Tong, Y., Vasko, I., Artemyev, A., Bale, S., & Mozer, F. 2019, ApJ, 878, 41 Abdeljaber, O., Avci, O., Kiranyaz, S., Gabbouj, M., & Inman, D. J. 2017, Unti, T. W. J., & Neugebauer, M. 1968, PhFl, 11, 563 JSV, 388, 154 Vech, D., & Malaspina, D. M. 2021, JGRA, 126, e29567 Acharya, U. R., Oh, S. L., Hagiwara, Y., et al. 2017, Comput. Biol. Med., Verniero, J. L., Larson, D. E., Livi, R., et al. 2020, ApJS, 248, 5 89, 389 Viall, N. M., & Borovsky, J. E. 2020, JGRA, 125, e26005 Acharya, U. R., Oh, S. L., Hagiwara, Y., Tan, J. H., & Adeli, H. 2018, Comput. Wicks, R. T., Alexander, R. L., Stevens, M., et al. 2016, ApJ, 819, 6 Biol. Med., 100, 270 Wilson, L., III, Koval, A., Szabo, A., et al. 2012, GeoRL, 39, L08109 Bruno, R., & Carbone, V. 2013, LRSP, 10, 2 Wilson, L., III, Koval, A., Szabo, A., et al. 2017, JGRA, 122, 9115 Cattell, C., Breneman, A., Dombeck, J., et al. 2022, ApJL, 924, L33 Wilson, L. B., III, Brosius, A. L., Gopalswamy, N., et al. 2021, RvGeo, 59, dos Santos, L. F. G., Narock, A., Nieves-Chinchilla, T., Nuñez, M., & Kirk, M. e2020RG000714 2020, SoPh, 295, 131 Wilson, L. B., III, Cattell, C. A., Kellogg, P. J., et al. 2009, JGRA, 114, Gu, J., Wang, Z., Kuen, J., et al. 2018, PatRe, 77, 354 A10106 Hasegawa, A. 1976, JGR, 81, 5083 Wilson, L. B., III, Cattell, C. A., Kellogg, P. J., et al. 2010, JGRA, 115, Jagarlamudi, V. K., Dudok de Wit, T., Froment, C., et al. 2021, A&A, 650, A9 A12104 Jeong, H.-J., Moon, Y.-J., Park, E., & Lee, H. 2020, ApJ, 903, L25 Wu, Q., Sun, Y., Yan, H., & Wu, X. 2020, Comput. Biol. Med., 121, 103800

Journal

The Astrophysical JournalIOP Publishing

Published: Jun 1, 2023

There are no references for this article.