A Statistical Comparison of Line Strength Variations in Coma and Cluster Galaxies at ztex2html_wrap_inline3110.3

Lewis A. Jones , Warrick J. Couch, PASA, 15 (3), 309
The html and gzipped postscript versions of this paper are in preprint form.
To access the final published version, download the pdf file
.

Next Section: Discussion
Title/Abstract Page: A Statistical Comparison of
Previous Section: Analysis
Contents Page: Volume 15, Number 3

Results

  figure116
Figure 2: This figure demonstrates the main result. It shows the distribution in line strength variations contained in each principal component for the clean Coma, noisy Coma, and high redshift data. The empty squares represent the clean Coma data, filled squares represent the noisy Coma data, and filled circles represent the high redshift sample. The dot-dashed line shows the eigenvalues for a sample of 100% uncorrelated variances. Taking the shallow trend of the Coma eigenvalues as the benchmark of components containing nearly no spectral information, it follows that, at most, two components in the Coma data are required to describe most of the line strength variation there, while the high redshift data requires four components to explain the line strength variations in that data set.

In this section we briefly address two questions; first, how many parameters are required to adequately describe the galaxy spectra, and second, what physical phenomena are associated with the new axes. Because of the data compression property of PCA, it is an ideal method for investigating the first question. Although a physical description of the new axes will not often be immediately apparent, the number of principal components which contain most of the information in the spectra tells us how many different parameters are needed to fully describe the data set, in this case, the galaxy line strengths.

Figure 2 shows the eigenvalues for the three different samples: empty squares for the clean Coma data, filled squares for the noisy Coma data, and filled circles for the high redshift data. As the components, or eigenvectors, represented by the eigenvalues are different for each different data set, no direct comparison of the properties of the components themselves can be made, but we are for the moment only interested in the number of components required to describe the data set, and this individual analysis of the data sets will tell us that. Remembering that the eigenvalues show the amount of the total sample variance accounted for in that principal component, this diagram displays the distribution of variances in the new axes for each of the three samples. One feature of the diagram that is clear at a first look is the difference in the shape of the distribution of the variances in the Coma data relative to the high redshift data. The first component in the Coma data contains 3 to 4 times (clean and noisy, respectively) the variance of the second component, while the first high redshift component contains less than 1.5 times the variance of the second. Then, while the coma eigenvalues flatten out, the high redshift eigenvalues continue on the same, more moderate, trend as for the first two until at the fifth component, it joins the already shallow trend of the Coma data.

Although the trends for the clean and noisy Coma samples are strikingly similar, and suggest that differences between the high redshift and Coma line strengths are not due to the higher noise of the distant cluster data, the differences between the two Coma samples are also instructive. What is immediately apparent is that the variance in the first two components of the noisy data has lessened while the rest of the components have picked up some of the overall sample variance. If the two samples were identical except for the noise characteristics, then one would expect the trend to become more shallow for the first two components and steepen for the latter ones; hence becoming more like the high redshift data. However, the trend becomes even steeper in the noisy data for the first two components. In adding the noise, we have added uncorrelated information to the spectra at all wavelengths. This process will necessarily mask some of the stronger correlations in the data, and remove power from the components associated with those correlations, while adding power to the components which already represent mostly uncorrelated noise. This provides a way to identify components which contain the strong correlations versus those containing mostly noise. Using this criterion, Figure 2 tells us that the first two components of the Coma data contain most of the information and the latter five are mostly noise.

While we have not degraded the high redshift data and therefore do not have the same criterion available to us, the dot-dashed line represents the value of the eigenvalues for a sample of 100% uncorrelated variances and can act as a different type of benchmark. Because of the fact that we see the five latter Coma components move almost uniformly toward the line of 100% uncorrelated variances when noise is added, we will take that trend to define the behaviour of components containing mostly the uncorrelated contributions to the sample variance. We can say then that the first four high redshift components are not uncorrelated noise as they do not lie on that trend. Taking the shape of the distributions of the variances together with the response of the components to adding noise, we can now answer the first question by saying that for the three rich clusters at ztex2html_wrap_inline3110.3 discussed in this paper, there are more parameters required to describe their spectral line strength variations than for the nearby Coma cluster. Specifically, there are four independent parameters present in the high redshift data, while only two in the Coma data.

   

Index PC1 PC2 PC3 PC4 PC5 PC6 PC7
CN-0.3739-0.4144 0.4610-0.1038-0.3148 0.3440-0.4978
CH-0.4629-0.2203-0.4818-0.0796 0.0652 0.5490 0.4395
Ca-0.3750 0.4977-0.1162 0.3392 0.4646 0.1671-0.4892
Fe-0.0215-0.6897 0.0263 0.2830 0.6012-0.2850-0.0216
Htex2html_wrap_inline357 0.3179-0.1988-0.4740 0.5995-0.4258 0.1316-0.2770
Htex2html_wrap_inline355 0.5059 0.0485 0.4076 0.2087 0.2724 0.6494 0.1900
Ctex2html_wrap_inline377-0.3845 0.1207 0.3879 0.6203-0.2528-0.1792 0.4543
Eig.gif30.321.716.613.8 6.55.85.3
F Testgif tex2html_wrap_inline477tex2html_wrap_inline4772.5%15%100%100%100%
Table 3: Eigenvectors for the Coma Plus High-z Cluster Sample

   

Index PC1 PC2
Name High-zgif Comagif KSgif High-z Coma KS
CN9177<1<1
CH<1<1<1944<1
Ca<18<1<115<1
Fe313<1<1<1<1
Htex2html_wrap_inline35728712063<1
Htex2html_wrap_inline355<1<1<14449<1
Ctex2html_wrap_inline3772<111625<1
Table 4: Correlations and KS Probabilities for the Combined Sample

For the purpose of addressing the question of the physical significance of the principal components, we will turn to the second PCA run with the combined clusters data set described in Section 3.2. This allows us to compare the distant clusters and Coma using the same set of basis vectors, the common principal components. Table 3 shows the eigenvectors for the combined run, the eigenvalues (shown as % Variance), and the F test probability that the eigenvector in question could be fully explained by measurement uncertainties, i.e. low probability means high significance. The eigenvector elements in Table 3 are actually correlation coefficients and range from +1 for a complete positive correlation of galaxy weights with index value, through 0 for no correlation, to -1 for a complete anticorrelation, and the degree of correlation is the degree to which variations in that index are responsible for the variations in the component. From F test probabilities in Table 3 it can be seen that only the first two principal components are significant to better than 99%. The increasing noise in the rest of the components will weaken any conclusions drawn about their relationship to physical quantities, so for this reason we will confine our discussion to the properties of the first two components.

Each principal component is a linear combination of the original set of seven spectral indices, the coefficients of which are contained in the eigenvectors (the correlation coefficients). The index values for any galaxy can then be reconstructed exactly from the eigenvectors by multiplying each of the seven eigenvectors by an appropriate weight; hence a weight quantifies the significance of a given eigenvector in describing a particular galaxy spectrum. Comparing the distribution of weights in the combined sample with the index values themselves can tell us about the physical meaning of the principal components.

To that end, using the weights generated with the common eigenvectors, we have computed the correlations of weight versus index value for the two samples individually, as well as the two dimensional KS test probabilities of the weight versus index value distributions for each sample. Table 4 shows those results. The correlation probabilities are shown as the percentage chance that the measured correlations could result from a random distribution, and the KS probabilities are shown as percentages that the two distributions (weight versus index value for each the Coma and High-z samples) could be drawn from the same parent population. For the correlation probabilities then, a low value indicates a high significance to the correlation, and for the KS probabilities, a low value indicates the distributions are different. One simple result we can draw from the KS probabilities in Table 4 is that the two samples are overwhelmingly different in the relationship between their line strengths and both of the first two principal components. Hence, Coma and the high redshift clusters are spectroscopically different.

Some of the ways in which they are different can be seen in the correlations of index value versus PC weight. For the first component, each index is well correlated with at least one of the cluster samples, and, from Table 3, we see that the strongest correlation is with Htex2html_wrap_inline355. However, the strong correlation may be due to the particular metallicity sensitivity of this definition of the index (see Section 3.1), related to its proximity to the G-band. For this reason, it will be prudent to regard Htex2html_wrap_inline355 as more of a psuedo-metal line than a Balmer line. Although, regardless of the origin of the correlation of the first component with Htex2html_wrap_inline355, we also have Htex2html_wrap_inline357, which is more free from metal contamination, so we will rely on that Balmer line for an indication of the general behaviour of Balmer lines.

The differences in the response of each sample to the first component is interesting. From Table 4, we see that in Coma, Htex2html_wrap_inline357 does not contribute to the first component (weak correlation), which makes the first component a general indicator of metal line strengths. For the high redshift clusters, Htex2html_wrap_inline357 does contribute, while Fe does not, so Htex2html_wrap_inline357 and the light elements are important for the first component in the high redshift clusters. In the second component, the Balmer lines show only weak correlations, while both samples are strongly correlated with Fe, showing that in both environments there is at least one physical factor, unrelated to the Balmer line strengths, which drives the heavy element line strengths.

The spectroscopic differences between Coma and the high redshift sample are real and their interpretation will provide an insight into galaxy evolution in rich cluster environments. We briefly address the question of interpretation below.


Next Section: Discussion
Title/Abstract Page: A Statistical Comparison of
Previous Section: Analysis
Contents Page: Volume 15, Number 3

Welcome... About Electronic PASA... Instructions to Authors
ASA Home Page... CSIRO Publishing PASA
Browse Articles HOME Search Articles
© Copyright Astronomical Society of Australia 1997
ASKAP
Public