UMAP Exploration In Petrology
This article is intended to explore the use of UMAP for visualizing geochemical data in petrology. This example will build on the UMAP article, which does an excellent job at explaining dimension reduction, UMAP, t-SNE, and the differences between them. Here we add a geological context to the algorithms to better understand how dimension reduction algorithms can be used and understood.
Here we use the Quartz-Alkali Feldspar-Plagioclase (QAP) diagram with randomized data points of varying composition to show how UMAP reduces the dimensions and preserves the relationships in the original dataset. Using some parameters will result in a near identical mapping to the QAP diagram, showing that UMAP can preserve the relationships.
QAPF (left) and UMAP (right) are linked. Hovering highlights corresponding points. Controls for UMAP are integrated into a single panel.
Due to limitations in plotly.js we cannot show proper QAPF diagrams with dual-ternary plots, but can for QAP. Therefore, we opted to show both without so both look visually the same.
To understand UMAP and other dimension reduction algorithms we must first understand the relationships in the QAP diagram. Using QAP we know that granite and tonalite samples do not share a common boundary, but they both share boundaries with granodiorite. Increasing n_neighbors to a higher values (e.g., 50+) will result in UMAP preserving this relationship, while lower values (e.g., 5) will not, also showing how parameter choices impact the results.
Therefore, the granite points will be next to and closer to granodiorite points, but will never be next to or closer to tonalite points. UMAP projections preserve the relative distances between points, but not the original high n-dimensional topology. Since our projections are fairly low complexity with high n_neighbors and high min_dist, we can see that UMAP almost recreates the original data, visually but also rotated and squished. This is not always the case, but it demonstrates that the relative relationships are preserved, and will be regardless of data complexity.
UMAP with QAP and Intrusive/Extrusive
Here we explore UMAP with QAPF data, with an added in a binary intrusive/extrusive label. With higher parameter values we can see that UMAP clusters out these two groups, and preserves the QAPF relationships within each group. This shows that UMAP can also preserve categorical relationships in addition to numerical relationships.
Since the QAPF data with intrusive/extrusive has more data influence for the 4 QAPF dimensions, and only 1 categorical dimension, with lower n_neighbors we can see that the single categorical dimension has less influence on the UMAP projection. This is best explained in Figure 3 on the original UMAP article.
UMAP with Metamorphic Mineral Assemblages
Here we explore UMAP using metamorphic mineral assemblage data. Metamorphic rocks are classified based on their mineral assemblages, which are influenced by the pressure and temperature conditions during metamorphism. We use the presence or absence of key index minerals (e.g., garnet, staurolite, kyanite, sillimanite) as categorical variables in UMAP.
With the mineral assemblage data and hovering over points, we can see that specific clusters occur based on the mineralogy. Where two clusters have two different colors(rock type), they represent the underlying continuous nature of metamorphic conditions. As this
UMAP with Metamorphic Percentage Mineral Assemblages
Here we explore UMAP using metamorphic mineral assemblage data with percentage compositions. Instead of just presence/absence of key index minerals, we use the percentage of each mineral in the rock as numerical variables in UMAP. This allows us to see how UMAP handles continuous compositional data and how it preserves relationships between samples with similar mineral percentages. With certain parameters we can observe the expected gradational relationships between mineral assemblages, reflecting metamorphic facies series.
UMAP with Sedimentary Rocks and Dott (1964) Classification
Acknowledgements
Thanks to the original UMAP article authors for their excellent work and explanations. You can find the original code for this article on GitHub. You can view my website at jarrodburges.com.