Comparison, Search and Clustering of Spectra
Dr Michael Madden and collaborators in NUI Galway have previously completed an R&D project, Hazard-IQ, in which new machine learning methods for analysis of chemical spectra were developed and commercialised . That work has led to two the development of innovative new methods for the analysis of mixtures of materials based on their chemical spectra.
Those methods were developed for classification (ie. identification) and quantification (estimation of concentration), but in this project we are extending them to other tasks of relevance to analysis of spectral data:
- Comparison: given two spectra, evaluate on a numerical scale how similar they are to each other (e.g. 0 “difference” if completely identical, 1 if completely different: having no materials in common)
- Clustering: given a comparison function, the difference between all pairs of spectra in a database can be computed, and the results used to identify clusters; i.e. groups of entries that are similar to each other and different from the rest
- Search: given a comparison function and the spectrum of an unknown substance, a ranked list of the most similar substances in the database can be produced; furthermore, if clustering has been performed, the unknown substance can be positioned relative to the clusters.
Although standard methods for these tasks exist, we believe that by re-tooling the classification methods we have previously developed, we can achieve better performance. The reason is that comparisons fundamentally depend on the application: from the point of view of a car mechanic, a Ford Mondeo operating as a taxi cab or as a Garda car as effectively the same, whereas from the point of view of a person trying to get home from the pub, they are very different!
This project is led by Dr Michael Madden and is funded by Enterprise Ireland's Commercialisation Fund - Proof of Concept Programme, 2008-2009.
From the end-user’s perspective, our solution will provide the following benefits not currently available:
- Handling of mixtures: Standard methods are easily confounded by mixtures, the spectra of which contain features from the underlying components. By being specifically designed to work with mixtures, our methods will be more accurate in handling mixtures.
- More accurate search: Standard search algorithms give equal weight to all parts of a spectrum, whereas our techniques give greater weight to those features that have greater discriminating power.
- Data visualisation and characterisation: Clustering techniques provide a natural way of characterising and visualising the distribution of cases in a database. However, good clustering required a good comparison function.
- Noise handling: Noise (small random variations in a signal) can impede accurate comparison and search, particularly in high-dimensional data such as chemical spectra. We have developed a range of noise-reduction approaches that can be applied to this task.