Description
Correspondence analysis (CA) has a special place in data science as it analyses data measured on categorical scales and results in visualizations that facilitate interpretation and understanding of multivariate categorical data. CA is primarily a method of unsupervised learning, that is it is designed to identify structures that are latent in the data, for example dimensions that identify the greatest differences in the observations as well as their similarities and groupings. Inherent in CA is a measure of distance that quantifies the proximities between observations, based on categorical data, that can also lead to formal identification of clusters. The method has found extensive applications in sociology, linguistics, archaeology and genetics. This one-day workshop will focus on its applicability and usefulness in sensometrics.
In the first part of the course, before lunch, I will explain the basic ideas of correspondence analysis, including measures of distance and the interpretation of the biplot, both at the heart of the data visualization. Some simple applications in sensometrics will be presented as well as the implementation of the method using R software.
After lunch, the way CA extends to more complex data will be explained, the most important extension being multiple correspondence analysis (MCA), which treats several categorical variables
simultaneously. The case of multi-way data in the CA/MCA context is also considered; for example, multi-block or multi-occasion data. Some more challenging sensometric data sets will be analysed, where the full versatility of the approach is demonstrated. Finally, the role of CA/MCA in supervised learning is discussed, when there is a specific response variable being modelled in terms of categorical predictors.