Antti Leino and Saara HyvÖnen
- Published in print:
- 2009
- Published Online:
- September 2012
- ISBN:
- 9780748640300
- eISBN:
- 9780748671380
- Item type:
- chapter
- Publisher:
- Edinburgh University Press
- DOI:
- 10.3366/edinburgh/9780748640300.003.0010
- Subject:
- Linguistics, Applied Linguistics and Pedagogy
Languages are traditionally subdivided into geographically distinct dialects, although any such division is just a coarse approximation of a more fine-grained variation. This underlying variation is ...
More
Languages are traditionally subdivided into geographically distinct dialects, although any such division is just a coarse approximation of a more fine-grained variation. This underlying variation is usually visualised in the form of maps, where the distribution of various features is shown as isoglosses. Component models such as factor analysis can be used to analyse spatial distributions of a large number of different features — such as the isogloss data in a dialect atlas or the distributions of ethnological or archaeological phenomena — with the goal of finding dialects or similar cultural aggregates. However, there are several such methods, and it is not obvious how their differences affect their usability for computational dialectology. This chapter addresses this question by comparing five such methods (factor analysis, non-negative matrix factorisation, aspect Bernoulli, independent component analysis, and principal components analysis) with two data sets describing Finnish dialectal variation. There are some fundamental differences between these methods, and some of these have implications that affect the dialectological interpretation of the results.Less
Languages are traditionally subdivided into geographically distinct dialects, although any such division is just a coarse approximation of a more fine-grained variation. This underlying variation is usually visualised in the form of maps, where the distribution of various features is shown as isoglosses. Component models such as factor analysis can be used to analyse spatial distributions of a large number of different features — such as the isogloss data in a dialect atlas or the distributions of ethnological or archaeological phenomena — with the goal of finding dialects or similar cultural aggregates. However, there are several such methods, and it is not obvious how their differences affect their usability for computational dialectology. This chapter addresses this question by comparing five such methods (factor analysis, non-negative matrix factorisation, aspect Bernoulli, independent component analysis, and principal components analysis) with two data sets describing Finnish dialectal variation. There are some fundamental differences between these methods, and some of these have implications that affect the dialectological interpretation of the results.