Guided by Guru Gemma2: Exploring Tibetan's Language Relatives

Great work! very nice to see this!

As for the accuracy of the result cluster, I think that it all depends on the level of details you are considering, and the type of criteria that are taken into account for the comparison.

I am most surprised to find Armenian, which is a language that is not related with Tibetan. Linguistic reconstructions make the hypothesis that the Tibetan plateau has been first populated by tribes traveling towards China, some of which populated Tibetan climbing from the side of Chengdu until Tibet while the rest continued into China. So the languages such as Kazakh, Tamasheq, and Tamazight seem to very roughly correspond to that hypothesis.

From the point of view of actually related languages, I only see Dzongkha, then Burmese. Most others may be geographically close to Tibet, they are nonetheless far from Tibetan language.

Then, the other way of comparing languages is to compare individual traits, such as clustering all languages that have singular/plural agreement for verbs versus those that don’t, or all languages that conjugate verbs versus those that don’t have conjugation. Or even still putting together all languages that make sentences using the order “Subject Object Verb”.

It looks to me that it is what is seen here: the languages clustered must share some traits with Tibetan language. As for to which ones and how relevant those traits are to compare languages, I have no idea. I would expect to find Nepali in such a comparison, but for some reason, Nepali is not found.

Tibetan language has integrated some of the features of sanskrit because of the texts imported from India and translated, which might explain why there are many Indian languages in your clusters.

Finally, on a side note, comparing translations (instead of actual texts in those respective languages) of these 200 languages is a HUGE bias and a huge input point of uncontrollable data that would make your results unusable altogether in a linguistic research context.

Yet, it is interesting to see what can be done using this methodology. Well done! Thank you for your efforts to keep pushing forwards Tibetan language!