Welcome, this time, I want to share with you the solution to a problem I had a few days ago, how to know what language a text string belongs to?
Throughout my research, I found several possible solutions, unfortunately some did not meet 100% of what I needed, for example, failed to detect certain specific languages, or had a particular cost.
In the end, I found a solution that I found extraordinary and it is free. To show it to you, I have created a new Console project in Visual Studio, which I called “DetectingLanguagesDemo”.
I'm going to install a nuget package, which by the way, I fortunately found by browsing through the dozens of results in the package search, and it's called “LanguageDetection.NETStandard”, by Pēteris Ņikiforovs.
NuGet\Install-Package LanguageDetection.NETStandard
Once the plugin has been installed, we simply have to create a new instance of the “LanguageDetector” class, to which we must indicate which languages we want to load through the “AddLanguages” method, which receives an array with the languages we want to detect, such as English:
using LanguageDetection;
var detector = new LanguageDetector();
detector.AddLanguages(new[] { "en" });
or we can use the AddAllLanguages method, which will be able to detect 53 languages.
detector.AddAllLanguages();
Finally, it is enough to execute the Detect method, passing the text string, in this case “This library is perfect for what I need”, which will return the language of the string. I will create a Console.WriteLine to see the result, and run the application.
string text = "This library is perfect for what I need";
detector.Detect(text);
You can see that the result of the execution returns the code eng, which refers to the English language. I am going to change the string to the same string in Spanish: “Esta biblioteca es perfecta para lo que necesito“, and run the program again.
The result is the spa code, which corresponds to the Spanish language. I am going to make a last test with the same string, but this time translated into French: “Cette bibliothèque répond parfaitement à mes besoins”.
Once again, we get the code fra, corresponding to the French language.
If you are wondering why the result returns 3 characters, it is because the ISO 639.3 standard is used. To make things easier for you, I share with you the link to this website, which contains the name of the languages, as well as the corresponding ISO code.
Finally, I also share with you the list inside the repository with the ISO codes of the languages supported by the nuget package.
I hope you find this information useful.