Language detection in deep learning
When we talk about deep learning, we refer to the ability to train machines to intelligently draw conclusions and perform complex tasks related to different mediums like images, videos, text, and more. In this Answer, we'll look at how to perform language prediction, a primary example of text-related tasks in deep learning.
Language detection
Language detection is a technique that takes a text or a passage as input and, based on the languages the model has been taught or trained on, detects which language is mainly used in that text.
MediaPipe and deep learning
MediaPipe is an open-source framework that offers a collection of pre-trained deep-learning models, including language prediction. The main advantage is that such a model can be easily integrated into our custom text-oriented applications.
Language detection model
In our application, we will use MediaPipe's detector.tflite model. This model is pre-trained on a specific set of languages and can effectively detect them in the text submitted.
Note: You can download this model here.
List of supported languages
The pre-trained model supports about 110 languages and returns "unknown" otherwise. The supported language codes have been mapped to the complete language names and saved in language_names.
language_names = {"unknown": "Unknown","af": "Afrikaans","am": "Amharic","ar": "Arabic","ar-Latn": "Arabic (Latin script)","az": "Azerbaijani","be": "Belarusian","bg": "Bulgarian","bg-Latn": "Bulgarian (Latin script)","bn": "Bengali","bs": "Bosnian","ca": "Catalan","ceb": "Cebuano","co": "Corsican","cs": "Czech","cy": "Welsh","da": "Danish","de": "German","el": "Greek","el-Latn": "Greek (Latin script)","en": "English","eo": "Esperanto","es": "Spanish","et": "Estonian","eu": "Basque","fa": "Persian","fi": "Finnish","fil": "Filipino","fr": "French","fy": "Frisian","ga": "Irish","gd": "Scottish Gaelic","gl": "Galician","gu": "Gujarati","ha": "Hausa","haw": "Hawaiian","hi": "Hindi","hi-Latn": "Hindi (Latin script)","hmn": "Hmong","hr": "Croatian","ht": "Haitian Creole","hu": "Hungarian","hy": "Armenian","id": "Indonesian","ig": "Igbo","is": "Icelandic","it": "Italian","iw": "Hebrew","ja": "Japanese","ja-Latn": "Japanese (Latin script)","jv": "Javanese","ka": "Georgian","kk": "Kazakh","km": "Khmer","kn": "Kannada","ko": "Korean","ku": "Kurdish","ky": "Kyrgyz","la": "Latin","lb": "Luxembourgish","lo": "Lao","lt": "Lithuanian","lv": "Latvian","mg": "Malagasy","mi": "Maori","mk": "Macedonian","ml": "Malayalam","mn": "Mongolian","mr": "Marathi","ms": "Malay","mt": "Maltese","my": "Burmese","ne": "Nepali","nl": "Dutch","no": "Norwegian","ny": "Chichewa","pa": "Punjabi","pl": "Polish","ps": "Pashto","pt": "Portuguese","ro": "Romanian","ru": "Russian","ru-Latn": "Russian (Latin script)","sd": "Sindhi","si": "Sinhala","sk": "Slovak","sl": "Slovenian","sm": "Samoan","sn": "Shona","so": "Somali","sq": "Albanian","sr": "Serbian","st": "Southern Sotho","su": "Sundanese","sv": "Swedish","sw": "Swahili","ta": "Tamil","te": "Telugu","tg": "Tajik","th": "Thai","tr": "Turkish","uk": "Ukrainian","ur": "Urdu","uz": "Uzbek","vi": "Vietnamese","xh": "Xhosa","yi": "Yiddish","yo": "Yoruba","zh": "Chinese","zh-Latn": "Chinese (Latin script)","zu": "Zulu",}
Code walkthrough
We will create a GUI application that takes text as input from the user and plots the main language and its probability as the result. Let's get started!
Imports
import sysimport matplotlib.pyplot as pltfrom PyQt6.QtWidgets import QApplication, QLabel, QLineEdit, QPushButton, QVBoxLayout, QWidgetfrom PyQt6.QtGui import QFontfrom mediapipe.tasks import pythonfrom mediapipe.tasks.python import textlanguage_names = {...}
The first step is to import the necessary modules for our code.
sysis used for window-related purposesmatplotlibis used for plotting the language resultsPyQt6is used for the GUI interaction between the user and the codemediapipeis used for obtaining the pre-trained language detection model
We also define the
language_namesin which the language codes and complete language names are mapped. For instance, "en" for "English".
detect_language function
def detect_language(input_text):base_options = python.BaseOptions(model_asset_path="detector.tflite")options = text.LanguageDetectorOptions(base_options=base_options)detector = text.LanguageDetector.create_from_options(options)detection_result = detector.detect(input_text)top_language = detection_result.detections[0].language_codetop_probability = f'{detection_result.detections[0].probability:.2f}'return top_language, top_probability
We define a function named
detect_language, which takes a single parameter calledinput_text. This is the text that the user feeds the code and aims to get the detected language against it.Next, we create an instance of the model.
base_optionscreates the configuration needed for the model and is passed the path of the model file "detector.tflite". These options are passed to thetext.LanguageDetectorOptionsfunction, which creates an instance ofoptions. The model is then created by passing the finaloptionsto the functiontext.LanguageDetector.create_from_optionsand is saved indetector.We use our
detectormodel to detect the main language of theinput_textby passing it a parameter and storing the result indetection_result.The results are analyzed, and the top language's
language_codeandprobabilityare extracted and saved intop_languageandtop_probabilityrespectively. These two variables are returned.
on_button_click function
def on_button_click():input_text = input_text_entry.text()top_language, top_probability = detect_language(input_text)top_probability = float(top_probability)top_language_full = language_names.get(top_language, top_language)plt.figure(figsize=(10, 4))plt.barh([0], [top_probability], color='maroon', alpha=0.7)plt.yticks([0], [f"{top_language_full} ({top_probability})",], fontsize=16, fontweight='bold', color='white')plt.xlabel('Probability', fontsize=18, fontweight='bold', color='black')plt.title('Detected Language', fontsize=20, fontweight='bold', color='black')plt.gca().invert_yaxis()plt.gca().set_facecolor('white')plt.gca().spines['right'].set_visible(False)plt.gca().spines['top'].set_visible(False)plt.tick_params(axis='both', colors='black')plt.text(top_probability + 0.01, 0, f"{top_probability:.2f}", va='center', fontsize=16, fontweight='bold', color='black')plt.subplots_adjust(left=0.3, right=0.95, top=0.8, bottom=0.2)plt.tight_layout()plt.show(block=False)
Our second function is defined as
on_button_click. This function's core job is to display the prediction in a user-friendly manner. It saves thetop_languageandtop_probabilityreturned by ourdetect_languagefunction and uses Matplotlib to generate a bar plot with the probability of the main language and its label. The complete language name is obtained from the mapping given bylanguage_names. The rest of the code focuses on display customizations and can be subject to change.
Note:
barhrefers to horizontal bar charts. You can learn more about them here.
main function
if __name__ == "__main__":app = QApplication(sys.argv)window = QWidget()window.setWindowTitle("Language Detection")window.setStyleSheet("background-color: white;")layout = QVBoxLayout()input_label = QLabel("Enter the text:")input_label.setFont(QFont('Arial', 18))input_label.setStyleSheet("color: black;")layout.addWidget(input_label)input_text_entry = QLineEdit()input_text_entry.setFont(QFont('Arial', 16))input_text_entry.setStyleSheet("color: black; background-color: white; border: 1px solid black; padding: 5px;")layout.addWidget(input_text_entry)detect_button = QPushButton("Detect Language")detect_button.setFont(QFont('Arial', 16))detect_button.setStyleSheet("color: white; background-color: maroon; padding: 8px;")detect_button.clicked.connect(on_button_click)layout.addWidget(detect_button)window.setLayout(layout)window.show()sys.exit(app.exec())
Finally, we put the code together in the
mainfunction. We use the Python library to create the GUI for our application and to aid in getting the text input from the user. The text is saved inPyQt5 cross platform GUI toolkit input_text_entry. Once thedetect_buttonis clicked, ouron_button_clickfunction is called, which in turn calls thedetect_languagefunction and plots the results in the window.
Executable code
Congratulations! Our language detection code is now complete. You can give it a go or perform any changes and click "Run" below to see it in action.
import sys
from PyQt6.QtWidgets import QApplication, QWidget, QLabel, QLineEdit, QPushButton, QVBoxLayout
from PyQt6.QtGui import QColor, QPalette
class MyWindow(QWidget):
def __init__(self):
super().__init__()
self.setWindowTitle("PyQt6 Example Code")
self.setGeometry(100, 100, 400, 200)
self.init_ui()
def init_ui(self):
layout = QVBoxLayout()
label1 = QLabel("Field 1:")
self.input1 = QLineEdit()
label2 = QLabel("Field 2:")
self.input2 = QLineEdit()
submit_button = QPushButton("Submit")
palette = QPalette()
label_color = QColor(0, 102, 204) # Blue color
button_color = QColor(255, 153, 0) # Orange color
palette.setColor(QPalette.ColorRole.WindowText, label_color)
palette.setColor(QPalette.ColorRole.ButtonText, button_color)
palette.setColor(QPalette.ColorRole.Button, QColor(240, 240, 240)) # Light gray
label1.setPalette(palette)
label2.setPalette(palette)
submit_button.setPalette(palette)
layout.addWidget(label1)
layout.addWidget(self.input1)
layout.addWidget(label2)
layout.addWidget(self.input2)
layout.addWidget(submit_button)
self.setLayout(layout)
if __name__ == "__main__":
app = QApplication(sys.argv)
window = MyWindow()
window.show()
sys.exit(app.exec())
Language detection demonstration
Japanese language
On giving the application a Japanese text and clicking on "Detect Language", the model accurately predicted the language i.e. Japanese, as well as its probability of 1.0 i.e. 100% on the bar plot.
French language
On giving the application a French text and clicking on "Detect Language", the model accurately predicted the language i.e. French, as well as its probability of 1.0 i.e. 100% on the bar plot.
Mixed languages
Upon giving the application a mixed text containing both French and German, the model returned German with a probability of 0.74 i.e. 74% since two out of three keywords were from the German language.
Use cases of language detection
Language detection is a task that can be used on a stand-alone basis and incorporated into many more complex designs, including but not limited to the following.
Note: Here's the complete list of related projects in MediaPipe or deep learning.
How well do you know language detection?
How do we understand what language code the model is referring to?
Using the top_language variable
Using the mapping from language_names
The model gives the full name when accessed by language_code
Free Resources