Polyglot for Natural Language Processing — An Overview

Polyglot for Natural Language Processing — An Overview

This article describes Polyglot, a Python NLP package that supports a variety of multilingual applications and provides a broad range of analysis and language coverage. Rami Al-Rfou is the creator. It consists of many of features such as

  1. Detecting languages (196 Languages)
  2. Tokenization is a term used to describe the process of (165 Languages)
  3. Recognizing Named Entities (40 Languages)
  4. Tagging of Parts of Speech (16 Languages)
  5. Analysis of Public Opinion (136 Languages) and a lot more

Let's start by installing the following packages:

For a quick and painless installation, use Google Colab.

"pip install polyglot"      

"# installing dependency packages

pip install pyicu "          

"# installing dependency packages

pip install Morfessor "      

"# installing dependency packages

pip install pycld2"         

Download some necessary models 

Use Google colab for easy installation of models 

"%%bash

polyglot download ner2.en    # downloading model ner

%%bash

polyglot download pos2.en    # downloading model pos

%%bash

polyglot download sentiment2.en  # downloading model sentiment

Code: Language Detection 

"from polyglot.detect import Detector

spanish_text = u"""¡Hola ! Mi nombre es Ana. Tengo veinticinco años. Vivo en Miami, Florida"""

detector = Detector(spanish_text)

print(detector.language)"


It correctly identified the text as spanish with a 98 percent confidence level.

Code: Tokenization

Tokenization is the breakdown of sentences into words, and even chapters into sentences.

"# importing Text from polyglot library

from polyglot.text import Text

sentences = u"""Suggest a platform for placement preparation?. GFG is a very good platform for placement

preparation.""" 

# passing sentences through imported Text                             

text = Text(sentences)

# dividing sentences into words                   

print(text.words)               

print('\n')

# separating sentences

print(text.sentences)"


It has divided the sentences into words and even separated the two different sentences. 

Code: Named Entity Recognition: 

Polyglot recognizes three categories of entities: 

  • Location
  • Organization
  • Persons

"from polyglot.text import Text

sentence = """Google is an American multinational technology company and Sundar Pichai is the CEO of Google"""

text = Text(sentence, hint_language_code ='en')

print(text.entities)"


I-ORG refers to organisation 

I-LOC refers to location 

I-PER refers to person 

Code: Part of Speech Tagging 

 "from polyglot.text import Text

sentence = """GeeksforGeeks is the best place for learning things in simple manner."""

text = Text(sentence)

print(text.pos_tags)"


ADP stands for adposition, ADJ for adjective, and DET stands for determiner.

Code – Sentiment Analysis 

"from polyglot.text import Text

sentence1 = """ABC is one of the best university in the world."""

sentence2 = """ABC is one of the worst university in the world."""

text1 = Text(sentence1)

text2 = Text(sentence2)

print(text1.polarity)

print(text2.polarity)"


1 refers that the sentence is in positive context 

-1 refers that the sentence is in a negative context