Polyglot for Natural Language Processing — An Overview

This article describes Polyglot, a Python NLP package that supports a variety of multilingual applications and provides a broad range of analysis and language coverage. Rami Al-Rfou is the creator. It consists of many of features such as

Detecting languages (196 Languages)
Tokenization is a term used to describe the process of (165 Languages)
Recognizing Named Entities (40 Languages)
Tagging of Parts of Speech (16 Languages)
Analysis of Public Opinion (136 Languages) and a lot more

Let's start by installing the following packages:

For a quick and painless installation, use Google Colab.

"pip install polyglot"

"# installing dependency packages

pip install pyicu "

"# installing dependency packages

pip install Morfessor "

"# installing dependency packages

pip install pycld2"

Download some necessary models

Use Google colab for easy installation of models

"%%bash

polyglot download ner2.en # downloading model ner

%%bash

polyglot download pos2.en # downloading model pos

%%bash

polyglot download sentiment2.en # downloading model sentiment

Code: Language Detection

"from polyglot.detect import Detector

spanish_text = u"""¡Hola ! Mi nombre es Ana. Tengo veinticinco años. Vivo en Miami, Florida"""

detector = Detector(spanish_text)

print(detector.language)"

It correctly identified the text as spanish with a 98 percent confidence level.

Code: Tokenization

Tokenization is the breakdown of sentences into words, and even chapters into sentences.

"# importing Text from polyglot library

from polyglot.text import Text

sentences = u"""Suggest a platform for placement preparation?. GFG is a very good platform for placement

preparation."""

# passing sentences through imported Text

text = Text(sentences)

# dividing sentences into words

print(text.words)

print('\n')

# separating sentences

print(text.sentences)"

It has divided the sentences into words and even separated the two different sentences.

Code: Named Entity Recognition:

Polyglot recognizes three categories of entities:

Location
Organization
Persons

"from polyglot.text import Text

sentence = """Google is an American multinational technology company and Sundar Pichai is the CEO of Google"""

text = Text(sentence, hint_language_code ='en')

print(text.entities)"

I-ORG refers to organisation

I-LOC refers to location

I-PER refers to person

Code: Part of Speech Tagging

"from polyglot.text import Text

sentence = """GeeksforGeeks is the best place for learning things in simple manner."""

text = Text(sentence)

print(text.pos_tags)"

ADP stands for adposition, ADJ for adjective, and DET stands for determiner.

Code – Sentiment Analysis

"from polyglot.text import Text

sentence1 = """ABC is one of the best university in the world."""

sentence2 = """ABC is one of the worst university in the world."""

text1 = Text(sentence1)

text2 = Text(sentence2)

print(text1.polarity)

print(text2.polarity)"

1 refers that the sentence is in positive context

-1 refers that the sentence is in a negative context

Polyglot for Natural Language Processing — An Overview

Code: Language Detection

Code: Tokenization

Code: Named Entity Recognition:

Code: Part of Speech Tagging

Code – Sentiment Analysis

You May Also Like!