This article describes Polyglot, a Python NLP package that supports a variety of multilingual applications and provides a broad range of analysis and language coverage. Rami Al-Rfou is the creator. It consists of many of features such as
Let's start by installing the following packages:
For a quick and painless installation, use Google Colab.
"pip install polyglot"
"# installing dependency packages
pip install pyicu "
"# installing dependency packages
pip install Morfessor "
"# installing dependency packages
pip install pycld2"
Download some necessary models
Use Google colab for easy installation of models
"%%bash
polyglot download ner2.en # downloading model ner
%%bash
polyglot download pos2.en # downloading model pos
%%bash
polyglot download sentiment2.en # downloading model sentiment
"from polyglot.detect import Detector
spanish_text = u"""¡Hola ! Mi nombre es Ana. Tengo veinticinco años. Vivo en Miami, Florida"""
detector = Detector(spanish_text)
print(detector.language)"
It correctly identified the text as spanish with a 98 percent confidence level.
Tokenization is the breakdown of sentences into words, and even chapters into sentences.
"# importing Text from polyglot library
from polyglot.text import Text
sentences = u"""Suggest a platform for placement preparation?. GFG is a very good platform for placement
preparation."""
# passing sentences through imported Text
text = Text(sentences)
# dividing sentences into words
print(text.words)
print('\n')
# separating sentences
print(text.sentences)"
It has divided the sentences into words and even separated the two different sentences.
Polyglot recognizes three categories of entities:
"from polyglot.text import Text
sentence = """Google is an American multinational technology company and Sundar Pichai is the CEO of Google"""
text = Text(sentence, hint_language_code ='en')
print(text.entities)"
I-ORG refers to organisation
I-LOC refers to location
I-PER refers to person
"from polyglot.text import Text
sentence = """GeeksforGeeks is the best place for learning things in simple manner."""
text = Text(sentence)
print(text.pos_tags)"
ADP stands for adposition, ADJ for adjective, and DET stands for determiner.
"from polyglot.text import Text
sentence1 = """ABC is one of the best university in the world."""
sentence2 = """ABC is one of the worst university in the world."""
text1 = Text(sentence1)
text2 = Text(sentence2)
print(text1.polarity)
print(text2.polarity)"
1 refers that the sentence is in positive context
-1 refers that the sentence is in a negative context