Before we dive deep into how to apply machine learning and artificial intelligence (AI) for NLP and text analytics, let’s clarify some basic ideas.
Machine learning for natural language processing and text analytics involves using machine learning algorithms and AI to understand the meaning of text documents.These documents can be just about anything that contains: text, social media, comments, survey responses, online reviews, even medical, legal, financial and regulatory documents.
The role of machine learning and AI in NLP and text analytics is to enhance, expedite and automate the underlying text analytics functions and NLP features that turn unstructured text into usable data and insights.
Artificial Intelligence and linguistics use NLP to comprehend and understand the deeper meanings. This matrix is devoted to making computers understand the statements or words written in human languages. The main purpose of machine learning is to make the user’s work easy and meet the wish to communicate with the computer in natural language.
We know that one cannot perfect in all languages and do not have enough time to learn new languages or get perfection in it.
Research about NLP further furnishes tools and systems that are an integral part of the deeper workings of NLP. These tools make NLP what it is today. Let's take a deeper look at these tools that include: Sentiment Analyzer, Parts of Speech (POS) Taggers, Chunking, Named Entity Recognition (NER), Emotion detection, Semantic Role Labeling
Sentiment analyzer
Extraction of sentiments of a given topic is achieved by sentiment analyzer. Topic-specific features term extraction, sentiment extraction by sentiment analysis and association by relationship analysis.
The sentiment lexicon and the sentiment pattern database are two types of resources used by sentiment Analysis. It analyses for positive and negative words and tries to give ratings on a scale -5 to +5.
Parts of Speech (POS) Taggers
To tag and classify words as verbs, a noun as well as adjectives of any language like Arabic, Sanskrit, Hindi, etc. NLP uses parts of speech tagger.
European languages are easy in procedures for parts of speech. But Asian languages or middle eastern languages are hard. Treebank technique is used by Sanskrit part of speech tagger. Support vector machine used by Arabic to automatically tokenize, parts of speech tag and annotate base phrases in Arabic text.
Chunking
Chunking is also called as Shadow Parsing. It uses syntactic correlated keywords like Noun Phrase and Verb Phrase (NP or VP) for labeling segments of sentences. Every word has a unique tag often marked as Begin Chunk (B-NP) tag or Inside Chunk (I-NP) tag.
CoNLL 2000 shared task plays an important role in chunking. It provides test data for chunking. Features composed of words, POS tags, and tags are used by this system.
Named Entity Recognition (NER)
Some people do not use traditional or standard English. In those places, Name Entity Recognition plays an important role. It results in the degradation performance of standard natural language processing tools substantially.
Annotation of the phrases or tweets with building tools trained on unlabelled in the domain and out domain data improves the performance as compared to standard natural language processing tools.
Emotion detection
Emotion Detection is similar to sentiment analysis, but it works on social media platforms where mixing of two languages (English + Any other Indian Language) is prominent. Based on emotions it categorizes statements into six groups. During this process, ambiguous words which are common in a given regional or local language and English gets tagged in a lexical category or parts of speech in the mixed script. This happens by identifying the base language of the speaker.
Semantic Role Labelling
For analyzing the sentiment of a sentence, sentiment role labeling is used. For example, in the PropBank formalism, one assigns roles to words that are arguments of a verb in the sentence.
SRL consists of the following stages:
- Creating a parse tree
- Identifying which parse tree nodes represent the arguments of a given verb
- Finally classifying these nodes to compute the corresponding SRL tags