Infinity of Stars

A poem from one heart to another…

Smartphone

独家优惠奖金 100% 高达 1 BTC + 180 免费旋转




Specific Information Retrieval from Unstructured text

Two approaches are used in this project

The model is used in answering specific questions from a context given to it in the form of paragraph. I.e it processes query and context. For e.g, if we want to extract specific skill set from the CV. We will ask the question What are the skills? i.e alteration will be in the last output layer of the model.

Convention Tensors

H (Context Vector),Q (query Vector),H(Higher dim Context vector after passing through LSTM),U (Higher dim Query vector after passing through LSTM),H~ (Query fused context vector),U~(Context fused Query Vector), S(Similarity Matrix),G (Query aware context vector), M (Final output vector)

Shapes

H(dXT),Q(dXJ),H(2dXT),U(2dXJ),H~(2dXT),U~(2dXT), S(TXJ), G(8dXT),M(2dXT)
PS: d=Total number of dimensions of word2vec or GloVe model T = Total number of context words J = Total number of Query Words

The BIDAF consists of 6 layers each with different functionalities :

The first three layers are understanding the query and context to different granularity . The last three layer establishes relation between query and context and gives the required output.

The techniques used are bidirectional LSTM, character embeddings using CNN, Word embedding using Word2vec or GloVe model (Pre trained).

Bidirectional LSTM is used in effective sequence learning for NLP tasks, i.e the main objective is relating query to context and context to query. Bidirectional LSTM includes a bidirectional hidden layer. An effective approach for learning from past to future
sequences as well as learning from future to past sequences like human listeners.

Character embedding using CNN –

This is an effective use of CNN in natural language processing. Each word in a sentence is converted to a high dimensional vector representation and then stored in a 2 dimensional matrix representation. Then 1D convolution is applied over this with different filter widths to form at least 300 different layers . From all the layers max polling is done to get a 1D vector representation at character Level, this approach is used in Conv Neural Network for sentence classification.

Word2Vector embeddings-

3. Contextual embedding layers:- It is the learning phase of the model the vectors Q and H are fed into the LSTM layers to learn the sequence of information from context and query separately. This is a bidirectional LSTM to learn in depth the temporal relationship between words of both context and query individually. Since this a bidirectional LSTM so the output of forward and backward networks are concatenated row wise and the output dimension finally turns out to be a 2d X T and 2d X J dimensional tensor for context (H) and query (U)respectively. Where T an J denotes the number of words in context and query respectively.

4.Attention Flow layer:- This layer is responsible for fusion of information from query to context and from context to query. The output of this layer is G, which represents query aware vector representation of the context. This requires the use of a similarity matrix which describes the similarity between context and query i.e the two matrices H and Q. Similarity matrix is computed as

Similarity matrix calculation

Where alpha is a trainable scalar function. S is of the order T X J. And the H:t and U:j are the column vectors. Alpha can be realized as W(s)[u:h:u *h] , where W(s) is a set of trainable weights and : is vector concatenation.

Thus U˜ (2d X T) is the new query vector with all the similarity fused with the context vector.

2. Query to Context:- Finding similarity of each context word with the query words using similarity matrix as

Building the vector G(The query aware context)

An intuition of vector G can be found as an concatenation of H,U~,H~, as we do not want any loss of information due to summarization and we want the entire information to be intact without any loss.

Definition of beta

5. Modelling layer :- This layer takes G as input, passes through two layers of bidirectional LSTM and learns the temporal relationship between the words of the context which is now query aware and completely different from contextual embedding layer which learns the relationship on query and context separately. Since this is bidirectional so it again outputs M which is a 2d X T vector and each column vector takes into account the contextual information between query and context.

6. Output Layer :- This is the last layer and the input to this layer M, For example in our task of skill extraction we have to ask in query “What are the skills?”or a series of query can be
executed like “What are the educational qualification?” “What are the projects” “What are the internships?” etc.;

We obtain the probability distribution of the start index of the answer over the entire paragraph by

 For the end index of the answer phrase, we pass M to another bidirectional LSTM layer and obtain M2 ∈ R2d×T . Then we use M2 to obtain the probability distribution of the end index in a similar manner:

The Loss

logarithmic Loss

Running this once will yield only one classified skill set. So since our project is about extracting all skill set. So while looking at start index we have to look for second max,third max probabilities as well so that we are able to classify maximum number of skills in a CV with maximum accuracy. Same applies to the end points as well. In other words after finding one skill. Remove the chunk of text from start till end point of the 1st skill and then run the softmax probability over the entire text other than that
portion and repeat this until we reach in the end.

In this approach we will be using NLTK pos(parts of speech tagger or tokenizer) to separate the words into noun ,verb ,adjective etc. We will try to separate out the noun and adjective as most of the skill set are in these POS(noun and adjective) using NLTK POS tagger . NLTK builds trees i.e relationship between noun(NN) , adjective,Verb etc. present in a sentence.

NLTK pos tagger architecture

Deep Learning architectures for candidate classification

The feed ,the nouns and adjectives which may be or may not be skills we need a network architecture to
separate skills from not skills, hence we use the following architecture.

Skill classifier architecture

Since we cannot feed the LSTM with words hence the words need to be converted to vectors using word2Vec or GloVe model. We convert each word to a 50 dimensional vector.

1st LSTM Input layer

This layer is used to feed in the phrase i.e the NOUN and adjectives extracted out.

2nd LSTM layer

This layer is used to feed in the context , For the given window size n we take n neighbouring words to the right and n words to the left of our candidate phrase, vector representations of these words are concatenated into the variable length vector and
passed to the LSTM layer. We found that the optimal n=3.

Dense Layer

Dense layer is used to establish the relationship between the phrase and context i.e (Phrase+context) and it has a fixed input shape.

These 3 layers are concatenated and passed through a series of dense layers. In order to handle the sequences with variable lengths we use appropriate padding and pad all vectors to the length of the vector of maximum size. The output can be used in identifying the start and end index of the CV data using a probability distribution function and the loss can be binary cross entropy.

Thank you… Hope you enjoyed it.

Add a comment

Related posts:

You Have The Potential To Make Your Dreams Come True

But no matter who you are, we know that you want to do something somewhat interesting with your life. If that’s what really let’s your authentic self shine through then not only is it interesting…

Here Are 10 Reasons Why Chatbots May Be Better Than Mobile Apps

Developing a mobile application is favourable in many situations, but there are simpler and more obvious solutions in various cases now.

Le premier cri

Les yeux mi-clos et confortablement installée sur sa banquette molletonnée, Math observait l’extérieur. Cachée sous une couverture, elle n’avait ni chaud ni froid. Elle attendait patiemment dans un…