The Thirty-Second Innovative Applications of Artificial Intelligence Conference (IAAI-20)
Implicit Skills Extraction Using Document
Embedding and Its Use in Job Recommendation
Akshay Gugnani,
1
Hemant Misra
2
1
IBM Research - AI,
2
Applied Research, Swiggy, India
[email protected], hemant.misra@swiggy.in
Abstract
This paper presents a job recommender system to match
resumes to job descriptions (JD), both of which are non-
standard and unstructured/semi-structured in form. First, the
paper proposes a combination of natural language process-
ing (NLP) techniques for the task of skill extraction. The per-
formance of the combined techniques on an industrial scale
dataset yielded a precision and recall of 0.78 and 0.88 respec-
tively. The paper then introduces the concept of extracting im-
plicit skills the skills which are not explicitly mentioned in
a JD but may be implicit in the context of geography, industry
or role. To mine and infer implicit skills for a JD, we find the
other JDs similar to this JD. This similarity match is done in
the semantic space. A Doc2Vec model is trained on 1.1 Mil-
lion JDs covering several domains crawled from the web, and
all the JDs are projected onto this semantic space. The skills
absent in the JD but present in similar JDs are obtained, and
the obtained skills are weighted using several techniques to
obtain the set of final implicit skills. Finally, several similar-
ity measures are explored to match the skills extracted from a
candidate’s resume to explicit and implicit skills of JDs. Em-
pirical results for matching resumes and JDs demonstrate that
the proposed approach gives a mean reciprocal rank of 0.88,
an improvement of 29.4% when compared to the performance
of a baseline method that uses only explicit skills.
1 Introduction
Formal job search and application typically involves match-
ing one’s profile or curriculum vitae (CV) with the available
job descriptions (JD), and then applying for those job oppor-
tunities whose JDs are the closest match to one’s CV, and
also considering his/her needs, constraints, and aspirations.
A few of the things that a person may consider while do-
ing this matching are: a) required skills mentioned in the
JDs and skills possessed by self, b) current salary versus
salary offered in the new job, c) future prospects after join-
ing the new job, etc. Some of the entities are easy to extract
from a JD, for example, the salary offered in a job. However,
some other entities, for example, skill extraction (are Python
and Java an animal and an island in Indonesia, respectively,
or two object-oriented programming languages) and future
prospects of a company (it is subjective as well as depen-
dent upon market conditions), need serious consideration.
Though tremendous progress has been made in general
purpose search engines, job search engines have made only
This author contributed during his tenure in IBM
Copyright
c
2020, Association for the Advancement of Artificial
Intelligence (www.aaai.org). All rights reserved.
modest progress. The reasons for this could be several, and
some of them are: a) CVs and JDs are typically not writ-
ten the way well-formatted articles are written in newspa-
pers etc, b) CVs may contain tables and other formatting
features to make them look attractive, but this makes it dif-
ficult to obtain relevant information from them, c) matching
skill keywords between CVs and JDs may not yield good
results because of the complex link between the skills, d)
the JDs may be too descriptive or too simplistic, and may
not uncover the essence of the offered positions and roles.
Considering all of these, and also other factors, automated
job search engines need research investment to make them
robust and improve their performance. This paper proposes
methods to overcome some of the above-mentioned chal-
lenges.
The fundamental premise this paper builds upon is that
skills are one of the most important aspects while matching
CVs to JDs, and play a major role in recommending JDs
which are the best match for a CV. The following are the
important contributions of this paper:
A new methodology which combines several natural lan-
guage processing (NLP) techniques for robust skill ex-
traction from natural occurring texts in CVs and JDs is
proposed
An approach for inferring implicit skills in a JD (skills
not explicitly mentioned in the JD) is introduced, and a
method to extract these implicit skills from other similar
JDs is presented
A bi-directional matching algorithm to match skills be-
tween CVs and JDs is suggested to obtain the most rele-
vant job recommendation for each CV
The rest of the paper is organized as follows: In Section 2,
related work from the areas of skill extraction and job rec-
ommendation is briefly described. Data sources used in this
paper and proposed approach are explained in Sections 3
and 4, respectively. The performance of the proposed job
recommendation system is presented in Section 5 followed
by conclusions.
2 Related Work
We have divided the related work into two broad sections:
1. Skills Extraction or Taxonomy Generation Systems
2. Job Recommendation Systems
13286
2.1 Skills Extraction/Taxonomy Generation
A candidate acquires skills through formal education, vo-
cation, internships, and/or previous jobs’ experience. In due
course of time, the candidate may start identifying (new) rel-
evant jobs based on the basis of these acquired skills. The
key function of a job search engine is to help the candidate
by recommending those jobs which are the closest match
to the candidate’s existing skill set. This recommendation
can be provided by matching skills of the candidate with
the skills mentioned in the available JDs. A common ap-
proach while doing a skill match is to use standard keyword-
matching or information retrieval framework as explained in
Salton and Buckley (1988). A few challenges of this kind of
approaches are: a) The skill may be mentioned in different
forms or in terms of synonyms (e.g. cplusplus, c++; pro-
gramming, scripting, etc.) in CVs and JDs, b) There could
be skills that may not be specified in a candidate’s profile or
a JD, but can be easily determined by business knowledge
(for example, ‘java’ being an object-oriented programming
(OOP) language, its experience also indicates experience of
OOP), and c) A skill could be an out of dictionary skill, that
is, a not-so-common skill-term missing in the dictionary or
from a new unseen domain for which the system may not
have skills. A framework for skill extraction and normal-
ization was proposed in Zhao et al. (2015). In this paper, a
taxonomy of skill was built and Wikipedia was utilized for
skill normalization. In Kivim
¨
aki et al. (2013), authors pro-
posed a system for skill extraction from documents primar-
ily targeting towards hiring and capacity management in an
organization. The system first computes similarities between
an input document and the texts of Wikipedia pages and then
uses a biased, hub-avoiding version of the Spreading Activa-
tion algorithm on the Wikipedia graph to associate the input
document with skills.
Colucci et al. (2003) introduced the concept of implicit
skills. Inspired by their work we have explored a new
method in this paper to mine implicit skills using word and
document embeddings. In Lau and Sure (2002), authors de-
scribed a methodology for application-driven development
of ontologies, with a sample instantiation of the methodol-
ogy for skills ontology development. In Bastian et al. (2014),
the team at LinkedIn built a large-scale topic extraction
pipeline that included constructing a folksonomy of skills
and expertise and implementing an inference and recom-
mender system for skills.
2.2 Job Recommendation Systems
The main idea of a job recommendation system is to provide
a set of (job) recommendations in response to a user’s cur-
rent profile. In these systems, the users typically can upload
their skills or resume or their job search criterion; similarly,
the employers or their agents can upload JDs or skills set
needed etc along with information such as location, position
and other job specific details.
In the job matching space, many studies have been con-
ducted to discuss different recommender systems related to
the recruiting problem as explored in the survey by Al-Otaibi
and Ykhlef (2012). Among them, Malinowski et al. (2006),
discussed a bilateral matching recommendation systems to
bring people together with jobs using an Expectation Max-
imization (EM) algorithm, while G
¨
olec¸ and Kahya (2007)
delineated a fuzzy model for competency based employee
evaluation and selection with fuzzy rules. Paparrizos et
al. (2011) used Decision Table/Naive Bayes (DTNB) as hy-
brid classifier. To accomplish the task of job recommenda-
tion, these systems used many manual attributes and vari-
ous information retrieval techniques. However, such recom-
mendation systems typically are only effective within a sin-
gle organization where there are standardized job roles. At
an industry sector level such as Information Technology or
across such different industry sectors (such as retail, insur-
ance, health care), mining and recommending the most rel-
evant career paths for a user is still an unsolved research
challenge.
In the literature, many authors have used machine learn-
ing algorithms also for developing a job recommendation
system. The idea of modeling the career paths and predict-
ing the job transition was proposed in Mimno and McCal-
lum (2008). In this paper, a topic model was used for dis-
covering latent skills from resumes. In Liu et al. (2016), a
time-based ranking model was applied to historical obser-
vations, and a recurrent neural network (RNN) was used to
model sequence properties to perform the task of job recom-
mendation. In our previous work, Gugnani et al. (2018), we
proposed the generation and representation of an individual
user profile in context of their skills, and generated a tempo-
ral skill-graph that can be used to recommend optimal career
path by suggesting next set of skills to acquire. Recently Lin
et al. (2016) used a convolutional neural network (CNN) to
match resumes with relevant JDs. Recent advances in job
recommendation are covered in the survey by Al-Otaibi and
Ykhlef (2012).
3 Data Sources
We mined the web to extract a heterogeneous mixture of
JDs from various open-source websites. The entire dataset
consists of 1.1 Million mined JDs. It has a substantial mix
from multiple domains like IT/Software, Health-care, Re-
cruiting, Education and 48 other such domains. This data is
used to train our Word2Vec and Doc2Vec models which are
explained further in Section 4. Since no standard large open
source dataset exists for the task of CV to JD matching, we
approached a research team (Maheshwary and Misra (2018))
who had worked on this problem using deep Siamese Net-
work. The dataset borrowed from them consists of 1314 re-
sumes which came in as a part of summer research intern
application at their company and a set of 3809 JDs from var-
ious domains. We have used this dataset for our full job rec-
ommender system evaluation so that we can compare our
results with some existing published results.
In order to identify a term or phrase as a skill, we created
a base skill dictionary as proposed in Gugnani et al. (2018).
This skill dictionary is created by mining the web for data-
sources rich in skill phrases and field terminologies (Onet
1
1
https://www.onetonline.org/
13287
and ComputerHope
2
). The validation of these skill-terms
was done with the help of various techniques explained
in Section 4. This skill dictionary contains 53,293 skill-
terms, and has a diverse mixture of skill-terms across various
domains ranging from basic soft skills to domain-specific
skills.
4 Proposed Approach
Our proposed system consists of 3 core units that are shown
in the system architecture in Figure 1.
Figure 1: System Architecture and Flow Diagram
The system includes:
Skill extraction module
Module identifying similar JDs given a JD
Module matching skills from candidate profile to skills
from JDs
Each of these modules is elaborated in the following sub-
sections. As shown in Figure 1, there are two inputs to the
system. Input 1 is a candidate profile and input 2 is a set
of JDs. The system is designed such that different inputs are
processed asynchronously. In Figure 1, the steps for process-
ing input 1 are labelled with Steps A
1
to A
3
, and the steps for
processing input 2 are labelled with Steps B
1
to B
5
. When
a candidate’s CV is passed to the Skill Extractor module,
it extracts skill-terms from the CV and assigns them to the
candidate’s profile. Input 2, which has the input JDs, is pro-
cessed first by the Doc2Vec model and then for every JD
best matching ‘n’ JDs are retrieved, This is followed by the
process of extracting implicit skill-terms for each JD the
detailed processing steps of this process are explained in the
below sub-section.
4.1 Skill Extraction
The Skill Extractor module is used to identify and extract
skill-terms from a given piece of natural unstructured text
these skill-terms can be single-word or multi-word phrases.
As shown in Figure 1, the Skill Extraction module processes
both the candidate profile as well as the JDs to identify po-
tential skill-terms from the unstructured input text. The mod-
ule parses each individual sentence in the input as a single
2
https://www.computerhope.com/
unit of raw text. The Skill Extraction module leverages sev-
eral natural language processing (NLP) techniques to iden-
tify skill-terms (when we refer to a skill-term or a skill-
phrase, we imply a single word or multiple words which
may be a skill). As shown in Figure 2, we have broadly clas-
sified them into the following four modules: a) Named En-
tity Recognition (NER), b) Part of Speech (PoS) Tagger, c)
Word2Vec (W2V), and d) Skill Dictionary.
Figure 2: Skill Identification Flow Diagram
From the raw text, each of the front three modules extracts
a set of terms along with a module-specific “score” for each
extracted term. Based on this score, the term/phrase is iden-
tified as a “Probable Skill” as shown in Figure 2. Combin-
ing the four module-specific scores, we compute an overall
“relevance score” which indicates how likely is an identified
term/phrase a skill.
Named Entity Recognition (NER)
NER is a subtask of information extraction that seeks to lo-
cate and classify terms occurring in natural text into pre-
defined categories such as names of persons, organizations,
locations, expressions of times, quantities, monetary values,
percentages, etc. Apart from usual NER, the Watson NLU
3
services also identify keywords, entities and concepts from
natural text. We leverage them to identify and validate noun-
phrases as skills or technological terms. We parse the input
text through this NLU system and generate a list of terms
that are identified in these three categories. We classify this
list as a set of “Probable Skills” SA (in Figure 2). These
terms are further processed with a Word2Vec model to as-
sign them a relevance score. This step is explained with an
example in the sub-section on Combined Flow (Section 4.1).
Part of Speech (PoS) Tagger
PoS Tagging is the process of marking up a word in a given
text (corpus) as corresponding to a particular part of speech,
based on both its definition and its context, which is its re-
lationship with adjacent and related words in a phrase, sen-
tence, or paragraph. A simplified form of this is the identifi-
cation of words as nouns, verbs, adjectives, adverbs, etc.
To identify how skill-terms are represented in a plain text
or JD, we did a manual annotation exercise to identify text
patterns of skill-terms occurrence. We asked five domain ex-
perts to manually label and annotate in few hundred JDs the
3
https://console.bluemix.net/docs/services/natural-language-
understanding/categories.html
13288
terms/keywords that they recognize as skill-phrases. During
this exercise we observed that the skills are very subjective
and vary not only based on an industry or job requirement
but also in context of the person evaluating it. For instance,
“writing” may not be regarded as a skill or important skill
for a “Software-Developer” role. Hence to identify a term as
skill we took the help of inter-annotator agreement - for a
term to be a skill, majority of the annotators must identify it
as a skill.
We then processed the same set of JDs through the Stan-
ford Core NLP Parser or PoS Tagging to identify noun,
verbs and adverb phrases and obtained their parsed tree.
We analyzed these JDs and manually annotated how the
skills commonly occurred in terms of PoS tags. Based on
this data we developed generalized set of rules to iden-
tify potential occurrence of skills in a sentence. Based
on the inter-annotator agreement we identified and de-
fined rules where three or more annotators had identi-
fied a skill-term and the PoS tag occurrence of those
terms. We leveraged these patterns/rules in plain text to
identify potential new skill-terms that may not be present
in our skill dictionary. An example of a rule can be -
If a sentence has a comma separated list of nouns, where
one or more nouns is a skill then the other set of nouns are
probably skills.
These rules weer programmed into our system and exe-
cuted on encountering such an instance in any new text. An
example case is discussed in the sub-section on Combined
Flow (Section 4.1).
Word2Vec (W2V)
W2V, in the work of Mikolov et al. (2013), is a group of
related models that are used to produce word embeddings.
W2V takes as its input a large corpus of text and produces a
vector space, typically of several hundred dimensions, with
each unique word in the corpus being assigned a correspond-
ing vector in that vector space. Word vectors are positioned
in the vector space such that words that share common con-
texts in the corpus are located in close proximity to one an-
other in the vector space.
As depicted in Figure 2, our W2V model plays an inte-
gral role in identifying and learning new skill phrases. It is
trained on a corpus consisting of 1.1 Million JDs, varied over
multiple domains, including, but not limited to, IT/Software,
Health-care, Recruiting, Education and 48 other such do-
mains. It also includes the Wikipedia pages of the terms
present in the skill-dictionary.
A W2V model tokenizes using white spaces, hence identi-
fying a single-word skill-terms is straightforward. However,
in order to improve our results, we operate under the as-
sumption that a skill-term can be a set of words or phrase as
well. Some instance of multi-word skills can be “Web Devel-
opment”, “Computer Programming”, “Proficient at Client
Retention”, “Hard Working”, Ability to Work Under Pres-
sure” etc. Since we are unaware as to which phrases may be
a skill-term we assume that if vectors for individual words
from the skill phrase are close to each other in the vector
space than their average will be close to this cluster and
would be a good representation of the skill phrase. Hence
for a multi-word phrase detected as potential skill-phrase,
we break the multi-word phrase to single words and average
the vectors of single words to obtain a vector for the phrase.
In the embedded W2V space, every potential skill-term
is compared with the skill-terms existing in the skill dic-
tionary. The highest W2V cosine similarity of the potential
skill-term with these known skill-terms is assigned as the
potential skill-term’s skill-score. We also leverage user feed-
back mechanism to learn new skills and improve the scoring
of new identified skills over-time. The assigned skill-score
indicates relevance score of each potential skill-term, which
in turn indicates how likely is the identified term or phrase a
skill.
Skill Dictionary
We curated a base knowledge source of skill-terms that
we use as the initial Skill Dictionary. The base skills were
curated by mining various online public dataset resources
which had well classified labels on terms identified as skills.
These terms were then validated by a team of three anno-
tators as skill-terms by using Wikipedia page and category
information as an additional validation step. The selected
keywords made the initial Skill Dictionary. We focused on
Information Technology and soft-skills as the starting skill
domains to create the initial Skill Dictionary. Few of the re-
sources we mined were Onet and Hope list. This initial dic-
tionary had 53,293 skill-terms.
This initial Skill Dictionary is further enhanced by our
skill-learning and feedback mechanism, which results in a
dynamically expanding skill dictionary that autonomously
expands to skills of new domains - the process for this is
explained in the following sub-sections.
Combined Flow
In this section we illustrate how the Skill Extraction system
works. Consider we have the following sentence from a JD:
“Need candidates with ability to code in Python, Java, and
Octave.
The above sentence is sent as input raw text to the Skill
Extraction system (Figure 2). It passes through its different
modules as described below:
The sentence is passed to the NER module. The NER
module identifies and generates a list of Entities, Key-
words and Concepts. A combined list of terms thus gen-
erated is referred to as SA. Each term in the list is also
assigned a score denoted by S
1
which indicates the con-
fidence level of the identified term to be a probable skill.
The value S
1
can take depends upon how many services
(Entity, Keyword and Concept) identified it as a probable
skill-term. For the sample sentence, the following terms
are identified as probable skills:
SA =
candidate, code, python, java
The same sentence is also passed to the PoS tagger mod-
ule. Here the sentence’s syntactical structure is compared
with the previously learned pre-defined rules. For the
sample sentence, the system rule identifies the sentence
to have a comma-separated list of nouns. The rule then
checks with the skill dictionary to find if any of the nouns
13289
in the comma separated list is a known skill. Two nouns
from the list map to known skills “Python” and “Java”.
This being in complete agreement with the pre-defined
rule, the rule gets fired suggesting that the third term is
also a likely skill, which in this case is “Octave”. Hence
the rule is fired for potential skill set, which is referred to
as SP.
SP =
octave, python, java
Each term in the list is assigned a score denoted by S
2
which indicates the confidence level of the identified term
to be a probable skill. The value that S
2
can take depends
upon how many pre-defined rules matched the syntactic
structure and identified the term as a probable skill-term
The same sentence is also parsed by the 3 skill-
dictionaries (Onet, Hope and Wikipedia), which identify
“Python” and “Java” as skills terms. For the sample sen-
tence, SD will hold the terms “Python” and “Java”.
SD =
python, java
A term may occur in only one dictionary or two out of the
three dictionaries or all the three dictionaries. Each dictio-
nary in which a term occurs assigns a weight to that term.
For example, if a term occurs in all the three dictionaries,
all the three dictionaries assign a weight to it. However,
if a term occurs only in one of the dictionaries, only that
dictionary assigns the weight to it, The combined score
assigned by the three dictionaries to each term is denoted
by S
3
.
S
1
,S
2
,S
3
can have values after normalization in the range
of [0,1] where 0 indicates a term is less likely to be a proba-
ble skill while 1 indicates a term is more likely to be a prob-
able skill.
Once a sentence has been processed by the three parallel
modules, the union of these lists forms the “Probable Skill
Set”.
Probable Skill Set = SAUSPUSD
Probable Skill Set =
candidate, code, python, java, octave
Using W2V model described earlier, vector representa-
tion of each phrase in the Probable Skill Set is obtained.
This is compared with W2V vector representation of each
skill-phrase found in the Skill Dictionary. For each phrase
(of the Probable Skill Set), the cosine similarity between the
phrase and all the skill-phrases found in the Skill Dictionary
is computed in the vector space, and the max cosine similar-
ity score is found. This max cosine similarity score for each
phrase of Probable Skill Set is stored in S
4
- it represents
the max cosine similarity score of the given phrase with the
skill-phrases of the Skill Dictionary.
Each identified phrase is assigned a “Relevance Score”,
which indicates how likely it is for the given phrase to be a
skill. This score is computed by the following formula:
x =
αS
1
+ βS
2
+
3
n=1
γ
n
S
3
n + λS
4
α + β +
3
n=1
γ
n
+ λ
(1)
where x is the Relevance Score.
Table 1 shows the weights assigned to the outputs (terms
identified as “Probable Skills”) of various modules. The
weights were arrived through empirical evaluations, and
those giving the best results were chosen and are shown here.
Table 1: Parameters’ Weight to Compute Relevance Score
Module Weight Symbol Weight Value
PoS α 1
NER β 1
ONet Dictionary γ
1
20
Hope Dictionary γ
2
10
Wikipedia Dictionary γ
3
20
W2V λ 2
Using the formula defined for “Relevance Score”, a fi-
nal score is computed for each phrase present in the Proba-
ble Skill Set. This score is a normalized value and ranges
between [0,1]. Based on empirical evaluation, we kept a
threshold of relevancy at 0.35 - any term with relevance
score below that value is removed from the “Skill List”.
For the sample sentence, from the “Probable Skill Set”, the
phrase “candidate” has a relevancy score of 0.24 and hence
gets removed from the final skill list generated.
Skills =
python, java, code, octave
Performance of Skill Extraction Module
We used the Skill Extraction system described in the previ-
ous sections on a set of 100 JDs. These JDs were selected
from a corpus that is not used for training the Skill Extrac-
tion system and is a part of the test dataset. In addition, these
JDs were not seen by the developers who developed the Skill
Extraction system. The JDs and the skill-terms extracted by
the system were then given to four selected annotators to
score the extracted skill-terms in terms of relevance as a
skill. A score of 0 was given for a term that has been in-
correctly extracted as a skill (false positive) while score of
1 was given for a term that has been correctly extracted as a
skill (true positive). We also asked the annotators to analyze
and annotate skill-terms which were missed or not extracted
but should have been extracted by the system.
Table 2: Skill Extraction Results for some Sub-Systems and
Full-System for 100 JDs
Skill Dictionary NLU Dic+ Full System:
Match NLU+ Dic+NLU+
W2V W2V+PoS
Yes 811 695 1002 1158
No
0 849 392 320
Total 811 1544 1394 1474
Missed
395 511 204 158
Precision 1.00 0.45 0.72 0.78
Recall
0.67 0.58 0.83 0.88
F1-Score
0.80 0.51 0.77 0.83
13290
Table 2 shows the analysis of the output of the skill ex-
tractor module (Full System: Dic+NLU+W2V+PoS) run
on 100 JDs. For the 100 JD dataset, the module extracted to-
tal of 1474 skills out of which 1158 were identified as skills
(Yes) and 320 were not identified as skills (No) by the an-
notators. As per the annotators’, there were 158 skills which
the module failed to identify. The performance in terms of
precision/recall/F1-score (F1-score is the harmonic average
of the precision and recall) is also shown in the table. Per-
formance of some sub-systems are also shown in the same
table for comparison. As expected, dictionary based sys-
tem (Dictionary Match) has very high precision but low
recall. On its own, NLU based system NLU) performs very
poorly. Combined systems (Dic+NLU+W2V and Full Sys-
tem: Dic+NLU+W2V+PoS) show better performance than
individual systems (Dictionary Match and NLU), and Full
System performs the best among all the systems.
We observe that the skill extraction system on the dataset
of 100 JDs has a precision of 0.78 and a recall of 0.88, giving
an F1-score of 0.83. This may be a better performance than
that of a related system Javed et al. (2017). However, it has
to be noted that the lack of their evaluation dataset makes
the direct comparison difficult.
In order to check the generalization of the module’s per-
formance, the same experiment was repeated on a larger
dataset of 275 JDs. The results obtained were similar and
consistent with our initial findings on 100 JDs’ dataset,
showing the convergence of statistics on a larger dataset.
In fact we observe that the performance-metrics are slightly
better on 275 JDs. A possible reason for this could be that
the randomly selected 100 JDs for the first experiment were
from a rather difficult sample.
Table 3: Skill Extraction Performance
System Precision Recall Accuracy F1-Score
100 JDs 0.78 0.88 0.71 0.83
275 JDs 0.80 0.93 0.75 0.86
4.2 Identifying Similar JDs
The premise of our proposed approach is to identify implicit
or latent skills and use them to improve job recommenda-
tion for candidates. We define an implicit/latent skill as one
which has not been directly or explicitly mentioned in a JD
but may be relevant for the job role. For instance, if a JD
is for the role of an assistant and mentions that the suitable
candidate needs to be well versed with Microsoft Word, this
JD has an inherent implicit skill of ability to operate a com-
puter. Our system populates every JD with relevant implicit
skills and uses this knowledge for better recommendations.
To identify an approach to extract latent or implicit skills
we analyzed hundreds of JDs. We found that for similar JDs
often certain skills were omitted depending on how the JDs
were written. We theorized that a union of skills of similar
JDs would result in a set of skills which are consistent with
the original job role. This premise was evaluated by taking
a few JDs and analyzing skills from them and their similar
JDs the hypothesis was found to be true in general.
In order to effectively obtain relevant similar JDs, we
crawled, mined and curated a list of over 1.1 Million JDs
from various online portals and job listings as described in
Section 3. These jobs were selectively extracted from mul-
tiple sources, spreading over a wide and vast set of domain
and job roles. To identify similar JDs we decided to try with
a generic approach of finding similar documents which is
to cluster similar jobs based on simple information retrieval
framework based on term frequency inverse document fre-
quency (TF-IDF). We generated results with modified query
representations in Apache Lucene. For comparison, we also
performed an analysis in the Doc2Vec vector space to find
similarity between JDs. We evaluated both the methods by
calculating Mean Reciprocal Rank (MRR).MRR is a statis-
tic measure for evaluating any process that produces a list of
possible responses to a sample of queries, ordered by the
probability of correctness. We observed that the MRR of
Doc2Vec’s ranked output was 0.63 and was much higher
than that of Lucene (0.51). Therefore we decided to use
Doc2Vec to find similar JDs.
Doc2Vec model was trained on 1.1 million JDs. We per-
formed experiments for hyper-parameter tuning to iden-
tify suitable parameter values for Doc2Vec. The follow-
ing were found suitable: Vector Size=300, Window=8,
min
count=5, alpha=0.0254, min alpha=0.001, Workers=25
and train
epoch=500. Based on this model of Doc2Vec, each
individual JD from the input set is passed to the Doc2Vec to
find upto 10 similar JDs from the training corpus. We use
a threshold value of 0.59 similarity or higher this value
was obtained by manual validation of the top matches and
their relevance when compared to the input JD. These top
10 similar JDs are tagged with the input JD. We pass these
(input JD and the similar JDs) to the Skill Extractor system.
The skills obtained from each input JD is tagged as explicit
skills and the skills extracted from the top 10 similar JDs are
tagged as the “probable implicit skills”. These explicit and
implicit skills are then passed to the Matching algorithm.
4.3 Matching Candidate CV and JD
At this stage, we have skills extracted from a candidate’s
profile and the explicit and probable implicit skills extracted
from each JD. We need a method to compare and match the
skills at both sides and generate a list of best matching JDs
for each CV. This is a bipartite graph matching problem.
During experiments, it was observed that between a set of
candidate’s skills and a set of JD’s skills, it is highly likely
that one set has a greater number of skills than the other set.
This would imply that if we did matching of skills along
1-way, then in general a JD with relatively more number of
skills than the other JDs would likely to match better with all
the CVs. Similarly if a candidate’s skills are more than that
of the JDs, there will be a likelihood of more JD matches
(and higher scores) for that candidate. Both these scenarios
could result in a poor matching leading to a poor job recom-
mendation. To overcome this challenge, we decided to per-
form matching from both sets and then compute an affinity
score to remove the variance. We performed a greedy maxi-
13291
mal match from the smaller set of skills to the larger set and
a maximum matching from the larger set to the smaller set
of skills. Once both these scores are computed, an Affinity
Score” is generated for the JD by averaging out scores from
both the matching pairs. The Affinity Score” measures how
suitable the recommendation is for the given candidate and
JDs, and ranges from [0,1] with 0 being a poor match and 1
being a strong match.
Computing the Affinity Score
In greedy maximal match, given a bipartite graph, it gives
the best match from left to right such that the net score of
the match is highest. A maximum match is a matching that
contains the largest possible number of edges. We may note
that every maximum matching is maximal match, but not
every maximal match is a maximum matching. We take into
account various heuristics for the matching to take place and
for an edge to be formed between a candidate’s set of skills
and a JD’s set of skills.
If there is a skill which is present in both the sets (CV
skills and JD skills) it will result in an “Edge Score” of 1
and will match to one another. We remove the common skills
from both the sets and assign them an edge-weight of 1, and
the count of total number of common skills are added in
order to account for them in the affinity score.
It was noticed during experiments that there is a strong
correlation of the following factors when comparing skills:
a) Cosine Similarity using W2V, b) Frequency Factor of
the skill in the document, and c) Boosting based on Ex-
plicit/Implicit skill. Hence the edge weight computation was
designed to incorporate these factors. The following formula
is defined for computing the edge-weight when a successful
match has been found:
Y =(ω
1
E
1
+ ω
2
E
2
+ ω
3
E
3
)/(ω
1
+ ω
2
+ ω
3
) (2)
where, Y = Edge Weight, E
1
is the cosine-similarity score
between the skills obtained by the W2V model and E
2
is
the frequency factor score for a skill that is calculated as
the Total Frequency of the skill across all the documents di-
vided by the number of documents. E
3
for explicit skills
is given a value of 1 since they are directly relevant across
both the documents. In contrast, for implicit skills, E
3
is
given a value of 0.5. In the matching algorithm, an edge
is only formed between two skills from either set based on
their edge score. The other parameters for edge-scoring are
listed in Table 4. The edge score is calculated by giving 50%
value to the cosine similarity, 20% value to the frequency
and 30% value to boosting values for being an explicit skill.
The Affinity Score” of the JD is then calculated as the av-
erage of all the Edge Weights it has.
Table 4: Parameters for Edge Weight Computation
Factor Symbol Weight
Cosine Similarity Score W2V ω
1
0.5
Frequency Score ω
2
0.2
Explicit-Implicit Score ω
3
0.3
5 Results: Matching CV and JD
In absence of any standard large open source dataset for
job recommendation task, this paper uses the dataset from
Maheshwary and Misra (2018) so that we can compare re-
sults. To validate the recommendation algorithm we created
2 sub-datasets. In the first sub-dataset we selected a set of
25 Candidate profiles and 100 JDs, and in the second we se-
lected a set of 200 Candidate profiles and 10,000 JDs. We
created these 2 separate sub-datasets to observe if perfor-
mance varies across a small and large dataset.
We obtained the top 10 recommended JDs for a candidate
resume from our system (a) first without using the implicit
skills and (b) then using the implicit skills. We got them
evaluated by 2 hiring experts and the results are shown in
Table 5.
Table 5: Performance for 25 Candidates’ Profiles
System A@1 A@3 A@5
System without Implicit Skills 0.68 0.76 0.88
System with Implicit Skills 0.88 0.96 1
We observe that the accuracy of our system for the high-
est/first recommended job is 0.68 without the use of implicit
skills. With the implicit skills being used the accuracy goes
up to 0.88. We additionally observe that the accuracy at the
5th recommendation is at 1 when considering implicit skills
this means that while using the implicit skills our system
is able to recommend a perfect job match within the top five
recommendations.
When compared with the system of Maheshwary and
Misra (2018), which uses siamese network for recommenda-
tion on the same dataset of 25 candidate profiles, our system
shows an overall improvement of 6.67% in recommending
relevant jobs.
The performance of the proposed system for the larger set
of 200 candidate profiles is shown in Table 6, and the per-
formance trend is seen to be similarly replicated. We observe
the accuracy with implicit skills at the 1st recommendation
is 0.84 and the accuracy at the 5th recommendation is at
0.98. The results of this experiment gives us confidence that
the proposed method using implicit skills generalizes across
the two datasets and hence would likely be consistent over a
further larger industry application dataset.
Table 6: Performance for 200 Candidates’ Profiles
System A@1 A@3 A@5
System with Implicit Skills 0.84 0.95 0.98
6 Conclusions and Future Work
In this paper, we have proposed a novel framework for job
recommendation a skill extraction technique is introduced
to identify and infer implicit skills for each JD that may not
be explicitly mentioned in the original JD. When these im-
plicit skills are used along with typically extracted explicit
13292
skills to generate the recommendations, our initial results in-
dicate that an improved and better set of job recommenda-
tions are obtained on a locally established and previously
published dataset.
In addition, the paper proposed a generalizable ensemble
method for skill extraction from unstructured text of resumes
as well as JDs. Our ensemble consists of sub-modules such
as a Watson-driven NLU system (for extracting concepts,
keywords and entities), a PoS tagging system whose out-
put PoS patterns were mapped for skill identification and
an expandable dictionary whose base dictionary was seeded
from several online open source knowledge bases. The per-
formance of the proposed ensemble method was compared
with that of the manual annotators an accuracy of more
than 0.90 and F1-score of more than 0.83 was obtained on
two different job datasets. Moreover, several scoring algo-
rithms were explored for matching skills extracted from re-
sumes with (explicit and implicit) skills extracted from JDs.
Due to non-availability of standard large open source
dataset for job recommendation task, we evaluated our sys-
tem on a dataset used in Maheshwary and Misra (2018).
Though our results on this dataset are better than the ones re-
ported in the original paper, our immediate next step would
involve using a more diverse dataset, a stronger evaluation
of the system by including more resumes and leveraging
additional techniques for skill identification and extraction.
Our future work in this space will involve generating ranked
recommendations on different career path options that opti-
mally utilize the accumulated skill and experience of a can-
didate. We also intend to use skill graph for inferring profes-
sional growth of a user and leveraging that for better recom-
mendations. This could help a professional in understanding
where he/she stands when compared to his/her peers. Appli-
cation of skill graph for professional growth inference could
also help in comparing two organizations in terms of profes-
sional growth of their employees. We propose using these
skill graphs to infer Skill-Gap in a candidate profile and use
this as an additional recommendation to the user. Addition-
ally the system can be used to analyze cost of acquiring a
skill and recommend better skills on which to get trained.
7 Acknowledgments
This work was guided by our colleague Danish Contractor
who provided expertise that greatly assisted this research.
The deployment work was supported by our colleagues Pad-
manabha Venkatagiri Seshadri and Vivek Sharma.
We are also immensely grateful to Renuka Sindhgatta and
Bikram Sengupta for their comments on an earlier versions
of the manuscript and support for this work.
References
Al-Otaibi, S. T., and Ykhlef, M. 2012. A survey of job
recommender systems. International Journal of Physical
Sciences 7(29):5127–5142.
Bastian, M.; Hayes, M.; Vaughan, W.; Shah, S.; Skomoroch,
P.; Kim, H.; Uryasev, S.; and Lloyd, C. 2014. Linkedin
skills: large-scale topic extraction and inference. In Proceed-
ings of the 8th ACM Conference on Recommender systems,
1–8. ACM.
Colucci, S.; Di Noia, T.; Di Sciascio, E.; Donini, F. M.;
Mongiello, M.; and Mottola, M. 2003. A formal approach
to ontology-based semantic match of skills descriptions. J.
UCS 9(12):1437–1454.
G
¨
olec¸, A., and Kahya, E. 2007. A fuzzy model for
competency-based employee evaluation and selection. Com-
puters & Industrial Engineering 52:143–161.
Gugnani, A.; Kasireddy, V. K. R.; and Ponnalagu, K. 2018.
Generating unified candidate skill graph for career path rec-
ommendation. In 2018 IEEE International Conference on
Data Mining Workshops (ICDMW), 328–333. IEEE.
Javed, F.; Hoang, P.; Mahoney, T.; and McNair, M. 2017.
Large-scale occupational skills normalization for online re-
cruitment. In AAAI, 4627–4634.
Kivim
¨
aki, I.; Panchenko, A.; Dessy, A.; Verdegem, D.;
Francq, P.; Bersini, H.; and Saerens, M. 2013. A graph-
based approach to skill extraction from text. Proceedings of
TextGraphs-8 Graph-based Methods for Natural Language
Processing 79–87.
Lau, T., and Sure, Y. 2002. Introducing ontology-based
skills management at a large insurance company. In In Pro-
ceedings of the Modellierung 2002, 123–134.
Lin, Y.; Lei, H.; Clement Addo, P.; and Li, X. 2016. Machine
Learned Resume-Job Matching Solution. ArXiv e-prints.
Liu, K.; Shi, X.; Kumar, A.; Zhu, L.; and Natarajan, P. 2016.
Temporal learning and sequence modeling for a job recom-
mender system. In Proceedings of the Recommender Sys-
tems Challenge, 7. ACM.
Maheshwary, S., and Misra, H. 2018. Matching resumes to
jobs via deep siamese network. In Companion of the The
Web Conference 2018 on The Web Conference 2018, 87–
88. International World Wide Web Conferences Steering
Committee.
Malinowski, J.; Keim, T.; Wendt, O.; and Weitzel, T. 2006.
Matching people and jobs: A bilateral recommendation ap-
proach. In HICSS.
Mikolov, T.; Sutskever, I.; Chen, K.; Corrado, G. S.; and
Dean, J. 2013. Distributed representations of words and
phrases and their compositionality. In Advances in neural
information processing systems, 3111–3119.
Mimno, D., and McCallum, A. 2008. Modeling career path
trajectories.
Paparrizos, I. K.; Cambazoglu, B. B.; and Gionis, A. 2011.
Machine learned job recommendation. In RecSys.
Salton, G., and Buckley, C. 1988. Term-weighting ap-
proaches in automatic text retrieval. Information processing
& management 24(5):513–523.
Zhao, M.; Javed, F.; Jacob, F.; and McNair, M. 2015. Skill:
A system for skill identification and normalization. In AAAI,
4012–4018.
13293