From Cron to Modern Data Stack (MDS): Dataflow Automation and Its Current State
The concept that makes the technological miracles of today possible are defined by data. Enormous amounts of data are collected...
Resumes are inconsistent. Even the best OCR parsing leaves you with lots of messy and unstructured data. Then, as a candidate moves through the application process, humans get involved. Add to the data set free form text reviews of the applicant and both linguistic and personal biases. In addition, each data source is siloed providing limited analytical opportunity.
After assessing multiple companies hiring processes, we have found 3 consistent opportunities to systematically improve hiring outcomes using NLP machine learning. The problem areas are: correctly structuring candidate resume data, assessing job fit, and reducing human hiring bias.
NLP algorithms to get diverse candidate resume data into a relational database.
A proprietary adaptive job skill table to assess whether a candidate is good for the job by analyzing extracted resume phrases.
Sentiment analysis to reduce bias in job postings.
Blue Orange Digital started with the imperative issue, isolated data sources. We designed a custom ETL to blend, clean and standardize the different data sets.
We then worked to parse and categorize free form string text to intelligently search and analyze unstructured documents. Using keyword detection classifiers, optical character recognition, and cloud-based NLP engines, we were able to scrub string text and turn it into relational data. With structured data, we provided a fast, interactive and searchable Business Analytics dashboard in AWS Quick Sight.
With a cleaned and structured data set, we were able to perform both sentiment analysis on the text and subjectivity detection to reduce candidate bias in human assessment.
We have used ML to automate resume screening and shortlist and grade candidates by learning from existing employees’ resumes. First, we used a natural language processing ML algorithm to turn the unstructured resume text into relational data. Then we built another ML algorithm that trained itself on prior employees to learn which resume data points (inputs) are correlated with successful employees to produce a shortlist of qualified candidates for the position (output). Instead of just scanning for keywords, we are able to make predictive hiring suggestions to HR. In addition, for firms who use digitized interviews, we can use machine learning technology to assess candidates’ personality and job fit by learning from successful candidates’ facial expressions and word choices.
We used sentiment analysis to identify the potentially biased language in job descriptions. The program is fed inputs with words like “aggressive” that are perceived as masculine-sounding and words such as “collaborative” that are perceived as feminine-sounding. By analyzing the words used in a job posting, the program creates an output of suggested replacement words in order to help solve the problem that these words may be discouraging female candidates from applying.