A fortune 500 Hedge Fund was looking to quantify beneficial hiring characteristics and to develop predictive hiring indicators to filter candidate applications. They had 10 years of unstructured free-text, both through resumes, third-party data, and interview notes. This contained large amounts of unstructured (free text, scans, emails) data. They were looking to standardize this data for improved analysis and to reveal non-standard correlative success factors.
Vertical: Talent Analytics
Model: SVM/Random Forest
In an effort to systematically improve data standardization and quantify the hiring pipeline, we applied numerous data science techniques in two foundational aspects of the hiring pipeline.
Unstructured to Structured Data Processing
- We used pLSA/LDA for resume topic modeling. This was applied to extract structured attributes from the unstructured associated text.
- We applied SVM/Random forest and other models to classify and clean this extracted content based on different weighted factors provided by the SME.
- We first implemented a weighted heuristic model to establish a benchmark.
- To allow for improved and standardized candidate ranking, we used a heavily feature trained logistic regression model.