"Native AI Built to Power Data-Driven Hiring" - SmartAssistant White Paper
We believe the future of repetitive recruiting processes is automation. Rather than spending time and company resources on non-value added tasks, teams will leverage recruiting tools driven by artificial intelligence (AI) to make better hiring decisions, allowing more time to build meaningful relationships with candidates while improving the bottom line. AI recruitment solutions, which enable hiring teams to quickly execute a number of recruiting functions—especially when sourcing and screening candidates for open roles—are becoming increasingly vital for businesses to compete in a data-driven market.
In our white paper, we explain what AI is, common recruitment AI tools, and the key functionalities that distinguish truly innovative AI solutions. You can find the white paper here: https://www.smartrecruiters.com/resources/landing/whitepaper-smartassistant/
How does it work?
The MatchScore is an aggregate of sub-scores for how close different aspects are to the expectations of a job (for example, education, skills, previous job titles), these parts are queried in context of larger pools (other candidates, historical data) and normalised/standardised. The calculation of the score will depend on the requirements of the actual job and what the system predicts is the most important things to "have" based on job advert description. We use a number of different models trained/based on application data, job description data, and external data (e.g. public datasets, established taxonomies).
What data elements are being looking at?
We are extracting data to process from two key areas, the candidate profile and the available resume. There are future plans to also include information in the SmartR profile but this is planned for H2 2022. From those data sources we are extracting the following fields:
Job title and job family
Job seniority level
Required skills and tools
Required or desired education level stated in job description
Required certificates (e.g. forklift license)
Jobs held (current and in the past)
Description of each position (if any)
Skills and capabilities demonstrated in past positions
Tenure: in total and per job held
Highest education level (e.g. High School, Bachelors, Post Doctoral)
Education subject/major (e.g. Mathematics, Business)
Institutions attended (e.g. University of California, Los Angeles)
Diplomas/Certificates (e.g. forklift license certification)
How has it been trained?
SmartAssistant is based on state-of-the-art natural language and machine learning models which have been trained/fine-tuned with years of job and resume data collected in multiple countries. We’ve taken this job data, as well as additional market data that looks at trends in industries, products, and positions, to be able to get the most holistic view of every candidate. We’ve used this to build internal representations of various job descriptions, and we enrich that data with additional information that might be missing from job descriptions and resumes. The models, when originally built, used data that we painstakingly analyzed, validated and extensively cleaned to ensure our foundation was of the highest quality.
What infrastructure is required to process our data?
SmartRecruiters has a data science pipeline with multiple services running a collection of different machine learning classifiers that regularly read and canonicalize job and profile data. This service ensures that critical job and profile data, such as location, title, company, and skill, can be standardized across the database, indexed for search and matching, and made available for use on the application, people and discover interfaces.
What design decisions did we make when creating SmartAssistant?
We made an important decision to not use a black-box approach to machine learning to produce automated decisions. While this concept lends itself well for domains such as medical image processing or autopilot systems where a highly biased algorithm is desired, it would result in negative outcomes if applied to hiring decisions. This is why SmartAssistant, as the name suggests, is to assist recruiters by restacking, highlighting and helping to prioritise the order candidates are reviewed and is not designed to make decisions in an automated way.
We do not develop custom AIs for each of our customers individually in order to avoid training algorithms on small, and inherently biased, data sets. Instead, we've trained our algorithms on a very large data set containing job descriptions and profiles gathered over many years. This is what makes SmartAssistant valuable because it is able to review more job descriptions and profiles in minutes than thousands of recruiters might see in their entire careers.
We designed SmartAssistant by modularising the support we give to the decision-making process, with classifiers and other models being encapsulated in each part so they can be independently audited and improved upon. Examples of such modularised units include "determine demonstrated hard skills and their complexity", "determine studied subjects in relation with positions held, or "determine a pattern of career progression" and so on. This approach allows us to tweak and improve on the parts in a standalone way knowing that any improvements will improve the overall calculation quality and/or speed. It also means that we can expand the solution over time by adding, modifying, replacing or removing each component if needed and take advantage of future developments in the Machine Learning industry.
We utilise machine learning techniques to determine how each of our specific classifiers should be applied across different jobs, industries, and candidate types. For instance, the system might decide to place a lower significance on a candidate's work history if they're just out of college and applied to an entry-level job compared to a mid level manager where work history has more relevance.
What do you do to reduce or remove potential bias?
As referred to in our design decision information, SmartAssistant uses white-box (not black-box) machine learning models because they allow our Data Science team to better interpret the outcome of the model when reviewing the inputs which makes bias identification much more straightforward to assess and therefore improves our ability to reduce/eliminate them. Black-box models, given their complex nature, are harder, and in some situations, almost impossible to identify bias with confidence.
Every piece of candidate data that is used to train or test our models is anonymized. This means that no personal information (PII) is being used in the training of the algorithm as it is being removed prior to it being used. Data is aggregated across applications across all clients in the model used to calculate the MatchScore so nothing could be directly attributed back to any one client or candidate.
Another reason why we anonymise our input data is to reduce the risk of Gender and Racial bias. A candidate's name provides a major indicator of their gender and/or their ethnic background so any Machine Learning model that retained this data connected to their experience or education is certainly introducing bias into their model and is exactly why we remove it before the data touches our solution. We are also converting Job Titles into non-gendered standardised versions before they are used in our models so for example Waiter and Waitress would be treated identically for analysis.
We also test our models against a large random sample of our candidate data to identify any issues with model bias in general.
The mechanisms behind the MatchScore calculation are not customizable by the users or companies who use the system. This is to avoid introducing bias back into the process and allows us to maintain control. This is also true for the Feedback submitted by users which is reviewed manually by our Data Science team, again to catch any form of bias before it ends up impacting our models.
How often is SmartAssistant updated and how are changes validated?
The different components contributing to the calculation of the MatchScore are reviewed as and when there is reason to believe that there is more/better data available and/or changes in application or job description data have been detected. We have a Data Science team who are working on constantly improving the system and once an improvement has been developed we conduct manual/human inspection of the results as well as evaluating at scale statistically for individual components. This could be as simple as validating that we have extracted the right skills from a job description to complex statistical analysis of a model.
Can you share the technical breakdown of each model and ML technique used?
The short answer is No. We do not share technical details of the algorithm as that information is proprietary. If you have specific questions you would like us to answer then we request these are written down and submitted to the Product Manager who will work to answer them with as much detail as possible without exposing the inner workings of the solution.