Bias-Proof Hiring: How Skillmint Builds Fair AI-Powered Interviews

skillmintai
Aug 14, 2025
5 min read

Updated: Dec 15, 2025

This post talks about some of the things we do at Skillmint to avoid discrimination and bias when evaluating interviews. It should go without saying that this is fundamentally the right thing to do, it leads to better hiring and a fairer society. It is part of Skillmint's core mission and is a key determinant in how our systems are built.

In the UK and EU, and most of the wider world it is also a legal requirement. In the UK through the Equality Act and in the EU through Council Directive 2000/78/EC. As AI becomes more prevalent in recruitment, led by companies like Skillmint, designing fair and equitable systems will be super important. The EU AI Act classifies recruitment as a high risk category that comes with a multitude of compliance requirements.

Interestingly but not unexpected given the training data, AI has been sometimes found to mirror the biases of humans. If you have some time to kill, this article talks about cognitive bias in AI in medicine and is fascinating. If you have even more time to learn about cognitive biases, and have already spent some of that talking to us about Skillmint 😀, I can also recommend the amazing book Thinking Fast and Slow by Daniel Kahneman. Anyway, back to the point, if AI can mirror cognitive biases, we can be pretty confident it has the potential to be biased in hiring too.

Step 1 - Learning from Humans

We are pretty blessed in the recruitment sector to have a large volume of high quality research around bias in recruiting, much of which is still completely applicable to AI evaluators. Understanding this better helps to design better systems.

The research shows that there are some common biases that occur in regular interviews, which I'll briefly describe below.

Horns or Halo Effect

This bias occurs often at the start of interviews where a superficial trait, even something like the candidate’s attractiveness, initial confidence, or a good or bad answer to the first question can colour the rest of the interview, causing interviewers to mark higher or lower for the rest of the interview.

Affinity Bias

This is where interviewers unconsciously favour candidates who look like them or share similar backgrounds and characteristics. Interestingly, there is evidence that LLMs favour content generated by the same model over something generated by other models and will score it higher eg. A GPT-4o will score a text by GPT-4o than something by Claude Sonnet.

Panel Drift

This occurs when interviewers have informal discussions on candidate performance before finalising scoring, which can lead to some interviewers changing their scores to conform to the rest of the group.

Information Leakage

Sometimes panels can inadvertently receive personal data or information about protected characteristics on the candidates which can skew their evaluations.

Recency and Order Bias

This often happens during long running interview processes where the first and last candidates receive different scores (often higher) than if they had been at a different point of the process due to the unconscious behaviour of interviewers to overweight answers most easily recalled in memory.

Step 2 - Solving the problems

Some of the problems above, like remembering all the information or overrating attractive people, are instantly solved by AI, others require a bit more work.

We are doubly blessed in the recruitment sector to have a ton of research into solutions to those issues, which combined with recent advances (as of mid-2025) in AI evaluation, such as LLMs-as-judges, can be designed into our evaluator to guard against any biases.

Solution - Structured Marking Criteria

For Skillmint to work, consistency is key. In human interview evaluation, we can be struck by Horns/Halo and Drift from candidate to candidate. In AI, inconsistencies can also be a killer due to a lack of a consistent baseline for scoring.

This is where a structured marking criteria comes in. By creating this based on the job description, requirements and additional notes, and scoring every candidate's answers against it, we end up with a much more consistent and fairer scoring amongst candidates. Skillmint then performs additional consistency checks, but this structured marking criteria provides a cornerstone for mitigating bias in this way.

Solution - Multi Model Marking

If you haven't ever read about Francis Galton and his observations on predictions and the wisdom of crowds on making accurate predictions, it is particularly fascinating, especially when you realise it's over 100 years old. The work has been fundamental in classical machine learning and still holds true for this use case.

The basic idea is that models are biased in some random direction, but the more independent models you add, the more the biases cancel out and the closer to the accurate result you get. For us, this aligns with the issue of Panel Drift. By using multiple LLMs from different providers, with different training data sets and architectures, we get more accurate and less biased evaluations.

Solution - Explicit Instructions

With our model structure in place, we can move on to some prompt engineering. We have a large body of instructions for our models, but I'll focus specifically on those around protected characteristics.

Thanks to strong legal structures and documents, we can create very explicit system instructions to always take protected characteristics into account, giving detailed descriptions of what these characteristics mean. Research also shows that LLMs perform better when provided with examples beforehand and we have found that to be the case with our evaluators, so we give detailed examples in the system prompt.

We also explicitly instruct our model on judgement. We want to evaluate in a fair and unbiased way, and mistaking someone's response as inauthentic when it is not is completely out of the question. Therefore, when there is doubt from the evaluators in terms of inauthenticity, the benefit of that doubt is always given to the candidate.

Solution - The Evaluator Evaluator

With all those safeguards in place, we can be confident that people have been evaluated fairly. However, at Skillmint, we want to be absolutely certain. So we add in an evaluator for our evaluators. If you haven't heard of LLMs-as-judges or juries then this paper from Cohere is a good read.

Once all the models have made their evaluations, we pass these along with our marking criteria, transcript and any other information to our evaluator evaluator. This agent thoroughly combs through the answers and evaluations, looking specifically for bias against protected characteristics. If it finds anything, it goes back to the offending evaluator model and tells it to improve its answers. This process happens again and again until the evaluator evaluator is happy. Only then do we have our final scores.

Step 3 - Humans

The final step is obviously a human in the loop. At Skillmint, we want to emphasise the fact that LLMs are a tool to help humans, not replace them.

We have put multiple things in place to help mitigate bias, and all of this information and thinking is given to humans to make a final decision. This is good practice, not just for recruitment, but for any decisions that have significance. Humans will always have more understanding of the specific context or unique elements that might not be in an LLMs training data. Together they can make effective decisions at a pace nowhere near possible before.