AI and Data Analytics in Equine Breeding Programmes

Panel: Kevin Blake (moderator), Anna Mackenzie (Pythia Bloodstock), Dr Sonja Egan (Horse Sport Ireland) and Tom Wilson (Racing²).

THE winners in racing and breeding will be decided not only by what happens on the track or in the sales ring, but increasingly in the way the industry collects, interprets, and acts upon data.

That was the central theme of a thought-provoking discussion featuring racing analyst and owner-breeder Kevin Blake, Anna Mackenzie of Pythia Bloodstock, Dr Sonja Egan of Horse Sport Ireland, and Tom Wilson of Racing².

The panel, part of The Irish Field’s AI in Equine Conference at Naas Racecourse, examined how analytics, genetics, and performance tracking are reshaping the bloodstock and racing landscapes.

Opening the session, Blake set the tone by acknowledging his own passion for using data to get an edge in his racing and bloodstock endeavours.

“Racing has been my passion and, in more recent years, that’s meant a much more heightened focus on a data based approach to analysing breeding, betting, bloodstock, racing, and everything associated with it.

“I think it’s fair to say that we’re in a very exciting time for this space in the racing industry. British and Irish racing has lagged behind some other jurisdictions for a long time in terms of the harvesting of data and the utilisation of it.

“I think there’s been a rapid amount of progress in recent years. So all of a sudden we have this massive glut of data to try and make sense of. And it’s so exciting, I suppose, for those of us that are into such things, to get our elbows into it and try to work it all out.”

Data science

Anna Mackenzie, a data scientist with Pythia Sports, agreed. Her company uses predictive modelling to identify betting and breeding opportunities, applying techniques already commonplace in American sports.

Having started life 10 years ago as a supplier of racing data for a global betting syndicate, Pythia has now branched into bloodstock and is specialising in finding value in the breeze-up market.

“In 2023, we identified top horses like Vandeek, but unfortunately others with deeper pockets than us did so too,” she said. “In 2024, we took a more commercial approach and again managed to identify top horses but also, crucially, identifying horses that went on to be very successful that other people in the market hadn’t necessarily seen.

“And we’ve been doing research and development into yearlings, horses in training and, one day hopefully, looking at mares, foals and optimal matings as well. The possibilities are pretty endless.”

For Dr Sonja Egan, Head of Breeding, Innovation and Development at Horse Sport Ireland, the key lies in genetics and structured breeding programmes.

Stressing that Horse Sport Ireland’s research is ‘open source’ and for the benefit of the industry rather than private interests, Egan said: “We’re very lucky that HSI is a national federation. So I was able to tap into the different areas in the organisation and ask ‘How can we take the information that we have to give the breeder more information about the animals that they’re breeding and help them to make better breeding decisions?’

“In some cases, that’s performance. In some cases, it’s general health.

DNA analysis

Egan spoke about the DNA analysis Horse Sport Ireland is currently conducting on horses registered with them through hair samples. Known as Single Nucleotide Polymorphism or ‘SNPs’, it allows the governing body for equestrian sport to copy what’s being done in the bovine sector.

She concluded by saying that, while there would always be variations in how you interpret race times, breeze-ups and conformation, “the DNA is the DNA” and can’t be argued with.

Egan also made some very interesting comments on the threat posed by gene-doping, which she claimed was not difficult for rogue breeders to attempt. When Blake noted that the British Horseracing Authority has invested heavily in tests to catch gene-dopers in the thoroughbred world, the IHRB senior veterinary officer Dr Lynn Hillyer was seen nodding in agreement.

Tom Wilson, co-founder of Racing², spoke about how his research company is helping bloodstock buyers to generate a short list of potential purchases at major sales, using AI in the selection process.

“We provide a pedigree data set on every single individual horse. That is a machine learning model that goes through five generations of the pedigree, generates a series of factors and features and then provides a pedigree score and a pedigree rating per individual.”

Computer vision

Racing² is perhaps best known for its computer vision models, which generate a series of data from a short video of each horse’s walk, typically found on a sales company’s website before a sale.

“I’m probably going to see and evaluate 8,000 yearlings in this season’s northern hemisphere yearling sales. The opportunity to generate measurements and data on those individuals at scale presents a big kind of learning opportunity when you look at these large language and AI models.

“A yearling sale is a very condensed period of time. You only have a couple of days to get around the sales grounds and do your inspections.

“That’s a lot of work for a human to get through and also to stay mentally sharp. You’re probably not going to apply the same evaluation or interpretation to the 1,000th horse you’ve seen as you did to the first one.

“The human eye and brain is just going to get a little bit tired, whereas the computer doesn’t.

“The computer generates data and scores horse 1,000 in exactly the same way that it generates data on horse one, so technology is taking human fatigue out of the equation and allowing us to collect data on a systematic level.”

Kevin Blake famously put stride data analysis into the mainstream when he correctly predicted the first three home in this year’s Derby using that metric, and Anna Mackenzie cited an example of how stride data was also important in assessing breeze-up purchases.

“We had a couple of very interesting cases from 2023, where two horses breezed in identical times. But their stride data was very different. One horse was consistent in how he worked up to his top speed, while the other was all over the place.

“His data suggested to us that he would struggle to maintain that speed for much more than two furlongs and, sure enough, he was always tailed off in his races. The other horse went to be rated in the high 90s.”

Blake, who has been using data analysis to inform some recent yearling sales purchases, would surely agree.

The AI in Equine Conference was supported by the Department of Agriculture, Food and the Marine under the ETS Scheme.