Measuring second-by-second fan engagement for F1


Understanding audience engagement is becoming more and more complex. The sheer amount of data available (and the number of places to look for it) has led to a new type of problem: extracting clean, representative and honest reactions to media is challenging from a both technical (speed, scope, analysis) and empirical (size, accuracy, actionability) perspective. 

Formula 1 (F1) is one of the world's most successful sports entertainment brands, boasting 503 million fans globally. But if you asked them what they find most exciting about the races, you’d be hard pressed to get a meaningful answer beyond crashes and overtakes. This makes it a particularly tricky audience to grow. As we know from our previous work in pinpointing the appeal of F1, it’s a task that involves modernising the sport and attracting younger audiences, while maintaining its heritage and staying true to its core essence. In order to do that successfully, you need to understand your fans completely. 

Flamingo has been working with Formula 1 for the best part of a year to deliver on this precise challenge. We started with an idea of what brought communities together around racing, and ended with a system for delivering near-real-time fan reactions to minute-by-minute coverage. This helped us shed light on each individual element that made up fan reaction: from the quality of a replay, to the building up of a competitive rivalry over multiple races. 

During 2017-18 or work had focused on community exploration. We spent time understanding and cataloguing how online F1 communities worked. During this analysis, we noticed something interesting. We saw that, predictably, fan conversation across multiple forums concentrated on days when races took place. What we didn't expect was that commenting was accurate to the second, meaning most people were reacting in real-time during the races. By combining the right forums of avid fans and accurately matching all the different time stamps, we could actually model real-time reactions to a race as it unfolded.

So, our mission was two-fold: First, determine the validity of synchronised forum posting as a source of fan live reaction to racing. Two, develop an interactive piece of technology to best take advantage of this data source. 

We used a number of different scraping methods (depending on source), and the overall amount of data generated was in the millions of comments. 

We pre-processed all the data using a number of natural language processing techniques and stored the data in a way that allowed any one section of the data to be accessed as fast as possible. The whole thing essentially dissects millions of conversations into component pieces, so that you can view what were the salient themes discussed for any selected period of time, be it a week or a five-minute window. New incoming data is seamlessly integrated with the existing data, creating a growing pool of insights. 

We developed a new algorithm for processing the language, taking inspiration from the betting world, creating an odds table for any single word or sentence. Word combinations like 'racing line' or 'Hamilton' are less likely to score high than highly descriptive word combinations such as 'max moved first' or 'racing incident'. 

Video clips are overlapped with the data to bring fan engagement to life: the system automatically ‘clips’ the video and makes a gif of what it finds most exciting. This not only brings a much more visceral presentation to the platform, but allows the user to have a much more refined interpretation of the excitement factor, adding the visual clip to the comments to facilitate understanding. An example of this is during complex moments, such as when something amusing or ironic happens, where linguistics alone would not be sufficient to understand context. 

We also developed a way of calculating what we call RES (Race Engagement Score). RES takes all of our data points and combines it with an assessment of how negatively or positively people reacted to the race. This gave each race a score relative to its season and created a historical repository of all races from which F1 could constantly contrast and compare.   

“We currently already look at active ways of measuring excitement among fans during live races, with combinations of GSR and dial-testing research. So, we thought it would be apt to compare results to the linguistic modelling, and we couldn’t believe the close collinearity of the results. In fact, in some cases, the tool was more accurate than these tests, as it would show other aspects of a Grand Prix that were entertaining as a fan. Instantly we could see the potential benefits and applications across our digital content team; motorsports team; and other research team analysis. And it really excites us as a department to be applying cutting edge technology to our avid online community.” 

Howell Craske, Data Analytics Manager, Formula One