[ad_1]
Examining the Impact of Fighting Styles on MMA Matchups with Machine Learning & Statistical Analysis
I’m a huge fan of combat sports. Growing up in south London, I frequented boxing gyms as a young teen and continued to box for fitness as an adult. What makes combat sports really exciting to me is the high degree of unpredictability — you really get the sense that anything can happen in a fight. Saying that, millions of armchair fans and professional analysts will usually give you their two cents on who will win a fight, why, and even how. One of the most cliché sayings in combat sports is “styles make fights”. Styles refers to the type of technique a fighter might use, whether that is karate, kickboxing, wresting, a mixture, etc.
This got me curious, I wanted to explore if data could reveal any truth to the old cliché. The remainder of this blog post will be a deep dive into how I used data about fighter performances in the UFC (Ultimate Fighting Championship) to understand the interactions of fighting styles within different weight categories. I really enjoyed putting this analysis together, I hope you enjoy reading it.
The fight data is taken from two Kaggle sources.
Data Source 1: Fight Outcomes
Data Source 2: Fight Statistics
For the purposes of this analysis, I isolated specific metrics from the data that indicate a fighter’s fighting style and the outcome of the fight. Capturing a fighter’s fighting style is a particularly challenging endeavor, and I’m not sure there is a consensus on how to do this. I chose to include both fighter and opponent-led interactions keeping the dimensions at the fight level. Recognizing that fighting is such a complex interaction between two people, I felt that it was right to preserve as much of this complexity as possible to truly understand the essence of a fighter’s style. There are alternatives to my chosen approach, I’ll outline some of these briefly.
- Capture what the fighter controls: My initial thought was to only use fight-style dimensions that a fighter is in control of. On the surface, this seems as simple as capturing fighter-led actions (i.e. strikes attempted, takedowns etc.). You often hear analysts break down a fighter’s performance in this way with statements like, “Fighter x is a volume puncher”. The issue with this approach is that a fight by nature is never a solo endeavor, it takes two to tango as the old saying goes. Leaving out opponent-led dimensions will exclude critical information about a fighter’s defense, how well they were able to impose their style etc.
- The average of a fighter’s fights: Another approach would be to average the chosen fight style dimensions across all fight’s for any given fighter. This approach has the inherent assumption that a fighter’s style doesn’t change much from fight to fight. This may actually be the case for some fighters. However, many will adapt their style depending on their opponent. Keeping the fight style dimensions at the fight level rather than aggregating at the fighter level will allow for a change of styles and may even give information about the consistency of a fighter. For example, a fighter who tends to be in the same cluster regardless of their opponent is displaying some consistency in fight style.
Fighting Style Dimensions — Data Source 2
Each fighting style dimension has a fighter constituent and an opponent constituent. The opponent part is just the fighting style dimension as applied against the fighter the data refers to. For example, “avg_kd_opp” is just how many times the opponent knocked down the fighter the data applies to (for the fighter, the metric would be “avg_kd”). For brevity, I have not included the opponent dimensions in the data dictionary but just referenced that they exist.
Mixed martial arts fight can be up to 5 rounds so each style dimension is averaged across the rounds in an attempt to capture the nature of the entire fight. For the purposes of clustering, I have adjusted the style dimensions (excluding the stances) by the total number of minutes fought and standardised.
- “avg_KD” = Knockdowns
- “avg_SUB_ATT” = Submission attempts
- “avg_REV” = Reversals
- “avg_SIG_STR_att” = Significant strikes attempted
- “avg_SIG_STR_landed” = Significant strikes landed
- “avg_TOTAL_STR_att” = Total strikes attempted
- “avg_TOTAL_STR_landed” = Total strikes landed
- “avg_TD_att” = Takedowns attempted
- “avg_TD_landed” = Takedowns landed
- “avg_HEAD_att” = Head attacks attempted
- “avg_HEAD_landed” = Head attacks landed
- “avg_BODY_att” = Body attacks attempted
- “avg_BODY_landed” = Body attacks landed
- “avg_LEG_att” = Leg attacks attempted
- “avg_LEG_landed” = Leg attacks landed
- “avg_DISTANCE_att” = Distance attacks attempted
- “avg_DISTANCE_landed” = Distance attacks landed
- “avg_CLINCH_att” = Clinches attempted
- “avg_CLINCH_landed” = Clinches landed
- “avg_GROUND_att” = Ground attacks attempted
- “avg_GROUND_landed” = Ground attacks landed
- “avg_CTRL_time(seconds)” = ground control time
- “Stance_Open Stance” = Open stance
- “Stance_Orthodox” = Orthodox stance
- “Stance_Sideways” = Sideways stance
- “Stance_Southpaw” = Southpaw stance
- “Stance_Switch” = Switch stances
An example of a South Paw fighting stance depicted below. The right hand is closest to the opponent, an orthodox stance would have the left hand leading.
In MMA the outcome of a fight can be the following:
- Draw: Now winner
- Win/Loss by KO/TKO: The fighter wins or loses by knockout or technical knockout.
- Win/Loss by M-Dec: The fighter wins or loses by majority judge decision.
- Win/Loss by S-Dec: The fighter wins or loses by split judge decision.
- Win/Lose by DQ: The fighter wins or loses by disqualification.
The approach is a fairly simple two step analysis. The first step uses the unsupervised machine learning approaches of dimension reduction and clustering to find latent clusters in the data, these are the fight styles. The second step runs a chi-square analysis on the fight results and head-to-heads of each fight style. In layman’s terms, the first step finds styles in the data, the second analyses the outcomes of style matchups.
Assumptions
I made some important assumptions during the modeling process, I’ll outline them below.
- Assumed that a fighter’s style can change from fight to fight. This is often the case as fighters and their coaches may tailor their approach for specific opponents.
- Assume that the average output (captured by the style dimension) across a fight is representative of the distribution of these dimensions across the fight. This is a way of saying that the distribution of fight dimensions across a fight is not long-tailed or somewhat normal. There is no real way for me to prove this, as I don’t have access to the unaggregated data.
- Assumed that fighting stances don’t change throughout the fight. This is not always true because a fighter’s stance can change due to injury, position etc. However, this data isn’t captured, so the assumption is fair in this instance.
- Assumed that the data accurately captures style dimensions and is exhaustive. This data has not been independently verified, and there is no way for me to do this.
- Assumed that a fighters performance is independent of their last from fight to fight. In reality, this might not hold true as it’s often the case that a fighters last performance may impact their next.
- Assumed that a fighter’s biomechanics does not impact their style. This isn’t strictly true, but in this case I have made this assumption for simplicity.
I used UMAP and HDBSCAN for dimensionality reduction and clustering. In total there are approximately 49 fight style dimensions (including the opponent dimensions omitted in the data dictionary). This is a large number of dimensions to cluster over, and impossible to visualise. Fortunately, there are dimension reduction techniques that can be used to reduce the 49 fight style dimension to a more manageable 2 dimensional representation while preserving most of the information. I used UMAP for this purpose. I won’t go into the details of UMAP in this article, but I will link to a useful resource that does.
One of the key hyperparameters for the UMAP algorithm is the distance metric which impacts the structure of the reduced dimensional space. The style dimensions relating to stance are binary, while the other style dimensions are continuous. You cannot meaningfully apply the same distance metric to these disparate variable types. For example, the distance between the locations of two cities is continuous meaning the Euclidean distance is easy to conceptualise, but what is the meaning of this distance within a category such as dog-breed? It doesn’t make sense to measure the Euclidean distance between Rottweilers and Poodles for example.
I solved this problem by using the Gower distance which effectively handles continuous and binary features.
Here’s how Gower distance handles both binary and continuous variables:
Continuous variables: For continuous variables, Gower distance calculates the dissimilarity using the Manhattan distance between two data points. The Manhattan distance is the absolute difference between the values of the variables. The difference is then scaled by dividing it by the range of that variable (i.e. the difference between the maximum and minimum values) to bring it into the range [0, 1]¹.
Binary variables: For binary variables, Gower distance considers the presence or absence of a feature. There are different types of binary variables:
Symmetric binary variables: Both the presence and absence of the attribute have meaning (i.e. sex: male or female). In this case, Gower distance calculates the dissimilarity as 0 if both data points have the same binary value and 1 if they have different binary values¹.
Asymmetric binary variables: Only the presence of the attribute has meaning (i.e. a disease: either present or not). For asymmetric binary variables, Gower distance calculates the dissimilarity as 0 if both data points have the attribute or if both data points do not have the attribute. It calculates the dissimilarity as 1 only if one data point has the attribute, and the other does not¹.
An issue with the Gower distance is that it can favour the binary variables over the continuous ones². To address this, I introduced a weighting on the distance metrics calculated across the binary variables. This isn’t a perfect solution, and the choice of weight itself was realised through trial and improvement.
For those interested I’ve included a code snippet showing the implementation of the overall clustering approach with weighted Gower distances.
Clusters
After dimensionality reduction, latent fighting styles are learned from the two-dimensional style representation with HDBSCAN. The images below are the results of this. I have only included the diagrams for weight classes where there was a statistically significant association between style head-to-head’s and fight outcomes.
Statistical significance is set at 0.05 for the purposes of this work.
Bantamweight
Chi2 Stat: 244.45042522969314
P Value: 1.0943795416747109e-41
Degrees of Freedom: 18
Heavyweight
Chi2 Stat: 749.7378583482009
P Value: 6.488876080819179e-146
Degrees of Freedom: 20
LightHeavyweight
Chi2 Stat: 570.0727926225678
P Value: 4.440800780200355e-92
Degrees of Freedom: 45
Middleweight
Chi2 Stat: 887.168883105777
P Value: 3.803177841194278e-160
Degrees of Freedom: 40
Welterweight
Chi2 Stat: 2262.132585205716
P Value: 0.0
Degrees of Freedom: 90
HDBSCAN is an unsupervised machine learning algorithm that clusters data points based on a measure of density.
A Note on Cluster Separation
The weighting applied to the binary stance variables has a significant impact on cluster formation. Weights closer to 1 generate distinctive islands of clusters. Excluding fighting stances completely (akin to a weight of 0) makes clusters less separable appearing as one huge mass. I set the cluster weights at around 0.15 which formed the interesting swirls shown above. I down weighted stances to allow the model to capture the more nuanced aspects of fighting styles represented by the continuous style dimensions.
I think this reflects well on the continuous nature of fighting styles and how different styles transition into one another.
Having established the clusters I plotted some radar charts to get some sense of the type of fighter within each cluster. The shaded area on the radar charts represent a cluster level average across all the fight style dimensions. I returned a list of the top 5 “typical” fighters within each cluster. Typical here is defined by the fighter’s distance from the centre of the cluster.
Heavyweight
The heavyweight UFC division, consisting of fighters weighing over 205 lbs, showcases powerful striking and devastating knockouts. Iconic champions like Stipe Miocic, Brock Lesnar, and Cain Velasquez have dominated this division. Despite its limited depth compared to lighter divisions, heavyweight fighters generate immense excitement with their immense power and fight-changing abilities.
Cluster 0: The central tendency of this cluster is towards aggressive striking. The radar charts display less in the way of leg attacks comparatively speaking. This might indicate a preference for boxing amongst fighters in this cluster.
0_Heavyweight
Shawn Jordan
Dave Herman
Lavar Johnson
Stipe Miocic
Cain Velasquez
Cluster 1: The central tendency of this cluster is towards distance attacks, leg attacks, and clinching. This might indicate a propensity toward Muay Thai, kick boxing, or other similar variations.
1_Heavyweight
Ciryl Gane
Parker Porter
Serghei Spivac
Marcin Tybura
Augusto Sakai
Cluster 2: This cluster has a central tendency towards ground fighting and submissions. This may indicate that fighters within the cluster are inclined towards Brazilian Jiu-Jitsu or another variation of submission grappling.
2_Heavyweight
Aleksei Oleinik
Walt Harris
Luis Henrique
Andrei Arlovski
Damian Grabowski
Light-Heavyweights
The light heavyweight UFC division features fighters weighing between 186–205 lbs, known for their blend of striking, grappling, and athleticism. Iconic champions like Jon Jones, Daniel Cormier, and Chuck Liddell have showcased dynamic skills and exciting fights, making the division a fan-favorite and breeding ground for legendary matchups.
Cluster 0: The radar chart displays a central tendency towards clinches, take-downs, and ground attacks. What separates this from the Brazilian Jiu-Jitsu type clusters is the lack of submissions. Due to this, I suspect the cluster may lean towards fighters that use wrestling styles or other variants of grappling.
0_LightHeavyweight
Anthony Perosh
Marcos Rogerio de Lima
Francimar Barroso
Tim Boetsch
Tom Lawlor
Cluster 1: There is an obvious central tendency towards stand up fighting and distance attacks. However, there is a noticeable lack of leg kicks. It’s commonly known that American Kickboxing rules forbid kicks to calf’s and legs, this may explain the lack of leg attacks. I suspect fighters in this cluster may be geared towards long range martial arts styles such as Kickboxing or Taekwondo. Muay Thai tends to be more heavy on the leg kicks.
1_LightHeavyweight
Saparbeg Safarov
Justin Ledet
Gian Villante
Ovince Saint Preux
Jared Cannonier
Cluster 2: The only prominent fight dimension relatively speaking is leg attacks. This might indicate a Muay Thai style, however there isn’t much else to go by. I can’t confidently say what styles are most representative of this cluster.
2_LightHeavyweight
Evan Tanner
Elvis Sinosic
Rich Franklin
James Irvin
Stephan Bonnar
Cluster 3: Cluster three is quite strange in that it doesn’t appear to have strong tendencies attack wise. However, there are strong central tendencies indicating defensive weaknesses. This is shown by the strong opponent style dimensions (i.e. propensity to recieve attacks) relative to the other clusters. This cluster might well contain a sizeable proportion of journeyman fighters.
Journeyman definition: an experienced reliable worker, athlete, or performer especially as distinguished from one who is brilliant or colorful.
3_LightHeavyweight
Gadzhimurad Antigulov
Paul Craig
Johnny Walker
Jimi Manuwa
Darko Stosic
Middleweights
The middleweight UFC division consists of fighters weighing 171–185 lbs, offering a balance of power and speed. Dominant champions like Anderson Silva, Israel Adesanya, and Chris Weidman have exemplified striking prowess and grappling expertise, contributing to the division’s reputation for captivating fights and memorable rivalries.
Cluster 0: This cluster has a central tendency towards stand up striking (i.e. distance, leg, and head strikes from standing ). Relative to other clusters, there is a sizeable coverage across most of the other fighting dimensions.
0_Middleweight
Punahele Soriano
Kyle Daukaus
Chris Curtis
Andre Muniz
Gerald Meerschaert
Cluster 1: There is a central tendency towards submission attempts, and somewhat towards ground attacks. It is difficult to say for certain, but this may indicate styles that are more geared towards Brazilian Jiu-Jitsu or even Judo. There isn’t much in the way of distance attacks further strengthening the argument that this may be a grappling cluster.
Two Brazilian Jiu-Jitsu practitioners engaged in a tournament.
1_Middleweight
Hector Lombard
Kevin Holland
John Phillips
Cezar Ferreira
Antonio Carlos Junior
Cluster 2: Central tendency shows fairly good coverage across a range of dimensions relatively speaking, although weaker on the distance style dimensions. I believe this suggests there is a propensity for grappling styles within this cluster.
_Middleweight
Rafael Natal
Luiz Cane
Andrew Craig
Brad Tavares
Chris Weidman
Cluster 3: The central tendency is towards takedowns and ground control. Similarly to the other grappling styles, there is very little indication of distance attacks. Cluster 3 in fact has the lowest of all clusters along the distance fighting style dimensions. I would imagine the typical fighter within this cluster has a propensity for wrestling-type styles if I had to guess from the metrics alone.
3_Middleweight
Dongi Yang
Aaron Simpson
Constantinos Philippou
Ronny Markes
Patrick Cote
Welterweights
The welterweight UFC division features fighters weighing 156–170 lbs, known for their combination of strength, agility, and endurance. Iconic champions like Georges St-Pierre, Tyron Woodley, and Kamaru Usman have showcased diverse skillsets, including striking, wrestling, and jiu-jitsu, making the division highly competitive and a fan-favorite for exciting bouts.
Cluster 0: Unfortunately the overlapping colours makes it difficult to fully comprehend this cluster. However, it is clear that there is nothing particularly outstanding here and generally low to moderate coverage across many of the striking dimensions. From the charts alone I would guess this style is a true mixed martial-arts style. The typical fighter may be an all rounder, somebody that can do a bit of everything.
0_Welterweight
Robert Whittaker
Vik Grujic
Warlley Alves
William Macario
John Howard
Cluster 1: Central tendency is towards distance attacks, striking, and leg kicks. However there is low to moderate coverage of submissions, something not seen at all in other kickboxing and striking heavy cluster. However I will raise that comparing clusters across weight classes should be done with caution here.
1_Welterweight
Mike Swick
Sheldon Westcott
Viscardi Andrade
Cathal Pendred
Demian Maia
Cluster 2 : There is a central tendency towards striking — both close and distance. For me what stands out most is the tendency towards knockdowns relative to the other clusters. This may indicate a cluster with many powerful or precise fighters. There isn’t much coverage over the leg kick dimensions indicating more of a kickboxing style over a Muay Thai style.
Muay Thai style leg kicking action in an octagon.
2_Welterweight
Dominique Steele
Rafael Dos Anjos
Jordan Mein
Donald Cerrone
Belal Muhammad
Cluster 3 : Central tendency is towards ground attacks, takedowns, striking. There is also a low to moderate coverage across the general striking dimensions. It’s difficult to say, but the metrics suggest a propensity to close range and ground striking.
3_Welterweight
Nick Diaz
Georges St-Pierre
Alex Karalexis
Tony DeSouza
Matt Hughes
Cluster 4: Central tendency towards submissions, reversals, takedowns and ground control relative to other clusters. There is some evidence to suggest this may represent aggressive ground grappling styles like submission wresting or Brazilian Jiu-Jitsu.
4_Welterweight
Fighter_dims
Marcus Davis
Brad Blackburn
TJ Grant
DaMarques Johnson
Amir Sadollah
Bantamweight
The bantamweight UFC division, featuring fighters weighing 126–135 lbs, is known for its fast-paced action, technical striking, and relentless grappling. TJ Dillashaw, Dominick Cruz, and Aljamain Sterling are some of the notable fighters who have shaped the division’s history. With a mix of power, speed, and skill, bantamweight bouts consistently deliver exhilarating fights for MMA enthusiasts.
Cluster 0: There is a tendency towards striking and standup fighting evidenced by the significant strikes, leg and body attacks, and distances strikes.
0_Bantamweight
Mario Bautista
Sean O'Malley
Sergey Morozov
Ricky Turcios
Liudvik Sholinian
Cluster 1: There is a central tendency towards submissions, ground attacks, and take downs. Relatively speaking this cluster is the one of the weakest in distance attacks for this weight class.
1_Bantamweight
Fighter_dims
Jonathan Martinez
Alex Perez
Louis Smolka
Brett Johns
Cluster 2 : There is a central tendency towards ground control, takedowns and clinches.
2_Bantamweight
Jeff Hougland
Francisco Rivera
Walel Watson
Damacio Page
Dustin Pague
[ad_2]
Source link