Skip to content
phylypp edited this page Jul 20, 2013 · 22 revisions

Task

  • find and use a prediction framework for online analysis
  • select a few events to predict and identify features good for prediction
  • make real time predictions
  • try to maximize prediction accuracy (find out whats possible)

Technology

Weka

"Weka is a collection of machine learning algorithms for data mining tasks. The algorithms can either be applied directly to a dataset or called from your own Java code. Weka contains tools for data pre-processing, classification, regression, clustering, association rules, and visualization. It is also well-suited for developing new machine learning schemes." http://www.cs.waikato.ac.nz/ml/weka/

MOA

"MOA is an open source framework for data stream mining. It includes a collection of machine learning algorithms (classification, regression, and clustering) and tools for evaluation. Related to the WEKA project, MOA is also written in Java, while scaling to more demanding problems." http://moa.cms.waikato.ac.nz/

Predictions

Pass Success Prediction

Will the next pass of the player with the ball fail or succeed?

  • pass successful
  • pass missed

For this prediction only the current state of the game is used.

Attack Result Prediction

Will the attack of the team result in ball loss, ball out of bounds or shot on goal?

  • shot on goal
  • ball loss
  • ball out of bounds

This prediction is more complex. Not only the current state of the game is used, but also events since the last prediction event are taken into account (e.g. passes count during an attack).

Code Structure

Model

The code structure is designed for easy integration of different classifieres and predictions.

Basic model to understand the structure: Basic Model

###Prophet The prophet updates all predictors. He is called every second of playtime from the statistics project. New predictors can be added here.

###Learner The learner encapsulates a classifier which can be trained and which makes the predictions.

###Predictor The predictor checks if the a prediction event has occurred and in that case trains the learner. Otherwise a prediction will be done. The predictor can use any learner available.

###PredictionInstance The prediction instance class creates and handles a prediction instance for prediction or train purposes. Attributes (features) for the prediction are defined here.

###Statistics The statistics project asks the prophet to update himself and provides the game information for prediction instance creation.

Approach for our example

Approach

Process Flow Chart

Process

Features

####TEAMMATE_IN_AREA Number of team mates in in 20 meters circle. ####OPPONENT_IN_AREA Number of opponents in in 20 meters circle. ####PLAYER_PASS_RATE Rate of successful passes to all passes of a player.

(playerPassesSuccessful / (playerPassesSuccessful + playerPassesMissed)) * 100 ####PLAYER_BALLCONTACT Sum of ball contacts of a player. ####LAST_PLAYER_ID ID of a player made a pass. (unique for every game) ####CURRENT_PLAYER_ID ID of a player accepted a pass. (unique for every game) ####DISTANCE_TO_NEAREST_TEAMMATE Distance to nearest team mate. ####DISTANCE_TO_NEAREST_OPPONENT Distance to nearest opponent. ####CURRENT_PLAYER_X X position of a player. ####CURRENT_PLAYER_Y Y position of a player. ####CURRENT_PLAYER_DISTANCE Accumulated run distance of a player. ####ATTRIBUTE_AREA The area a player is in, can be own area, middle area or opponent's area. ####ATTRIBUTE_PASS_COUNT Sum of passes occurred during attack. ####ATTRIBUTE_AVERAGE_VELOCITY Average velocity of the ball in opponent direction in the last seconds.

averageOfAll(abs(currentBallYPosition - lastBallYPosition) / (currentGameTime - lastGameTime)))

Pass Features

Pass Features

Attack Result Features

Attack Result Features

Algorithm

K-nearest neighbors algorithm - IBk

In pattern recognition, the k-nearest neighbor algorithm is a method for classifying objects based on closest training examples in the feature space.

more Information (Wikipedia)

IBk is a lazy classifier in Weka, which is based on Knn. The adventage of the classifier IBk in comparisoin to the Knn classifier in Weka is the method updateClassifier(Instance instance), which update the classifier without creating a new model.

more Information (class description)

Important Parameters

  • KNN: the number of nearest neighbors to use for prediction
  • Cross Validation: the test technique to select the best k value during training (adaptive KNN)

Output

The predictions are preiodically send to the visualization project for visualization.

For testing with Weka and MOA an ARFF file can be created at the end of the game. To enable the creation of the ARFF file a boolean has to be set to true in predictions.Utils:

public static final boolean ARFF_WRITING_MODE = true

The ARFF file will be created in a logs folder in the predictions project root folder.

Result

Best results were achieved with the IBk classifier. Tests have shown that many features including less promising ones do improve the accuracy. Therefore we have used more than ten features for both predictions.

Pass Success Prediction

Accuracy (average of whole game): 85%*

Distribution: 73% pass successful, 27% pass missed

Number of instances: 827

Parameters: IBk classifier, KNN adaptive, Linear Nearest Neighbor Search with Euclidean Distance

* In the project a lower accuracy is measured, because a fixed KNN value is used for better visualisation possibilities.

Accuracy Development

Development of accuracy depending on increasing training set size:

Pass Success Prediction Accuracy

The accuracy starts at a high level, since most passes succeed at the beginning and therefore classification is easy. The overall high level of accuracy can be explained by the same reason: 73% of all passes are successful. At the end a slightly decreasing accuracy is noticeable.

Attack Result Prediction

Accuracy (average of whole game): 84%

Distribution: 69% ball loss, 24% ball out of bounds, 7% shot on goal

Number of instances: 332

Parameters: IBk classifier, KNN = 8, Linear Nearest Neighbor Search with Euclidean Distance

Accuracy Development

Development of accuracy depending on increasing training set size (lower accuracy because other prediction method used):

Attack Result Prediction Accuracy

The accuracy increases during the whole game.

Conclusion

With the sensor data and the statistics data we were able to get a high accuracy for predictions. If the good accuracy values remain when using other football games has yet to be tried. A big problem will be the different playing styles of other players and teams. Also the pass success prediction has shown that the accuracy can decrease after a while. In that case the instance set size has to be limited or other classifiers have to be used.

The wish to predict goals could not be fulfilled, because goal events are too rare to be able to train a classifier properly. But with frequently occuring events like player passes good accuracies have been achieved already after some minutes of playtime. The attack result prediction improved during the whole play.