-
-
Notifications
You must be signed in to change notification settings - Fork 552
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implementig Ad predictor #1642
Implementig Ad predictor #1642
Changes from 14 commits
6bf81df
6aa4869
814fd8a
b309747
069cee9
6a229d8
e0e6c75
ff8c617
67e7e14
6f43ec8
89cd67e
54a94e3
4a7bc49
7311788
648b1a4
322b924
dcb0f98
2855d11
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,154 @@ | ||
from __future__ import annotations | ||
|
||
from collections import defaultdict, namedtuple | ||
|
||
import numpy as np | ||
|
||
from river.base.classifier import Classifier | ||
|
||
|
||
def default_weight(): | ||
return {"mean": 0.0, "variance": 1.0} | ||
|
||
|
||
class AdPredictor(Classifier): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I guess this model can live in the |
||
"""AdPredictor, developed by Microsoft, is a machine learning algorithm designed to predict the probability of user | ||
clicks on online advertisements. This algorithm plays a crucial role in computational advertising, where predicting | ||
click-through rates (CTR) is essential for optimizing ad placements and maximizing revenue. | ||
Parameters | ||
---------- | ||
beta (float, default=0.1): | ||
A smoothing parameter that regulates the weight updates. Smaller values allow for finer updates, | ||
while larger values can accelerate convergence but may risk instability. | ||
prior_probability (float, default=0.5): | ||
The initial estimate rate. This value sets the bias weight, influencing the model's predictions | ||
before observing any data. | ||
|
||
epsilon (float, default=0.1): | ||
A variance dynamics parameter that controls how the model balances prior knowledge and learned information. | ||
Larger values prioritize prior knowledge, while smaller values favor data-driven updates. | ||
|
||
num_features (int, default=10): | ||
The maximum number of features the model can handle. This parameter affects scalability and efficiency, | ||
especially for high-dimensional data. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You need to follow the docstring syntax we use everywhere else in River. Take a look at the source code of another model for examples :) |
||
|
||
Attributes | ||
---------- | ||
weights (defaultdict): | ||
A dictionary where each feature key maps to a dictionary containing: | ||
|
||
mean (float): The current estimate of the feature's weight. | ||
variance (float): The uncertainty associated with the weight estimate. | ||
|
||
bias_weight (float): | ||
The weight corresponding to the model bias, initialized using the prior_probability. | ||
This attribute allows the model to make predictions even when no features are active. | ||
|
||
Examples: | ||
---------- | ||
adpredictor = AdPredictor(beta=0.1, prior_probability=0.5, epsilon=0.1, num_features=5) | ||
data = [({"feature1": 1, "feature2": 1}, 1),({"feature1": 1, "feature3": 1}, 0),({"feature2": 1, "feature4": 1}, 1),({"feature1": 1, "feature2": 1, "feature3": 1}, 0),({"feature4": 1, "feature5": 1}, 1),] | ||
def train_and_test(model, data): | ||
for x, y in data: | ||
pred_before = model.predict_one(x) | ||
model.learn_one(x, y) | ||
pred_after = model.predict_one(x) | ||
print(f"Features: {x} | True label: {y} | Prediction before training: {pred_before:.4f} | Prediction after training: {pred_after:.4f}") | ||
|
||
train_and_test(adpredictor, data) | ||
|
||
Features: {'feature1': 1, 'feature2': 1} | True label: 1 | Prediction before training: 0.5000 | Prediction after training: 0.7230 | ||
Features: {'feature1': 1, 'feature3': 1} | True label: 0 | Prediction before training: 0.6065 | Prediction after training: 0.3650 | ||
Features: {'feature2': 1, 'feature4': 1} | True label: 1 | Prediction before training: 0.6065 | Prediction after training: 0.7761 | ||
Features: {'feature1': 1, 'feature2': 1, 'feature3': 1} | True label: 0 | Prediction before training: 0.5455 | Prediction after training: 0.3197 | ||
Features: {'feature4': 1, 'feature5': 1} | True label: 1 | Prediction before training: 0.5888 | Prediction after training: 0.7699 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Same: take a look at another model source code for an example. You should be able to run the docstring test with |
||
|
||
""" | ||
|
||
config = namedtuple("config", ["beta", "prior_probability", "epsilon", "num_features"]) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What is this for? |
||
|
||
def __init__(self, beta=0.1, prior_probability=0.5, epsilon=0.1, num_features=10): | ||
# Initialization of model parameters | ||
self.beta = beta | ||
self.prior_probability = prior_probability | ||
self.epsilon = epsilon | ||
self.num_features = num_features | ||
# Initialize weights as a defaultdict for each feature, with mean and variance attributes | ||
self.weights = defaultdict(default_weight) | ||
# Initialize bias weight based on prior probability | ||
self.bias_weight = self.prior_bias_weight() | ||
|
||
def prior_bias_weight(self): | ||
# Calculate initial bias weight using prior probability | ||
|
||
return np.log(self.prior_probability / (1 - self.prior_probability)) / self.beta | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We prefer using Python's standard library. So here you'll have to use |
||
|
||
def _active_mean_variance(self, features): | ||
"""_active_mean_variance(features) (method): | ||
Computes the cumulative mean and variance for all active features in a sample, | ||
including the bias. This is crucial for making predictions.""" | ||
# Calculate total mean and variance for all active features | ||
|
||
total_mean = sum(self.weights[f]["mean"] for f in features) + self.bias_weight | ||
total_variance = sum(self.weights[f]["variance"] for f in features) + self.beta**2 | ||
return total_mean, total_variance | ||
|
||
def predict_one(self, x): | ||
# Generate a probability prediction for one sample | ||
features = x.keys() | ||
total_mean, total_variance = self._active_mean_variance(features) | ||
# Sigmoid function for probability prediction based on Gaussian distribution | ||
return 1 / (1 + np.exp(-total_mean / np.sqrt(total_variance))) | ||
|
||
def learn_one(self, x, y): | ||
# Online learning step to update the model with one sample | ||
features = x.keys() | ||
y = 1 if y else -1 | ||
total_mean, total_variance = self._active_mean_variance(features) | ||
v, w = self.gaussian_corrections(y * total_mean / np.sqrt(total_variance)) | ||
|
||
# Update mean and variance for each feature in the sample | ||
for feature in features: | ||
mean = self.weights[feature]["mean"] | ||
variance = self.weights[feature]["variance"] | ||
|
||
mean_delta = y * variance / np.sqrt(total_variance) * v # Update mean | ||
variance_multiplier = 1.0 - variance / total_variance * w # Update variance | ||
|
||
# Update weight | ||
self.weights[feature]["mean"] = mean + mean_delta | ||
self.weights[feature]["variance"] = variance * variance_multiplier | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think it's cleaner to have two dicts: one to hold the means and how to hold the variances |
||
|
||
def gaussian_corrections(self, score): | ||
"""gaussian_corrections(score) (method): | ||
Implements Bayesian update corrections using the Gaussian probability density function (PDF) | ||
and cumulative density function (CDF).""" | ||
# CDF calculation for Gaussian correction | ||
cdf = 1 / (1 + np.exp(-score)) | ||
pdf = np.exp(-0.5 * score**2) / np.sqrt(2 * np.pi) # PDF calculation | ||
v = pdf / cdf # Correction factor for mean update | ||
w = v * (v + score) # Correction factor for variance update | ||
return v, w | ||
|
||
def _apply_dynamics(self, weight): | ||
"""_apply_dynamics(weight) (method): | ||
Regularizes the variance of a feature weight using a combination of prior variance and learned variance. | ||
This helps maintain a balance between prior beliefs and observed data.""" | ||
# Apply variance dynamics for regularization | ||
prior_variance = 1.0 | ||
# Adjust variance to manage prior knowledge and current learning balance | ||
adjusted_variance = ( | ||
weight["variance"] | ||
* prior_variance | ||
/ ((1.0 - self.epsilon) * prior_variance + self.epsilon * weight["variance"]) | ||
) | ||
# Adjust mean based on the dynamics, balancing previous and current knowledge | ||
adjusted_mean = adjusted_variance * ( | ||
(1.0 - self.epsilon) * weight["mean"] / weight["variance"] | ||
+ self.epsilon * 0 / prior_variance | ||
) | ||
return {"mean": adjusted_mean, "variance": adjusted_variance} | ||
|
||
def __str__(self): | ||
# String representation of the model for easy identification | ||
return "AdPredictor" | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There is no need for this |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We prefer to import the package, and then access its properties, instead of importing what we need