What is Thompson sampling?
Thompson sampling, also known as the Bayesian bandit algorithm, is a widely used technique in decision-making and reinforcement learning. This algorithm tackles the classic "multi-armed bandit" problem, where a gambler must decide which slot machine (or "arm") to pull to maximize their cumulative reward over time.
Multi-armed bandit problem
Imagine facing a row of slot machines. Each machine has a different and unknown probability of winning, representing its reward distribution. The objective is to identify the machine with the highest probability of winning and to maximize the total winnings by repeatedly pulling arms over time.
Thompson sampling approach
Thompson sampling takes a probabilistic approach to tackle the multi-armed bandit problem. At its core, the algorithm maintains a probability distribution for each arm, representing its underlying probability of winning. By maintaining probability distributions for each arm and continually updating them with new information, Thompson sampling efficiently explores and exploits different options, making it a powerful technique for smarter decision-making under uncertainty.
Thompson sampling in python
import numpy as npimport matplotlib.pyplot as pltimport pandas as pditems = pd.read_csv('Data.csv')import randomtotal_rounds = 10000ads = 10selection = []track_1 = [0] * adstrack_0 = [0] * adsfinal = 0for round in range(0, total_rounds):chosen = 0probability = 0for index in range(0, ads):score = random.betavariate(track_1[index] + 1, track_0[index] + 1)if score > probability:probability = scorechosen = indexselection.append(chosen)received = items.values[round, chosen]if received == 1:track_1[chosen] = track_1[chosen] + 1else:track_0[chosen] = track_0[chosen] + 1final = final + receivedplt.hist(selection)plt.title('Frequency of Ad selections using Thompson sampling')plt.xlabel('Ads')plt.ylabel('Number of selections')plt.savefig('output/image.png')plt.show()
Code explanation
Line 1–3: Importing the necessary libraries. We use
numpyfor numerical computations,matplotlib.pyplotfor creating plots, andpandasfor handling data from the CSV file.Line 5: Load the data from the CSV file
Data.csvinto a pandas DataFrame calleditems.Line 8: Initializing
total_roundsto specify the number of rounds the algorithm will run.Line 9: Initializing
adsto represent the number of arms (ads) to choose from.Line 10: Initializing
selectionto keep track of selected arms.Line 11–12: Initializing
track_1andtrack_0to store the number of times each arm received a reward or no reward.Line 13: Initializing
finalto accumulate the total rewards.Line 14–22: Perform Thompson sampling to select arms in each round.
Line 23–28: Update reward tracking based on feedback.
Line 30–35: Plot the histogram of selected arms using Thompson sampling.
Output
The x-axis of the histogram represents the different arms (ads) available for selection, and the y-axis represents the number of times each arm (ad) was selected during the total number of rounds (specified as total_rounds in the code). The higher the count for a particular arm, the more frequently it was chosen by the Thompson sampling algorithm.
Applications
Thompson sampling has applications in various domains, including:
Online advertising: Optimizing the allocation of resources to different ad campaigns to maximize returns.
Clinical trials: Efficiently testing treatments or medications to identify the most effective.
Recommendation systems: Personalizing recommendations based on user interactions to improve engagement.
Resource allocation: Allocating resources like server capacity or inventory to maximize overall utility.
Conclusion
Thompson sampling solves the multi-armed bandit problem efficiently. By employing Bayesian statistics and probability distributions, Thompson sampling balances between exploring new options and exploiting well-known ones.
Free Resources