Search⌘ K
AI Features

Solution: Cumulative Penality Heuristic

Explore how to apply the cumulative penalty heuristic in building complex Bayesian networks by simulating and discretizing data from interconnected nodes. Understand the transformation of a classical graph problem into a probabilistic model, enabling dynamic reasoning in uncertain real-world scenarios like city planning and supply chain management. Gain skills in converting continuous data into binary states to fit Bayesian network structures for more flexible and informative analysis.

We'll cover the following...

Let's imagine this scenario: We are city planners for a small town with ten distinct locations (nodes) connected by roads (edges). The locations are represented by letters A to J, and the roads have different distances (weights) between them. The town map and distances between locations are as follows:

Town map locations and road distances
Town map locations and road distances

When converting a network into a Bayesian network, each node represents a random variable, and each edge represents a conditional dependency between the connected nodes.

In this scenario, we're simulating data that represents the connections and distances between locations in a town. We are assigning numerical values to these connections to create a dataset that reflects the structure of the town.

Solution

Please find below the solution to the exercise in the previous lesson:

Python 3.8
# Simulate the continuous data
np.random.seed(42)
n_samples = 10000
# Simulate node A (mean: 1 std: 0.5)
A = np.random.normal(1, 0.5, n_samples)
# Simulate the other nodes
B = 5 * A
C = 3 * A
D = 2 * B + 4 * C
E = 1 * D + 6 * C
F = 3 * D
G = 2 * F + 5 * E
H = 4 * G
I = 2 * H + 7 * G
J = 1 * I
# Create the dataset
data = {"A": A, "B": B, "C": C, "D": D, "E": E, "F": F, "G": G, "H": H, "I": I, "J": J}
# Calculate the mean of each node
mean_values = {node: values.mean() for node, values in data.items()}
# Define thresholds as the mean value for each node
thresholds = mean_values
discrete_data = {node: (values > threshold).astype(int) for node, values, threshold in zip(data.keys(), data.values(), thresholds.values())}
# Convert the discrete data to a pandas DataFrame
df = pd.DataFrame(discrete_data)
# Define the structure of the Bayesian network
sm = StructureModel()
sm.add_edges_from([
('A', 'B'),
('A', 'C'),
('B', 'D'),
('C', 'D'),
('C', 'E'),
('D', 'E'),
('D', 'F'),
('E', 'G'),
('F', 'G'),
('G', 'H'),
('G', 'I'),
('H', 'I'),
('I', 'J'),
])
# Create the Bayesian Network
bn = BayesianNetwork(sm)
# Fit the Bayesian Network
bn = bn.fit_node_states(df)
bn = bn.fit_cpds(df, method="BayesianEstimator", bayes_prior="K2")
# BASELINE query:
ie = InferenceEngine(bn)
baseline = ie.query({})
rounded_baseline = {
outer_key: {inner_key: round(value, 1) for inner_key, value in inner_dict.items()}
for outer_key, inner_dict in baseline.items()}
print(rounded_baseline)
def test(n={}):
return rounded_baseline
  • Line 6: This line generates a simulated dataset for node A, using ...