Solution: Cumulative Penality Heuristic
Learn how to apply cumulative penalty heuristics to train a Bayesian network.
We'll cover the following...
Let's imagine this scenario: We are city planners for a small town with ten distinct locations (nodes) connected by roads (edges). The locations are represented by letters A
to J
, and the roads have different distances (weights) between them. The town map and distances between locations are as follows:
When converting a network into a Bayesian network, each node represents a random variable, and each edge represents a conditional dependency between the connected nodes.
In this scenario, we're simulating data that represents the connections and distances between locations in a town. We are assigning numerical values to these connections to create a dataset that reflects the structure of the town.
Solution
Please find below the solution to the exercise in the previous lesson:
# Simulate the continuous datanp.random.seed(42)n_samples = 10000# Simulate node A (mean: 1 std: 0.5)A = np.random.normal(1, 0.5, n_samples)# Simulate the other nodesB = 5 * AC = 3 * AD = 2 * B + 4 * CE = 1 * D + 6 * CF = 3 * DG = 2 * F + 5 * EH = 4 * GI = 2 * H + 7 * GJ = 1 * I# Create the datasetdata = {"A": A, "B": B, "C": C, "D": D, "E": E, "F": F, "G": G, "H": H, "I": I, "J": J}# Calculate the mean of each nodemean_values = {node: values.mean() for node, values in data.items()}# Define thresholds as the mean value for each nodethresholds = mean_valuesdiscrete_data = {node: (values > threshold).astype(int) for node, values, threshold in zip(data.keys(), data.values(), thresholds.values())}# Convert the discrete data to a pandas DataFramedf = pd.DataFrame(discrete_data)# Define the structure of the Bayesian networksm = StructureModel()sm.add_edges_from([('A', 'B'),('A', 'C'),('B', 'D'),('C', 'D'),('C', 'E'),('D', 'E'),('D', 'F'),('E', 'G'),('F', 'G'),('G', 'H'),('G', 'I'),('H', 'I'),('I', 'J'),])# Create the Bayesian Networkbn = BayesianNetwork(sm)# Fit the Bayesian Networkbn = bn.fit_node_states(df)bn = bn.fit_cpds(df, method="BayesianEstimator", bayes_prior="K2")# BASELINE query:ie = InferenceEngine(bn)baseline = ie.query({})rounded_baseline = {outer_key: {inner_key: round(value, 1) for inner_key, value in inner_dict.items()}for outer_key, inner_dict in baseline.items()}print(rounded_baseline)def test(n={}):return rounded_baseline
Line 6: This line generates a simulated dataset for node
A
, using ...