I am not particularly proud of the way the code in this project is organized. That being said, I think it still shows some problem solving ability as well as dedication to learning new skills/technologies (I'd not really used Pandas or NumPy prior). That is why I'm making it public as something I can showcase.
If I was to go back and do it over again I'd use OO design. I'd also just use the scikit-learn libraries for cross-validation, but doing it from scratch at least once is probably beneficial.
For anyone curious, we were trying to use Bayesian networks to predict customer churn for a company whose business model is software subscriptions. The code in this project allows the user to create a directed acyclic graph (DAG) by passing in a list of ordered-pairs that represent the directed edges between the nodes in the graph. In addition to this DAG, the user must supply a .csv file whose columns are a superset of the variables in the model (the graph nodes). If these two conditions are met, then we will "train" the BN associated with the DAG by automatically constructing conditional probability tables for each node in the network and calling the appropriate methods from pgmPy's libraries to build the network. This network will then be cross-validated with as many folds as the user specifies. Unique environment instantiations are caluculated only once and then their values are inserted into a symbol table to avoid redundant computation.