diff --git a/notebooks/Alhazen.ipynb b/notebooks/Alhazen.ipynb index 201ca1eb..5a8c7702 100644 --- a/notebooks/Alhazen.ipynb +++ b/notebooks/Alhazen.ipynb @@ -15,15 +15,9 @@ "Given the many executions we can generate, it is only natural that these executions would also be subject to _machine learning_ in order to learn which features of the input (or the execution) would be associated with failures.\n", "\n", "In this chapter, we study the _Alhazen_ approach, one of the first of this kind.\n", - "Alhazen by Kampmann et al. [[KHSZ20](https://publications.cispa.saarland/3107/7/fse2020-alhazen.pdf)] automatically learn the associations between the failure of a program and _features of the input data_, say \"The error occurs whenever the `` element is negative\"" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ + "Alhazen by Kampmann et al. \\cite{Kampmann2020} automatically learns the associations between the failure of a program and _features of the input data_, say \"The error occurs whenever the `` element is negative\"\n", "\n", - "This chapter is based on an Alhazen implementation and exercise contributed by [Martin Eberlein](https://martineberlein.github.io), TU Berlin. Thanks a lot, Martin!" + "This chapter is based on an Alhazen implementation contributed by [Martin Eberlein](https://martineberlein.github.io) of TU Berlin. Thanks a lot, Martin!" ] }, { @@ -69,15 +63,6 @@ "import bookutils.setup" ] }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from typing import List, Tuple, Dict, Any, Optional" - ] - }, { "cell_type": "markdown", "metadata": {}, @@ -130,7 +115,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "In 2020, Kampmann et al. [[KHSZ20](https://publications.cispa.saarland/3107/7/fse2020-alhazen.pdf)] presented one of the first approaches to automatically learn circumstances of (failing) program behavior.\n", + "In 2020, Kampmann et al. \\cite{Kampmann2020} presented one of the first approaches to automatically learn circumstances of (failing) program behavior.\n", "Their approach associates the program’s failure with the _syntactical features_ of the input data, allowing them to learn and extract the properties that result in the specific behavior.\n", "\n", "Their reference implementation _Alhazen_ can generate a diagnosis and explain why, for instance, a particular bug occurs.\n", @@ -239,9 +224,18 @@ "metadata": {}, "outputs": [], "source": [ - " from fuzzingbook.Grammars import Grammar, EXPR_GRAMMAR, reachable_nonterminals, is_valid_grammar\n", - " from fuzzingbook.GrammarFuzzer import GrammarFuzzer, expansion_to_children, DerivationTree, tree_to_string, display_tree, is_nonterminal\n", - " from fuzzingbook.Parser import EarleyParser" + "from typing import List, Tuple, Dict, Any, Optional" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from fuzzingbook.Grammars import Grammar, EXPR_GRAMMAR, reachable_nonterminals, is_valid_grammar\n", + "from fuzzingbook.GrammarFuzzer import GrammarFuzzer, expansion_to_children, DerivationTree, tree_to_string, display_tree, is_nonterminal\n", + "from fuzzingbook.Parser import EarleyParser" ] }, { @@ -250,9 +244,9 @@ "metadata": {}, "outputs": [], "source": [ - " import pandas\n", - " import numpy\n", - " import matplotlib" + "import pandas\n", + "import numpy\n", + "import matplotlib" ] }, { @@ -318,14 +312,8 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Now, let's execute our two input samples and observe the calculator's behavior. To do this, we load the function `execute_samples` from the notebook ExecuteSamples.ipynb. We can call the function with a list of input samples, and it returns the corresponding execution outcome (label/oracle). The output is a [pandas dataframe](https://pandas.pydata.org/docs/reference/frame.html), and the labels are from the class `OracleResults`." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Next we implement the function `sample_runner(sample)` that lets us execute the calculator for a single sample. `sample_runner(sample)` returns the, in the pervious step imported, `OracleResult` for the sample." + "Now, let's execute our two input samples and observe the calculator's behavior.\n", + "We implement the function `sample_runner(sample)` that lets us execute the calculator for a single sample. `sample_runner(sample)` returns an `OracleResult` for the sample." ] }, { @@ -370,9 +358,9 @@ "def task_sqrt(x):\n", " \"\"\"Computes the square root of x, using the Newton-Raphson method\"\"\"\n", " if x <= -12 and x >= -42:\n", - " \tx = 0\n", + " x = 0 # Guess where the bug is :-)\n", " else:\n", - " \tx = 1\n", + " x = 1\n", " x = max(x, 0)\n", " approx = None\n", " guess = x / 2\n", @@ -620,10 +608,10 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "In this section, we are concerned with the problem of extracting semantic features from inputs. In particular, Alhazen defines various features based on the input grammar, such as *existance* and *numeric interpretation*. These features are then extracted from the parse trees of the inputs (see [Section 3 of the paper](https://publications.cispa.saarland/3107/7/fse2020-alhazen.pdf) for more details).\n", + "In this section, we are concerned with the problem of extracting semantic features from inputs. In particular, Alhazen defines various features based on the input grammar, such as *existence* and *numeric interpretation*. These features are then extracted from the parse trees of the inputs (see Section 3 of \\cite{Kampmann2020} for more details).\n", "\n", "The implementation of the feature extraction module consists of the following three tasks:\n", - "1. Implementation of individual feature classes, whose instances allow to derive specific feature values from inputs\n", + "1. Implementation of individual feature classes, whose instances allow deriving specific feature values from inputs\n", "2. Extraction of features from the grammar through instantiation of the aforementioned feature classes\n", "3. Computation of feature vectors from a set of inputs, which will then be used as input for the decision tree" ] @@ -1076,7 +1064,7 @@ "source": [ "**INPUT**:\n", "the function requires the following input parameter:\n", - "- sample: a input sample \n", + "- sample: an input sample \n", "- grammar: the grammar that should be transformed/extended" ] }, @@ -1170,7 +1158,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "We will also use `scikit-learn` as the machine learning library. We will use the `DecisionTreeClassifier` to learn the syntactical input features that are responsible for the bug-triggering behavior of our Calculator (SUT - SubjectUnderTest). We also use the `tree` module and `graphviz` to visualize the learned decision tree." + "We will also use `scikit-learn` as the machine learning library. We will use the `DecisionTreeClassifier` to learn the syntactical input features that are responsible for the bug-triggering behavior of our Calculator." ] }, { @@ -1188,21 +1176,23 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "First, we transform the individual input features (represented as dicts) into a NumPy array. For this example, we use the following four features (`function-sqrt`, `function-cos`, `function-sin`, `number`) to describe an input feature. (Please note that this is an extremely reduced example; this is not the complete list of features that should be extracted from the `CALCULATOR` Grammar.)" + "First, we transform the individual input features (represented as Python dictionaries) into a NumPy array.\n", + "For this example, we use the following four features (`function-sqrt`, `function-cos`, `function-sin`, `number`) to describe an input feature. (Please note that this is an extremely reduced example; this is not the complete list of features that should be extracted from the `CALCULATOR` Grammar.)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "The features `function-sqrt`, `function-cos`, `function-sin`state wheater the function _sqrt_, _cos_, or _sin_ was used. A `1`is given, if the sample contains the respective function, otherwise the feature contains a `0`." + "The features `function-sqrt`, `function-cos`, `function-sin` state whether the function _sqrt_, _cos_, or _sin_ was used.\n", + "A `1` is given if the sample contains the respective function, otherwise the feature contains a `0`." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "For the each (x), the `number` feature describes which value was used for `x`. For instance, the first input `sqrt(-900)` corresponds to 'function-sqrt': 1 and 'number': -900." + "For each (x), the `number` feature describes which value was used for `x`. For instance, the first input `sqrt(-900)` corresponds to 'function-sqrt': 1 and 'number': -900." ] }, { @@ -1227,7 +1217,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "We define a list of lables (or oracles) that state wheather the specific input file resulted in a bug or not. We use the `OracleResult`-Class to keep everything tidy and clean." + "We define a list of labels (or oracles) that state whether the specific input file resulted in a bug or not. We use the `OracleResult`-Class to keep everything tidy and clean." ] }, { @@ -1380,7 +1370,8 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "For _Alhazen's_ second activity (Train Classification Model), your are required to write a function `train_tree(data)` that trains a decision tree on a given data frame. `train_tree(data)` should return the learned decision tree." + "For _Alhazen's_ second step (Train Classification Model), we write a function `train_tree(data)` that trains a decision tree on a given data frame.\n", + "`train_tree(data)` should return the learned decision tree." ] }, { @@ -1417,14 +1408,16 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "**Note:** Each row of data['oracle'] is of type OracleResult. However, sci-kit learn requires an array of strings. Convert them to learn the decision tree." + "**Note:** Each row of data['oracle'] is of type `OracleResult`.\n", + "However, sci-kit learn requires an array of strings.\n", + "We have to convert them to learn the decision tree." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "**OUTPUT**: the function should return a learned decision tree of type _sklearn.tree._classes.DecisionTreeClassifier_" + "**OUTPUT**: the function returns a learned decision tree of type `_sklearn.tree._classes.DecisionTreeClassifier_`." ] }, { @@ -1523,7 +1516,8 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "**Note:** The sklearn DictVectorizer uses an internal sort function as default. This will result in different feature_name indices. If you want to use the Dictvectorizer please ensure that you only access the feature_names with the function vec.get_feature_names_out(). We recommend that you use the pandas Dataframe, since this is also the format used in the feedback loop." + "**Note:** The sklearn `DictVectorizer` uses an internal sort function as default. This will result in different feature_name indices. If you want to use the `Dictvectorizer` please ensure that you only access the feature_names with the function `vec.get_feature_names_out()`.\n", + "We recommend that you use the pandas Dataframe, since this is also the format used in the feedback loop." ] }, { @@ -1725,6 +1719,13 @@ " return idx\n", "\n", "\n", + "def find_existence_index(features: List[Feature], feature: Feature):\n", + " for idx, f in enumerate(features):\n", + " if isinstance(f, ExistenceFeature) and f.key() == feature.key():\n", + " return idx\n", + " raise AssertionError(\"There is no existence feature with this key!\")\n", + "\n", + "\n", "def remove_infeasible(clf, features: List[Feature]):\n", " for node in range(0, clf.tree_.node_count):\n", " if not is_leaf(clf, node):\n", @@ -1797,33 +1798,6 @@ "### Excursion: Converting Trees to Paths" ] }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import logging\n", - "\n", - "def tree_to_paths(tree, features: List[Feature]):\n", - " logging.info(\"Extracting requirements from tree ...\")\n", - " paths = []\n", - " # go through tree leaf by leaf\n", - " for path in all_path(tree):\n", - " requirements = []\n", - " is_bug = OracleResult.BUG == prediction_for_path(tree, path)\n", - " # find the requirements\n", - " box_ = box(tree, path, feature_names=features).transpose()\n", - " for feature, row in box_.iterrows():\n", - " mini = row['min']\n", - " maxi = row['max']\n", - " if (not numpy.isinf(mini)) or (not numpy.isinf(maxi)):\n", - " requirements.append(TreeRequirement(feature, mini, maxi))\n", - " paths.append(TreePath(None, is_bug, requirements))\n", - "\n", - " return paths\n" - ] - }, { "cell_type": "code", "execution_count": null, @@ -1963,6 +1937,30 @@ " return int(\"\".join([9] * int(maxi)))" ] }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def tree_to_paths(tree, features: List[Feature]):\n", + " paths = []\n", + " # go through tree leaf by leaf\n", + " for path in all_path(tree):\n", + " requirements = []\n", + " is_bug = OracleResult.BUG == prediction_for_path(tree, path)\n", + " # find the requirements\n", + " box_ = box(tree, path, feature_names=features).transpose()\n", + " for feature, row in box_.iterrows():\n", + " mini = row['min']\n", + " maxi = row['max']\n", + " if (not numpy.isinf(mini)) or (not numpy.isinf(maxi)):\n", + " requirements.append(TreeRequirement(feature, mini, maxi))\n", + " paths.append(TreePath(None, is_bug, requirements))\n", + "\n", + " return paths" + ] + }, { "cell_type": "markdown", "metadata": {}, @@ -2044,7 +2042,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Lets verify if we can negate a whole path." + "Let's verify if we can negate a whole path." ] }, { @@ -2064,7 +2062,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Lets verify if we can negate a whole path." + "Let's verify if we can negate a whole path." ] }, { @@ -2078,7 +2076,8 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "We will use the Decision tree and extract new input specifications to refine or refute our hypothesis (See paper Section 4.1 - Extracting Prediction Paths). These input specifications will be parsed to the input generator that tries to generate new inputs that fullfil the defined input specifications." + "We will use the Decision tree and extract new input specifications to refine or refute our hypothesis (See Section 4.1 - Extracting Prediction Paths in \\cite{Kampmann2020}).\n", + "These input specifications will be parsed to the input generator that tries to generate new inputs that fulfill the defined input specifications." ] }, { @@ -2126,7 +2125,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "We will use the Decision tree and extract new input specifications to refine or refute our hypothesis (See paper Section 4.1 - Extracting Prediction Paths). These input specifications will be parsed to the input generator that tries to generate new inputs that fullfil the defined input specifications." + "We will use the Decision tree and extract new input specifications to refine or refute our hypothesis (See paper Section 4.1 - Extracting Prediction Paths). These input specifications will be parsed to the input generator that tries to generate new inputs that fulfill the defined input specifications." ] }, { @@ -2213,7 +2212,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Lets validate our grammar, by using the grammar to produce 100 sample requirement specifications" + "Let's validate our grammar, by using the grammar to produce 100 sample requirement specifications" ] }, { @@ -2247,7 +2246,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Lets also try with some real requirement specifications" + "Let's also try with some real requirement specifications" ] }, { @@ -2497,18 +2496,18 @@ "metadata": {}, "source": [ "We implement a _Grammar-Based Input Generator_ that generates new input samples from a List of `Input Specifications`.\n", - "The Input Specifications are extracted from the decision tree boundaries in the previous Activity 3: _RequirementExtraction_.\n", + "The Input Specifications are extracted from the decision tree boundaries in the previous Step 3: `RequirementExtraction`.\n", "\n", - "An Input Specification consists of **1 to n** many predicates or requirements (e.g. feature '>=' value, or 'num(term) <= 13').\n", - "We generate a new input for each `InputSpecification`.\n", - "The new input fulfills all the given requirements of an InputSpecification." + "An `InputSpecification` consists of **1 to n** many predicates or requirements (e.g. ` >= value`, or `num() <= 13`).\n", + "We generate a new input for each `InputSpecification`.\n", + "The new input fulfills all the given requirements of an `InputSpecification`." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "**Info:** For further details, please refer to Section 4.4 and 4.5 of the paper and the Chapter Efficient Grammar Fuzzing in the fuzzingbook." + "**Info:** For further details, please refer to Section 4.4 and 4.5 of \\cite{Kampmann2020} and the Chapter [Efficient Grammar Fuzzing](https://www.fuzzingbook.org/html/GrammarFuzzer.html) in the Fuzzing Book." ] }, { @@ -2518,8 +2517,8 @@ "**INPUT**:\n", "the function requires the following input parameter:\n", "- grammar: the grammar this is used to produce new inputs (e.g. the CALCULATOR-Grammar)\n", - "- new_input_specification: a List of new inputs specifications (List\\[InputSpecification\\])\n", - "- timeout: a max time budget. Return the generated inputs when the timebudget is exeeded." + "- new_input_specification: a List of new inputs specifications (`List\\[InputSpecification\\]`)\n", + "- timeout: a max time budget. Return the generated inputs when the time budget is exceeded." ] }, { @@ -2675,15 +2674,6 @@ " return final_samples" ] }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "generate_samples = generate_samples_advanced" - ] - }, { "cell_type": "markdown", "metadata": {}, @@ -2712,6 +2702,13 @@ "execution_count": null, "metadata": {}, "outputs": [], + "source": [ + "generate_samples = generate_samples_advanced" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, "source": [ "### Excursion: Some Tests" ] @@ -2739,15 +2736,13 @@ "\n", "print('--generating samples--')\n", "# samples = generate_samples(CALC_GRAMMAR, [testspec0, testspec1], 10)\n", - "samples = generate_samples_advanced(CALC_GRAMMAR, [testspec2], 10)\n", + "samples = generate_samples(CALC_GRAMMAR, [testspec2], 10)\n", "samples" ] }, { - "cell_type": "code", - "execution_count": null, + "cell_type": "markdown", "metadata": {}, - "outputs": [], "source": [ "### End of Excursion" ] @@ -2859,9 +2854,7 @@ " for iteration in range(1, self._max_iter + 1):\n", " if self._verbose:\n", " print(f\"\\nIteration #{iteration}\")\n", - " self._iterate(self._previous_samples)\n", - "\n", - " return self._finalize()" + " self._iterate(self._previous_samples)" ] }, { @@ -2871,8 +2864,14 @@ "outputs": [], "source": [ "class Alhazen(Alhazen):\n", - " def _finalize(self):\n", - " return self._trees" + " def all_trees(self, /, prune: bool = True):\n", + " trees = self._trees\n", + " if prune:\n", + " trees = [remove_unequal_decisions(tree) for tree in self._trees]\n", + " return trees\n", + "\n", + " def last_tree(self, /, prune: bool = True):\n", + " return self.all_trees(prune=prune)[-1]" ] }, { @@ -2924,6 +2923,45 @@ " self._previous_samples = new_samples" ] }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "class Alhazen(Alhazen):\n", + " def all_feature_names(self, friendly: bool = True) -> List[str]:\n", + " if friendly:\n", + " all_feature_names = [f.friendly_name() for f in self._all_features]\n", + " else:\n", + " all_feature_names = [f.name for f in self._all_features]\n", + " return all_feature_names" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "class Alhazen(Alhazen):\n", + " def show_decision_tree(self, tree = None, friendly: bool = True):\n", + " return show_decision_tree(tree or self.last_tree(),\n", + " self.all_feature_names())" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "class Alhazen(Alhazen):\n", + " def friendly_decision_tree(self, tree = None):\n", + " return friendly_decision_tree(tree or self.last_tree(),\n", + " self.all_feature_names())" + ] + }, { "cell_type": "markdown", "metadata": {}, @@ -2938,41 +2976,69 @@ "We can finally run _Alhazen_!" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Set the number of refinement iterations and the timeout for the input generator.\n", + "The execution time of Alhazen mainly depends on the number of iterations." + ] + }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ - "# Set the number of refinement iterations and the timeout for the input generator\n", - "# The execution time of Alhazen mainly depends on the number of iterations\n", - "\n", "MAX_ITERATIONS = 20\n", "GENERATOR_TIMEOUT = 10 # timeout in seconds" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We initialize Alhazen with the previously used sample_list:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "initial_sample_list" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "And here we go! When initialized with `verbose=True`, Alhazen prints its progress during execution, issuing for each iteration\n", + "\n", + "* the last decision tree\n", + "* the new input specification resulting from the tree\n", + "* the new samples satisfying the input specification." + ] + }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ - "# We initialize Alhazen with the previously used sample_list (['sqrt(-16)', 'sqrt(4)'])\n", "alhazen = Alhazen(sample_runner, CALC_GRAMMAR, initial_sample_list,\n", " verbose=True,\n", " max_iterations=MAX_ITERATIONS,\n", " generator_timeout=GENERATOR_TIMEOUT)\n", - "\n", - "# and run it\n", - "# Alhazen returns a list of all the iteratively learned decision trees\n", - "trees = alhazen.run()" + "alhazen.run()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "Let's display the final decision tree learned by Alhazen. You can use the function `show_tree(decision_tree, features)` to display the final tree." + "To access the final decision tree learned by Alhazen, use:" ] }, { @@ -2981,8 +3047,14 @@ "metadata": {}, "outputs": [], "source": [ - "final_tree = trees[MAX_ITERATIONS-1]\n", - "final_tree" + "alhazen.last_tree()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Let's display it:" ] }, { @@ -2991,8 +3063,66 @@ "metadata": {}, "outputs": [], "source": [ - "all_features = extract_all_features(CALC_GRAMMAR)\n", - "all_feature_names = [f.friendly_name() for f in all_features]" + "alhazen.show_decision_tree()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We can also view the tree as text:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "print(alhazen.friendly_decision_tree())" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "In both views, we see that the failure is related to the `sqrt()` function being called with a negative value.\n", + "But what's the deal with the `` and `` fields?\n", + "For this, let's have a look at our sqrt function code:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import inspect\n", + "print(inspect.getsource(task_sqrt))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We see that Alhazen has correctly determined the boundaries of `x` for the bug - the `` value must be `4` or less (otherwise, the value of `x` will not trigger the bug); and `` and `` correctly reflect the boundaries.\n", + "(Note that `` comes with a sign, whereas `` has no sign.)\n", + "Not too bad for a machine learning approach :-)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Synopsis" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "This chapter provides an implementation of the _Alhazen_ approach [[KHSZ20](https://publications.cispa.saarland/3107/7/fse2020-alhazen.pdf)], which trains machine learning _classifiers_ from input features.\n", + "Given a test function, a grammar, and a set of inputs, the `Alhazen` class produces a decision tree that _characterizes failure circumstances_:" ] }, { @@ -3001,14 +3131,16 @@ "metadata": {}, "outputs": [], "source": [ - "show_decision_tree(final_tree, all_feature_names)" + "alhazen = Alhazen(sample_runner, CALC_GRAMMAR, initial_sample_list,\n", + " max_iterations=20)\n", + "alhazen.run()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "**Info:** The decision tree may contain unnecessary long paths, where the bug-class does not change. You can use the function `remove_unequal_decisions(decision_tree)` to remove those nodes." + "The final decision tree can be accessed using `last_tree()`:" ] }, { @@ -3017,7 +3149,14 @@ "metadata": {}, "outputs": [], "source": [ - "show_decision_tree(remove_unequal_decisions(final_tree), all_feature_names)" + "# alhazen.last_tree()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We can visualize the resulting decision tree using `Alhazen.show_decision_tree()`:" ] }, { @@ -3026,21 +3165,70 @@ "metadata": {}, "outputs": [], "source": [ - "print(friendly_decision_tree(final_tree, all_feature_names))" + "alhazen.show_decision_tree()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "## Synopsis" + "A decision tree is read from top to bottom.\n", + "Decision nodes (with two children) come with a _predicate_ on top.\n", + "This predicate is either\n", + "\n", + "* _numeric_, such as ` > 20`, indicating the numeric value of the given symbol, or\n", + "* _existential_, such as ` == '1'`, which has a _negative_ value when False, and a _positive_ value when True.\n", + "\n", + "If the predicate evaluates to `True`, follow the left path; if it evaluates to `False`, follow the right path.\n", + "A leaf node (no children) will give you the final decision `class = BUG` or `class = NO_BUG`.\n", + "\n", + "So if the predicate states ` == 'sqrt' <= 0.5`, this means that if the function is _not_ `sqrt`, follow the left (`True`) path. If it is `sqrt`, follow the right (`False`) path.\n", + "\n", + "The `samples` field shows the number of sample inputs that contributed to this decision.\n", + "The `gini` field (aka Gini impurity) indicates how many samples fall into the displayed class (`BUG` or `NO_BUG`).\n", + "A `gini` value of `0.0` means _purity_ - all samples fall into the displayed class.\n", + "The _saturation_ of nodes also indicates purity – the higher the saturation, the higher the purity." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "_For those only interested in using the code in this chapter (without wanting to know how it works), give an example. This will be copied to the beginning of the chapter (before the first section) as text with rendered input and output._" + "There is also a text version available, with much fewer (but hopefully still essential) details:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "print(alhazen.friendly_decision_tree())" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "In both representations, we see that the present failure is associated with a negative value for the `sqrt` function and precise boundaries for its value.\n", + "In fact, the error conditions are given in the source code:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import inspect\n", + "print(inspect.getsource(task_sqrt))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Try out Alhazen on your own code and your own examples!" ] }, { @@ -3055,9 +3243,9 @@ "source": [ "## Lessons Learned\n", "\n", - "* _Lesson one_\n", - "* _Lesson two_\n", - "* _Lesson three_" + "* Training _machine learners from input features_ can give important insights on failure circumstances.\n", + "* Generating _additional inputs_ based on feedback from the machine learner can greatly enhance precision.\n", + "* Applying machine learners on input and execution features is still at its infancy." ] }, { @@ -3072,10 +3260,7 @@ "source": [ "## Next Steps\n", "\n", - "_Link to subsequent chapters (notebooks) here, as in:_\n", - "\n", - "* [use _assertions_ to check conditions at runtime](Assertions.ipynb)\n", - "* [reduce _failing inputs_ for efficient debugging](DeltaDebugger.ipynb)\n" + "Our [next chapter](Repairer.ipynb) introduces _automated repair_ of programs, building on the fault localization and generalization mechanisms introduced so far." ] }, { @@ -3084,9 +3269,10 @@ "source": [ "## Background\n", "\n", - "_Cite relevant works in the literature and put them into context, as in:_\n", + "This chapter is built on the Alhazen paper by Kampmann et al. \\cite{Kampmann2020}.\n", "\n", - "The idea of ensuring that each expansion in the grammar is used at least once goes back to Burkhardt \\cite{Burkhardt1967}, to be later rediscovered by Paul Purdom \\cite{Purdom1972}." + "In \\cite{Eberlein2023}, Eberlein et al. introduced _Avicenna_, a new interpretation of Alhazen that makes use of the ISLa framework \\cite{Steinhoefel2022} to learn and produce input features.\n", + "Avicenna improves over Alhazen in terms of performance, expressiveness, and precision." ] } ], diff --git a/notebooks/DDSetDebugger.ipynb b/notebooks/DDSetDebugger.ipynb index 5b1a1d52..1ea37435 100644 --- a/notebooks/DDSetDebugger.ipynb +++ b/notebooks/DDSetDebugger.ipynb @@ -2515,7 +2515,10 @@ "source": [ "## Next Steps\n", "\n", - "Our [next chapter](Repairer.ipynb) introduces _automated repair_ of programs, building on the fault localization and generalization mechanisms introduced so far." + "The potential for determining how input features relate to bugs is not nearly explored yet. \n", + "The ALHAZEN work by Kampmann et al. \\cite{Kampmann2020} generalizes over DDSET differently, by investigating _semantic_ features of input elements such as their numeric interpretation or length and their correlation with failures. Like DDSET, ALHAZEN also uses a feedback loop to strengthen or refute its hypotheses.\n", + "\n", + "Our [next chapter](Alhazen.ipynb) introduces ALHAZEN." ] }, { @@ -2527,10 +2530,7 @@ "\n", "Our `DDSetDebugger` class implements the DDSET algorithm as introduced by Gopinath et al. in \\cite{Gopinath2020}. A [full-fledged implementation of DDSET](https://rahul.gopinath.org/post/2020/07/15/ddset/) with plenty of details and experiments is available as a Jupyter Notebook. Our implementation follows the [simplified implementation of DDSET, as described by Gopinath](https://rahul.gopinath.org/post/2020/08/03/simple-ddset/).\n", "\n", - "The potential for determining how input features relate to bugs is not nearly explored yet. \n", - "The ALHAZEN work by Kampmann et al. \\cite{Kampmann2020} generalizes over DDSET differently, by investigating _semantic_ features of input elements such as their numeric interpretation or length and their correlation with failures. Like DDSET, ALHAZEN also uses a feedback loop to strengthen or refute its hypotheses.\n", - "\n", - "In recent work \\cite{Gopinath2021}, Gopinath has extended the concept of DDSET further. His work on _evocative expressions_ introduces a _pattern language_ in which arbitrary DDSET-like patterns can be combined into Boolean formula that even more precisely capture and produce failure circumstances. In particular, evocative expressions can _specialize_ grammars towards Boolean pattern combinations, thus allowing for great flexibility in testing and debugging." + "In \\cite{Gopinath2021}, Gopinath has extended the concept of DDSET further. His work on _evocative expressions_ introduces a _pattern language_ in which arbitrary DDSET-like patterns can be combined into Boolean formula that even more precisely capture and produce failure circumstances. In particular, evocative expressions can _specialize_ grammars towards Boolean pattern combinations, thus allowing for great flexibility in testing and debugging." ] }, { diff --git a/notebooks/PICS/Alhazen-synopsis-1.png b/notebooks/PICS/Alhazen-synopsis-1.png new file mode 100644 index 00000000..1698d4b8 Binary files /dev/null and b/notebooks/PICS/Alhazen-synopsis-1.png differ diff --git a/notebooks/PICS/Alhazen-synopsis-1.svg b/notebooks/PICS/Alhazen-synopsis-1.svg new file mode 100644 index 00000000..bf90b9c1 --- /dev/null +++ b/notebooks/PICS/Alhazen-synopsis-1.svg @@ -0,0 +1,272 @@ + + + + + + +Tree + + + +0 + +<lead-digit> <= 3.5 +gini = 0.5 +samples = 1626 +value = [1.0, 1.0] +class = NO_BUG + + + +1 + +<function> == 'sqrt' <= 0.5 +gini = 0.346 +samples = 485 +value = [0.952, 0.272] +class = BUG + + + +0->1 + + +True + + + +14 + +<lead-digit> <= 4.5 +gini = 0.115 +samples = 1141 +value = [0.048, 0.728] +class = NO_BUG + + + +0->14 + + +False + + + +2 + +gini = 0.0 +samples = 260 +value = [0.0, 0.166] +class = NO_BUG + + + +1->2 + + + + + +5 + +<term> <= -11.75 +gini = 0.18 +samples = 225 +value = [0.952, 0.106] +class = BUG + + + +1->5 + + + + + +6 + +<value> <= 69.5 +gini = 0.065 +samples = 112 +value = [0.952, 0.033] +class = BUG + + + +5->6 + + + + + +11 + +gini = 0.0 +samples = 113 +value = [0.0, 0.072] +class = NO_BUG + + + +5->11 + + + + + +7 + +gini = -0.0 +samples = 60 +value = [0.952, 0.0] +class = BUG + + + +6->7 + + + + + +8 + +gini = 0.0 +samples = 52 +value = [0.0, 0.033] +class = NO_BUG + + + +6->8 + + + + + +15 + +<value> <= 41.505 +gini = 0.475 +samples = 120 +value = [0.048, 0.075] +class = NO_BUG + + + +14->15 + + + + + +24 + +gini = 0.0 +samples = 1021 +value = [0.0, 0.653] +class = NO_BUG + + + +14->24 + + + + + +16 + +<function> == 'sqrt' <= 0.5 +gini = 0.175 +samples = 11 +value = [0.048, 0.005] +class = BUG + + + +15->16 + + + + + +21 + +gini = 0.0 +samples = 109 +value = [0.0, 0.07] +class = NO_BUG + + + +15->21 + + + + + +17 + +gini = 0.0 +samples = 5 +value = [0.0, 0.003] +class = NO_BUG + + + +16->17 + + + + + +18 + +<term> == '<value>' <= 0.5 +gini = 0.074 +samples = 6 +value = [0.048, 0.002] +class = BUG + + + +16->18 + + + + + +19 + +gini = 0.0 +samples = 3 +value = [0.048, 0.0] +class = BUG + + + +18->19 + + + + + +20 + +gini = -0.0 +samples = 3 +value = [0.0, 0.002] +class = NO_BUG + + + +18->20 + + + + + diff --git a/notebooks/shared/fuzzingbook.bib b/notebooks/shared/fuzzingbook.bib index 303a6d64..1b3f4c51 100644 --- a/notebooks/shared/fuzzingbook.bib +++ b/notebooks/shared/fuzzingbook.bib @@ -1834,4 +1834,40 @@ @article{Boehme2018stads articleno = {7}, numpages = {52}, keywords = {extrapolation, security, measure of confidence, discovery probability, fuzzing, stopping rule, reliability, measure of progress, code coverage, Statistical guarantees, species coverage} -} \ No newline at end of file +} + +@inproceedings{Eberlein2023, +author = {Eberlein, Martin and Smytzek, Marius and Steinh\"{o}fel, Dominic and Grunske, Lars and Zeller, Andreas}, +title = {Semantic Debugging}, +year = {2023}, +isbn = {9798400703270}, +publisher = {Association for Computing Machinery}, +address = {New York, NY, USA}, +url = {https://doi.org/10.1145/3611643.3616296}, +doi = {10.1145/3611643.3616296}, +abstract = {Why does my program fail? We present a novel and general technique to automatically determine failure causes and conditions, using logical properties over input elements: “The program fails if and only if int() > len() holds—that is, the given is larger than the length.” Our AVICENNA prototype uses modern techniques for inferring properties of passing and failing inputs and validating and refining hypotheses by having a constraint solver generate supporting test cases to obtain such diagnoses. As a result, AVICENNA produces crisp and expressive diagnoses even for complex failure conditions, considerably improving over the state of the art with diagnoses close to those of human experts.}, +booktitle = {Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering}, +pages = {438–449}, +numpages = {12}, +keywords = {behavior explanation, debugging, program behavior, testing}, +location = {San Francisco, CA, USA}, +series = {ESEC/FSE 2023} +} + +@inproceedings{Steinhoefel2022, +author = {Steinh\"{o}fel, Dominic and Zeller, Andreas}, +title = {Input invariants}, +year = {2022}, +isbn = {9781450394130}, +publisher = {Association for Computing Machinery}, +address = {New York, NY, USA}, +url = {https://doi.org/10.1145/3540250.3549139}, +doi = {10.1145/3540250.3549139}, +abstract = {How can we generate valid system inputs? Grammar-based fuzzers are highly efficient in producing syntactically valid system inputs. However, programs will often reject inputs that are semantically invalid. We introduce ISLa, a declarative specification language for context-sensitive properties of structured system inputs based on context-free grammars. With ISLa, it is possible to specify input constraints like "a variable has to be defined before it is used," "the 'file name' block must be 100 bytes long," or "the number of columns in all CSV rows must be identical."Such constraints go into the ISLa fuzzer, which leverages the power of solvers like Z3 to solve semantic constraints and, on top, handles quantifiers and predicates over grammar structure. We show that a few ISLa constraints suffice to produce 100\% semantically valid inputs while still maintaining input diversity. ISLa can also parse and precisely validate inputs against semantic constraints.ISLa constraints can be mined from existing input samples. For this, our ISLearn prototype uses a catalog of common patterns, instantiates these over input elements, and retains those candidates that hold for the inputs observed and whose instantiations are fully accepted by input-processing programs. The resulting constraints can then again be used for fuzzing and parsing.}, +booktitle = {Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering}, +pages = {583–594}, +numpages = {12}, +keywords = {constraint mining, fuzzing, grammars, specification language}, +location = {Singapore, Singapore}, +series = {ESEC/FSE 2022} +}