diff --git a/notebooks/Alhazen.ipynb b/notebooks/Alhazen.ipynb index 0397895c..a7a75fd8 100644 --- a/notebooks/Alhazen.ipynb +++ b/notebooks/Alhazen.ipynb @@ -184,6 +184,364 @@ "START_SYMBOL = \"\"" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We see that the `CALCULATOR` Grammar consists of several production rules. The calculator subject will only accept inputs that conform to this grammar definition." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "
\n", + "[Info]: We use the functionallity provided by The Fuzzingbook. For a more detailed description of Grammars, have a look at the chapter Fuzzing with Grammars.\n", + "
" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Now, lets load two initial input samples:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Load initial input files\n", + "sample_list = ['sqrt(-16)', 'sqrt(4)']" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The two initial input samples for our calculator should be:\n", + "- _sqrt(-16)_\n", + "- _sqrt(4)_\n", + "\n", + "Let's check if this is true with python's `assert` function. The condition is True, if no Assertion is thrown." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Now, let's execute our two input samples and observe the calculator's behavior. To do this, we load the function `execute_samples` from the notebook ExecuteSamples.ipynb. We can call the function with a list of input samples, and it returns the corresponding execution outcome (label/oracle). The output is a [pandas dataframe](https://pandas.pydata.org/docs/reference/frame.html), and the labels are from the class `OracleResults`." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Next we implement the function `sample_runner(sample)` that lets us execute the calculator for a single sample. `sample_runner(sample)` returns the, in the pervious step imported, `OracleResult` for the sample." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from enum import Enum\n", + "\n", + "class OracleResult(Enum):\n", + " BUG = \"BUG\"\n", + " NO_BUG = \"NO_BUG\"\n", + " UNDEF = \"UNDEF\"\n", + "\n", + " def __str__(self):\n", + " return self.value" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "\"\"\"\n", + "This file contains the code under test for the example bug.\n", + "The sqrt() method fails on x <= 0.\n", + "\"\"\"\n", + "from math import tan as rtan\n", + "from math import cos as rcos\n", + "from math import sin as rsin\n", + "\n", + "\n", + "def task_sqrt(x):\n", + " \"\"\"Computes the square root of x, using the Newton-Raphson method\"\"\"\n", + " if x <= -12 and x >= -42:\n", + " \tx = 0\n", + " else:\n", + " \tx = 1\n", + " x = max(x, 0)\n", + " approx = None\n", + " guess = x / 2\n", + " while approx != guess:\n", + " approx = guess\n", + " guess = (approx + x / approx) / 2\n", + " return approx\n", + "\n", + "\n", + "def task_tan(x):\n", + " return rtan(x)\n", + "\n", + "\n", + "def task_cos(x):\n", + " return rcos(x)\n", + "\n", + "\n", + "def task_sin(x):\n", + " return rsin(x)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import pandas\n", + "import sys\n", + "\n", + "SUBJECT = \"calculator\"\n", + "\n", + "def sample_runner(sample):\n", + " testcode = sample\n", + "\n", + " try:\n", + " exec(testcode, {\"sqrt\": task_sqrt, \"tan\": task_tan, \"sin\": task_sin, \"cos\": task_cos}, {})\n", + " return OracleResult.NO_BUG\n", + " except ZeroDivisionError:\n", + " return OracleResult.BUG\n", + " except Exception as e:\n", + " print(e, file=sys.stderr)\n", + " return OracleResult.UNDEF" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Let's test the function:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "sample = \"sqrt(-16)\"\n", + "sample_runner(sample)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "As expected, the sample `sqrt(-16)` triggers the calculator bug. Let's try some more samples:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "assert sample_runner(\"sqrt(-23)\") == OracleResult.BUG\n", + "assert sample_runner(\"sqrt(44)\") == OracleResult.NO_BUG\n", + "assert sample_runner(\"cos(-9)\") == OracleResult.NO_BUG" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "What happens if we parse inputs to calculator, that do not conform to its input format?" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "sample_runner(\"undef_function(QUERY)\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The function `sample_runner(sample)` returns an `OracleResult.UNDEF` whenever the runner is not able to execute the sample." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "
\n", + "To work reliably, you have to remove all samples from the learning set of Alhazen that do not conform to the grammar. \n", + "
" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The finally we provide the function 'execute_samples(sample_list)' that obtians the oracle/label for a list of samples." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import uuid\n", + "\n", + "# executes a list of samples and return the execution outcome (label)\n", + "# the functions returns a pandas dataframe\n", + "def execute_samples(sample_list):\n", + " data = []\n", + " for sample in sample_list:\n", + " id = uuid.uuid1()\n", + " result = sample_runner(sample)\n", + " data.append({\n", + " # \"sample_id\": id.hex,\n", + " # \"sample\": sample,\n", + " # \"subject\": SUBJECT,\n", + " \"oracle\": result\n", + " })\n", + " return pandas.DataFrame.from_records(data)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# let us define a list of samples to execute\n", + "sample_list = [\"sqrt(-20)\", \"cos(2)\", \"sqrt(-100)\", \"undef_function(foo)\"]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# we obtain the execution outcome\n", + "labels = execute_samples(sample_list)\n", + "display(labels)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# combine with the sample_list\n", + "for i, row in enumerate(labels['oracle']): print(sample_list[i].ljust(30) + str(row))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "To remove the undefined input samples, you could invoke something similar to this:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# clean up data\n", + "clean_data = labels.drop(labels[labels.oracle.astype(str) == \"UNDEF\"].index)\n", + "display(clean_data)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Load function execute samples\n", + "# execute_samples(List[str])\n", + "oracle = execute_samples(sample_list)\n", + "oracle" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Combined sample and labels by iterating over the obtained oracle\n", + "for i, row in enumerate(oracle['oracle']):\n", + " print(sample_list[i].ljust(30) + str(row))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We observe that the sample `sqrt(-16)` triggers a bug in the calculator, whereas the sample `sqrt(4)` does not show unusual behavior. Of course, we want to know why the sample fails the program. In a typical use case, the developers of the calculator program would now try other input samples and evaluate if similar inputs also trigger the program's failure. Let's try some more input samples; maybe we can refine our understanding of why the calculator crashes:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Our guesses (maybe the failure is also in the cos or tan function?)\n", + "guess_samples = ['cos(-16)', 'tan(-16)', 'sqrt(-100)', 'sqrt(-20.23412431234123)']\n", + "\n", + "# lets obtain the execution outcome for each of our guess\n", + "guess_oracle = execute_samples(guess_samples)\n", + "\n", + "# lets show the results\n", + "for i, row in enumerate(guess_oracle['oracle']):\n", + " print(guess_samples[i].ljust(30) + str(row))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "It looks like the failure only occurs in the `sqrt` function, however, only for specific `x` values. We could now try other values for `x` and repeat the process. However, this would be highly time-consuming and not an efficient debugging technique for a larger and more complex test subject." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Wouldn't it be great if there was a tool that automatically does this for us? And this is exactly what _Alhazen_ is used for. It helps us explain why specific input files fail a program. " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "
\n", + "[Info]: Alhazen is a tool that automatically learns the circumstances of program failure by associating syntactical features of sample inputs with the execution outcome. The produced explanations (in the form of a decision tree) help developers focus on the input space's relevant aspects.\n", + "
" + ] + }, { "cell_type": "markdown", "metadata": {}, @@ -719,49 +1077,25 @@ "metadata": {}, "outputs": [], "source": [ - "# TODO Add better test case for correct validation\n", - "\n", - "transformed_grammar = transform_grammar(\"1 + 2\", EXPR_GRAMMAR)\n", - "for rule in transformed_grammar:\n", - " print(rule.ljust(10), transformed_grammar[rule])" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Learning Syntactical Features" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Decision Trees" + "# TODO Add better test case for correct validation\n", + "\n", + "transformed_grammar = transform_grammar(\"1 + 2\", EXPR_GRAMMAR)\n", + "for rule in transformed_grammar:\n", + " print(rule.ljust(10), transformed_grammar[rule])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "Lets import the oracle definition that states weather a bug was present or not from the helper scripts:" + "## Learning Syntactical Features" ] }, { - "cell_type": "code", - "execution_count": null, + "cell_type": "markdown", "metadata": {}, - "outputs": [], "source": [ - "from enum import Enum\n", - "\n", - "class OracleResult(Enum):\n", - " BUG = \"BUG\"\n", - " NO_BUG = \"NO_BUG\"\n", - " UNDEF = \"UNDEF\"\n", - "\n", - " def __str__(self):\n", - " return self.value" + "### Decision Trees" ] }, { @@ -1973,13 +2307,6 @@ "If no assertion is triggered, then everything seems to work." ] }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [] - }, { "cell_type": "markdown", "metadata": {}, @@ -2215,238 +2542,6 @@ "samples" ] }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Executing Input Files" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Let us build the functionality to execute the calculator subject with a list of input samples." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Next we implement the function `sample_runner(sample)` that lets us execute the calculator for a single sample. `sample_runner(sample)` returns the, in the pervious step imported, `OracleResult` for the sample." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "\"\"\"\n", - "This file contains the code under test for the example bug.\n", - "The sqrt() method fails on x <= 0.\n", - "\"\"\"\n", - "from math import tan as rtan\n", - "from math import cos as rcos\n", - "from math import sin as rsin\n", - "\n", - "\n", - "def task_sqrt(x):\n", - " \"\"\"Computes the square root of x, using the Newton-Raphson method\"\"\"\n", - " if x <= -12 and x >= -42:\n", - " \tx = 0\n", - " else:\n", - " \tx = 1\n", - " x = max(x, 0)\n", - " approx = None\n", - " guess = x / 2\n", - " while approx != guess:\n", - " approx = guess\n", - " guess = (approx + x / approx) / 2\n", - " return approx\n", - "\n", - "\n", - "def task_tan(x):\n", - " return rtan(x)\n", - "\n", - "\n", - "def task_cos(x):\n", - " return rcos(x)\n", - "\n", - "\n", - "def task_sin(x):\n", - " return rsin(x)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import pandas\n", - "\n", - "SUBJECT = \"calculator\"\n", - "\n", - "def sample_runner(sample):\n", - " testcode = sample\n", - "\n", - " try:\n", - " exec(testcode, {\"sqrt\": task.sqrt, \"tan\": task.tan, \"sin\": task.sin, \"cos\": task.cos}, {})\n", - " return OracleResult.NO_BUG\n", - " except ZeroDivisionError:\n", - " return OracleResult.BUG\n", - " except:\n", - " return OracleResult.UNDEF" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Let's test the function:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "sample = \"sqrt(-16)\"\n", - "sample_runner(sample)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "As expected, the sample `sqrt(-16)` triggers the calculator bug. Let's try some more samples:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "assert sample_runner(\"sqrt(-23)\") == OracleResult.BUG\n", - "assert sample_runner(\"sqrt(44)\") == OracleResult.NO_BUG\n", - "assert sample_runner(\"cos(-9)\") == OracleResult.NO_BUG" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "What happens if we parse inputs to calculator, that do not conform to its input format?" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "sample_runner(\"undef_function(QUERY)\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "The function `sample_runner(sample)` returns an `OracleResult.UNDEF` whenever the runner is not able to execute the sample." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "
\n", - "To work reliably, you have to remove all samples from the learning set of Alhazen that do not conform to the grammar. \n", - "
" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "The finally we provide the function 'execute_samples(sample_list)' that obtians the oracle/label for a list of samples." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import uuid\n", - "\n", - "# executes a list of samples and return the execution outcome (label)\n", - "# the functions returns a pandas dataframe\n", - "def execute_samples(sample_list):\n", - " data = []\n", - " for sample in sample_list:\n", - " id = uuid.uuid1()\n", - " result = sample_runner(sample)\n", - " data.append({\n", - " # \"sample_id\": id.hex,\n", - " # \"sample\": sample,\n", - " # \"subject\": SUBJECT,\n", - " \"oracle\": result\n", - " })\n", - " return pandas.DataFrame.from_records(data)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# let us define a list of samples to execute\n", - "sample_list = [\"sqrt(-20)\", \"cos(2)\", \"sqrt(-100)\", \"undef_function(foo)\"]" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# we obtain the execution outcome\n", - "labels = execute_samples(sample_list)\n", - "display(labels)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# combine with the sample_list\n", - "for i, row in enumerate(labels['oracle']): print(sample_list[i].ljust(30) + str(row))" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "To remove the undefined input samples, you could invoke something similar to this:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# clean up data\n", - "clean_data = labels.drop(labels[labels.oracle.astype(str) == \"UNDEF\"].index)\n", - "display(clean_data)" - ] - }, { "cell_type": "markdown", "metadata": {},