Skip to content

Commit

Permalink
Redo fuzzer exercises to be based around tables
Browse files Browse the repository at this point in the history
The previous bug using list items was not a good example.

This also requires exploring a larger search space so we allow the browser to
be reloaded using SIGHUP.
  • Loading branch information
adetaylor committed Jul 9, 2024
1 parent 9adbcab commit d12edee
Show file tree
Hide file tree
Showing 13 changed files with 197 additions and 405 deletions.
2 changes: 1 addition & 1 deletion docs/exercise-4b-hints/hint1.md
Original file line number Diff line number Diff line change
@@ -1 +1 @@
The bug is in the handling of a standard, common, HTML tag - specifically, an HTML tag which was *not* included in `browser.py`.
The bug is in the handling of some standard, fairly common, HTML tags.
3 changes: 2 additions & 1 deletion docs/exercise-4b-hints/hint2.md
Original file line number Diff line number Diff line change
@@ -1 +1,2 @@
Perhaps `browser.py` was feeling a bit listless.
Although you're not allowed to look in `html_table.py`, perhaps the name of that file
gives you some clues about what sorts of HTML tags might be involved?
2 changes: 1 addition & 1 deletion docs/exercise-4b-hints/hint3.md
Original file line number Diff line number Diff line change
@@ -1 +1 @@
There are several standard HTML list tags - `ul`, `ol`, `li` (and various others).
There are several standard HTML table tags - `tr`, `td`, `table` (and various others).
2 changes: 1 addition & 1 deletion docs/exercise-4b-hints/hint4.md
Original file line number Diff line number Diff line change
@@ -1 +1 @@
There is a combination of these list-related tags which will cause the browser to crash. You should write code to generate random combinations of these tags and intervening data.
There is a combination of these table-related tags which will cause the browser to crash. You should write code to generate random combinations of these tags and intervening data.
2 changes: 2 additions & 0 deletions docs/exercise4a.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,8 @@ Any of the following counts as a success:
Rules:
* You can only do this by *altering the HTML content of the web page*. Remember,
you're a website operator. You *cannot* change the browser code.
* You *must not* look in the `html_table.py` file, because that's for
a subsequent exercise.

> [!TIP]
> If you find a bug which makes one website look like another one,
Expand Down
6 changes: 4 additions & 2 deletions docs/exercise4b.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,9 +6,11 @@ the hidden security bugs in the previous exercise.
Now you're going to write a program to find more security bugs. A program which
finds security bugs by testing another program is called a "fuzzer".

**Note**: this exercise probably only works on Linux, Mac or Chromebooks.

Do this:

* Do *NOT* look at the code for `src/fuzzer/browser-v2.py`. That is cheating!
* Do *NOT* look at the code for `src/browser/html_table.py`. That is cheating!
* Open `src/fuzzer/fuzzer.py` in VSCode and read it.
* Run `python3 src/fuzzer/fuzzer.py`. Watch what it does.
* Control-C to cancel it.
Expand All @@ -18,7 +20,7 @@ Now:
1. Modify *one single number* in the `generate_testcase` function so that it
finds one of the security bugs. Run the fuzzer again.
2. Now, modify `generate_testcase` to find another bug which is hidden in
`src/fuzzer/browser-v2.py`. Do *not* look at its code - that's cheating!
`src/browser/html_table.py`. Do *not* look at its code - that's cheating!
To be clear, this is an _extra_ security bug which wasn't in `browser.py`.

## General hints (no spoilers! Fine to read)
Expand Down
2 changes: 2 additions & 0 deletions docs/for-teachers.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,3 +22,5 @@ be run on an online Python REPL. The kids will need a real computer capable
of running Python locally, and they'll need to be able to install a few Python
libraries using `pip`. You should carefully run through the [setup requirements](setup.md)
before deciding if this project is right for you.

You may find [solutions at this page](solutions.md).
13 changes: 13 additions & 0 deletions docs/solutions.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
# Solutions

Example solutions (there may be others!)

# Exercise 4b

```
text = ""
num = random.randrange(0, 12)
for x in range(0, num):
text += random.choice(["<tr>", "</tr>", "</td>", "<td>", "<table>", "</table>", "hello"])
return text
```
93 changes: 66 additions & 27 deletions src/browser/browser.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,11 +17,13 @@
# Simple demo python web browser. Lacks all sorts of important features.

from PyQt6.QtWidgets import QApplication, QWidget, QMainWindow, QVBoxLayout, QHBoxLayout, QPushButton, QLabel, QLineEdit, QSizePolicy
from PyQt6.QtCore import QSettings, Qt, QPoint, QSize, QEvent
from PyQt6.QtCore import QSettings, Qt, QPoint, QSize, QSocketNotifier
from PyQt6.QtGui import QFont, QMouseEvent, QPainter, QFontMetrics
import requests
import os
import signal
import sys
import html_table # do not look inside this file, that would be cheating on a later exercise
from html.parser import HTMLParser
from urllib.parse import urlparse

Expand All @@ -37,7 +39,7 @@ class Renderer(HTMLParser, QWidget):
the right sort of text in the right places.
"""

def __init__(self, parent=None):
def __init__(self, browser, parent=None):
"""
Code which is run when we create a new Renderer.
"""
Expand All @@ -54,21 +56,14 @@ def __init__(self, parent=None):
# e.g.
# (10, 20, 50, 30, "http://foo.com")
self.html = ""
self.browser = None
self.browser = browser

def minimumSizeHint(self):
"""
Returns the smallest possible size on the screen for our renderer.
"""
return QSize(800, 400)

def set_browser(self, browser):
"""
Remembers a reference to the browser object, so we can tell
the browser later when a link is clicked.
"""
self.browser = browser

def mouseReleaseEvent(self, event: QMouseEvent | None) -> None:
"""
Handle a click somewhere in the renderer area. See if it
Expand Down Expand Up @@ -118,6 +113,7 @@ def paintEvent(self, event):
self.space_needed_before_next_data = False
self.current_link = None # if we're in a <a href=...> hyperlink
self.known_links = list() # Links anywhere on the page
self.table = None # whether we're in an HTML table
# The following call interprets all the HTML in page_html.
# You can't see most of the code which does this because it's
# in the library which provides the HTMLParser class. But it will
Expand All @@ -127,6 +123,9 @@ def paintEvent(self, event):
# handle_data and handle_endtag depending on what's inside self.html.
self.feed(self.html)
self.painter = None
# Ignore the following two lines, they're used for exercise 4b only
if os.environ.get("OUTPUT_STATUS") is not None:
print("Rendering completed\n", flush=True)

def handle_starttag(self, tag, attrs):
"""
Expand All @@ -139,6 +138,13 @@ def handle_starttag(self, tag, attrs):
# Stuff inside these tags isn't actually HTML
# to display on the screen.
self.ignore_current_text = True
if self.table is not None:
# If we're inside a table, handle table-related tags but no others
if tag == 'tr':
self.table.handle_tr_start()
if tag == 'td':
self.table.handle_td_start()
return
if tag == 'b' or tag == 'strong':
self.is_bold = True
if tag == 's':
Expand Down Expand Up @@ -174,12 +180,20 @@ def handle_starttag(self, tag, attrs):
heading_number = int(tag[1])
font_size_difference = FONT_SIZE_INCREASES_FOR_HEADERS_1_TO_6[heading_number - 1]
self.font_size += font_size_difference
if tag == 'table':
self.table = html_table.HTMLTable()
self.space_needed_before_next_data = True

def handle_endtag(self, tag):
"""
Handle an HTML end tag, for example </a> or </b>
"""
if self.table is not None:
# If we're inside a table, handle table end but no other tags
if tag == 'table':
self.y_pos = self.table.handle_table_end(self.y_pos, lambda x, y, content: self.draw_text(x, y, content))
self.table = None
return
if tag == 'br' or tag == 'p': # move to a new line
self.newline()
if tag == 'script' or tag == 'style' or tag == 'title':
Expand Down Expand Up @@ -221,6 +235,21 @@ def handle_data(self, data):
if self.space_needed_before_next_data:
self.space_needed_before_next_data = False
data = ' ' + data
if self.table is not None:
# If we're inside a table, ask our table layout code to
# figure out where to draw it later
self.table.handle_data(data)
else:
(text_width, text_height) = self.draw_text(self.x_pos, self.y_pos, data)
self.x_pos = self.x_pos + text_width
if text_height > self.tallest_text_in_previous_line:
self.tallest_text_in_previous_line = text_height

def draw_text(self, x_pos, y_pos, text):
"""
Draw some text on the screen.
Returns a tuple of (x, y) space occupied
"""
# Work out what font we'll draw this in.
weight = QFont.Weight.Normal
if self.is_bold:
Expand All @@ -233,26 +262,24 @@ def handle_data(self, data):
self.painter.setPen(fill)
# Work out the size of the text we're about to draw.
text_measurer = QFontMetrics(font)
text_width = int(text_measurer.horizontalAdvance(data))
text_width = int(text_measurer.horizontalAdvance(text))
text_height = int(text_measurer.height())
# Tell our GUI canvas to draw some text! The important bit!
self.painter.drawText(QPoint(self.x_pos, self.y_pos + text_height), data)
self.painter.drawText(QPoint(x_pos, y_pos + text_height), text)
# If we're in a hyperlink, underline it and record its coordinates
# in case it gets clicked later.
if self.current_link is not None:
self.painter.drawLine(self.x_pos, self.y_pos + text_height, self.x_pos + text_width, self.y_pos + text_height)
self.known_links.append((self.x_pos, self.y_pos, self.x_pos + text_width, self.y_pos + text_height, self.current_link))
self.painter.drawLine(x_pos, y_pos + text_height, x_pos + text_width, y_pos + text_height)
self.known_links.append((x_pos, y_pos, x_pos + text_width, y_pos + text_height, self.current_link))
# Strikethrough - draw a line over the text but only
# if we don't cover more than 50% of it, we don't want it illegible
if self.is_strikethrough:
fraction_of_text_covered = 6 / self.font_size
if fraction_of_text_covered <= 0.5:
strikethrough_line_y_pos = self.y_pos + (self.font_size / 2) - 80
self.canvas.create_line(self.x_pos, strikethrough_line_y_pos,
self.x_pos + text_width, strikethrough_line_y_pos)
self.x_pos = self.x_pos + text_width
if text_height > self.tallest_text_in_previous_line:
self.tallest_text_in_previous_line = text_height
strikethrough_line_y_pos = y_pos + (self.font_size / 2) - 80
self.canvas.create_line(x_pos, strikethrough_line_y_pos,
x_pos + text_width, strikethrough_line_y_pos)
return (text_width, text_height)


class Browser(QMainWindow):
Expand Down Expand Up @@ -283,20 +310,21 @@ def __init__(self, initial_url):
toolbar.setLayout(toolbar_layout)
overall_layout = QVBoxLayout()
overall_layout.addWidget(toolbar)
self.renderer = Renderer()
self.renderer.set_browser(self)
self.renderer = Renderer(self)
overall_layout.addWidget(self.renderer)
self.status_bar = QLabel("Status:")
overall_layout.addWidget(self.status_bar)
widget = QWidget()
widget.setLayout(overall_layout)
self.setCentralWidget(widget)
# Set up somewhere to remember the last URL the user used
self.settings = QSettings("browser-learning", "browser")
if initial_url is None:
initial_url = self.settings.value("url", "https://en.wikipedia.org", type=str)
self.set_window_url(initial_url)
else:
self.navigate(initial_url)
self.set_window_url(initial_url)
self.setup_fuzzer_handling() # ignore

def go_button_clicked(self):
"""
Expand Down Expand Up @@ -326,9 +354,6 @@ def set_status(self, message):
Update the status line at the bottom of the screen
"""
self.status_bar.setText(message)
# Ignore the following two lines, they're used for exercise 4b only
if os.environ.get("OUTPUT_STATUS") is not None:
print(message + "\n", flush=True)

def set_window_url(self, url):
"""
Expand Down Expand Up @@ -377,6 +402,20 @@ def setup_encryption(self, url):
elif "REQUESTS_CA_BUNDLE" in os.environ:
del os.environ["REQUESTS_CA_BUNDLE"]

def setup_fuzzer_handling(self):
"""
Ignore this function - it's used to set up
fuzzing for some of the later exercises.
"""
self.reader, self.writer = os.pipe()
signal.signal(signal.SIGHUP, lambda _s, _h: os.write(self.writer, b'a'))
notifier = QSocketNotifier(self.reader, QSocketNotifier.Type.Read, self)
notifier.setEnabled(True)
def signal_received():
os.read(self.reader, 1)
window.go_button_clicked()
notifier.activated.connect(signal_received)


#########################################
# Main program here
Expand All @@ -402,4 +441,4 @@ def setup_encryption(self, url):
# we need to display something on the screen, along with
# methods above like "go_button_clicked" or "mouseReleaseEvent"
# when the user interacts with the app.
app.exec()
app.exec()
80 changes: 80 additions & 0 deletions src/browser/html_table.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,80 @@
# Copyright 2024 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

#####################################
#####################################
#####################################
### DO NOT LOOK INSIDE THIS FILE! ###
#####################################
#####################################
#####################################
#####################################
# This contains spoilers for exercise
# 4b. Reading this code is cheating!
#####################################
#####################################
#####################################
#####################################

class HTMLTable:
def __init__(self):
self.rows = list()

def handle_tr_start(self):
self.rows.append(list())

def handle_td_start(self):
if len(self.rows) == 0: # no tr was found
return
self.rows[-1].append("")

def handle_data(self, data):
if len(self.rows) == 0: # no tr was found
return
if len(self.rows[-1]) == 0: # no td was found
return
self.rows[-1][-1] += data

def handle_table_end(self, initial_y_pos, draw_at):
"""
Draws the table, using the passed function which takes
x and y positions and content, draws the content,
and returns a tuple of (x, y) space
occupied.
Returns the y position after the table is drawn.
"""
if len(self.rows) == 0:
return initial_y_pos
y_pos = initial_y_pos
column_widths = list()
first_row = True
# Column widths are based on the first row space
# occupied. A real algorithm would consider other rows.
for row in self.rows:
max_height = 0
if first_row:
first_row = False
for cell in row:
current_x_pos = sum(column_widths)
(width, height) = draw_at(current_x_pos, y_pos, cell)
column_widths.append(width + 10) # padding
max_height = max(max_height, height)
else:
current_x_pos = 0
for n, cell in enumerate(row):
(_, height) = draw_at(current_x_pos, y_pos, cell)
max_height = max(max_height, height)
current_x_pos += column_widths[n]
y_pos += max_height + 10 # padding
return y_pos
Loading

0 comments on commit d12edee

Please sign in to comment.