Skip to content

Commit

Permalink
Add some regular expression note about html tag
Browse files Browse the repository at this point in the history
  • Loading branch information
crazyguitar committed Mar 2, 2016
1 parent d17db51 commit 33f84ca
Showing 1 changed file with 61 additions and 0 deletions.
61 changes: 61 additions & 0 deletions docs/source/notes/python-rexp.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,67 @@
Python Regular Expression cheatsheet
====================================

Compare HTML tags
-----------------

+------------+--------------+--------------+
| tag type | format | example |
+============+==============+==============+
| all tag | <[^>]+> | <br />, <a> |
+------------+--------------+--------------+
| open tag | <[^/>][^>]*> | <a>, <table> |
+------------+--------------+--------------+
| close tag | </[^>]+> | </p>, </a> |
+------------+--------------+--------------+
| self close | <[^/>]+/> | <br /> |
+------------+--------------+--------------+


.. code-block:: python
# open tag
>>> re.search('<[^/>][^>]*>', '<table>') != None
True
>>> re.search('<[^/>][^>]*>', '<a href="#label">') != None
True
>>> re.search('<[^/>][^>]*>', '<img src="/img">') != None
True
>>> re.search('<[^/>][^>]*>', '</table>') != None
False
# close tag
>>> re.search('</[^>]+>', '</table>') != None
True
# self close
>>> re.search('<[^/>]+/>', '<br />') != None
True
``re.findall()`` match string
-----------------------------

.. code-block:: python
# split all string
>>> re.findall('[\w]+', source)
['Hello', 'World', 'Ker', 'HAHA']
# parsing python.org website
>>> import urllib
>>> import re
>>> s = urllib.urlopen('https://www.python.org')
>>> html = s.read()
>>> s.close()
>>> print "open tags"
open tags
>>> re.findall('<[^/>][^>]*>', html)[0:2]
['<!doctype html>', '<!--[if lt IE 7]>']
>>> print "close tags"
close tags
>>> re.findall('</[^>]+>', html)[0:2]
['</script>', '</title>']
>>> print "self-closing tags"
Group Comparison
----------------

Expand Down

0 comments on commit 33f84ca

Please sign in to comment.