This repo contains very basic syntax on python for revision for a new learner. Content has been taken from Python for beginners course on coursera.
-- Use Dir on any object to check all methods that can be done on it. Ex: l=list() dir(l)
list=[], l=list() # creates a list
list[3] # used to access elements of a list (Note- indexing starts from index 0)
list[1:4], list[2:], list[:8] # different slicings of a list to get a sublist
(Note- [a:b] means a to b, not including b)
list.split() # splits a line into different parts about every space i.e. gives a list of words
list.split('@') # splits the list into different parts about the symbol in single quotes
dict={}, d=dict() # creates a dictionary: collection of key-value pairs
dict['key'] # used to access elements of a list (Note- no sequence in preserved in the dict elements)
dict.keys(), list(dict) # returns a list of dictionary keys
dict.values() # returns a list of dictionary values
dict.items() # returns a list of tupples (k,v) with key, value pairs
get('key',default) method:
Used to check if a key is in the list or not.
If present then returns it's value else initialises that key with a default value.
Ex-
for name in names:
counts[name]=counts.get(name,0) + 1;
These are more efficient than lists bcoz they can be stored more densely as they are immutable.
tp=( , , , ..) tp=tuple() #create a tuple
(x,y)=(2,'Ram')
for (k,v) in d.items() # d is dictionary and d.items gives tuples in key value pairs
print(k,v)
sorted(d.items(), reverse=True)
Comparing 2 tuples is possible
Simple code using above 3 data structures Q. Find top 10 most common words in a given file. Code:
fname="file.txt"
fhand= open(fname)
for line in fhand:
words=line.split()
for word in words:
counts[word]= counts.get(word,0) + 1
lst=list()
for k,v in counts.items()
newtup= (v,k)
lst.append(v,k)
lst=sorted( lst, reverse=True )
for v,k in lst[:10]
print(k,v)
With this we can write last 8 lines of above code in 1 line
In list comprehension, we define a list not by actual elements but by an expression.
For example: python a=[(v,k) for (k,v) in d.items()]
So we can write last 8 lines of above code as:
print(sorted( [ (v,k) for k,v in counts.items() ], reverse= True ) )
Very handy for making searches in strings and documents with specific type of pattern like we want to search all words starting with 'H' and ending with 'o' rather than finding whether 'Hello' is present or not. For this, we use wildcard characters. https://docs.python.org/3/howto/regex.html
Python Regular Expression Quick Guide:
^ Matches the beginning of a line
$ Matches the end of the line
. Matches any character
\s Matches whitespace
\S Matches any non-whitespace character
* Repeats a character zero or more times
*? Repeats a character zero or more times
(non-greedy)
+ Repeats a character one or more times
+? Repeats a character one or more times
(non-greedy)
[aeiou] Matches a single character in the listed set
[^XYZ] Matches a single character not in the listed set
[a-z0-9] The set of characters can include a range
( Indicates where string extraction is to start
) Indicates where string extraction is to end
\ Use this when you want to actually match a special character from above like + but it can't be directly used
as it will be interpreted as 'Repeats a character one or more times' but \+ will simply search for + symbol
as part of the pattern
#re.search()- check if a string matches a regular expression
#re.findall()- extract portions of string that matches our regular expressions
import re
hand=open('text.txt')
for line in hand:
line=line.rstrip()
if re.search('^From:',line)
print(line)
#EX--> ^X.*: --> starts with X, followed by 1 or more (due to +) non-blank characters (due to \S) and then ':'
import re
x='I like 21 and 51'
y=re.findall('[0-9]+',x) # find all numbers-characters b/w 0 to 9 one or more times
# now y=['21','51'] -- these are strings not integers
by default, findall() does greedy matching i.e. if 2 strings match our expression then it will pick up the bigger string
we can select smaller one by using '?' sign
'^From .*@([^ ]*)' --> Start with from, followed by a space, any number of characters up to '@' and
then begin extracting '(' all non-blank characters '[^ ]*' and then end extracting ')'
Note: only extracted part i.e. part between paranthesis '()' will be stored in the list