Linux

Python Regular Expression

A regular expression is a special text string for describing a search pattern The Python re module provides regular expression support.
We will discuss regular expression and searching functions one by one.
First we discuss match() function. Syntax
re.match(pattern, string, flags=0)
pattern = The regular expression to be matched.
string = Given string in which pattern is matched.
flag = You can specify different flags using bitwise OR (|). These flag we will discuss later.
This match() function matches the pattern in the beginning of a given string. If match is found it returns pattern, if match is not found it returns None .
Let us discuss first example.
import re
line = "Every thing is planned"
patten ='Every'
m =re.match(patten,line)
print m
print m.group()
Out-put
Python Regular expression match() function
match Python regular expression
The variable m just prints memory address of pattern and m.group() prints the pattern.

search()

Syntax
re.search(pattern, string, flags=0)
The function search() finds the first occurrence of pattern in string. If pattern is found it returns pattern, None otherwise.
match.start() find the starting point of matched pattern.
match.end() find the end point of matched pattern.
import re
pattern = "this"
text = " in this world nothing is permanent"
match = re.search(pattern,text)
s = match.start()
e = match.end()
print "Found", match.re.pattern ,"in", text, "from", s, "to", e
Out-put with code
search Python regular expression using search()
search Python regular expression using search()
The search function checks for a match anywhere in the string.

sub()

Syntax
re.sub(pattern, replace, string, max=0)
The method sub() replaces the occurrence of RE pattern in string with replace sub-string.
import re
date = "2014-06-34"
newdate = re.sub('-',"",date)
print "New date is ",newdate

newdate = re.sub('-',"",date,1)
print "New date is ",newdate
Out-put with code
Search and Replace by using Python sub()
showing Search and Replace
The above example shows that, first newdate, all occurrence of - are replaced by " " and in second example only first occurrence of - is replaced. If max =0 all then all occurrence are replaced.

findall()

Syntax
findall(pattern, string)
If you want to find all occurrence of pattern RE in string, use findall() function.
Let us discuss the example.
import re
pattern = "this"
text = " in this world nothing is permanent this is last number this"
for match in re.findall(pattern,text):
   print "found", match

for match in re.finditer(pattern,text):
   s = match.start()
   e = match.end()
   print "Found", match.re.pattern, s, "from", e
Out-put
Python findall() finds the all occurrences
findall() finds the all occurrences
finditer() returns an iterator that produces match instances


Python special characters

In Python there are special characters, which helps to form regular expression.

. (dot)

In the default mode, this matches any character except a newline. If the DOTALL flag has been specified, this matches any character including a newline. Let us discuss in example.
Python .dot operator
Showing dot operator
Click to view code

^ (Caret)

Matches the start of the string, and in MULTILINE mode also matches immediately after each newline as shown in next figure.

$

Matches the end of the string or just before the newline at the end of the string as shown in below figure.
Python regular expression ^ and $ operator
Showing ^ and $ operator
Click to view code
you can see that ^ searched from starting and $ searched from end of the string.

* operator

The * operator is used in RE to match 0 or more repetitions of the preceding RE, as many repetitions as are possible. ab* will match a , ab, or a followed by any number of bs.

+ operator

The + operator is used in RE to match 1 or more repetitions of the preceding RE. ab+ will match a followed by any non-zero number of as; it will not match just a .
Let us discuss in the example
Python regular expression ab* and ab+  operator
showing ab* and ab+
Click to view code
If you see figure in ab+ 24 and 26 is not printed due to single a

?

The ? is used in RE to match 0 or 1 repetitions of the preceding RE. ab? will match either a or ab. The *, + , and ? qualifiers are all greedy; they match as much text as possible. Sometimes this behaviour isn't desired; if the RE ab* is matched against abbbb, it will match the entire string, and not just ab. Adding ? after the qualifier makes it perform the match in non-greedy or minimal fashion; as few characters as possible will be matched. Using *, ? in the previous expression will match only ab . Let us discuss with example
Python regular expression comparison or  * and ? operator
Python regular expression comparison or * and ? operator
Click to view code
By seeing above picture you can clearly identify the meaning of ? opreator.


{m}

Specifies that exactly m copies of the previous RE should be matched; fewer matches cause the entire RE not to match. For example, a{3} will match exactly three a characters, but not four.

{m,n}

Causes the resulting RE to match from m to n repetitions of the preceding RE, attempting to match as many repetitions as possible. For example, a{4,6} will match from 4 to 6 a characters. The below example clear the picture
Python regular expression {m} operators
Showing {m} operators
Click to view code
{m,n}? Causes the resulting RE to match from m to n repetitions of the preceding RE, attempting to match as few repetitions as possible. This is the non-greedy version of the previous qualifier. For example, on the 6-character string aaaaaa, a{3,5} will match 5 a characters, while a{3,5}? will only match 3 characters.


[]

Used to indicate a set of characters. In a set:
  1. Characters can be listed individually, e.g. [op] will match o, or p.
  2. Ranges of characters can be indicated by giving two characters and separating them by a -, for example [a-z] will match any lowercase ASCII letter, [0-5][0-9] will match all the two-digits numbers from 00 to 59, and [0-9A-Fa-f] will match any hexadecimal digit. If - is escaped (e.g. [a\-z]) or if it's placed as the first or last character (e.g. [a-]), it will match a literal -.
  3. Characters that are not within a range can be matched by complementing the set. If the first character of the set is ^, all the characters that are not in the set will be matched. For example, [^5] will match any character except '5', and [^^] will match any character except ^. ^ has no special meaning if it's not the first character in the set.
  4. To match a literal ] inside a set, precede it with a backslash, or place it at the beginning of the set. For example, both [()[\]{}] and []()[{}] will both match a parenthesis.
Let us see the small example
Python regular expression [] operator
[] operator
Click to view code

Further reading: if you want to explore more in regular expression check the following links.





admin