HomePage of Laszlo SZATHMARY | Python / Perl to Python regular expressions tutorial

#############################

Once I made a Perl to Java regular expressions tutorial. This is similar to that but customized for Python.

I'll give some examples in Perl on the left side and their Python equivalents on the right side.

See also:

(1) get a group from a string

$text = "Asian.lst";
if ($text =~ m#(.*)\.lst#)
{
   $filename = $1;
}
# $filename is "Asian" now

import re

text = 'Asian.lst'
result = re.search(r'(.*)\.lst', text)
if result:
    filename = result.group(1)
    print 'filename:', filename

(2) match a string against a regexp
$text = "Asian.lst"; if ($text =~ m#ian#) { print "contains 'ian'\n"; }	import re text = 'Asian.lst' result = re.search(r'ian', text) if result: print "contains 'ian'"

(3) read a file line by line and match a regexp against each line

open (F1, "<input.txt") || die();
while (<F1>)
{
   chomp;
   if (m#dog#i) {
      print $_, "\n";
   }
}
close F1;

import re

f1 = open('input.txt',  'r')
p = re.compile(r'dog', re.IGNORECASE)

for line in f1:
    line = line.rstrip()   # like Perl's chomp()
    if p.search(line):
        print line

f1.close()

input.txt:

yo
dogdog
a dog is here
a cat is here
Snoop Doggy Dog
pussycat

output:

dogdog
a dog is here
Snoop Doggy Dog

(4) replace the first, then all the occurences of a substring in a string

my $text = "a dog and a dog";

$text =~ s#dog#cat#;
# "a cat and a dog"

$text =~ s#dog#cat#g;
# "a cat and a cat"

import re

text = 'a dog and a dog'

text = re.sub('dog', 'cat', text, 1)
# 'a cat and a dog'

text = re.sub('dog', 'cat', text)
# 'a cat and a cat'

(5) find all the occurences of a substring in a string

my $text = '<a href="ad1">sdqs</a>'
          .'<a href="ad2">sds</a><a href=ad3>qs</a>';

while ($text =~ m#href="?(.*?)"?>#g)
{
   print $1, "\n";
}
# output:
#
# ad1
# ad2
# ad3

import re

text = '<a href="ad1">sdqs</a>' \
     + '<a href="ad2">sds</a><a href=ad3>qs</a>'

# solution 1
for m in re.finditer(r'href="?(.*?)"?>', text):
    print m.group(1)

# or, solution 2:
m = re.findall(r'href="?(.*?)"?>', text)
print m   # ['ad1', 'ad2', 'ad3']
# now the result is stored in a list

Verbose Regular Expressions

A speciality of Python is the so-called verbose regular expressions. Let's see the first example again:

import re

text = 'Asian.lst'
pattern = """
    ^               # beginning of string
    (.*)            # anything
    \.              # a dot
    lst             # the string 'lst'`
    $               # end of string
"""
result = re.search(pattern, text, re.VERBOSE)   # notice the usage of the re.VERBOSE constant
if result:
    filename = result.group(1)
    print 'filename:', filename   # filename: Asian

In this case whitespace characters (as well as comments) are completely ignored in the pattern.