HOME - Recent Changes - Search:

Academic Work


Personal

* pot de départ


dblp


(:twitter:)

-----

[ edit | logout ]
[ help | sandbox | passwd ]

Perl to Python regular expressions tutorial

#############################

Once I made a Perl to Java regular expressions tutorial. This is similar to that but customized for Python.

I'll give some examples in Perl on the left side and their Python equivalents on the right side.

See also:

(1) get a group from a string
$text = "Asian.lst";
if ($text =~ m#(.*)\.lst#)
{
   $filename = $1;
}
# $filename is "Asian" now
import re

text = 'Asian.lst'
result = re.search(r'(.*)\.lst', text)
if result:
    filename = result.group(1)
    print 'filename:', filename
(2) match a string against a regexp
$text = "Asian.lst";
if ($text =~ m#ian#)
{
   print "contains 'ian'\n";
}
import re

text = 'Asian.lst'
result = re.search(r'ian', text)
if result:
    print "contains 'ian'"
(3) read a file line by line and match a regexp against each line
open (F1, "<input.txt") || die();
while (<F1>)
{
   chomp;
   if (m#dog#i) {
      print $_, "\n";
   }
}
close F1;
import re

f1 = open('input.txt',  'r')
p = re.compile(r'dog', re.IGNORECASE)

for line in f1:
    line = line.rstrip()   # like Perl's chomp()
    if p.search(line):
        print line

f1.close()
input.txt:
yo
dogdog
a dog is here
a cat is here
Snoop Doggy Dog
pussycat
output:
dogdog
a dog is here
Snoop Doggy Dog
(4) replace the first, then all the occurences of a substring in a string
my $text = "a dog and a dog";

$text =~ s#dog#cat#;
# "a cat and a dog"

$text =~ s#dog#cat#g;
# "a cat and a cat"
import re

text = 'a dog and a dog'

text = re.sub('dog', 'cat', text, 1)
# 'a cat and a dog'

text = re.sub('dog', 'cat', text)
# 'a cat and a cat'
(5) find all the occurences of a substring in a string
my $text = '<a href="ad1">sdqs</a>'
          .'<a href="ad2">sds</a><a href=ad3>qs</a>';

while ($text =~ m#href="?(.*?)"?>#g)
{
   print $1, "\n";
}
# output:
#
# ad1
# ad2
# ad3
import re

text = '<a href="ad1">sdqs</a>' \
     + '<a href="ad2">sds</a><a href=ad3>qs</a>'

# solution 1
for m in re.finditer(r'href="?(.*?)"?>', text):
    print m.group(1)

# or, solution 2:
m = re.findall(r'href="?(.*?)"?>', text)
print m   # ['ad1', 'ad2', 'ad3']
# now the result is stored in a list

Verbose Regular Expressions

A speciality of Python is the so-called verbose regular expressions. Let's see the first example again:

import re

text = 'Asian.lst'
pattern = """
    ^               # beginning of string
    (.*)            # anything
    \.              # a dot
    lst             # the string 'lst'`
    $               # end of string
"""
result = re.search(pattern, text, re.VERBOSE)   # notice the usage of the re.VERBOSE constant
if result:
    filename = result.group(1)
    print 'filename:', filename   # filename: Asian

In this case whitespace characters (as well as comments) are completely ignored in the pattern.

Cloud City


anime | bash | blogs | bsd | c/c++ | c64 | calc | comics | convert | cube | del.icio.us | digg | east | eBooks | egeszseg | elite | firefox | flash | fun | games | gimp | google | groovy | hardware | hit&run | howto | java | javascript | knife | lang | latex | liferay | linux | lovecraft | magyar | maths | movies | music | p2p | perl | pdf | photoshop | php | pmwiki | prog | python | radio | recept | rts | scala | scene | sci-fi | scripting | security | shell | space | súlyos | telephone | torrente | translate | ubuntu | vim | wallpapers | webutils | wikis | windows


Blogs and Dev.

* Ubuntu Incident
* Python Adventures
* me @ GitHub


Places

Debrecen | France | Hungary | Montreal | Nancy


Notes

full circle | km


Hobby Projects

* Jabba's Codes
* PmWiki
* Firefox
* PHP
* JavaScript
* Scriptorium
* Tutorials
* me @ GitHub


Quick Links


[ edit ]

View - Edit - History - Attach - Print *** Report - Recent Changes - Search
Page last modified on 2009 November 15, 03:28