Recent Changes - Search:

Oktatás

* Programozás 2
  + feladatsor
  + C feladatsor
  + Python feladatsor
  + GitHub oldal

* Szkriptnyelvek
  + feladatsor
  + quick link

* Adator. prog.
  + feladatsor
  + quick link

Teaching

* Prog. for Data Sci.
  ◇ exercises
  ◇ quick link

teaching assets


Félévek

* 2025/26/1
* archívum


Linkek

* kalendárium
   - munkaszüneti napok '20
* tételsorok
* jegyzetek
* szakdolgozat / PhD
* ösztöndíjak
* certificates
* C lang.
* C++
* C#
* Clojure
* D lang.
* Java
* Nim
* Scala


[ edit | logout ]
[ sandbox | passwd ]

Regular expressions

(1) get a group from a string
import re

text = "Asian.lst"
result = re.search(r"(.*)\.lst", text)
if result:
    filename = result.group(1)
    print("filename:", filename)
import std.stdio;
import std.regex;

void main()
{
    string text = "Asian.lst";
    auto result = matchFirst(text, regex(r"(.*)\.lst"));

    if (result)
    {
        string filename = result.captures[1]; // capture group 1
        // string filename = result[1]; // shorter version
        writeln("filename: ", filename);
    }
}

Output:

filename: Asian
(2) match a string against a regexp
import re

text = "Asian.lst"
result = re.search(r"ian", text)
if result:
    print("contains 'ian'")
import std.stdio;
import std.regex;

void main()
{
    string text = "Asian.lst";
    auto result = matchFirst(text, regex(r"ian"));

    if (result)
    {
        writeln("contains 'ian'");
    }
}

Output:

contains 'ian'
(3) find all the occurences of a substring in a string
import re

text = """
<a href="ad1">sdqs</a>
<a href="ad2">sds</a>
<a href=ad3>qs</a>
"""


m = re.findall(r'href="?(.*?)"?>', text)
print(m)  # ['ad1', 'ad2', 'ad3']
import std.stdio;
import std.regex;
import std.algorithm;
import std.array;

const text = `
<a href="ad1">sdqs</a>
<a href="ad2">sds</a>
<a href=ad3>qs</a>
`;

string[] findAll(const string re, const string text)
{
    return text.matchAll(regex(re)).map!(m => m[1]).array;
}

void main()
{
    auto li = findAll(`href="?(.*?)"?>`, text);
    writeln(li); // ["ad1", "ad2", "ad3"]
}

Problem with backreferences

The std.regex package in the stdlib is not perfect. Backreferences don't work correctly :( I ran into a problem that exists since 2015…

On Discord, Paul Backus summarized it as follows: "It's specifically the combination of (1) a backreference, (2) with a .* in front of it, (3) with "extra" characters at the start of the string and between the two parts that are supposed to match."

Links:

Example with a workaround:

import std.regex;
import std.stdio;

void main()
{
    string text = "baacaa";
    // auto result = matchFirst(text, regex(r"(..).*\1"));  // buggy
    auto result = matchFirst(text, regex(r"(..).{0,999999}\1")); // workaround

    if (result) {
        writeln(text);
    }
}

The workaround was posted by pbackus (thanks!). His comments: "As a workaround, you can replace .* with {0,N} for some large value of N".

Cloud City

  

Blogjaim, hobbi projektjeim

* The Ubuntu Incident
* Python Adventures
* @GitHub
* heroku
* extra
* haladó Python
* YouTube listák


Debrecen | la France


[ edit ]

Edit - History - Print *** Report - Recent Changes - Search
Page last modified on 2025 November 23, 19:27