Recent Changes - Search:

Oktatás

* Programozás 1
  + feladatsor
  + GitHub oldal

* Szkriptnyelvek
  + feladatsor
  + quick link

Teaching

* Programming 1 (BI)
  ◇ exercises
  ◇ quick link

teaching assets


Félévek

* 2025/26/2
* archívum


Linkek

* kalendárium
* tételsorok
* jegyzetek
* szakdolgozat / PhD
* ösztöndíjak
* certificates
* C lang.
* C#
* D lang.
* Java
* Nim
* Nim2
  + exercises
* XC=BASIC
* old
  ◇C++, ◇Clojure, ◇Scala


[ edit | logout ]
[ sandbox | passwd ]

Extract links from a webpage

This time let's do it with regular expressions.

Core part

import std/re

proc extract_links(html: string): seq[string] =
  let pattern = re"https?://(?:[a-zA-Z0-9\$-_@.&+!*\(\),]|%[0-9a-fA-F]{2})+"
  findAll(html, pattern)

Complete example

import std/strutils    # strip, split, join
import std/httpclient
import std/re

proc get_page(url: string): string =
  let client = newHttpClient()
  try:
    client.getContent(url)
  except HttpRequestError as e:
    stderr.writeLine("Error: ", e.msg)
    ""

proc extract_links(html: string): seq[string] =
  let pattern = re"https?://(?:[a-zA-Z0-9\$-_@.&+!*\(\),]|%[0-9a-fA-F]{2})+"
  findAll(html, pattern)

proc main() =
  let
    url = "https://www.bing.com"
    html = get_page(url)
    urls = extract_links(html)

  for url in urls:
    echo url

  # for url in urls:
    # if ".jpg" in url:
      # echo url.split("&")[0]
      # break

main()

Output:

https://www.bing.com/th?id=OHR.Kofa_ROW0914409827_tmb.jpg&rf=
https://www.bing.com/?form=HPFBBK&ssd=20260507_0700&mkt=en-WW
https://www.bing.com/
https://r.bing.com
...
Cloud City

  

Blogjaim, hobbi projektjeim

* The Ubuntu Incident
* Python Adventures
* @GitHub
* heroku
* extra
* haladó Python
* YouTube listák


Debrecen | la France


[ edit ]

Edit - History - Print *** Report - Recent Changes - Search
Page last modified on 2026 May 07, 21:42