Recent Changes - Search:

Oktatás

* Programozás 1
  + feladatsor
  + GitHub oldal

* Szkriptnyelvek
  + feladatsor
  + quick link

Teaching

Programming 1 (BI)
  ▸ exercises
  ▸ quick link

teaching assets


Félévek

* 2025/26/2
* archívum


Linkek

* kalendárium
* tételsorok
* jegyzetek
* szakdolgozat / PhD
* ösztöndíjak
* certificates
* C lang.
* C#
* D lang.
* Java
* Nim
* Nim2
  + exercises
* XC=BASIC
* old
  ✦C++, ✦Clojure, ✦Scala


[ edit | logout ]
[ sandbox | passwd ]

MD5 hash

The MD5 (Message Digest 5) algorithm produces a 128-bit (16-byte) hash value, typically expressed as a 32-digit hexadecimal number. It was designed in 1991 and is commonly used to verify data integrity by generating a unique "fingerprint" for files or strings of text. That is, it is often used as a checksum to verify data integrity against unintentional corruption.

However, MD5 is now considered cryptographically broken and insecure. It is no longer suitable for security-sensitive applications such as password storage or digital signatures, and modern systems should use more secure alternatives like SHA-256 or SHA-3 for cryptographic purposes.


Here we just want to generate a fingerprint for a file or a string. For this purpose, MD5 is fine.

See src/checksums/md5 .

nimble install checksums
import checksums/md5


echo getMD5("")     # d41d8cd98f00b204e9800998ecf8427e

echo getMD5("nim")  # 51aaf9dbcf1c573b12b329a5668ec05a

let fname = "tree.jpg"
echo getMD5(readFile(fname))  # md5 hash of the file
  • readFile() returns the content of a file as a string. However, you can use it with a binary file too, since a string is a series of byte.
  • Since readFile() reads the whole content to the memory, don't use this method with huge files.

MD5 hash of a huge file

If you have a huge file then it's a better idea to read it by chunks.

import checksums/md5


proc md5File(path: string, chunkSize = 1024 * 1024): string =
  ## Computes the MD5 hash of a file by reading it in chunks.
  ## Default chunk size is 1 MB.
  var ctx: MD5Context
  md5Init(ctx)

  var f = open(path, fmRead)
  defer: f.close()

  var buf = newSeq[uint8](chunkSize)
  while true:
    let bytesRead = f.readBytes(buf, 0, chunkSize)
    if bytesRead == 0:
      break
    md5Update(ctx, buf.toOpenArray(0, bytesRead - 1))

  var digest: MD5Digest
  md5Final(ctx, digest)
  result = $digest

# ----------

let fname = "ubuntu.iso"

echo md5File(fname)  # 725e0a5bf98d2b5c9c0f13d8c38cae79

Some speed comparisons:

# Linux command:
$ time md5sum ubuntu.iso
# 3.44 sec

# Nim, DEBUG mode:
$ nim c bigfile.nim
$ time ./bigfile ubuntu.iso
# 94.61 sec

# Nim, release mode:
$ nim c -d:release bigfile.nim
$ time ./bigfile ubuntu.iso
# 5.67 sec

# Nim, speed mode with GCC:
$ nim c -d:release --opt:speed bigfile.nim
$ time ./bigfile ubuntu.iso
# 5.69 sec

# Nim, speed mode with Clang:
$ nim c --cc:clang -d:release --opt:speed bigfile.nim
$ time ./bigfile ubuntu.iso
# 4.75 sec

The size of ubuntu.iso was 2.9 GB.

These are hot runs, i.e. the file ubuntu.iso was already processed before these tests.

Notice how slow the DEBUG BUILD is! Lesson learned: when you use a Nim program in production, always compile it in release mode!

Summary: the Linux command md5sum is the fastest, but Nim in speed mode with Clang is not bad either.


I also tested what happens if I read the whole content with readFile(). In release mode, it took 7.00 sec, while in speed mode it took 6.14 sec. So it's slower and consumes a lot of memory in addition.

In Python

import hashlib

def md5_file(path: str, chunk_size: int = 1024 * 1024) -> str:
    h = hashlib.md5()
    with open(path, "rb") as f:
        while chunk := f.read(chunk_size):
            h.update(chunk)
    return h.hexdigest()

print(md5_file("ubuntu.iso"))  # 725e0a5bf98d2b5c9c0f13d8c38cae79

Execution time: 3.70 sec. Surprisingly fast! It's just as fast as the Linux command md5sum.

I asked Claude AI why Python is so fast in this case. Got the following response:

"CPython's hashlib calls OpenSSL under the hood. hashlib.md5() doesn't use a pure-Python implementation — it delegates to the system's OpenSSL (or a bundled one), which is:

  • Hand-optimized C, often with SIMD intrinsics (SSE2, AVX2, etc.)
  • Likely using hardware acceleration if available

Nim's checksums/md5 is a pure Nim implementation with no such platform-specific optimization." (Claude AI)

That makes sense. A pure-Python implementation would be very slow.

See also https://docs.python.org/3/library/hashlib.html. OpenSSL is mentioned several times. It's very likely that your Python is compiled with OpenSSL. Here is the Nim implementation: link. It's just pure Nim code.

Cloud City

  

Blogjaim, hobbi projektjeim

* The Ubuntu Incident
* Python Adventures
* @GitHub
* heroku
* extra
* haladó Python
* YouTube listák


Debrecen | la France


[ edit ]

Edit - History - Print *** Report - Recent Changes - Search
Page last modified on 2026 May 21, 23:39