The checkmol/matchmol Homepage
Which input formats are supported by
How can checkmol/matchmol be used?
How to obtain checkmol/matchmol?
What are the requirements of
Compiling and installing
Usage (command-line options):
Windows DLL version
Linux server version
utility program which reads molecular structure files in different
formats (see below) and analyzes the input molecule for the presence of
various functional groups and structural elements. At present, approx.
200 different functional groups are recognized. Output can be either
clear text (English or German), a bitstring or its ASCII
representation, or a set of special 8-character codes.
This output can be easily placed into a database table, permitting the
creation of chemical databases with a functional group search option.
Here is a complete list of
Another output option of checkmol is a set of statistical values
derived from a given molecule, which can also be used for quick
retrieval from a database. These values include: the number of atoms,
bonds, and rings, the number of differently hybridized carbon, oxgen,
and nitrogen atoms, the number of C=O double bonds, the number of rings
of different sizes, the number of rings containing nitrogen, oxygen,
sulfur, the number of aromatic rings, the number of heterocyclic rings,
etc. The combination of all of these values for a given molecule
represents some kind of "fingerprint" which is useful for rapid
pre-selection in a database structure/substructure search prior to a
full atom-by-atom match (see below). For a fully functional set of PHP
scripts implementing such a web database (plus utility scripts for
data import), please visit the MolDB5
capabilities of checkmol. It compares two (or more) molecular
structures and determines whether one of them is a substructure of the
other one. This is done by a full atom-by-atom comparison of the input
structures. Thus, matchmol can be used as a back-end program for
structure/substructure search operations in chemical databases (see
More detailed information is available in this publication:
N., Functionality Pattern Matching as an Efficient Complementary
Structure/Reaction Search Tool: an Open-Source Approach. Molecules, 15, 5079-5092
Which input formats
are supported by
As input files, MDL molfiles (*.mol; 2D and 3D), Alchemy molfiles
(*.mol), and Sybyl mol2-files (*.mol2) are currently understood by
checkmol/matchmol, the preferred format is the MDL molfile format. The
matchmol utility can also process MDL SD-files which can contain
multiple molecular structures. At present, it is not intended to extend
the number of supported input file formats, as there are powerful file
format converters available, such as OpenBabel.
description of the MDL file formats (molfile, SD-file) is available here.
checkmol/matchmol be used?
The main purpose of checkmol/matchmol is to permit the creation of
fully searchable, web-based molecular structure databases entirely with
free software. For example, a typical LAMP system (Linux, Apache,
MySQL, PHP) can be easily extended with checkmol/matchmol into a
database with structure/substructure search options. A detailed
description of how this can be done is given here.
Another application is batch-mode processing of data files containing
multiple structures, in our case MDL SD files. For instance, one can do
a substructure search e.g. for uracil-containing molecules in a large
SD file like the Maybridge screening collection and write the matching
molecules into another SD file. This can be achieved with the following
matchmol -m uracil.mol maybridge-complete.sdf
-m option causes output of hits in MDL
(including any additional fields of the input SD file),
contains the query structure (the "needle") and
is the database file (the "haystack"). Since version 0.2g of
checkmol/matchmol, there is no size limit for the "haystack" file.
How to obtain
The two programs are in fact only one program which is invoked by two
different names, i.e. there is only one source code. The utility is
freely available under the terms of the GNU General Public License
(GPL), for a detailed description of this license, please visit http://www.gnu.org/copyleft/gpl.html.
please visit the download directory at http://merian.pch.univie.ac.at/pch/download/chemistry/checkmol/,
it contains the source code (checkmol.pas
is a symbolic link to the latest source file) as well as pre-compiled
binaries for various platforms (Windows, Mac OS X) in the "bin
" subdirectory; there is also a socket-based server version for Un*x-like systems (cmmmsrv) in the "server
for a brief description of version history, please check the source code
What are the
The software is available both as source code and as a binary compiled
for Linux (x86 architecture). It is entirely written in Pascal and it
was compiled with Free Pascal 1.0.11 or Free
Pascal 2.4.0 (starting from v0.4c). The Free Pascal compiler
freely available under the GPL, and there are versions for a variety of
operating systems and computer architectures. For more information
about Free Pascal, please visit the project homepage at http://www.freepascal.org.
binary executable of checkmol/matchmol was built on a SuSE 10.1
or on a Ubuntu 10.04 system, but it should run on any other x86 Linux
there are no special libraries required. Supported platforms include
also MS Windows (NT, 2000, XP).
Compile with fpc (Free Pascal, see above), using the -Sd or -S2 option
mode; this is IMPORTANT!)
Example for compilation and installation:
fpc checkmol.pas -S2 -O3 -Op3
you are running
MacOS X, use the
as described on the Macs
website (i.e., do not
compiler optimisation flags)
This will give a file "checkmol.o" and a file "checkmol";
"root" user, do the following:
cp checkmol /usr/local/bin
(or any other
directory in your path)
a symbolic link does not
ln checkmol matchmol
Note that checkmol and matchmol are the same executable, but the
program behaves differently depending on the name it was invoked with.
Of course, you can also copy
"checkmol" to "matchmol" (instead of making a link), but then it
takes twice as
much disk space (under Windows, this is the only possibility, as there
are no hard links available under this "OS").
can be invoked with the following arguments
checkmol [options] <filename>
where [options] can be:
-l print a list of fingerprint
codes + explanation and exit
-v verbose output
-r force SSR (set of small
rings) ring search
metal atoms as ring members
and one of the following:
-e english text
(common name of
functional group; default)
-d german text
(common name of
(acronym-like code for
-b bitstring (in decimal format) representing the
presence of each group
-s (the ASCII
representation of the
bitstring, i.e. 0s and 1s)
-p lists the position of each functional group (atom number of key atom)
-x print molecular
statistics (number of various atom types, bond types, ring sizes, etc.
-X same as above,
listing all records
zero) as comma-separated list
count charges in fingerprint
can be combined (like -vc); <filename> specifies any file
the formats supported (MDL *.mol, Alchemy *.mol, Sybyl *.mol2), the
filename "-" (without quotes) specifies standard input
-m write MDL molfile (with
for aromatic atoms/bonds)
-h hashed fingerprint mode
with boolean output
-H hashed fingerprint mode
with decimal output
be invoked with
the following arguments
matchmol [options] <needle>
where <needle> and
<haystack> are the two
(supported formats: MDL *.mol, Alchemy *.mol,
options can be:
-v verbose output
-x exact match
comparison of atom and bond
-r force SSR (set
of small rings) ring
-m write matching molecule as
MDL molfile to standard output
-M accept metal
atoms as ring members
-n additional output of atom
numbers for matching atom pairs
-N like -n, but only for the
first matching substructure found
-g check geometry of double
-G check geometry of chiral
-a check charges strictly
-i check isotopes strictly
-d check radicals strictly
-f fingerprint mode (1
needles) with boolean output
-F fingerprint mode (1
needles) with decimal output
Default output: record number + ":T" for hit or
miss, i.e., if the haystack contains only one molecule, then
result will be "1:T" or "1:F". The "haystack" can also be a MDL SD-file
(containing multiple molecules); if invoked with "-" as file argument,
both "needle" and "haystack" are read as only one SD-file from standard
input, assuming the first entry in the SDF to be the "needle"; the
output is: entry number + ":F" (false) or ":T" (true)
At present, only smaller molecules are handled adequately, i.e. for
each molecule the maximum number of atoms is 1024, the maximum number
of bonds is 1024, the maximum ring size is 128 (i.e., rings larger than
128 members are treated as open-chain compounds), and the maximum
number of rings is 1024. Checkmol/matchmol collects the "set of all
rings" (SAR) instead of e.g. the "smallest set of smallest rings"
Aromaticity is determined by application of the Hückel rule
pi electrons) without any geometry checks, but with adequate treatment
of tautomeric/mesomeric structures where possible. For example,
1-methyl-2(1H)-pyridone is correctly recognized as aromatic, as well as
cyclopentadienyl anion, tropylium cation, fulvene, tropone, etc.
New in version 0.2: if a molecule contains more than 1024 rings, a
fallback mechanism changes the ring search mode from SAR to SSR (set of
small rings, which is defined as follows: ringsize <= 12 atoms,
ring is completely contained in another one). For additional
information, please check the version history description in the source
Starting with versions 0.3d and 0.3f, matchmol supports stereospecific search operations,
either globally or on a per-atom or per-bond basis. Geometric isomers
of the E/Z type (aka cis/trans
isomers) are recognized as well as isomers with chiral centers (R/S
isomers). The latter type of isomer discrimination works with 3D
molfiles (using the XYZ coordinates) and with 2D molfiles (using "up"
and "down" bond notation) in any combination.
Starting with version 0.4, checkmol supports the generation of hash-based fingerprints
for efficient pre-selection in structure databases. The default values
are as follows: only linear fragments, minimum fragment length: 3
atoms, maximum fragment length: 8 atoms, 2 bits per fragment, total
bitstring length: 512 bits.
Starting with version 0.5, checkmol has an option (-p) to display all
occurrences of all detected functional groups in a molecule by listing
the corresponding "key atoms" (for a graphical representation of all
functional groups with their key atoms, see the document fgtable.pdf).
Windows DLL version
Although the program can be smoothly compiled with Free Pascal on the
Win32 platform as a console application, its encapsulation in a Windows
Dynamic Link Library (DLL) would have specific advantages, such as
seamless integration into database applications like MS Access (using
VBA as the link). Alessandro
from PROCOS had
the idea for this
DLL version and he also realized its implementation. Cited from
Alessandro's code header:
I needed substructure matching capability.
I needed a dll for using with visual basic or VBA
I needed to pass the mol file as string (from a memo field
and not as a molfile on the disk)
so... I've modified the original matchmol
A more detailed description of the features of this DLL and how to use
it are given in the header of the source code (see download link
below). Alternatively, you can use the Barsoi DLL, a library based on a
C port of checkmol/matchmol which has been developed as a part of the
pgchem::tigress project by Ernst-Georg Schmid (see below).
Linux server version
server program providing checkmol/matchmol functionality has been
developed as a replacement for the checkmol/matchmol command-line
program in web-based molecular structure databases and related
applications. Communication of any frontend program (e.g., a PHP
script) with cmmmsrv takes place via sockets instead of shell calls,
thus saving a significant amount of time.
source code: cmmmsrv.pas
compiled Linux (i586) binary: cmmmsrv.gz
examples for using cmmmsrv can be found in the MolDB5R
package (e.g., in the script incss.php)
Pgchem::tigress is a cheminformatics extension to the PostgreSQL DBMS.
It enables PostgreSQL to store, retrieve and search molecules by pure
SQL statements. It uses checkmol/matchmol and OpenBabel and optionally
Barsoi, which is based on a C port of checkmol/matchmol, can also be
used as a dynamically linked library to provide checkmol/matchmol
functionality to other programs, or to build checkmol/matchmol on
platforms where no Pascal, but an ANSI-C compiler is available.
Schmid (Bayer Business Services GmbH, Leverkusen, Germany).
Chemtool is a small
drawing chemical structures on Linux and Unix
systems using the GTK toolkit under X11. Starting with developer
version 1.7, it
adds the beginning of database support with (sub)structure searches in
or MySQL databases using the checkmol/matchmol program. Developer:
Martin Kroeker (University of Freiburg, Germany).
Molecule Interaction Database (SMID)
SMID is an expanding database of small molecule - domain
interactions determined from MMDB records. All information is stored in
SMID database records that are freely available through a web
interface. Among other classification criteria, a newly designed
chemical ontology organises compounds by their functional groups which
are automatically assigned by the checkmol program.
A web-based version of Wolfgang Robien's CSEARCH NMR spectral database
and prediction system uses checkmol/matchmol as the engine of its
structure/substructure/functional group search facility (approx.
- open enventory
A web-based integrated lab journal and chemicals inventory, developed
at the Technical University of Kaiserslautern/Germany (contact. F.
Rudolphi). This open-source package makes use of matchmol technology
for substructure searching.
Checkmol/matchmol was written by Norbert
Department of Pharmaceutical Chemistry (now: Department of Drug
and Natural Product Synthesis), University of Vienna, Austria.
You can contact me by e-mail: firstname.lastname@example.org
(no spam, no viruses, no HTML mails, please).
Haider, 2003-12-01; last update: 2013-05-24