Introduction to SMARTS
Standing on the shoulders of giants, SMARTS was developed by Daylight Chemical Information Systems, the same company that introduced SMILES. The documentation here is heavily inspired by the original Daylight SMARTS theory. In this documentation we are following the implementation by RDKit, which also includes syntax extensions by ChemAxon. Extensions like Hybridization, Heteroatom Neighbor, Range Queries and Dative Bonds are RDKit specific.
What is SMARTS?
SMARTS (SMiles ARbitrary Target Specification) is a language for describing molecular patterns and properties. It extends the SMILES notation to allow expressive queries over chemical structures, making it possible to search, filter, and classify molecules based on substructure patterns.
In the SMILES language we have atoms and bonds. The same is true in SMARTS, which is further extended with property filters and logical operators.
Atom Primitives
The simplest SMARTS patterns match individual atoms. Atoms are specified inside square brackets and can carry multiple constraints joined by logical operators.
Basic Atom Symbols
[C]- any aliphatic carbon atom[c]- any aromatic carbon atom[#6]- carbon by atomic number (aliphatic or aromatic)[*]- any atom (wildcard)
Atom Properties
Atom primitives can encode aromaticity, charge, hydrogen count, degree, valence, ring
membership, and more. All primitives can be combined inside [...] using logical operators.
| Primitive | Meaning | Default | Example |
|---|---|---|---|
a | aromatic atom | - | [a] |
A | aliphatic atom | - | [A] |
H<n> | total hydrogen count (implicit + explicit) | exactly 1 | [CH3] |
h<n> | implicit hydrogen count | at least 1 | [Ch2] |
D<n> | explicit degree, not counting implicit H | exactly 1 | [D3] |
d<n> | non-hydrogen degree | exactly 1 | [d2] |
X<n> | total connectivity (including implicit H) | exactly 1 | [X4] |
v<n> | total valence (sum of bond orders) | exactly 1 | [v4] |
R<n> | number of SSSR (Smallest Set of Smallest Rings) rings atom is in | any ring atom | [R2] |
r<n> | size of smallest SSSR ring | any ring atom | [r5] |
x<n> | number of ring bonds | at least 1 | [x2] |
+<n> | positive formal charge | +1 | [N+], [+2] |
-<n> | negative formal charge | -1 | [O-], [O-2] |
#<n> | atomic number | - | [#6], [#7] |
<n> | atomic mass (isotope) | unspecified | [13C], [35Cl] |
Logical Operators
Atom and bond primitives can be combined using logical operators to build complex queries:
&- high-precedence AND (implicit between primitives),- OR;- low-precedence AND!- NOT
Bond Primitives
Bonds between atoms can also be constrained. An unspecified bond in a SMARTS pattern matches either a single or aromatic bond.
-- single bond=- double bond#- triple bond:- aromatic bond~- any bond (wildcard)@- any ring bond/- directional bond "up" (for E/Z stereo)\- directional bond "down" (for E/Z stereo)/?- directional "up" or unspecified\?- directional "down" or unspecified
Chirality
Tetrahedral chirality can be specified using @ (anticlockwise) and @@ (clockwise), looking from first neighbour, following the same convention as SMILES. When included
in a SMARTS pattern, chirality is used as a matching constraint - unspecified chirality in the query
matches both enantiomers.
| Primitive | Meaning | Example |
|---|---|---|
@ | anticlockwise (looking from first neighbour) | [C@H] |
@@ | clockwise (looking from first neighbour) | [C@@H] |
Note, the @? and @@? operators, although part of Daylight, are not
supported in RDKit. Also, non-tetrahedral chiral classes are not supported. Read more chirality.
Huh? That seems wrong? Yes, because currently the JavaScript port of RDKit does not allow for
chiral search. I have TODO.
Recursive SMARTS
Any SMARTS expression can be used to define an atomic environment by anchoring it on the atom of
interest using the $(...) syntax. These expressions behave like atomic primitives and
can be combined with other primitives using logical operators.
[$(*C)]- atom connected to a methyl (or methylene) carbon[$(*[CH3]);$(*C[CH3])]- atom connected to both methyl and ethyl sidegroups
Component-level Grouping
A dot (.) in a SMARTS pattern separates disconnected fragments. Each fragment can
match anywhere in the target - there is no constraint on which component it belongs to.
C.Ocarbon and oxygen found in the SMILES. Will match atoms inCCOandCC.OOC.Odoes not matchCCC, because there is no oxygen present
(C).(C) to require matches in separate components), but this is not supported
in RDKit. In RDKit, parentheses are only used for branching, and (C).(C) is a parse
error. Additionally, . does not enforce matching across different disconnected
fragments, so C.C can match within a single molecule. To correctly handle
fragment-level constraints, split the molecule first (e.g. with Chem.GetMolFrags) and match each fragment separately or post-filter the results. Hybridization Queries
Atoms can be matched by hybridization state using the ^ primitive followed by a number.
^0- S^1- SP^2- SP2^3- SP3^4- SP3D^5- SP3D2
Heteroatom Neighbor Queries
Two primitives match atoms based on the number of heteroatom neighbors (non-C, non-H) they have:
z<n>- exactly n heteroatom neighbors (aromatic or aliphatic)Z<n>- exactly n aliphatic heteroatom neighbors
Range Queries
Many numeric primitives accept a range in curly braces instead of a fixed value. Supported
primitives: D, h, r, R, v, x, X, z, Z, +, -.
D{2-4}- between 2 and 4 explicit connections (inclusive)D{-3}- at most 3 explicit connectionsD{2-}- at least 2 explicit connections
Dative Bonds
Dative bonds <- and ->, are covalent bonds in which both
electrons in the shared pair come from the same atom, so the bond is directional.
->- dative bond pointing right (donor → acceptor)<-- dative bond pointing left (acceptor ← donor)
Above bonds will not match the same atoms. In the example below, the nitrogen in trimethylamine
donates a dative bond to platinum. [#7]->* matches the nitrogen as donor, while *<-[#7] matches the
platinum as acceptor. With SMILES [Fe]->CC1=O.CN(C1)(C)->[Pt].
Efficiency Tips
Tips for writing efficient SMARTS (patterns are evaluated left to right):
- Place uncommon atoms or bond arrangements early in the pattern.
- In an AND expression, put the less common specification first.
- In an OR expression, put the less common specification last.