SMARTS 101

Introduction to SMARTS

Standing on the shoulders of giants, SMARTS was developed by Daylight Chemical Information Systems, the same company that introduced SMILES. The documentation here is heavily inspired by the original Daylight SMARTS theory. In this documentation we are following the implementation by RDKit, which also includes syntax extensions by ChemAxon. Extensions like Hybridization, Heteroatom Neighbor, Range Queries and Dative Bonds are RDKit specific.

What is SMARTS?

SMARTS (SMiles ARbitrary Target Specification) is a language for describing molecular patterns and properties. It extends the SMILES notation to allow expressive queries over chemical structures, making it possible to search, filter, and classify molecules based on substructure patterns.

In the SMILES language we have atoms and bonds. The same is true in SMARTS, which is further extended with property filters and logical operators.

Atom Primitives

The simplest SMARTS patterns match individual atoms. Atoms are specified inside square brackets and can carry multiple constraints joined by logical operators.

Basic Atom Symbols

  • [C] - any aliphatic carbon atom
  • [c] - any aromatic carbon atom
  • [#6] - carbon by atomic number (aliphatic or aromatic)
  • [*] - any atom (wildcard)

Atom Properties

Atom primitives can encode aromaticity, charge, hydrogen count, degree, valence, ring membership, and more. All primitives can be combined inside [...] using logical operators.

PrimitiveMeaningDefaultExample
aaromatic atom-[a]
Aaliphatic atom-[A]
H<n>total hydrogen count (implicit + explicit)exactly 1[CH3]
h<n>implicit hydrogen countat least 1[Ch2]
D<n>explicit degree, not counting implicit Hexactly 1[D3]
d<n>non-hydrogen degreeexactly 1[d2]
X<n>total connectivity (including implicit H)exactly 1[X4]
v<n>total valence (sum of bond orders)exactly 1[v4]
R<n>number of SSSR (Smallest Set of Smallest Rings) rings atom is inany ring atom[R2]
r<n>size of smallest SSSR ringany ring atom[r5]
x<n>number of ring bondsat least 1[x2]
+<n>positive formal charge+1[N+], [+2]
-<n>negative formal charge-1[O-], [O-2]
#<n>atomic number-[#6], [#7]
<n>atomic mass (isotope)unspecified[13C], [35Cl]

Logical Operators

Atom and bond primitives can be combined using logical operators to build complex queries:

  • & - high-precedence AND (implicit between primitives)
  • , - OR
  • ; - low-precedence AND
  • ! - NOT

Bond Primitives

Bonds between atoms can also be constrained. An unspecified bond in a SMARTS pattern matches either a single or aromatic bond.

  • - - single bond
  • = - double bond
  • # - triple bond
  • : - aromatic bond
  • ~ - any bond (wildcard)
  • @ - any ring bond
  • / - directional bond "up" (for E/Z stereo)
  • \ - directional bond "down" (for E/Z stereo)
  • /? - directional "up" or unspecified
  • \? - directional "down" or unspecified

Chirality

Tetrahedral chirality can be specified using @ (anticlockwise) and @@ (clockwise), looking from first neighbour, following the same convention as SMILES. When included in a SMARTS pattern, chirality is used as a matching constraint - unspecified chirality in the query matches both enantiomers.

PrimitiveMeaningExample
@anticlockwise (looking from first neighbour)[C@H]
@@clockwise (looking from first neighbour)[C@@H]

Note, the @? and @@? operators, although part of Daylight, are not supported in RDKit. Also, non-tetrahedral chiral classes are not supported. Read more chirality.

Huh? That seems wrong? Yes, because currently the JavaScript port of RDKit does not allow for chiral search. I have TODO.

Recursive SMARTS

Any SMARTS expression can be used to define an atomic environment by anchoring it on the atom of interest using the $(...) syntax. These expressions behave like atomic primitives and can be combined with other primitives using logical operators.

  • [$(*C)] - atom connected to a methyl (or methylene) carbon
  • [$(*[CH3]);$(*C[CH3])] - atom connected to both methyl and ethyl sidegroups

Component-level Grouping

A dot (.) in a SMARTS pattern separates disconnected fragments. Each fragment can match anywhere in the target - there is no constraint on which component it belongs to.

  • C.O carbon and oxygen found in the SMILES. Will match atoms in CCO and CC.OO
  • C.O does not match CCC, because there is no oxygen present
Note: The Daylight SMARTS syntax defines component-level grouping using zero-level parentheses (e.g. (C).(C) to require matches in separate components), but this is not supported in RDKit. In RDKit, parentheses are only used for branching, and (C).(C) is a parse error. Additionally, . does not enforce matching across different disconnected fragments, so C.C can match within a single molecule. To correctly handle fragment-level constraints, split the molecule first (e.g. with Chem.GetMolFrags) and match each fragment separately or post-filter the results.

Hybridization Queries

Atoms can be matched by hybridization state using the ^ primitive followed by a number.

  • ^0 - S
  • ^1 - SP
  • ^2 - SP2
  • ^3 - SP3
  • ^4 - SP3D
  • ^5 - SP3D2

Heteroatom Neighbor Queries

Two primitives match atoms based on the number of heteroatom neighbors (non-C, non-H) they have:

  • z<n> - exactly n heteroatom neighbors (aromatic or aliphatic)
  • Z<n> - exactly n aliphatic heteroatom neighbors

Range Queries

Many numeric primitives accept a range in curly braces instead of a fixed value. Supported primitives: D, h, r, R, v, x, X, z, Z, +, -.

  • D{2-4} - between 2 and 4 explicit connections (inclusive)
  • D{-3} - at most 3 explicit connections
  • D{2-} - at least 2 explicit connections

Dative Bonds

Dative bonds <- and ->, are covalent bonds in which both electrons in the shared pair come from the same atom, so the bond is directional.

  • -> - dative bond pointing right (donor → acceptor)
  • <- - dative bond pointing left (acceptor ← donor)

Above bonds will not match the same atoms. In the example below, the nitrogen in trimethylamine donates a dative bond to platinum. [#7]->* matches the nitrogen as donor, while *<-[#7] matches the platinum as acceptor. With SMILES [Fe]->CC1=O.CN(C1)(C)->[Pt].

Efficiency Tips

Tips for writing efficient SMARTS (patterns are evaluated left to right):

  • Place uncommon atoms or bond arrangements early in the pattern.
  • In an AND expression, put the less common specification first.
  • In an OR expression, put the less common specification last.

Examples

Arene Halogens in Ortho

Aromatic carbons joined by a single bond

Aliphatic nitrogen in a ring

Two carbons connected by a double or triple bond