PyLSD is a layer over the LSD software. The reader of this page should already be familiar with LSD. PyLSD processes enhanced LSD input files, and thus solves structure elucidation problems LSD cannot solve. The enhancements allow one to deal with compounds for which the exact molecular formula is not precisely known, for which some atoms have an unknown hybridization state and/or an unknown multiplicity (number of attached hydrogen atoms). In addition, the solutions can be ranked by decreasing order of likelihood, according to the matching of experimental 13C NMR chemical shifts with predicted ones.
An ambiguous (or complex) LSD problem, that include a variable molecular formula (VMF) and variable status atoms (VSA), is converted by PyLSD into a set of unambiguous (or simple) LSD problem files that can be separately processed by the LSD software. The solutions files are then grouped together for ranking using the 13C chemical shift prediction by nmrshiftdb2, structure diagam generation and display.
PyLSD in written in Python language.
Find pyLSD on
PyLSD is a free software that is distributed under the GPL license.
This is version alpha-4. PyLSD-a4 is functional but still needs many improvements.
Here is the History file
Click here for installation and testing instructions. This page also indicates where to put the PyLSD command files and how to run them.
python lsd.py mypylsdfile.lsdcommand
INSTALL.html indicates which intermediate files are created and provides advices for troubleshooting (see the First-aid section). The solution file, in SD format is named mypylsdfile_0.sdf, and is created in the LSD/Data directory. It can be used in order to improve the 2D structure depictions outlsd generates.
A LSD data file is not (yet) a valid PyLSD data file.
The conversion can be achieved by adding to new commands:
FORM command indicates the molecular formula of the unknown compound.
It has a single argument, a character string between double quotes,
such as in
FORM "C 21 H 22 N 2 O 2" for strychnine.
All formula parts, elements symbols and coefficients are separated by white spaces.
PIEC command is a story in itself that is telled later, in a separate section.
It takes a single integer as parameter that fixes an upper limit to the number of connected parts in
the problem solutions (well, roughly...).
PIEC 1 command to a LSD data file achieves what the user
generally wants to do.
The pinene.lsd LSD data file has been adapted to pyLSD and is available for testing. Please notice that the location of the substructure files have been updated because the Filters directory is not any more in the current directory but in ../LSD/. Running "python lsd.py pinene.lsd" from the command line with "Variant" as current directory should display the structure of pinene.
The string argument of the
contains chemical element symbols that are followed by an indication
about the number of occurences of these elements in the molecule.
The number of occurences may be either an integer or a range in the form
in which n and m are two integers (n < m).
If at least one number of occurences is a range, then a
MOMA is required.
FORM "C 1 H 3 N 1 O 2-3", taken from the mixture.lsd PyLSD data file.
This molecular formula fits with nitromethane, methyl nitrite and nitromethane.
The argument of the
MOMA command indicates a molecular mass or
a molecular mass range. A molecular mass is the sum of the atomic masses of the atoms
that constitute a molecule, expressed in atomic mass units (amu),
The atomic masses are integer values (number of nucleons in
the most abundant isotope) according to the first line of the paragraphs in
MOMA 1-1000, taken from the mixture.lsd PyLSD data file.
The molecular mass must be between 1 and 1000 amu, thus meaning that no constraint on mass is imposed.
This is an extension of the LSD
For chemical element symbols, LSD supports usual symbols for usual valence
(S for divalent sulfur) and usual symbols followed by unusual valence
(S4 for tetravalent sulfur). PyLSD also supports alternative valences but the
usual valence must explicitely be given (S24 for di- and tetravalent sulfur).
For hybridization state, multiplicity and electric charge, alternative values
are given, as usual, between parenthesis and are separated by blanks.
MULT 20 N35 (2 3) (0 1 2) (0 1), defines atom 20
as a nitrogen atom either tri- or pentavalent, sp2 or sp3, bound to 0, 1 or 2
hydrogen atoms, with either a 0 or a +1 electric charge.
Molecular ELECtric charge
ELEC command either imposes a single molecular electric charge value
or proposes alternative values. If no
ELEC command is present,
the imposed value is 0. Electric charges are expressed by integers, in
proton electric charge units. Alternative values
are given, as usual, between parenthesis and are separated by blanks.
ELEC (-1 0 1), constrains the molecular electric charge
to be -1, 0 or +1, in proton electric charge units.
MAXimum number of Positively/Negatively charged atoms
MAXP/MAXN command has a single integer argument that is
the maximum number of positively/negatively charged atoms in the molecule.
MAXP/MAXN command is present, then no control takes place.
MAXP 1 constrains the molecule to have at most 1
positively charged atom.
DEfault MUlt parameter
DEMU command is only necessary if the molecular formula
has alternatives. In this case, each element has a minimum number of occurences.
The number of MULT commands for an element cannot be higher than the minimum number
of occurences. If the actual number of occurences is strictly higher than
the minimum number, the supplementary atoms get, by default, the most general status
for the element, as inferred from Variant/statuslist.txt.
DEMU command overrides the default status for an element,
given as first command parameter. The following parameters are those of
DEMU N N (1 2 3) (0 1 2 3) (0 1) indicates that
any supplementary nitrogen atom (relatively to the minimum number
of nitrogen atoms, as indicated by the
is a trivalent nitrogen, of any hybridization, any multiplicity,
either not electrically charged or with a single positive charge.
CNTD command with 1 as argument forces LSD to deliver
connected (in one piece) solutions. When its argument is 0, this control is disabled.
PIEC pyLSD-specific command operates on solution connectivity
but at a different level.
If one or more VSAs are present, the first task of pyLSD to propose a coordinance to each VSA.
The coordinance concerns only the graph of heavy (non-hydrogen) atoms, considering each
chemical bond between them as simple. The coordinance of an atom is simply the number
of its neighbors. The molecular coordinance is sum the coordinances of all the heavy atoms;
it is equal to twice the number of bonds between atoms (again, all bonds are simple).
It can be proved that:
number of rings = number of bonds - number of atoms + number of connected parts.
Considering that each atom has a defined coordinance and therefore that the molecule has a defined number of bonds, that the molecule has a defined number of atoms (the VMF ambiguity has already been resolved at this time), then a set of possible number of connected parts corresponds to a set of possible number of rings. If all the possible number of rings are negative, then the currently proposed VSA coordinance set is not a valid one. The parameter of the
PIEC indicates that the number of connected parts of the
solution is comprised between 1 and the parameter value (included).
Looking for all the isomers of benzene, made of neutral carbon atoms, sp, sp2 or sp3,
bound to 0, 1, 2 or 3 hydrogen atoms, it might be possible to consider that all atoms
are monocoordinated (6 sp atoms, each bound to 1 H atom, like in a set of 3 acetylene molecules).
The molecular coordinance is 6, resulting in 3 bonds. With 6 atoms and 1 as single possible
number of pieces, the number of rings would be ‑2. With
PIEC 1, the possibility
of having 6 monocoordinated carbon atoms must not be further explored.
The tri-acetylene solution can only be produced with a
PIEC 3 command.
PIEC 1 does not prevent the generation of a non-connected solution.
The solution that consists in cyclobutadiene and acetylene has 5 bonds. With 6 atoms and
1 connected part, the number of rings is 0, which is acceptable.
PIEC command is only a way to eliminate unrealistic possibilities
in the assignment of particular coordinance values to the VSAs, and not a real
control on the solution connectivity. This control can only be achieved by changing LSD itself
through a modification of the
If at least one carbon atom has its experimental chemical shift defined by a
SHIX command, then the prediction of the chemical shifts will be carried out.
The sum of the absolute values of the differences between experimental
and predicted values is used as criterion for solution ranking.
The best fit solution is presented first.