CASA

Computer-Aided Spectral Assignment


CASA version 0.1.2
See the history file.
Last modified: September, 28th 2012.



Description

The aim of the CASA program is to assign 13C NMR resonances to the carbon atoms in a user-supplied molecular structure.
It relies on an exploitation of chemical shift correlations in 2D NMR spectra and, optionally, on a comparison between experimental and estimated chemical shift values.
The assignment of all resonances is possible if the hypothetic supplied structure is the good one, assuming that CASA input data is correct.
CASA may therefore be seen as a 1D/2D NMR-based automatic structure verification tool. The result of the CASA program is a set of lists of possible assignments.

The input data for CASA consists at least in

The CASA Data folder contains example files.

First input file: experimental NMR data

A data file, with a .cas extension, contains commands made of a mnemonic followed by one or more parameters. Unless specified, the order of the commands in the data file has no importance.

Each 13C NMR resonance is produced by a carbon atom whose status must be defined: a numerical label (or index), an hybridization state (sp3, sp2 or sp), a multiplicity (CH3, CH2, CH or C) and an optional chemical shift value.
Signals may be labelled the way the user wants but it may be convenient to number them starting from 1, in the decreasing order of chemical shift values.

The labelling of the H signals may be arbitrary as well, but it is highly recommended to give same label to H and C signals that arise from directly bound atoms, as indicated by the HSQC spectrum. Two anisochronous H nuclei within a methylene group may get the same index.
Each correlation in a 2D spectrum is converted into an ordered pair of atom indexes that is reported in the data file.
A reason for assignment failure may lie in the misinterpretation of weak intensity correlations. The latter originate from weak coupling constants that may reflect the presence of couplings through a high number of chemical bonds. Default coupling path length for COSY (3 bonds) and HMBC (2 and 3 bonds) correlations can be modified by correlation-specific command options.

Any trivial assignment can be given to CASA as a starting point of the resonance assignment process. For example, if there is only one keto functional group in the molecule and only one signal corresponding to an sp2 carbon around 210 ppm, this signal may be then assigned to the ketone carbon atom.

Supplementary information from elementary spectral analysis can be provided to CASA by the user.
A signal can be associated to an atom neighborhood property of the atom it will be assigned to.
For example, a carbon signal at 15 ppm from a methyl group whose proton signals appear as a singlet at 1 ppm corresponds to a group that is bonded to a quaternary carbon. In this case, one can impose this CH3 group to have one neighbor among a list of quaternary carbons.
Neighborhood properties are very useful when no chemical shift prediction is available.
List and property definition commands are treated in the order they are written in the data file. Changing their order can change their meaning.

Second input file: candidate structure

The program only accepts molecular structures in MDL .mol files. It could be easily produced with any chemical structure drawing program by saving molecule in MDL .mol format.

Additional input file:

CASA is able to restrict the number of carbon atoms a signal can be assigned to if the user can predict a chemical shift value and its associated uncertainty range for the C atoms in the proposed structure.


Installation

CASA is distributed under GPL licence.
CASA versions are available:

For installation from source, use the Makefile in the CasaSrc directory and the script named "compil" in the PredictorSrc directory.


NMR Data File

A data file is made of commands and comments. A comment is everything between a ";" and the end of the line.
There are basic commands to provide spectral information, commands to define atom neighborhood information and commands to control the program execution.

Each command has a specific syntax and starts with a command mnemonic, followed by 1 to 4 blank-separated parameters.
All command mnemonics are made of 4 alphanumeric characters. There are different types of parameter which follow the rules described below.

Parameters types

The parameter types are described using the following symbols:

Symbol Description
I A positive integer (0 included).
F A real number.
R An optional real number.
V A list of integer separated by blanks (4 maximum) between parentheses or a single positive integer.
O An optional positive integer.
Ln A list index, with n strictly positive.
B A list index (Ln) or an integer (I).
H An optional sign + or -.
P A file path name.
S A set of positive integers separated by spaces.

The parameters of a command are successively referenced by P1, P2, P3 and P4.

Basic commands

Command mnemonic Application Parameter description Comment
DEPT I V I R
defines NMR signal status
  • P1: 13C NMR signal index.
  • P2: Hybridization state or a list of hybridization between parentheses (1, 2 or 3, for sp, sp2 and sp3 atoms).
  • P3: Multiplicity (0, 1, 2 or 3 for C, CH, CH2, CH3 atoms).
  • P4: 13C NMR chemical shift (optional).
Examples:
  • DEPT 1 (2 3) 2: NMR signal 1 corresponds to a CH2 (2) sp2 or sp3 ((2 3)) carbon.
  • DEPT 2 2 0 175.0: NMR signal 2 at 175.0 ppm (175.0) corresponds to a quaternary (0) sp2 (2) carbon.
HSQC I I
heteronuclear correlation through 1 bond
  • P1: Index of correlating carbon.
  • P2: Index of correlating hydrogen.
Two inequivalent hydrogen atoms bound to the same atom may have different numbers.
It is generally useful (but not mandatory) to give identical numbers to a non-hydrogen atom and to the hydrogen atom(s) that is (are) directly bound.
Example: HSQC 4 4. Carbon 4 and hydrogen 4 are bound together.
COSY V I O O
COSY correlation
  • P1: Index of a correlating hydrogen or a list of ambiguously correlating hydrogens, between parentheses.
  • P2: Index of the correlating hydrogen resonance.
  • P3: Optional coupling path length, lower limit.
  • P4: Optional coupling path length, upper limit.
If there is no optional argument, the correlation is through 3 bonds by default.
When P3 is the only present optional argument, it defines the only possible coupling path length. Its minimim value is 3.
When both P3 and P4 are present, with P3 greater or equal to 3 and P4 greater or equal to P3, the coupling path length must be in the [P3, P4] range. If P4 is equal to 0, the length must be greater to P3.
Examples:
  • COSY 2 9: Hydrogen atom 2 correlates with hydrogen atom 9 through 3 bonds.
  • COSY (4 6) 9: Hydrogen atom 4 or 6 correlates with hydrogen atom 9 through 3 bonds.
  • COSY 5 9 3 4: Hydrogen atom 5 correlates with hydrogen atom 9 through 3 or 4 bonds.
HMBC V I O O
Heteronuclear correlation
  • P1: Index of a correlating carbon resonance or a list of ambiguously correlating carbons, between parentheses.
  • P2: Index of correlating hydrogen resonance.
  • P3: Optional coupling path length, lower limit.
  • P4: Optional coupling path length, upper limit.
If there is no optional argument, the correlating atoms are distant by 2 or 3 bonds, by default.
When P3 is the only present optional argument, it defines the only possible coupling path length. Its minimim value is 2.
When both P3 and P4 are present, with P3 greater or equal to 2 and P4 greater or equal to P3, the coupling path length must be in the [P3, P4] range. If P4 is equal to 0, the length must be greater to P3.
Examples:
  • HMBC 3 8: Carbon 3 correlates with hydrogen atom 8 through 2 or 3 bonds.
  • HMBC (4 5) 8: Carbon 4 or 5 correlates with hydrogen atom 8 through 2 or 3 bonds.
  • HMBC 6 8 2: Carbon 6 correlates with hydrogen atom 8 through 2 bonds.
  • HMBC 6 8 4: Carbon 6 correlates with hydrogen atom 8 through 4 bonds.
  • HMBC 6 8 2 3: Carbon 6 correlates with hydrogen atom 8 through 2 or 3 bonds.
INAD V I
INADEQUATE correlation
  • P1: Index of a correlating carbon resonance or a list of ambiguously correlating carbons, between parentheses.
  • P2: Index of a correlating carbon resonance.
There is no optional argument, the correlation is through 1 bond by default.
ASGN I I
Assignment of NMR signal to atom of the structure
  • P1: 13C NMR signal index.
  • P2: Index of the corresponding carbon atom in the structure.
Example: ASGN 1 5. 13C NMR signal 1 is assigned to carbon atom 5 in the molecule.

Definition of NMR signal properties

Neighborhood properties can be defined to impose constraints during signal/atom assignment.
A signal takes atom neighboring properties for the the atom it will be assigned to. The user can define a number of neighbors in a list of candidates.
These information could derive from simple chemical shift value or coupling pattern analysis.
Properties are very useful when no chemical shift prediction is available.

Command mnemonic Application Parameter description Comment
LNMR Ln S
defines a list of NMR signals
  • P1: List index, n is comprised between 1 and 20.
  • P2: 13C NMR signal indexes.
Example: LNMR L1 12 8. The L1 list contains NMR signals 12 and 8.
LMOL Ln S
defines a list of molecular atoms
  • P1: List index, n is comprised between 1 and 20.
  • P2: Molecular atom indexes.
Example: LMOL L2 19 8 5. The L2 list contains atoms 19, 8 and 5 of the molecule.
PROP B I Ln H
defines environment of atoms
  • P1: A single signal index or the index of an NMR signal list. The signal(s) referenced in P1 correspond(s) to atom(s) which have exactly P2 neighbors in P3.
  • P2: Exact number of neighbors. The value 0 stands for "all".
  • P3: Index of neighboring atom list.
  • P4: Optional sign (+ or -).
Examples:
  • PROP L1 0 L2: Each NMR signal in L1 must be assigned to atom which has all its neighbors in L2.
  • PROP 12 1 L3: Signal 12 must be assigned to atom which has exactly 1 neighbor in L3.
  • PROP 12 1 L3 +: Signal 12 must be assigned to atom which has 1 or more neighbors in L3.
  • PROP 12 1 L3 -: Signal 12 must be assigned to atom which has 1 or less neighbor in L3.
  • PROP 12 0 L3 -: Signal 12 must be assigned to atom which has no neighbor in L3.

Example:

Analysis of chemical shifts shows that the carbon atoms that cause resonances labelled 12 and 8 most likely have each 1 neighboring heteroatom. The PROP command imposes signals 12 and 8 to be assigned to carbon atoms which are bound to exactly 1 heteroatom.

Execution control

Command mnemonic Application Parameter description Comment
ENTR I
display of input files interpretation
  • P1 = 0: No display (default).
  • P1 = 1: Active state.
---
WORK I
allow production of solutions
  • P1 = 0: Only reads and interpretes the input data files.
  • P1 = 1: Active state (default).
---
VERB I
verbosity
  • P1 = 0: Dummy (default).
  • P1 = 1: Active state.
  • P1 = 2: Very verbose.
---
STEP I
single step operation
  • P1 = 0: Nothing (default).
  • P1 = 1: Active state.
It requires the command VERB 2.
The user is prompted for the action to be taken.
CCLA I
controls the use of C atom equivalence classes
  • P1 = 0: No classes. All 13C assignments are generated.
  • P1 = 1: Symmetric C atoms are equivalent (default).
---
AWCS I
assignment with estimated chemical shifts
  • P1 = 0: Assignment without estimated chemical shifts (default).
  • P1 = 1: Uses estimated 13C chemical shifts in addition to 2D NMR correlations for assignment.
  • P1 = 2: Uses only estimated 13C chemical shifts for assignment.
Requires an additional file that contains predicted chemical shift values.
The path of this file must be specified with a CCSF command.
CCSF P
file that contains estimated chemical shift values
  • P1: Path of file containing estimated chemical shifts.
P1 must begin with a " and finish by a ".
The file format is described here.
SCLF F
scale factor for chemical shift range
  • P1: Scale factor value.
All prediction error interval radius present in the additional file are multiplied by the scale factor.
TOLE F
minimum tolerance value for chemical shift range
  • P1: Threshold value.
This value cannot be inferior to 5 (default value).
ELIM I I
elimination of unsatisfying HMBC and/or COSY correlations
  • P1: Maximum number of eliminated correlations.
  • P2: Maximum number of bonds between the atoms in eliminated correlations.
The upper limit is P2 for a HMBC correlation and P2 + 1 for a COSY correlation.
P2 = 0 means no limitation.
Example: ELIM 3 5. 3 HMBC and/or COSY correlations, at most, can be eliminated. The upper limit is a 5J coupling for HMBC eliminated correlations and a 6J coupling for COSY correlations.


Additional data: predicted chemical shifts file

A set of estimated chemical shifts can be provided to CASA in order to improve the result of the assignment process.

The additional file must be written according to the following format.
Each line describes an atom and contains 3 blank-separated fields.

The radius of the prediction error interval may be enlarged or reduced by a scale factor. This scale factor is given as parameter to the SCLF command.
The minimum radius value is set to 5 by default. It can be modified with the TOLE command.
To be active, the name of such a file MUST be given as parameter to the CCSF command and the AWCS command parameter must be set to 1.

Prediction of 13C NMR chemical shifts

A predictor is provided with CASA. It relies on the NMRShiftDB database.

The predictor works with the same MDL .mol file used by CASA. This file must not contain explicit hydrogen atoms.

Example: prediction of quinidine chemical shifts:

>> predict quinidine.mol cstable.txt

cstable.txt is the output file containing predicted chemical shifts.


Running CASA

First, make sure that the 2 mandatory input files (for example: camphor.cas and camphor.mol) have correctly been written. When possible, create a supplementary file with the estimated chemical shifts and estimation errors of the atoms, according to their numbering in the .mol structure file.

Running the program is achieved by typing:

>> casa camphor.cas camphor.mol

in the command line prompt.

After resolution, the number of possible assignment sets is printed and a solution file named camphor.sol is created. The solution file is named after the .cas file name.


Reading the result

The solution file header is a copy of the input NMR data file.
Each solution, if any, is presented as a list of lines in which each NMR signal index is followed by the corresponding atom index.


Copyright(C)2012 CNRS-UMR 7312-Bertrand Plainchont and Jean-Marc Nuzillard