Structural Search
This notebook demonstrates two structural-search workflows. The first uses the live NIST Chemistry WebBook structural-search endpoint. The second searches a tiny local CSV index fixture using RDKit and the indexed InChI/InChIKey fields.
RDKit is required for SMILES/InChI conversion and for local structural search. Install it with pip install -e ".[structure]" or with conda install -c conda-forge rdkit.
[1]:
import nistchempy as nist
molblock = nist.molblock_from_smiles('c1ccccc1')
print(molblock.splitlines()[0])
print('M END' in molblock)
True
Live WebBook structural search
The live structural search sends a MOL block to the WebBook. Passing a MOL file or MOL block does not require RDKit, but this example uses RDKit to convert SMILES to a MOL block first.
[2]:
search = nist.run_structural_search(
smiles='c1ccccc1',
search_type='struct',
)
search.success, search.num_compounds, search.compound_ids[:5]
[2]:
(True, 1, ['C71432'])
Local structural search over an index
The local index stores InChI and InChIKey values. With RDKit installed, NistChemPy can screen those indexed structures locally. This is a linear scan over the CSV table, not a persistent fingerprint database.
[3]:
from pathlib import Path
index_path = Path('example_index.csv')
if not index_path.exists():
index_path = Path('docs/source/example_index.csv')
index = nist.get_local_index(index_path)
index.structural_search(
smiles='c1ccccc1',
mode='exact',
).loc[:, ['ID', 'name', 'formula']]
[3]:
| ID | name | formula | |
|---|---|---|---|
| 1 | C71432 | Benzene | C6H6 |
[4]:
index.structural_search(
smiles='CCO',
mode='similarity',
threshold=0.1,
).loc[:, ['ID', 'name', 'formula', 'similarity']]
[4]:
| ID | name | formula | similarity | |
|---|---|---|---|---|
| 2 | C64175 | Ethanol | C2H6O | 1.0 |