Local index workflow
NistChemPy no longer ships a prebuilt NIST Chemistry WebBook index. The local index is a user-generated cache that helps with broad compound discovery, property-availability filtering, and local search.
The local index is not an official NIST product and is not covered by the NistChemPy software license. It is a local artifact created by the user from NIST Chemistry WebBook pages.
What the index contains
A completed local index is a CSV table, usually named index.csv. It contains
compound identifiers, basic metadata, structure-file links, and WebBook section
availability URLs. Typical columns include:
IDnamesynonymsformulamol_weightinchiinchi_keycas_rnmol2Dandmol3Dsection columns such as
Mass spectrum (electron ionization)andGas Chromatography
The tiny example_index.csv fixture used in the documentation has the same
kind of column layout, but it is not a replacement for a locally generated
index.
Where the index is stored
By default, NistChemPy stores the local index in the platform-specific user
cache directory under nistchempy/webbook-index. The exact path depends on
the operating system.
Print the resolved default path with:
nistchempy index path
The same path is available from Python:
import nistchempy as nist
nist.WebBookIndex.default_path()
Use --path to build or read an index at a project-local location:
nistchempy index build --path ./webbook-index --accept-data-terms
nistchempy index search benzene --path ./webbook-index
Project-local index directories should normally be added to .gitignore.
Do not commit generated full indexes, raw page caches, or other large
WebBook-derived artifacts to public repositories.
How to check status
Use the status command if you built an index earlier and forgot where it is, or if you want to check whether a build completed:
nistchempy index status
nistchempy index status --path ./webbook-index
How the index is formed
A full local-index build has two conceptual stages:
discovery strategy -> seeds.csv -> compound-page enrichment -> index.csv
The discovery stage finds candidate WebBook compounds and writes
seeds.csv. The enrichment stage visits compound pages and extracts metadata,
structure links, and section availability URLs into index.csv.
Supported discovery strategies are:
formula-browserTraverses the WebBook formula browser and is the default general-purpose strategy.
sitemapReads WebBook sitemap files when available. It is useful as an audit or supplementary source.
formula-searchUses bounded formula-search subdivision. This strategy requires explicit formula bounds and records unresolved query regions for later inspection.
Build the default index with:
nistchempy index build --accept-data-terms
Warning
A full section-availability index can require visiting one WebBook page per compound.
With a polite 3 second delay and roughly 100,000-150,000 pages, an initial rebuild can take about 3.5-5+ days before retries and network overhead.
Use --path for an explicit cache location and rerun the command to
resume interrupted enrichment work.
Importing an existing local CSV
If you already have a local index CSV, import it into the cache layout:
nistchempy index build \
--from-csv /path/to/index.csv \
--path ./webbook-index \
--accept-data-terms
This does not make the CSV redistributable. It only records it as a local user artifact in NistChemPy’s current cache layout.
Using the index from Python
import nistchempy as nist
index = nist.get_local_index('./webbook-index')
index.search('benzene')
index.available_properties('C71432')
Local text and availability search
The local index supports text search over metadata columns and filtering by available WebBook sections:
index.search('benzene')
index.filter(has_sections='Mass spectrum (electron ionization)')
index.available_properties('C71432')
Local structural search
If RDKit is installed, the local index can also perform lightweight structural
screening using the indexed inchi and inchi_key columns:
index.structural_search(smiles='c1ccccc1', mode='exact')
index.structural_search(smiles='CCO', mode='substructure')
index.structural_search(smiles='CCO', mode='similarity')
This is a linear scan over the local index, not a persistent fingerprint
database. It is useful for small and medium local indexes and exploratory work.
For authoritative online structural search, use nist.run_structural_search.