NistChemPy API
The public API is organized around four tasks: live WebBook search, compound
page parsing, user-local index access, and low-level request/parsing helpers.
Internal local-index implementation modules under nistchempy.indexing are
not documented here as public API.
Top-level package
Unofficial tools for querying NIST Chemistry WebBook pages.
NistChemPy extracts selected molecular-property records from Chemistry WebBook pages for research workflows. It is not affiliated with, maintained by, or endorsed by NIST.
Search
The module contains search-related functionality
- nistchempy.search.get_search_parameters() Dict[str, str]
Returns search parameters and the corresponding keys
- Returns:
{short_key => search_parameter}
- Return type:
_tp.Dict[str, str]
- nistchempy.search.print_search_parameters() None
Prints available search parameters
- class nistchempy.search.NistSearchParameters(use_SI: bool = True, match_isotopes: bool = False, allow_other: bool = False, allow_extra: bool = False, no_ion: bool = False, cTG: bool = False, cTC: bool = False, cTP: bool = False, cTR: bool = False, cIE: bool = False, cIC: bool = False, cIR: bool = False, cTZ: bool = False, cMS: bool = False, cUV: bool = False, cGC: bool = False, cES: bool = False, cDI: bool = False, cSO: bool = False)
Bases:
objectGET parameters for compound search of NIST Chemistry WebBook
- use_SI
if True, returns results in SI units. otherwise calories are used
- Type:
bool
- match_isotopes
if True, exactly matches the specified isotopes (formula search only)
- Type:
bool
- allow_other
if True, allows elements not specified in formula (formula search only)
- Type:
bool
- allow_extra
if True, allows more atoms of elements in formula than specified (formula search only)
- Type:
bool
- no_ion
if True, excludes ions from the search (formula search only)
- Type:
bool
- cTG
if True, returns entries containing gas-phase thermodynamic data
- Type:
bool
- cTC
if True, returns entries containing condensed-phase thermodynamic data
- Type:
bool
- cTP
if True, returns entries containing phase-change thermodynamic data
- Type:
bool
- cTR
if True, returns entries containing reaction thermodynamic data
- Type:
bool
- cIE
if True, returns entries containing ion energetics thermodynamic data
- Type:
bool
- cIC
if True, returns entries containing ion cluster thermodynamic data
- Type:
bool
- cIR
if True, returns entries containing IR data
- Type:
bool
- cTZ
if True, returns entries containing THz IR data
- Type:
bool
- cMS
if True, returns entries containing MS data
- Type:
bool
- cUV
if True, returns entries containing UV/Vis data
- Type:
bool
- cGC
if True, returns entries containing gas chromatography data
- Type:
bool
- cES
if True, returns entries containing vibrational and electronic energy levels
- Type:
bool
- cDI
if True, returns entries containing constants of diatomic molecules
- Type:
bool
- cSO
if True, returns entries containing info on Henry’s law
- Type:
bool
- use_SI: bool = True
- match_isotopes: bool = False
- allow_other: bool = False
- allow_extra: bool = False
- no_ion: bool = False
- cTG: bool = False
- cTC: bool = False
- cTP: bool = False
- cTR: bool = False
- cIE: bool = False
- cIC: bool = False
- cIR: bool = False
- cTZ: bool = False
- cMS: bool = False
- cUV: bool = False
- cGC: bool = False
- cES: bool = False
- cDI: bool = False
- cSO: bool = False
- get_request_parameters() dict
Returns dictionary containing GET parameters
- Returns:
dictionary of GET parameters relevant to the search
- Return type:
dict
- class nistchempy.search.NistSearch(_request_config: RequestConfig, _nist_response: NistResponse, search_parameters: NistSearchParameters, compound_ids: List[str], success: bool, lost: bool, message: str = '')
Bases:
objectResults of the compound search in NIST Chemistry WebBook
- _request_config
additional requests.get parameters
- Type:
_ncpr.RequestConfig
- _nist_response
NIST search response
- Type:
NistResponse
- search_parameters
used search parameters
- Type:
NistSearchParameters
- compound_ids
NIST IDs of found compounds
- Type:
_tp.List[str]
- compounds
NistCompound objects of found compounds
- Type:
_tp.List[_compound.NistCompound]
- success
True if search request was successful
- Type:
bool
- num_compounds
number of found compounds
- Type:
int
- lost
True if search returns less compounds than there are in the database
- Type:
bool
- message
Optional WebBook search status/error message.
- Type:
str
- search_parameters: NistSearchParameters
- compound_ids: List[str]
- compounds: List[NistCompound]
- success: bool
- num_compounds: int
- lost: bool
- message: str = ''
- load_found_compounds() List[NistCompound]
Load and return found compounds.
- Returns:
Loaded NistCompound objects.
- Return type:
list
- nistchempy.search.search_from_response(nr: NistResponse, search_parameters: NistSearchParameters, config: RequestConfig) NistSearch
Transforms search requests to the NistSearch object
- Parameters:
nr (_ncpr.NistResponse) – NIST response object
search_parameters (NistSearchParameters) – search request parameters
config (_ncpr.RequestConfig) – search request config
- Returns:
search results
- Return type:
NistSearch
- nistchempy.search.run_search(identifier: str, search_type: str, search_parameters: NistSearchParameters | None = None, request_config: RequestConfig | None = None, use_SI: bool = True, match_isotopes: bool = False, allow_other: bool = False, allow_extra: bool = False, no_ion: bool = False, cTG: bool = False, cTC: bool = False, cTP: bool = False, cTR: bool = False, cIE: bool = False, cIC: bool = False, cIR: bool = False, cTZ: bool = False, cMS: bool = False, cUV: bool = False, cGC: bool = False, cES: bool = False, cDI: bool = False, cSO: bool = False) NistSearch
Searches compounds in NIST Chemistry WebBook
- Parameters:
identifier (str) – NIST compound ID / formula / name / inchi / CAS RN
search_type (str) – identifier type, available options are: - ‘formula’ - ‘name’ - ‘inchi’ - ‘cas’ - ‘id’
search_parameters (_tp.Optional[NistSearchParameters]) – search parameters; if provided, the following search parameter arguments are ignored
request_config (_tp.Optional[_ncpr.RequestConfig]) – additional requests.get parameters
use_SI (bool) – if True, returns results in SI units. otherwise calories are used
match_isotopes (bool) – if True, exactly matches the specified isotopes (formula search only)
allow_other (bool) – if True, allows elements not specified in formula (formula search only)
allow_extra (bool) – if True, allows more atoms of elements in formula than specified (formula search only)
no_ion (bool) – if True, excludes ions from the search (formula search only)
cTG (bool) – if True, returns entries containing gas-phase thermodynamic data
cTC (bool) – if True, returns entries containing condensed-phase thermodynamic data
cTP (bool) – if True, returns entries containing phase-change thermodynamic data
cTR (bool) – if True, returns entries containing reaction thermodynamic data
cIE (bool) – if True, returns entries containing ion energetics thermodynamic data
cIC (bool) – if True, returns entries containing ion cluster thermodynamic data
cIR (bool) – if True, returns entries containing IR data
cTZ (bool) – if True, returns entries containing THz IR data
cMS (bool) – if True, returns entries containing MS data
cUV (bool) – if True, returns entries containing UV/Vis data
cGC (bool) – if True, returns entries containing gas chromatography data
cES (bool) – if True, returns entries containing vibrational and electronic energy levels
cDI (bool) – if True, returns entries containing constants of diatomic molecules
cSO (bool) – if True, returns entries containing info on Henry’s law
- Returns:
search object containing info on found compounds
- Return type:
NistSearch
- nistchempy.search.run_structural_search(molfile: str | None = None, molblock: str | None = None, search_type: str = 'sub', search_parameters: NistSearchParameters | None = None, request_config: RequestConfig | None = None, use_SI: bool = True, cTG: bool = False, cTC: bool = False, cTP: bool = False, cTR: bool = False, cIE: bool = False, cIC: bool = False, cIR: bool = False, cTZ: bool = False, cMS: bool = False, cUV: bool = False, cGC: bool = False, cES: bool = False, cDI: bool = False, cSO: bool = False, smiles: str | None = None, inchi: str | None = None) NistSearch
Runs (sub)structural search in NIST Chemistry WebBook.
RDKit is required only when
smilesorinchiis supplied and NistChemPy must convert that structure into a MOL block. Passingmolfileormolblockdirectly does not require RDKit.- Parameters:
molfile – Optional path to a MOL file.
molblock – Optional MOL block text.
search_type – Structural search type:
structfor exact match orsubfor substructure search.search_parameters – Optional search parameters. If provided, the individual boolean search-parameter arguments are ignored.
request_config – Optional request configuration.
use_SI – If True, returns thermodynamic values in SI units.
cTG – Return entries containing gas-phase thermodynamic data.
cTC – Return entries containing condensed-phase thermodynamic data.
cTP – Return entries containing phase-change thermodynamic data.
cTR – Return entries containing reaction thermodynamic data.
cIE – Return entries containing ion energetics thermodynamic data.
cIC – Return entries containing ion cluster thermodynamic data.
cIR – Return entries containing IR data.
cTZ – Return entries containing THz IR data.
cMS – Return entries containing MS data.
cUV – Return entries containing UV/Vis data.
cGC – Return entries containing gas chromatography data.
cES – Return entries containing vibrational/electronic energy levels.
cDI – Return entries containing constants of diatomic molecules.
cSO – Return entries containing Henry’s law data.
smiles – Optional SMILES string converted to a MOL block with RDKit.
inchi – Optional InChI string converted to a MOL block with RDKit.
- Returns:
Search object containing found compounds.
- Return type:
NistSearch
- Raises:
ValueError – If the search type or structural input selection is invalid.
NistChemPyOptionalDependencyError – If RDKit is required but missing.
Compounds
The module contains compound-related functionality
- nistchempy.compound.SPEC_TYPES
dictionary containing abbreviations for spectra types used in compound page (keys) or urls for downloading JDX-files (values)
- Type:
dict
- class nistchempy.compound.Spectrum(compound: NistCompound, spec_type: str, spec_idx: str, jdx_text: str, source_url: str = '')
Bases:
objectWrapper for IR, MS, and UV-Vis extracted from NIST Chemistry WebBook
- compound
parent NistCompound object
- Type:
NistCompound
- spec_type
IR / TZ (THz IR) / MS / UV (UV-Vis)
- Type:
str
- spec_idx
index of the spectrum
- Type:
str
- jdx_text
text block of the corresponding JDX-file
- Type:
str
- compound: NistCompound
- spec_type: str
- spec_idx: str
- jdx_text: str
- source_url: str = ''
- to_record() SpectrumRecord
Return this spectrum as a structured record.
- Returns:
JSON-like spectrum record.
- Return type:
SpectrumRecord
- to_dict(include_raw: bool = True) dict
Return this spectrum as a JSON-friendly dictionary.
- Parameters:
include_raw – If True, include raw JCAMP-DX text.
- Returns:
Structured spectrum data.
- Return type:
dict
- save(name: str = None, path_dir: str = None) None
Saves spectrum in JDX format
- name
custom filename (default name is formed from compound ID, spectrum type and index)
- Type:
str
- path_dir
directory where output file will be saved
- Type:
str
- class nistchempy.compound.Chromatogram(compound: NistCompound, ri_type: str, column_type: str, temp_regime: str, data: DataFrame, source_url: str = '')
Bases:
objectWrapper chromatography data extracted from NIST Chemistry WebBook
- compound
parent NistCompound object
- Type:
NistCompound
- ri_type
type of retention index: Kovatz, van den Dool & Kratz, etc.
- Type:
str
- column_type
polar / non-polar
- Type:
str
- temp_regime
temperature regime: isothermal / ramp / custom
- Type:
str
- data
experimental data
- Type:
_pd.core.frame.DataFrame
- compound: NistCompound
- ri_type: str
- column_type: str
- temp_regime: str
- data: DataFrame
- source_url: str = ''
- to_record() ChromatogramRecord
Return this chromatogram as a structured record.
- Returns:
JSON-like gas chromatography record.
- Return type:
ChromatogramRecord
- to_dict(orient: str = 'records') dict
Return this chromatogram as a JSON-friendly dictionary.
- Parameters:
orient – DataFrame orientation passed to
DataFrame.to_dict.- Returns:
Structured gas chromatography data.
- Return type:
dict
- save(name: str = None, path_dir: str = None, **kwargs) None
Saves chromatograms in CSV format
- name
custom filename (default name is formed from compound ID, spectrum type and index)
- Type:
str
- path_dir
directory where output file will be saved
- Type:
str
- kwargs
parameters for pandas DataFrame to_csv method
- class nistchempy.compound.NistCompound(_request_config: RequestConfig, _nist_response: NistResponse, ID: str | None, name: str | None, synonyms: List[str], formula: str | None, mol_weight: float | None, inchi: str | None, inchi_key: str | None, cas_rn: str | None, mol_refs: Dict[str, str], data_refs: Dict[str, str], nist_public_refs: Dict[str, str], nist_subscription_refs: Dict[str, str])
Bases:
objectStores info on NIST Chemistry WebBook compound
- _request_config
additional requests.get parameters
- Type:
_ncpr.RequestConfig
- _nist_response
response to the GET request
- Type:
_ncpr.NistResponse
- ID
NIST compound ID
- Type:
_tp.Optional[str]
- name
chemical name
- Type:
_tp.Optional[str]
- synonyms
synonyms of the chemical name
- Type:
_tp.List[str]
- formula
chemical formula
- Type:
_tp.Optional[str]
- mol_weight
molecular weigth, g/cm^3
- Type:
_tp.Optional[float]
- inchi
InChI string
- Type:
_tp.Optional[str]
- inchi_key
InChI key string
- Type:
_tp.Optional[str]
- cas_rn
CAS registry number
- Type:
_tp.Optional[str]
- mol_refs
references to 2D and 3D MOL-files
- Type:
_tp.Dict[str, str]
- data_refs
references to the webpages containing physical chemical data for the given compound
- Type:
_tp.Dict[str, str]
- nist_public_refs
references to webpages of other public NIST databases containing data for the given compound
- Type:
_tp.Dict[str, str]
- nist_subscription_refs
references to webpages of subscription NIST databases containing data for the given compound
- Type:
_tp.Dict[str, str]
- mol2D
text block of a MOL-file containing 2D atomic coordinates
- Type:
_tp.Optional[str]
- mol3D
text block of a MOL-file containing 3D atomic coordinates
- Type:
_tp.Optional[str]
- ir_specs
list pf IR Spectrum objects
- Type:
_tp.List[Spectrum]
- thz_specs
list pf THz Spectrum objects
- Type:
_tp.List[Spectrum]
- ms_specs
list pf MS Spectrum objects
- Type:
_tp.List[Spectrum]
- uv_specs
list pf UV-Vis Spectrum objects
- Type:
_tp.List[Spectrum]
- gas_chromat
list of Chromatogram objects
- Type:
_tp.List[Chromatogram]
- ID: str | None
- name: str | None
- synonyms: List[str]
- formula: str | None
- mol_weight: float | None
- inchi: str | None
- inchi_key: str | None
- cas_rn: str | None
- mol_refs: Dict[str, str]
- data_refs: Dict[str, str]
- nist_public_refs: Dict[str, str]
- nist_subscription_refs: Dict[str, str]
- mol2D: str | None
- mol3D: str | None
- ir_specs: List[Spectrum]
- thz_specs: List[Spectrum]
- ms_specs: List[Spectrum]
- uv_specs: List[Spectrum]
- gas_chromat: List[Chromatogram]
- to_record() CompoundRecord
Return compound metadata as a structured record.
- Returns:
JSON-like compound metadata record.
- Return type:
CompoundRecord
- to_dict() dict
Return compound metadata as a JSON-friendly dictionary.
- Returns:
Structured compound metadata.
- Return type:
dict
- iter_records()
Yield structured records for metadata and already loaded data.
This method does not download additional WebBook pages. It only returns records for compound metadata and properties that have already been loaded into the object.
- Yields:
RecordBase – Compound, MOL file, spectrum, or chromatography record.
- to_records() List
Return records for metadata and already loaded data.
- Returns:
Structured records yielded by
iter_records.- Return type:
list
- get_molfile(dim: int) str | None
Loads text block of 2D / 3D molfile
- Parameters:
dim (int) – dimensionality of molfile (2D / 3D)
- Returns:
Downloaded MOL file text, or None if unavailable.
- Return type:
str | None
- get_mol2D() str | None
Loads text block of 2D molfile
- get_mol3D() str | None
Loads text block of 3D molfile
- get_molfiles() Dict[str, str | None]
Loads text block of all available molfiles
- get_spectrum(spec_type: str, spec_idx: str) Spectrum
Loads spectrum of given type (IR / TZ / MS / UV) and index
- Parameters:
spec_type (str) – spectrum type [ IR / TZ / MS / UV ]
spec_idx (str) – spectrum index
- Returns:
wrapper for the text block of JDX-formatted spectrum
- Return type:
Spectrum
- get_spectra(spec_type: str) List[Spectrum] | None
Loads all available spectra of given type (IR / TZ / MS / UV)
- Parameters:
spec_type (str) – spectrum type [ IR / TZ / MS / UV ]
- get_ir_spectra() List[Spectrum] | None
Loads all available IR spectra
- get_thz_spectra() List[Spectrum] | None
Loads all available THz spectra
- get_ms_spectra() List[Spectrum] | None
Loads all available MS spectra
- get_uv_spectra() List[Spectrum] | None
Loads all available UV-Vis spectra
- get_all_spectra() dict
Loads all available spectra
- save_spectra(spec_type: str, path_dir: str = './') None
Saves all spectra of given type to the specified folder
- Parameters:
spec_type (str) – spectrum type [ IR / TZ / MS / UV ]
path_dir (str) – directory to save spectra
- save_ir_spectra(path_dir: str = './') None
Saves IR spectra to the specified folder
- Parameters:
path_dir (str) – directory to save spectra
- save_thz_spectra(path_dir: str = './') None
Saves IR spectra to the specified folder
- Parameters:
path_dir (str) – directory to save spectra
- save_ms_spectra(path_dir: str = './') None
Saves mass spectra to the specified folder
- Parameters:
path_dir (str) – directory to save spectra
- save_uv_spectra(path_dir: str = './') None
Saves all UV-Vis spectra to the specified folder
- Parameters:
path_dir (str) – directory to save spectra
- save_all_spectra(path_dir: str = './') None
Saves all UV-Vis spectra to the specified folder
- Parameters:
path_dir (str) – directory to save spectra
- get_gas_chromatography() List[Chromatogram] | None
Loads info on gas chromatography
- save_gas_chromatography(path_dir: str = './', **kwargs) None
Saves all tables with data on gas chromatohraphy experiments
- Parameters:
path_dir (str) – directory to save spectra
- nistchempy.compound.compound_from_response(nr: NistResponse, request_config: RequestConfig | None = None) NistCompound | None
Initializes NistCompound object from the corresponding response
- Parameters:
nr (_ncpr.NistResponse) – response to the GET request for a compound
request_config (_tp.Optional[_ncpr.RequestConfig]) – additional requests.get parameters
- Returns:
NistCompound object, and None if there are several compounds corresponding to the given ID
- Return type:
_tp.Optional[NistCompound]
- nistchempy.compound.get_compound(ID: str, request_config: RequestConfig | None = None) NistCompound | None
Loads the main info on the given NIST compound
- Parameters:
ID (str) – NIST compound ID, CAS RN or InChI
request_config (_tp.Optional[_ncpr.RequestConfig]) – additional requests.get parameters
- Returns:
NistCompound object, and None if there are several compounds corresponding to the given ID
- Return type:
_tp.Optional[NistCompound]
Structured records
Structured JSON-like records returned by NistChemPy.
- class nistchempy.records.CompoundRecord(record_type: str = 'compound', compound_id: str = '', source_url: str = '', retrieved_at: str = '', name: str = '', synonyms: List[str] = <factory>, formula: str | None = None, mol_weight: float | None = None, inchi: str | None = None, inchi_key: str | None = None, cas_rn: str | None = None, mol_refs: Dict[str, str]=<factory>, data_refs: Dict[str, str]=<factory>, nist_public_refs: Dict[str, str]=<factory>, nist_subscription_refs: Dict[str, str]=<factory>)
Bases:
RecordBaseStructured metadata record for a WebBook compound.
- Parameters:
compound_id – NIST Chemistry WebBook compound ID.
name – Preferred compound name.
synonyms – Alternative compound names.
formula – Molecular formula.
mol_weight – Molecular weight.
inchi – InChI string.
inchi_key – InChIKey string.
cas_rn – CAS Registry Number, when available.
mol_refs – Links to MOL file resources.
data_refs – Links to WebBook property sections.
nist_public_refs – Links to public NIST-related resources.
nist_subscription_refs – Links to subscription NIST-related resources.
source_url – URL of the source WebBook page.
retrieved_at – Optional retrieval timestamp.
- record_type: str = 'compound'
- name: str = ''
- synonyms: List[str]
- formula: str | None = None
- mol_weight: float | None = None
- inchi: str | None = None
- inchi_key: str | None = None
- cas_rn: str | None = None
- mol_refs: Dict[str, str]
- data_refs: Dict[str, str]
- nist_public_refs: Dict[str, str]
- nist_subscription_refs: Dict[str, str]
- to_dict() Dict[str, Any]
Return the compound record as a JSON-friendly dictionary.
- Returns:
Compound metadata record.
- Return type:
dict
- class nistchempy.records.MolfileRecord(record_type: str = 'molfile', compound_id: str = '', source_url: str = '', retrieved_at: str = '', dimension: int = 0, molfile: str = '')
Bases:
RecordBaseStructured record for a downloaded MOL file.
- Parameters:
compound_id – NIST Chemistry WebBook compound ID.
dimension – MOL file dimensionality, usually 2 or 3.
molfile – Raw MOL file text.
source_url – URL used to download the MOL file.
retrieved_at – Optional retrieval timestamp.
- record_type: str = 'molfile'
- dimension: int = 0
- molfile: str = ''
- to_dict() Dict[str, Any]
Return the MOL file record as a JSON-friendly dictionary.
- Returns:
MOL file record.
- Return type:
dict
- class nistchempy.records.SpectrumRecord(record_type: str = 'spectrum', compound_id: str = '', source_url: str = '', retrieved_at: str = '', spectrum_type: str = '', spectrum_index: str = '', jdx_text: str = '', parsed: Dict | None = None)
Bases:
RecordBaseStructured record for a WebBook spectrum.
NistChemPy currently preserves raw JCAMP-DX text and does not digitize it. A future release may add optional JCAMP parsing/digitization.
- Parameters:
compound_id – NIST Chemistry WebBook compound ID.
spectrum_type – Spectrum type abbreviation, such as
IRorMS.spectrum_index – WebBook spectrum index.
jdx_text – Raw JCAMP-DX text.
parsed – Optional parsed/digitized representation. Currently unused by core NistChemPy and normally empty.
source_url – URL used to download the spectrum.
retrieved_at – Optional retrieval timestamp.
- record_type: str = 'spectrum'
- spectrum_type: str = ''
- spectrum_index: str = ''
- jdx_text: str = ''
- parsed: Dict | None = None
- to_dict(include_raw: bool = True) Dict[str, Any]
Return the spectrum record as a JSON-friendly dictionary.
- Parameters:
include_raw – If True, include raw JCAMP-DX text in the output.
- Returns:
Spectrum record.
- Return type:
dict
- class nistchempy.records.ChromatogramRecord(record_type: str = 'gas_chromatography', compound_id: str = '', source_url: str = '', retrieved_at: str = '', ri_type: str = '', column_type: str = '', temp_regime: str = '', data: DataFrame = <factory>)
Bases:
RecordBaseStructured record for a gas chromatography table.
- Parameters:
compound_id – NIST Chemistry WebBook compound ID.
ri_type – Retention-index type.
column_type – Column type.
temp_regime – Temperature regime.
data – Chromatography table as a DataFrame.
source_url – URL used to download the table.
retrieved_at – Optional retrieval timestamp.
- record_type: str = 'gas_chromatography'
- ri_type: str = ''
- column_type: str = ''
- temp_regime: str = ''
- data: DataFrame
- to_dict(orient: str = 'records') Dict[str, Any]
Return the chromatogram record as a JSON-friendly dictionary.
- Parameters:
orient – DataFrame orientation passed to
DataFrame.to_dict.- Returns:
Gas chromatography record.
- Return type:
dict
- nistchempy.records.record_to_dict(record: Any, *, include_raw: bool = True, orient: str = 'records') Dict[str, Any]
Convert one structured record-like object to a dictionary.
- Parameters:
record – Record object, object exposing
to_dict(), or mapping.include_raw – If True, include raw payloads such as JCAMP-DX text for spectrum records.
orient – DataFrame orientation used for chromatography records.
- Returns:
JSON-friendly record dictionary.
- Return type:
dict
- Raises:
TypeError – If the object cannot be converted to a dictionary.
- nistchempy.records.records_to_dicts(records: Iterable[Any], *, include_raw: bool = True, orient: str = 'records') List[Dict[str, Any]]
Convert structured records to dictionaries.
- Parameters:
records – Iterable of record objects, mappings, or objects exposing
to_dict().include_raw – If True, include raw payloads such as JCAMP-DX text for spectrum records.
orient – DataFrame orientation used for chromatography records.
- Returns:
JSON-friendly record dictionaries.
- Return type:
list[dict]
- nistchempy.records.write_records_json(records: Iterable[Any], path: str | PathLike, *, include_raw: bool = True, orient: str = 'records', indent: int = 2, ensure_ascii: bool = False) None
Write structured records to a JSON array file.
- Parameters:
records – Iterable of record objects, mappings, or objects exposing
to_dict().path – Output JSON file path.
include_raw – If True, include raw payloads such as JCAMP-DX text for spectrum records.
orient – DataFrame orientation used for chromatography records.
indent – JSON indentation level.
ensure_ascii – Passed to
json.dump.
- nistchempy.records.write_records_jsonl(records: Iterable[Any], path: str | PathLike, *, include_raw: bool = True, orient: str = 'records', ensure_ascii: bool = False) None
Write structured records to a JSON Lines file.
- Parameters:
records – Iterable of record objects, mappings, or objects exposing
to_dict().path – Output JSON Lines file path.
include_raw – If True, include raw payloads such as JCAMP-DX text for spectrum records.
orient – DataFrame orientation used for chromatography records.
ensure_ascii – Passed to
json.dumps.
Structured compound metadata records.
- class nistchempy.records.compound.CompoundRecord(record_type: str = 'compound', compound_id: str = '', source_url: str = '', retrieved_at: str = '', name: str = '', synonyms: List[str] = <factory>, formula: str | None = None, mol_weight: float | None = None, inchi: str | None = None, inchi_key: str | None = None, cas_rn: str | None = None, mol_refs: Dict[str, str]=<factory>, data_refs: Dict[str, str]=<factory>, nist_public_refs: Dict[str, str]=<factory>, nist_subscription_refs: Dict[str, str]=<factory>)
Bases:
RecordBaseStructured metadata record for a WebBook compound.
- Parameters:
compound_id – NIST Chemistry WebBook compound ID.
name – Preferred compound name.
synonyms – Alternative compound names.
formula – Molecular formula.
mol_weight – Molecular weight.
inchi – InChI string.
inchi_key – InChIKey string.
cas_rn – CAS Registry Number, when available.
mol_refs – Links to MOL file resources.
data_refs – Links to WebBook property sections.
nist_public_refs – Links to public NIST-related resources.
nist_subscription_refs – Links to subscription NIST-related resources.
source_url – URL of the source WebBook page.
retrieved_at – Optional retrieval timestamp.
- record_type: str = 'compound'
- name: str = ''
- synonyms: List[str]
- formula: str | None = None
- mol_weight: float | None = None
- inchi: str | None = None
- inchi_key: str | None = None
- cas_rn: str | None = None
- mol_refs: Dict[str, str]
- data_refs: Dict[str, str]
- nist_public_refs: Dict[str, str]
- nist_subscription_refs: Dict[str, str]
- to_dict() Dict[str, Any]
Return the compound record as a JSON-friendly dictionary.
- Returns:
Compound metadata record.
- Return type:
dict
Structured MOL file records.
- class nistchempy.records.molfile.MolfileRecord(record_type: str = 'molfile', compound_id: str = '', source_url: str = '', retrieved_at: str = '', dimension: int = 0, molfile: str = '')
Bases:
RecordBaseStructured record for a downloaded MOL file.
- Parameters:
compound_id – NIST Chemistry WebBook compound ID.
dimension – MOL file dimensionality, usually 2 or 3.
molfile – Raw MOL file text.
source_url – URL used to download the MOL file.
retrieved_at – Optional retrieval timestamp.
- record_type: str = 'molfile'
- dimension: int = 0
- molfile: str = ''
- to_dict() Dict[str, Any]
Return the MOL file record as a JSON-friendly dictionary.
- Returns:
MOL file record.
- Return type:
dict
Structured spectrum records.
- class nistchempy.records.spectra.SpectrumRecord(record_type: str = 'spectrum', compound_id: str = '', source_url: str = '', retrieved_at: str = '', spectrum_type: str = '', spectrum_index: str = '', jdx_text: str = '', parsed: Dict | None = None)
Bases:
RecordBaseStructured record for a WebBook spectrum.
NistChemPy currently preserves raw JCAMP-DX text and does not digitize it. A future release may add optional JCAMP parsing/digitization.
- Parameters:
compound_id – NIST Chemistry WebBook compound ID.
spectrum_type – Spectrum type abbreviation, such as
IRorMS.spectrum_index – WebBook spectrum index.
jdx_text – Raw JCAMP-DX text.
parsed – Optional parsed/digitized representation. Currently unused by core NistChemPy and normally empty.
source_url – URL used to download the spectrum.
retrieved_at – Optional retrieval timestamp.
- record_type: str = 'spectrum'
- spectrum_type: str = ''
- spectrum_index: str = ''
- jdx_text: str = ''
- parsed: Dict | None = None
- to_dict(include_raw: bool = True) Dict[str, Any]
Return the spectrum record as a JSON-friendly dictionary.
- Parameters:
include_raw – If True, include raw JCAMP-DX text in the output.
- Returns:
Spectrum record.
- Return type:
dict
Structured gas chromatography records.
- class nistchempy.records.chromatography.ChromatogramRecord(record_type: str = 'gas_chromatography', compound_id: str = '', source_url: str = '', retrieved_at: str = '', ri_type: str = '', column_type: str = '', temp_regime: str = '', data: DataFrame = <factory>)
Bases:
RecordBaseStructured record for a gas chromatography table.
- Parameters:
compound_id – NIST Chemistry WebBook compound ID.
ri_type – Retention-index type.
column_type – Column type.
temp_regime – Temperature regime.
data – Chromatography table as a DataFrame.
source_url – URL used to download the table.
retrieved_at – Optional retrieval timestamp.
- record_type: str = 'gas_chromatography'
- ri_type: str = ''
- column_type: str = ''
- temp_regime: str = ''
- data: DataFrame
- to_dict(orient: str = 'records') Dict[str, Any]
Return the chromatogram record as a JSON-friendly dictionary.
- Parameters:
orient – DataFrame orientation passed to
DataFrame.to_dict.- Returns:
Gas chromatography record.
- Return type:
dict
Input/output helpers for structured NistChemPy records.
- nistchempy.records.io.record_to_dict(record: Any, *, include_raw: bool = True, orient: str = 'records') Dict[str, Any]
Convert one structured record-like object to a dictionary.
- Parameters:
record – Record object, object exposing
to_dict(), or mapping.include_raw – If True, include raw payloads such as JCAMP-DX text for spectrum records.
orient – DataFrame orientation used for chromatography records.
- Returns:
JSON-friendly record dictionary.
- Return type:
dict
- Raises:
TypeError – If the object cannot be converted to a dictionary.
- nistchempy.records.io.records_to_dicts(records: Iterable[Any], *, include_raw: bool = True, orient: str = 'records') List[Dict[str, Any]]
Convert structured records to dictionaries.
- Parameters:
records – Iterable of record objects, mappings, or objects exposing
to_dict().include_raw – If True, include raw payloads such as JCAMP-DX text for spectrum records.
orient – DataFrame orientation used for chromatography records.
- Returns:
JSON-friendly record dictionaries.
- Return type:
list[dict]
- nistchempy.records.io.write_records_json(records: Iterable[Any], path: str | PathLike, *, include_raw: bool = True, orient: str = 'records', indent: int = 2, ensure_ascii: bool = False) None
Write structured records to a JSON array file.
- Parameters:
records – Iterable of record objects, mappings, or objects exposing
to_dict().path – Output JSON file path.
include_raw – If True, include raw payloads such as JCAMP-DX text for spectrum records.
orient – DataFrame orientation used for chromatography records.
indent – JSON indentation level.
ensure_ascii – Passed to
json.dump.
- nistchempy.records.io.write_records_jsonl(records: Iterable[Any], path: str | PathLike, *, include_raw: bool = True, orient: str = 'records', ensure_ascii: bool = False) None
Write structured records to a JSON Lines file.
- Parameters:
records – Iterable of record objects, mappings, or objects exposing
to_dict().path – Output JSON Lines file path.
include_raw – If True, include raw payloads such as JCAMP-DX text for spectrum records.
orient – DataFrame orientation used for chromatography records.
ensure_ascii – Passed to
json.dumps.
Local indexes
Public local WebBook index namespace.
Requests
Request wrappers for NIST Chemistry WebBook APIs
- nistchempy.requests.BASE_URL
base URL of the NIST Chemistry WebBook database
- Type:
str
- nistchempy.requests.SEARCH_URL
relative URL for the search API
- Type:
str
- nistchempy.requests.INCHI_URL
relative URL for obtaining NIST compounds via InChI
- Type:
str
- class nistchempy.requests.RequestConfig(delay: float = 0.0, max_attempts: int | None = 1, kwargs: dict = <factory>)
Bases:
objectConfiguration for NIST Chemistry WebBook HTTP requests.
- delay
Time delay in seconds after receiving a response from NIST.
- Type:
float
- max_attempts
Number of request attempts. Values greater than one enable retrying after request errors or non-OK responses.
- Type:
int | None
- kwargs
Extra keyword arguments passed to the underlying requests function. The mapping is copied during initialization.
- Type:
dict
- delay: float = 0.0
- max_attempts: int | None = 1
- kwargs: dict
- nistchempy.requests.fix_html(html: str) str
Fixes detected typos in html code of NIST Chem WebBook web pages
- Parameters:
html (str) – text of html-file
- Returns:
fixed html-file
- Return type:
str
- class nistchempy.requests.NistResponse(response: Response)
Bases:
objectDescribes response to the GET request to the NIST Chemistry WebBook
- response
request’s response
- Type:
_requests.models.Response
- ok
True if request’s status code is less than 400
- Type:
bool
- content_type
content type of the response
- Type:
_tp.Optional[str]
- text
text of the response
- Type:
_tp.Optional[str]
- soup
BeautifulSoup object of the html response
- Type:
_tp.Optional[_bs4.BeautifulSoup]
- response: Response
- ok: bool
- content_type: str | None
- text: str | None
- soup: BeautifulSoup | None = None
- nistchempy.requests.make_nist_request(url: str, params: dict | None = None, config: RequestConfig | None = None) NistResponse
Dummy GET request to the NIST Chemistry WebBook
- Parameters:
url (str) – URL of the NIST webpage
params (dict) – GET request parameters
config (_tp.Optional[RequestConfig]) – additional requests.get parameters
- Returns:
wrapper for the request’s response
- Return type:
NistResponse
- nistchempy.requests.make_nist_post_request(url: str, data: dict | None = None, json: dict | None = None, files: dict | None = None, config: RequestConfig | None = None) NistResponse
POST request to the NIST Chemistry WebBook
- Parameters:
url (str) – URL of the NIST webpage
data (dict) – POST data object to send in the body of the request
json (dict) – JSON serializable object to send in the body of the request
files (dict) – POST qwarg to send files in the body of the request
config (_tp.Optional[RequestConfig]) – additional requests.post parameters
- Returns:
wrapper for the request’s response
- Return type:
NistResponse
Parsing helpers
The parsing modules are useful for contributors and advanced users who need to inspect or adapt WebBook HTML parsing behavior.
Functionality to parse NIST Chemistry WebBook compound pages.
- nistchempy.parsing.compound.get_compound_header(soup: BeautifulSoup | None)
Return the main compound-page header, if present.
- Parameters:
soup – Parsed HTML page.
- Returns:
BeautifulSoup tag for the compound header, or None.
- nistchempy.parsing.compound.get_compound_info_list(soup: BeautifulSoup | None)
Return the main compound metadata list, if present.
- Parameters:
soup – Parsed HTML page.
- Returns:
BeautifulSoup tag for the main metadata list, or None.
- nistchempy.parsing.compound.get_found_compounds(soup: BeautifulSoup | None) dict
Extract IDs of found compounds for NIST Chemistry WebBook search.
- Parameters:
soup – Parsed search-result page.
- Returns:
Dictionary with
IDsandlostentries.
- nistchempy.parsing.compound.is_compound_page(soup: BeautifulSoup | None) bool
Check whether an HTML page is a single compound page.
- Parameters:
soup – Parsed HTML page.
- Returns:
True when the page looks like a single compound page.
- nistchempy.parsing.compound.get_compound_id_from_comment(soup: BeautifulSoup | None) str | None
Extract compound ID from commented fields in the Notes section.
- Parameters:
soup – Parsed compound page.
- Returns:
NIST compound ID, or None.
- nistchempy.parsing.compound.get_compound_id_from_units_switch(soup: BeautifulSoup | None) str | None
Extract compound ID from the energy-units switch URL.
- Parameters:
soup – Parsed compound page.
- Returns:
NIST compound ID, or None.
- nistchempy.parsing.compound.get_compound_id_from_data_refs(soup: BeautifulSoup | None) str | None
Extract compound ID from URLs to compound data sections.
- Parameters:
soup – Parsed compound page.
- Returns:
NIST compound ID, or None.
- nistchempy.parsing.compound.get_compound_id(soup: BeautifulSoup | None) str | None
Return the NIST compound ID for a single compound page.
- Parameters:
soup – Parsed compound page.
- Returns:
NIST compound ID, or None.
- nistchempy.parsing.compound.get_compound_name(soup: BeautifulSoup | None) str | None
Extract chemical name from a compound page.
- Parameters:
soup – Parsed compound page.
- Returns:
Chemical name, or None.
- nistchempy.parsing.compound.get_compound_synonyms(soup: BeautifulSoup | None) List[str]
Extract compound synonyms from a compound page.
- Parameters:
soup – Parsed compound page.
- Returns:
Alternative chemical names.
- nistchempy.parsing.compound.get_compound_formula(soup: BeautifulSoup | None) str | None
Extract chemical formula from a compound page.
- Parameters:
soup – Parsed compound page.
- Returns:
Chemical formula, or None.
- nistchempy.parsing.compound.get_compound_mol_weight(soup: BeautifulSoup | None) float | None
Extract molecular weight from a compound page.
- Parameters:
soup – Parsed compound page.
- Returns:
Molecular weight, or None.
- nistchempy.parsing.compound.get_compound_inchi(soup: BeautifulSoup | None) str | None
Extract InChI from a compound page.
- Parameters:
soup – Parsed compound page.
- Returns:
InChI string, or None.
- nistchempy.parsing.compound.get_compound_inchi_key(soup: BeautifulSoup | None) str | None
Extract InChIKey from a compound page.
- Parameters:
soup – Parsed compound page.
- Returns:
InChIKey string, or None.
- nistchempy.parsing.compound.get_compound_casrn(soup: BeautifulSoup | None) str | None
Extract CAS Registry Number from a compound page.
- Parameters:
soup – Parsed compound page.
- Returns:
CAS RN, or None.
- nistchempy.parsing.compound.get_compound_mol_refs(soup: BeautifulSoup | None) Dict[str, str]
Extract URLs for available MOL files from a compound page.
- Parameters:
soup – Parsed compound page.
- Returns:
Mapping from
mol2D/mol3Dkeys to URLs.
- nistchempy.parsing.compound.get_compound_data_refs(soup: BeautifulSoup | None) Dict[str, str]
Extract URLs for available compound data sections.
- Parameters:
soup – Parsed compound page.
- Returns:
Mapping from WebBook section keys/names to URLs.
- nistchempy.parsing.compound.get_compound_nist_public_refs(soup: BeautifulSoup | None) Dict[str, str]
Extract URLs for compound data at other public NIST sites.
- Parameters:
soup – Parsed compound page.
- Returns:
Mapping from source names to URLs.
- nistchempy.parsing.compound.get_compound_nist_subscription_refs(soup: BeautifulSoup | None) Dict[str, str]
Extract URLs for compound data at subscription NIST sites.
- Parameters:
soup – Parsed compound page.
- Returns:
Mapping from source names to URLs.
- nistchempy.parsing.compound.parse_compound_page(soup: BeautifulSoup | None) dict | None
Parse a single compound page.
- Parameters:
soup – Parsed compound page.
- Returns:
Extracted compound information, or None if the page is not a single compound page.
The module contains functionality to parse gas chromatography info
- nistchempy.parsing.gas_chromatography.get_chromatography_table_refs(soup: BeautifulSoup | None) List[str]
Extracts references to large format tables containing info on chromatographic experiments
- Parameters:
soup (_bs4.BeautifulSoup) – bs4-parsed web-page
- Returns:
list of URLs
- Return type:
_tp.List[str]
- nistchempy.parsing.gas_chromatography.get_literature_references(soup: BeautifulSoup | None) Dict[str, str]
Extracts literature references from the corresponding section
- Parameters:
soup (_bs4.BeautifulSoup) – bs4-parsed web-page
- Returns:
ref’s span id => full reference text
- Return type:
_tp.Dict
- nistchempy.parsing.gas_chromatography.parse_chromatography_table(soup: BeautifulSoup) dict
Extracts references to large format tables containing info on chromatographic experiments
- Parameters:
soup (_bs4.BeautifulSoup) – bs4-parsed web-page
- Returns:
contains info to initialize nistchempy.compound.Chromatogram
- Return type:
dict
Utilities and exceptions
Utility functions
- nistchempy.utils.get_crawl_delay(useragent: str = '*', config: RequestConfig | None = None) float
Returns NIST Chemistry Webbook’s crawl delay for the given user agent
- nistchempy.utils.useragent
user agent
- Type:
str
- Returns:
crawl delay in seconds
- Return type:
float
- nistchempy.utils.safe_filename(text: str, replacement: str = '_') str
Return a filesystem-friendly filename fragment.
- Parameters:
text – Input text to sanitize.
replacement – Replacement character for unsafe characters.
- Returns:
Sanitized filename. Empty results are returned as
'file'.- Return type:
str
NistChemPy exceptions.
- exception nistchempy.exceptions.NistChemPyError
Bases:
ExceptionBase exception for NistChemPy errors.
- exception nistchempy.exceptions.NistChemPyIndexNotFoundError
Bases:
NistChemPyErrorRaised when a user-local WebBook index is not available.
- exception nistchempy.exceptions.NistChemPyIndexError
Bases:
NistChemPyErrorRaised when a user-local WebBook index is invalid.
- exception nistchempy.exceptions.NistChemPyIndexBuildError
Bases:
NistChemPyIndexErrorRaised when a user-local WebBook index build fails.
- exception nistchempy.exceptions.NistChemPyDataTermsError
Bases:
NistChemPyIndexBuildErrorRaised when local index creation lacks explicit acknowledgement.
- exception nistchempy.exceptions.NistChemPyOptionalDependencyError
Bases:
NistChemPyErrorRaised when an optional dependency is required but unavailable.