NistChemPy API

The public API is organized around four tasks: live WebBook search, compound page parsing, user-local index access, and low-level request/parsing helpers. Internal local-index implementation modules under nistchempy.indexing are not documented here as public API.

Top-level package

Unofficial tools for querying NIST Chemistry WebBook pages.

NistChemPy extracts selected molecular-property records from Chemistry WebBook pages for research workflows. It is not affiliated with, maintained by, or endorsed by NIST.

Compounds

The module contains compound-related functionality

nistchempy.compound.SPEC_TYPES

dictionary containing abbreviations for spectra types used in compound page (keys) or urls for downloading JDX-files (values)

Type:

dict

class nistchempy.compound.Spectrum(compound: NistCompound, spec_type: str, spec_idx: str, jdx_text: str, source_url: str = '')

Bases: object

Wrapper for IR, MS, and UV-Vis extracted from NIST Chemistry WebBook

compound

parent NistCompound object

Type:

NistCompound

spec_type

IR / TZ (THz IR) / MS / UV (UV-Vis)

Type:

str

spec_idx

index of the spectrum

Type:

str

jdx_text

text block of the corresponding JDX-file

Type:

str

compound: NistCompound
spec_type: str
spec_idx: str
jdx_text: str
source_url: str = ''
to_record() SpectrumRecord

Return this spectrum as a structured record.

Returns:

JSON-like spectrum record.

Return type:

SpectrumRecord

to_dict(include_raw: bool = True) dict

Return this spectrum as a JSON-friendly dictionary.

Parameters:

include_raw – If True, include raw JCAMP-DX text.

Returns:

Structured spectrum data.

Return type:

dict

save(name: str = None, path_dir: str = None) None

Saves spectrum in JDX format

name

custom filename (default name is formed from compound ID, spectrum type and index)

Type:

str

path_dir

directory where output file will be saved

Type:

str

class nistchempy.compound.Chromatogram(compound: NistCompound, ri_type: str, column_type: str, temp_regime: str, data: DataFrame, source_url: str = '')

Bases: object

Wrapper chromatography data extracted from NIST Chemistry WebBook

compound

parent NistCompound object

Type:

NistCompound

ri_type

type of retention index: Kovatz, van den Dool & Kratz, etc.

Type:

str

column_type

polar / non-polar

Type:

str

temp_regime

temperature regime: isothermal / ramp / custom

Type:

str

data

experimental data

Type:

_pd.core.frame.DataFrame

compound: NistCompound
ri_type: str
column_type: str
temp_regime: str
data: DataFrame
source_url: str = ''
to_record() ChromatogramRecord

Return this chromatogram as a structured record.

Returns:

JSON-like gas chromatography record.

Return type:

ChromatogramRecord

to_dict(orient: str = 'records') dict

Return this chromatogram as a JSON-friendly dictionary.

Parameters:

orient – DataFrame orientation passed to DataFrame.to_dict.

Returns:

Structured gas chromatography data.

Return type:

dict

save(name: str = None, path_dir: str = None, **kwargs) None

Saves chromatograms in CSV format

name

custom filename (default name is formed from compound ID, spectrum type and index)

Type:

str

path_dir

directory where output file will be saved

Type:

str

kwargs

parameters for pandas DataFrame to_csv method

class nistchempy.compound.NistCompound(_request_config: RequestConfig, _nist_response: NistResponse, ID: str | None, name: str | None, synonyms: List[str], formula: str | None, mol_weight: float | None, inchi: str | None, inchi_key: str | None, cas_rn: str | None, mol_refs: Dict[str, str], data_refs: Dict[str, str], nist_public_refs: Dict[str, str], nist_subscription_refs: Dict[str, str])

Bases: object

Stores info on NIST Chemistry WebBook compound

_request_config

additional requests.get parameters

Type:

_ncpr.RequestConfig

_nist_response

response to the GET request

Type:

_ncpr.NistResponse

ID

NIST compound ID

Type:

_tp.Optional[str]

name

chemical name

Type:

_tp.Optional[str]

synonyms

synonyms of the chemical name

Type:

_tp.List[str]

formula

chemical formula

Type:

_tp.Optional[str]

mol_weight

molecular weigth, g/cm^3

Type:

_tp.Optional[float]

inchi

InChI string

Type:

_tp.Optional[str]

inchi_key

InChI key string

Type:

_tp.Optional[str]

cas_rn

CAS registry number

Type:

_tp.Optional[str]

mol_refs

references to 2D and 3D MOL-files

Type:

_tp.Dict[str, str]

data_refs

references to the webpages containing physical chemical data for the given compound

Type:

_tp.Dict[str, str]

nist_public_refs

references to webpages of other public NIST databases containing data for the given compound

Type:

_tp.Dict[str, str]

nist_subscription_refs

references to webpages of subscription NIST databases containing data for the given compound

Type:

_tp.Dict[str, str]

mol2D

text block of a MOL-file containing 2D atomic coordinates

Type:

_tp.Optional[str]

mol3D

text block of a MOL-file containing 3D atomic coordinates

Type:

_tp.Optional[str]

ir_specs

list pf IR Spectrum objects

Type:

_tp.List[Spectrum]

thz_specs

list pf THz Spectrum objects

Type:

_tp.List[Spectrum]

ms_specs

list pf MS Spectrum objects

Type:

_tp.List[Spectrum]

uv_specs

list pf UV-Vis Spectrum objects

Type:

_tp.List[Spectrum]

gas_chromat

list of Chromatogram objects

Type:

_tp.List[Chromatogram]

ID: str | None
name: str | None
synonyms: List[str]
formula: str | None
mol_weight: float | None
inchi: str | None
inchi_key: str | None
cas_rn: str | None
mol_refs: Dict[str, str]
data_refs: Dict[str, str]
nist_public_refs: Dict[str, str]
nist_subscription_refs: Dict[str, str]
mol2D: str | None
mol3D: str | None
ir_specs: List[Spectrum]
thz_specs: List[Spectrum]
ms_specs: List[Spectrum]
uv_specs: List[Spectrum]
gas_chromat: List[Chromatogram]
to_record() CompoundRecord

Return compound metadata as a structured record.

Returns:

JSON-like compound metadata record.

Return type:

CompoundRecord

to_dict() dict

Return compound metadata as a JSON-friendly dictionary.

Returns:

Structured compound metadata.

Return type:

dict

iter_records()

Yield structured records for metadata and already loaded data.

This method does not download additional WebBook pages. It only returns records for compound metadata and properties that have already been loaded into the object.

Yields:

RecordBase – Compound, MOL file, spectrum, or chromatography record.

to_records() List

Return records for metadata and already loaded data.

Returns:

Structured records yielded by iter_records.

Return type:

list

get_molfile(dim: int) str | None

Loads text block of 2D / 3D molfile

Parameters:

dim (int) – dimensionality of molfile (2D / 3D)

Returns:

Downloaded MOL file text, or None if unavailable.

Return type:

str | None

get_mol2D() str | None

Loads text block of 2D molfile

get_mol3D() str | None

Loads text block of 3D molfile

get_molfiles() Dict[str, str | None]

Loads text block of all available molfiles

get_spectrum(spec_type: str, spec_idx: str) Spectrum

Loads spectrum of given type (IR / TZ / MS / UV) and index

Parameters:
  • spec_type (str) – spectrum type [ IR / TZ / MS / UV ]

  • spec_idx (str) – spectrum index

Returns:

wrapper for the text block of JDX-formatted spectrum

Return type:

Spectrum

get_spectra(spec_type: str) List[Spectrum] | None

Loads all available spectra of given type (IR / TZ / MS / UV)

Parameters:

spec_type (str) – spectrum type [ IR / TZ / MS / UV ]

get_ir_spectra() List[Spectrum] | None

Loads all available IR spectra

get_thz_spectra() List[Spectrum] | None

Loads all available THz spectra

get_ms_spectra() List[Spectrum] | None

Loads all available MS spectra

get_uv_spectra() List[Spectrum] | None

Loads all available UV-Vis spectra

get_all_spectra() dict

Loads all available spectra

save_spectra(spec_type: str, path_dir: str = './') None

Saves all spectra of given type to the specified folder

Parameters:
  • spec_type (str) – spectrum type [ IR / TZ / MS / UV ]

  • path_dir (str) – directory to save spectra

save_ir_spectra(path_dir: str = './') None

Saves IR spectra to the specified folder

Parameters:

path_dir (str) – directory to save spectra

save_thz_spectra(path_dir: str = './') None

Saves IR spectra to the specified folder

Parameters:

path_dir (str) – directory to save spectra

save_ms_spectra(path_dir: str = './') None

Saves mass spectra to the specified folder

Parameters:

path_dir (str) – directory to save spectra

save_uv_spectra(path_dir: str = './') None

Saves all UV-Vis spectra to the specified folder

Parameters:

path_dir (str) – directory to save spectra

save_all_spectra(path_dir: str = './') None

Saves all UV-Vis spectra to the specified folder

Parameters:

path_dir (str) – directory to save spectra

get_gas_chromatography() List[Chromatogram] | None

Loads info on gas chromatography

save_gas_chromatography(path_dir: str = './', **kwargs) None

Saves all tables with data on gas chromatohraphy experiments

Parameters:

path_dir (str) – directory to save spectra

nistchempy.compound.compound_from_response(nr: NistResponse, request_config: RequestConfig | None = None) NistCompound | None

Initializes NistCompound object from the corresponding response

Parameters:
  • nr (_ncpr.NistResponse) – response to the GET request for a compound

  • request_config (_tp.Optional[_ncpr.RequestConfig]) – additional requests.get parameters

Returns:

NistCompound object, and None if there are several compounds corresponding to the given ID

Return type:

_tp.Optional[NistCompound]

nistchempy.compound.get_compound(ID: str, request_config: RequestConfig | None = None) NistCompound | None

Loads the main info on the given NIST compound

Parameters:
  • ID (str) – NIST compound ID, CAS RN or InChI

  • request_config (_tp.Optional[_ncpr.RequestConfig]) – additional requests.get parameters

Returns:

NistCompound object, and None if there are several compounds corresponding to the given ID

Return type:

_tp.Optional[NistCompound]

Structured records

Structured JSON-like records returned by NistChemPy.

class nistchempy.records.CompoundRecord(record_type: str = 'compound', compound_id: str = '', source_url: str = '', retrieved_at: str = '', name: str = '', synonyms: List[str] = <factory>, formula: str | None = None, mol_weight: float | None = None, inchi: str | None = None, inchi_key: str | None = None, cas_rn: str | None = None, mol_refs: Dict[str, str]=<factory>, data_refs: Dict[str, str]=<factory>, nist_public_refs: Dict[str, str]=<factory>, nist_subscription_refs: Dict[str, str]=<factory>)

Bases: RecordBase

Structured metadata record for a WebBook compound.

Parameters:
  • compound_id – NIST Chemistry WebBook compound ID.

  • name – Preferred compound name.

  • synonyms – Alternative compound names.

  • formula – Molecular formula.

  • mol_weight – Molecular weight.

  • inchi – InChI string.

  • inchi_key – InChIKey string.

  • cas_rn – CAS Registry Number, when available.

  • mol_refs – Links to MOL file resources.

  • data_refs – Links to WebBook property sections.

  • nist_public_refs – Links to public NIST-related resources.

  • nist_subscription_refs – Links to subscription NIST-related resources.

  • source_url – URL of the source WebBook page.

  • retrieved_at – Optional retrieval timestamp.

record_type: str = 'compound'
name: str = ''
synonyms: List[str]
formula: str | None = None
mol_weight: float | None = None
inchi: str | None = None
inchi_key: str | None = None
cas_rn: str | None = None
mol_refs: Dict[str, str]
data_refs: Dict[str, str]
nist_public_refs: Dict[str, str]
nist_subscription_refs: Dict[str, str]
to_dict() Dict[str, Any]

Return the compound record as a JSON-friendly dictionary.

Returns:

Compound metadata record.

Return type:

dict

class nistchempy.records.MolfileRecord(record_type: str = 'molfile', compound_id: str = '', source_url: str = '', retrieved_at: str = '', dimension: int = 0, molfile: str = '')

Bases: RecordBase

Structured record for a downloaded MOL file.

Parameters:
  • compound_id – NIST Chemistry WebBook compound ID.

  • dimension – MOL file dimensionality, usually 2 or 3.

  • molfile – Raw MOL file text.

  • source_url – URL used to download the MOL file.

  • retrieved_at – Optional retrieval timestamp.

record_type: str = 'molfile'
dimension: int = 0
molfile: str = ''
to_dict() Dict[str, Any]

Return the MOL file record as a JSON-friendly dictionary.

Returns:

MOL file record.

Return type:

dict

class nistchempy.records.SpectrumRecord(record_type: str = 'spectrum', compound_id: str = '', source_url: str = '', retrieved_at: str = '', spectrum_type: str = '', spectrum_index: str = '', jdx_text: str = '', parsed: Dict | None = None)

Bases: RecordBase

Structured record for a WebBook spectrum.

NistChemPy currently preserves raw JCAMP-DX text and does not digitize it. A future release may add optional JCAMP parsing/digitization.

Parameters:
  • compound_id – NIST Chemistry WebBook compound ID.

  • spectrum_type – Spectrum type abbreviation, such as IR or MS.

  • spectrum_index – WebBook spectrum index.

  • jdx_text – Raw JCAMP-DX text.

  • parsed – Optional parsed/digitized representation. Currently unused by core NistChemPy and normally empty.

  • source_url – URL used to download the spectrum.

  • retrieved_at – Optional retrieval timestamp.

record_type: str = 'spectrum'
spectrum_type: str = ''
spectrum_index: str = ''
jdx_text: str = ''
parsed: Dict | None = None
to_dict(include_raw: bool = True) Dict[str, Any]

Return the spectrum record as a JSON-friendly dictionary.

Parameters:

include_raw – If True, include raw JCAMP-DX text in the output.

Returns:

Spectrum record.

Return type:

dict

class nistchempy.records.ChromatogramRecord(record_type: str = 'gas_chromatography', compound_id: str = '', source_url: str = '', retrieved_at: str = '', ri_type: str = '', column_type: str = '', temp_regime: str = '', data: DataFrame = <factory>)

Bases: RecordBase

Structured record for a gas chromatography table.

Parameters:
  • compound_id – NIST Chemistry WebBook compound ID.

  • ri_type – Retention-index type.

  • column_type – Column type.

  • temp_regime – Temperature regime.

  • data – Chromatography table as a DataFrame.

  • source_url – URL used to download the table.

  • retrieved_at – Optional retrieval timestamp.

record_type: str = 'gas_chromatography'
ri_type: str = ''
column_type: str = ''
temp_regime: str = ''
data: DataFrame
to_dict(orient: str = 'records') Dict[str, Any]

Return the chromatogram record as a JSON-friendly dictionary.

Parameters:

orient – DataFrame orientation passed to DataFrame.to_dict.

Returns:

Gas chromatography record.

Return type:

dict

nistchempy.records.record_to_dict(record: Any, *, include_raw: bool = True, orient: str = 'records') Dict[str, Any]

Convert one structured record-like object to a dictionary.

Parameters:
  • record – Record object, object exposing to_dict(), or mapping.

  • include_raw – If True, include raw payloads such as JCAMP-DX text for spectrum records.

  • orient – DataFrame orientation used for chromatography records.

Returns:

JSON-friendly record dictionary.

Return type:

dict

Raises:

TypeError – If the object cannot be converted to a dictionary.

nistchempy.records.records_to_dicts(records: Iterable[Any], *, include_raw: bool = True, orient: str = 'records') List[Dict[str, Any]]

Convert structured records to dictionaries.

Parameters:
  • records – Iterable of record objects, mappings, or objects exposing to_dict().

  • include_raw – If True, include raw payloads such as JCAMP-DX text for spectrum records.

  • orient – DataFrame orientation used for chromatography records.

Returns:

JSON-friendly record dictionaries.

Return type:

list[dict]

nistchempy.records.write_records_json(records: Iterable[Any], path: str | PathLike, *, include_raw: bool = True, orient: str = 'records', indent: int = 2, ensure_ascii: bool = False) None

Write structured records to a JSON array file.

Parameters:
  • records – Iterable of record objects, mappings, or objects exposing to_dict().

  • path – Output JSON file path.

  • include_raw – If True, include raw payloads such as JCAMP-DX text for spectrum records.

  • orient – DataFrame orientation used for chromatography records.

  • indent – JSON indentation level.

  • ensure_ascii – Passed to json.dump.

nistchempy.records.write_records_jsonl(records: Iterable[Any], path: str | PathLike, *, include_raw: bool = True, orient: str = 'records', ensure_ascii: bool = False) None

Write structured records to a JSON Lines file.

Parameters:
  • records – Iterable of record objects, mappings, or objects exposing to_dict().

  • path – Output JSON Lines file path.

  • include_raw – If True, include raw payloads such as JCAMP-DX text for spectrum records.

  • orient – DataFrame orientation used for chromatography records.

  • ensure_ascii – Passed to json.dumps.

Structured compound metadata records.

class nistchempy.records.compound.CompoundRecord(record_type: str = 'compound', compound_id: str = '', source_url: str = '', retrieved_at: str = '', name: str = '', synonyms: List[str] = <factory>, formula: str | None = None, mol_weight: float | None = None, inchi: str | None = None, inchi_key: str | None = None, cas_rn: str | None = None, mol_refs: Dict[str, str]=<factory>, data_refs: Dict[str, str]=<factory>, nist_public_refs: Dict[str, str]=<factory>, nist_subscription_refs: Dict[str, str]=<factory>)

Bases: RecordBase

Structured metadata record for a WebBook compound.

Parameters:
  • compound_id – NIST Chemistry WebBook compound ID.

  • name – Preferred compound name.

  • synonyms – Alternative compound names.

  • formula – Molecular formula.

  • mol_weight – Molecular weight.

  • inchi – InChI string.

  • inchi_key – InChIKey string.

  • cas_rn – CAS Registry Number, when available.

  • mol_refs – Links to MOL file resources.

  • data_refs – Links to WebBook property sections.

  • nist_public_refs – Links to public NIST-related resources.

  • nist_subscription_refs – Links to subscription NIST-related resources.

  • source_url – URL of the source WebBook page.

  • retrieved_at – Optional retrieval timestamp.

record_type: str = 'compound'
name: str = ''
synonyms: List[str]
formula: str | None = None
mol_weight: float | None = None
inchi: str | None = None
inchi_key: str | None = None
cas_rn: str | None = None
mol_refs: Dict[str, str]
data_refs: Dict[str, str]
nist_public_refs: Dict[str, str]
nist_subscription_refs: Dict[str, str]
to_dict() Dict[str, Any]

Return the compound record as a JSON-friendly dictionary.

Returns:

Compound metadata record.

Return type:

dict

Structured MOL file records.

class nistchempy.records.molfile.MolfileRecord(record_type: str = 'molfile', compound_id: str = '', source_url: str = '', retrieved_at: str = '', dimension: int = 0, molfile: str = '')

Bases: RecordBase

Structured record for a downloaded MOL file.

Parameters:
  • compound_id – NIST Chemistry WebBook compound ID.

  • dimension – MOL file dimensionality, usually 2 or 3.

  • molfile – Raw MOL file text.

  • source_url – URL used to download the MOL file.

  • retrieved_at – Optional retrieval timestamp.

record_type: str = 'molfile'
dimension: int = 0
molfile: str = ''
to_dict() Dict[str, Any]

Return the MOL file record as a JSON-friendly dictionary.

Returns:

MOL file record.

Return type:

dict

Structured spectrum records.

class nistchempy.records.spectra.SpectrumRecord(record_type: str = 'spectrum', compound_id: str = '', source_url: str = '', retrieved_at: str = '', spectrum_type: str = '', spectrum_index: str = '', jdx_text: str = '', parsed: Dict | None = None)

Bases: RecordBase

Structured record for a WebBook spectrum.

NistChemPy currently preserves raw JCAMP-DX text and does not digitize it. A future release may add optional JCAMP parsing/digitization.

Parameters:
  • compound_id – NIST Chemistry WebBook compound ID.

  • spectrum_type – Spectrum type abbreviation, such as IR or MS.

  • spectrum_index – WebBook spectrum index.

  • jdx_text – Raw JCAMP-DX text.

  • parsed – Optional parsed/digitized representation. Currently unused by core NistChemPy and normally empty.

  • source_url – URL used to download the spectrum.

  • retrieved_at – Optional retrieval timestamp.

record_type: str = 'spectrum'
spectrum_type: str = ''
spectrum_index: str = ''
jdx_text: str = ''
parsed: Dict | None = None
to_dict(include_raw: bool = True) Dict[str, Any]

Return the spectrum record as a JSON-friendly dictionary.

Parameters:

include_raw – If True, include raw JCAMP-DX text in the output.

Returns:

Spectrum record.

Return type:

dict

Structured gas chromatography records.

class nistchempy.records.chromatography.ChromatogramRecord(record_type: str = 'gas_chromatography', compound_id: str = '', source_url: str = '', retrieved_at: str = '', ri_type: str = '', column_type: str = '', temp_regime: str = '', data: DataFrame = <factory>)

Bases: RecordBase

Structured record for a gas chromatography table.

Parameters:
  • compound_id – NIST Chemistry WebBook compound ID.

  • ri_type – Retention-index type.

  • column_type – Column type.

  • temp_regime – Temperature regime.

  • data – Chromatography table as a DataFrame.

  • source_url – URL used to download the table.

  • retrieved_at – Optional retrieval timestamp.

record_type: str = 'gas_chromatography'
ri_type: str = ''
column_type: str = ''
temp_regime: str = ''
data: DataFrame
to_dict(orient: str = 'records') Dict[str, Any]

Return the chromatogram record as a JSON-friendly dictionary.

Parameters:

orient – DataFrame orientation passed to DataFrame.to_dict.

Returns:

Gas chromatography record.

Return type:

dict

Input/output helpers for structured NistChemPy records.

nistchempy.records.io.record_to_dict(record: Any, *, include_raw: bool = True, orient: str = 'records') Dict[str, Any]

Convert one structured record-like object to a dictionary.

Parameters:
  • record – Record object, object exposing to_dict(), or mapping.

  • include_raw – If True, include raw payloads such as JCAMP-DX text for spectrum records.

  • orient – DataFrame orientation used for chromatography records.

Returns:

JSON-friendly record dictionary.

Return type:

dict

Raises:

TypeError – If the object cannot be converted to a dictionary.

nistchempy.records.io.records_to_dicts(records: Iterable[Any], *, include_raw: bool = True, orient: str = 'records') List[Dict[str, Any]]

Convert structured records to dictionaries.

Parameters:
  • records – Iterable of record objects, mappings, or objects exposing to_dict().

  • include_raw – If True, include raw payloads such as JCAMP-DX text for spectrum records.

  • orient – DataFrame orientation used for chromatography records.

Returns:

JSON-friendly record dictionaries.

Return type:

list[dict]

nistchempy.records.io.write_records_json(records: Iterable[Any], path: str | PathLike, *, include_raw: bool = True, orient: str = 'records', indent: int = 2, ensure_ascii: bool = False) None

Write structured records to a JSON array file.

Parameters:
  • records – Iterable of record objects, mappings, or objects exposing to_dict().

  • path – Output JSON file path.

  • include_raw – If True, include raw payloads such as JCAMP-DX text for spectrum records.

  • orient – DataFrame orientation used for chromatography records.

  • indent – JSON indentation level.

  • ensure_ascii – Passed to json.dump.

nistchempy.records.io.write_records_jsonl(records: Iterable[Any], path: str | PathLike, *, include_raw: bool = True, orient: str = 'records', ensure_ascii: bool = False) None

Write structured records to a JSON Lines file.

Parameters:
  • records – Iterable of record objects, mappings, or objects exposing to_dict().

  • path – Output JSON Lines file path.

  • include_raw – If True, include raw payloads such as JCAMP-DX text for spectrum records.

  • orient – DataFrame orientation used for chromatography records.

  • ensure_ascii – Passed to json.dumps.

Local indexes

Public local WebBook index namespace.

Requests

Request wrappers for NIST Chemistry WebBook APIs

nistchempy.requests.BASE_URL

base URL of the NIST Chemistry WebBook database

Type:

str

nistchempy.requests.SEARCH_URL

relative URL for the search API

Type:

str

nistchempy.requests.INCHI_URL

relative URL for obtaining NIST compounds via InChI

Type:

str

class nistchempy.requests.RequestConfig(delay: float = 0.0, max_attempts: int | None = 1, kwargs: dict = <factory>)

Bases: object

Configuration for NIST Chemistry WebBook HTTP requests.

delay

Time delay in seconds after receiving a response from NIST.

Type:

float

max_attempts

Number of request attempts. Values greater than one enable retrying after request errors or non-OK responses.

Type:

int | None

kwargs

Extra keyword arguments passed to the underlying requests function. The mapping is copied during initialization.

Type:

dict

delay: float = 0.0
max_attempts: int | None = 1
kwargs: dict
nistchempy.requests.fix_html(html: str) str

Fixes detected typos in html code of NIST Chem WebBook web pages

Parameters:

html (str) – text of html-file

Returns:

fixed html-file

Return type:

str

class nistchempy.requests.NistResponse(response: Response)

Bases: object

Describes response to the GET request to the NIST Chemistry WebBook

response

request’s response

Type:

_requests.models.Response

ok

True if request’s status code is less than 400

Type:

bool

content_type

content type of the response

Type:

_tp.Optional[str]

text

text of the response

Type:

_tp.Optional[str]

soup

BeautifulSoup object of the html response

Type:

_tp.Optional[_bs4.BeautifulSoup]

response: Response
ok: bool
content_type: str | None
text: str | None
soup: BeautifulSoup | None = None
nistchempy.requests.make_nist_request(url: str, params: dict | None = None, config: RequestConfig | None = None) NistResponse

Dummy GET request to the NIST Chemistry WebBook

Parameters:
  • url (str) – URL of the NIST webpage

  • params (dict) – GET request parameters

  • config (_tp.Optional[RequestConfig]) – additional requests.get parameters

Returns:

wrapper for the request’s response

Return type:

NistResponse

nistchempy.requests.make_nist_post_request(url: str, data: dict | None = None, json: dict | None = None, files: dict | None = None, config: RequestConfig | None = None) NistResponse

POST request to the NIST Chemistry WebBook

Parameters:
  • url (str) – URL of the NIST webpage

  • data (dict) – POST data object to send in the body of the request

  • json (dict) – JSON serializable object to send in the body of the request

  • files (dict) – POST qwarg to send files in the body of the request

  • config (_tp.Optional[RequestConfig]) – additional requests.post parameters

Returns:

wrapper for the request’s response

Return type:

NistResponse

Parsing helpers

The parsing modules are useful for contributors and advanced users who need to inspect or adapt WebBook HTML parsing behavior.

Functionality to parse NIST Chemistry WebBook compound pages.

nistchempy.parsing.compound.get_compound_header(soup: BeautifulSoup | None)

Return the main compound-page header, if present.

Parameters:

soup – Parsed HTML page.

Returns:

BeautifulSoup tag for the compound header, or None.

nistchempy.parsing.compound.get_compound_info_list(soup: BeautifulSoup | None)

Return the main compound metadata list, if present.

Parameters:

soup – Parsed HTML page.

Returns:

BeautifulSoup tag for the main metadata list, or None.

nistchempy.parsing.compound.get_found_compounds(soup: BeautifulSoup | None) dict

Extract IDs of found compounds for NIST Chemistry WebBook search.

Parameters:

soup – Parsed search-result page.

Returns:

Dictionary with IDs and lost entries.

nistchempy.parsing.compound.is_compound_page(soup: BeautifulSoup | None) bool

Check whether an HTML page is a single compound page.

Parameters:

soup – Parsed HTML page.

Returns:

True when the page looks like a single compound page.

nistchempy.parsing.compound.get_compound_id_from_comment(soup: BeautifulSoup | None) str | None

Extract compound ID from commented fields in the Notes section.

Parameters:

soup – Parsed compound page.

Returns:

NIST compound ID, or None.

nistchempy.parsing.compound.get_compound_id_from_units_switch(soup: BeautifulSoup | None) str | None

Extract compound ID from the energy-units switch URL.

Parameters:

soup – Parsed compound page.

Returns:

NIST compound ID, or None.

nistchempy.parsing.compound.get_compound_id_from_data_refs(soup: BeautifulSoup | None) str | None

Extract compound ID from URLs to compound data sections.

Parameters:

soup – Parsed compound page.

Returns:

NIST compound ID, or None.

nistchempy.parsing.compound.get_compound_id(soup: BeautifulSoup | None) str | None

Return the NIST compound ID for a single compound page.

Parameters:

soup – Parsed compound page.

Returns:

NIST compound ID, or None.

nistchempy.parsing.compound.get_compound_name(soup: BeautifulSoup | None) str | None

Extract chemical name from a compound page.

Parameters:

soup – Parsed compound page.

Returns:

Chemical name, or None.

nistchempy.parsing.compound.get_compound_synonyms(soup: BeautifulSoup | None) List[str]

Extract compound synonyms from a compound page.

Parameters:

soup – Parsed compound page.

Returns:

Alternative chemical names.

nistchempy.parsing.compound.get_compound_formula(soup: BeautifulSoup | None) str | None

Extract chemical formula from a compound page.

Parameters:

soup – Parsed compound page.

Returns:

Chemical formula, or None.

nistchempy.parsing.compound.get_compound_mol_weight(soup: BeautifulSoup | None) float | None

Extract molecular weight from a compound page.

Parameters:

soup – Parsed compound page.

Returns:

Molecular weight, or None.

nistchempy.parsing.compound.get_compound_inchi(soup: BeautifulSoup | None) str | None

Extract InChI from a compound page.

Parameters:

soup – Parsed compound page.

Returns:

InChI string, or None.

nistchempy.parsing.compound.get_compound_inchi_key(soup: BeautifulSoup | None) str | None

Extract InChIKey from a compound page.

Parameters:

soup – Parsed compound page.

Returns:

InChIKey string, or None.

nistchempy.parsing.compound.get_compound_casrn(soup: BeautifulSoup | None) str | None

Extract CAS Registry Number from a compound page.

Parameters:

soup – Parsed compound page.

Returns:

CAS RN, or None.

nistchempy.parsing.compound.get_compound_mol_refs(soup: BeautifulSoup | None) Dict[str, str]

Extract URLs for available MOL files from a compound page.

Parameters:

soup – Parsed compound page.

Returns:

Mapping from mol2D / mol3D keys to URLs.

nistchempy.parsing.compound.get_compound_data_refs(soup: BeautifulSoup | None) Dict[str, str]

Extract URLs for available compound data sections.

Parameters:

soup – Parsed compound page.

Returns:

Mapping from WebBook section keys/names to URLs.

nistchempy.parsing.compound.get_compound_nist_public_refs(soup: BeautifulSoup | None) Dict[str, str]

Extract URLs for compound data at other public NIST sites.

Parameters:

soup – Parsed compound page.

Returns:

Mapping from source names to URLs.

nistchempy.parsing.compound.get_compound_nist_subscription_refs(soup: BeautifulSoup | None) Dict[str, str]

Extract URLs for compound data at subscription NIST sites.

Parameters:

soup – Parsed compound page.

Returns:

Mapping from source names to URLs.

nistchempy.parsing.compound.parse_compound_page(soup: BeautifulSoup | None) dict | None

Parse a single compound page.

Parameters:

soup – Parsed compound page.

Returns:

Extracted compound information, or None if the page is not a single compound page.

The module contains functionality to parse gas chromatography info

nistchempy.parsing.gas_chromatography.get_chromatography_table_refs(soup: BeautifulSoup | None) List[str]

Extracts references to large format tables containing info on chromatographic experiments

Parameters:

soup (_bs4.BeautifulSoup) – bs4-parsed web-page

Returns:

list of URLs

Return type:

_tp.List[str]

nistchempy.parsing.gas_chromatography.get_literature_references(soup: BeautifulSoup | None) Dict[str, str]

Extracts literature references from the corresponding section

Parameters:

soup (_bs4.BeautifulSoup) – bs4-parsed web-page

Returns:

ref’s span id => full reference text

Return type:

_tp.Dict

nistchempy.parsing.gas_chromatography.parse_chromatography_table(soup: BeautifulSoup) dict

Extracts references to large format tables containing info on chromatographic experiments

Parameters:

soup (_bs4.BeautifulSoup) – bs4-parsed web-page

Returns:

contains info to initialize nistchempy.compound.Chromatogram

Return type:

dict

Utilities and exceptions

Utility functions

nistchempy.utils.get_crawl_delay(useragent: str = '*', config: RequestConfig | None = None) float

Returns NIST Chemistry Webbook’s crawl delay for the given user agent

nistchempy.utils.useragent

user agent

Type:

str

Returns:

crawl delay in seconds

Return type:

float

nistchempy.utils.safe_filename(text: str, replacement: str = '_') str

Return a filesystem-friendly filename fragment.

Parameters:
  • text – Input text to sanitize.

  • replacement – Replacement character for unsafe characters.

Returns:

Sanitized filename. Empty results are returned as 'file'.

Return type:

str

NistChemPy exceptions.

exception nistchempy.exceptions.NistChemPyError

Bases: Exception

Base exception for NistChemPy errors.

exception nistchempy.exceptions.NistChemPyIndexNotFoundError

Bases: NistChemPyError

Raised when a user-local WebBook index is not available.

exception nistchempy.exceptions.NistChemPyIndexError

Bases: NistChemPyError

Raised when a user-local WebBook index is invalid.

exception nistchempy.exceptions.NistChemPyIndexBuildError

Bases: NistChemPyIndexError

Raised when a user-local WebBook index build fails.

exception nistchempy.exceptions.NistChemPyDataTermsError

Bases: NistChemPyIndexBuildError

Raised when local index creation lacks explicit acknowledgement.

exception nistchempy.exceptions.NistChemPyOptionalDependencyError

Bases: NistChemPyError

Raised when an optional dependency is required but unavailable.