HeavyEdge-Classify documentation#

HeavyEdge-Classify is a Python package for probabilistic classification of coating edge profiles.

Note

This package provides only the model architecture and command line interfaces. It does not include any pre-trained model or training data.

Usage#

HeavyEdge-Classify is designed to be used either as a command line program or as a Python module.

Command line#

Command line interface provides pre-defined subroutines for training and prediction. It can be invoked by:

heavyedge classify-train <args>
heavyedge classify-predict <args>

Refer to help message of each command for their arguments.

Python module#

The Python module heavyedge_classify provides functions and classes for Python runtime. Refer to Runtime API section for high-level interface.

Module reference#

This section provides reference for heavyedge_classify Python module.

Runtime API#

High-level Python runtime interface.

heavyedge_classify.api.classify_train(profiles, labels, cv=5, calibration='sigmoid', normalize=True, n_jobs=None, random_state=0, logger=<function <lambda>>, n_splits=None)[source]#

Train classification model.

Parameters:

profilesheavyedge.ProfileData: Open h5 file of profiles.
labelsnp.ndarray: Label array. The order of labels should match the order of profiles.
cvint, iterable, or cross-validation generator, default=5: Cross-validation strategy. If an integer is passed, it is the number of folds for stratified k-fold CV.
calibration{“sigmoid”, “isotonic”, “temperature”, “sigmoid_ovo”, “isotonic_ovo”}: Calibration method for the classifier.
normalizebool, default=True: Whether to normalize profiles by area under curve. Set this to False if profiles are already normalized.
n_jobsint, default=None: Number of jobs to run in parallel
random_stateint, default=0: Random seed for reproducibility.
loggercallable, optional: Logger function which accepts a progress message string.
n_splitsint, optional: Number of splits for cross-validation. If passed, overrides cv.

Deprecated since version 1.4.0: The n_splits parameter is deprecated and will be removed in a future version. Use cv instead.

Returns:

model: Trained model object.

Examples

>>> from heavyedge import ProfileData
>>> from heavyedge_classify.samples import get_sample_path
>>> from heavyedge_classify.api import classify_train
>>> import numpy as np
>>> profiles = ProfileData(get_sample_path("Profiles.h5"))
>>> labels = np.load(get_sample_path("labels.npy"))
>>> classify_train(profiles, labels)
CalibratedClassifierCV(...)

heavyedge_classify.api.classify_predict(model, profiles, normalize=True, batch_size=None, logger=<function <lambda>>)[source]#

Predict probabilistic labels of profiles using a trained model.

Parameters:

model: Trained model object.
profilesheavyedge.ProfileData: Open h5 file of profiles.
normalizebool, default=True: Whether to normalize profiles by area under curve. Set this to False if profiles are already normalized.
batch_sizeint, optional: Batch size to load data. If not passed, all data are loaded at once.
loggercallable, optional: Logger function which accepts a progress message string.

Yields:

predicted_labelsnp.ndarray: Predicted probabilistic label array.

Examples

>>> import pickle
>>> from heavyedge import ProfileData
>>> from heavyedge_classify.samples import get_sample_path
>>> from heavyedge_classify.api import classify_predict
>>> with open(get_sample_path("model.pkl"), "rb") as f:
...     model = pickle.load(f)
>>> profiles = ProfileData(get_sample_path("Profiles.h5"))
>>> [lab.shape for lab in classify_predict(model, profiles, batch_size=50)]
[(50, 3), (25, 3)]

Low-level API#

MiniRocket-based probabilistic classifier of 1D signals.

heavyedge_classify.model.minirocket_classifier(cv=5, calibration='sigmoid', n_jobs=None, verbose=False, random_state=0, n_splits=None)[source]#

MiniRocket-based probabilistic classifier of 1D signals.

Parameters:

cvint, iterable, or cross-validation generator, default=5: Cross-validation strategy. If an integer is passed, it is the number of folds for stratified k-fold CV.
calibration{“sigmoid”, “isotonic”, “temperature”, “sigmoid_ovo”, “isotonic_ovo”}: Calibration method for the classifier.
n_jobsint, default=None: Number of jobs to run in parallel.
verbosebool, default=False: Prints pipeline steps.
random_stateint, default=0: Random seed for reproducibility.
n_splitsint, optional: Number of splits for cross-validation. If passed, overrides cv.

Deprecated since version 1.4.0: The n_splits parameter is deprecated and will be removed in a future version. Use cv instead.

Returns:

model: MiniRocket-based probabilistic classifier.

Examples

>>> from heavyedge import ProfileData
>>> from heavyedge_classify.samples import get_sample_path
>>> from heavyedge_classify.model import minirocket_classifier
>>> import numpy as np
>>> model = minirocket_classifier(cv=5, random_state=42)
>>> X, _, _ = ProfileData(get_sample_path("Profiles.h5"))[:]
>>> y = np.load(get_sample_path("labels.npy"))
>>> model.fit(X[:5], y[:5])
CalibratedClassifierCV(...)