HeavyEdge-Classify documentation#

HeavyEdge-Classify is a Python package for probabilistic classification of coating edge profiles.

Note

This package provides only the model architecture and command line interfaces. It does not include any pre-trained model or training data.

Usage#

HeavyEdge-Classify is designed to be used either as a command line program or as a Python module.

Command line#

Command line interface provides pre-defined subroutines for training and prediction. It can be invoked by:

heavyedge classify-train <args>
heavyedge classify-predict <args>

Refer to help message of each command for their arguments.

Python module#

The Python module heavyedge_classify provides functions and classes for Python runtime. Refer to Runtime API section for high-level interface.

Module reference#

This section provides reference for heavyedge_classify Python module.

Runtime API#

High-level Python runtime interface.

heavyedge_classify.api.classify_train(profiles, labels, cv=5, calibration='sigmoid', normalize=True, n_jobs=None, random_state=0, logger=<function <lambda>>, n_splits=None)[source]#

Train classification model.

Parameters:
profilesheavyedge.ProfileData

Open h5 file of profiles.

labelsnp.ndarray

Label array. The order of labels should match the order of profiles.

cvint, iterable, or cross-validation generator, default=5

Cross-validation strategy. If an integer is passed, it is the number of folds for stratified k-fold CV.

calibration{“sigmoid”, “isotonic”, “temperature”, “sigmoid_ovo”, “isotonic_ovo”}

Calibration method for the classifier.

normalizebool, default=True

Whether to normalize profiles by area under curve. Set this to False if profiles are already normalized.

n_jobsint, default=None

Number of jobs to run in parallel

random_stateint, default=0

Random seed for reproducibility.

loggercallable, optional

Logger function which accepts a progress message string.

n_splitsint, optional

Number of splits for cross-validation. If passed, overrides cv.

Deprecated since version 1.4.0: The n_splits parameter is deprecated and will be removed in a future version. Use cv instead.

Returns:
model

Trained model object.

Examples

>>> from heavyedge import ProfileData
>>> from heavyedge_classify.samples import get_sample_path
>>> from heavyedge_classify.api import classify_train
>>> import numpy as np
>>> profiles = ProfileData(get_sample_path("Profiles.h5"))
>>> labels = np.load(get_sample_path("labels.npy"))
>>> classify_train(profiles, labels)
CalibratedClassifierCV(...)
heavyedge_classify.api.classify_predict(model, profiles, normalize=True, batch_size=None, logger=<function <lambda>>)[source]#

Predict probabilistic labels of profiles using a trained model.

Parameters:
model

Trained model object.

profilesheavyedge.ProfileData

Open h5 file of profiles.

normalizebool, default=True

Whether to normalize profiles by area under curve. Set this to False if profiles are already normalized.

batch_sizeint, optional

Batch size to load data. If not passed, all data are loaded at once.

loggercallable, optional

Logger function which accepts a progress message string.

Yields:
predicted_labelsnp.ndarray

Predicted probabilistic label array.

Examples

>>> import pickle
>>> from heavyedge import ProfileData
>>> from heavyedge_classify.samples import get_sample_path
>>> from heavyedge_classify.api import classify_predict
>>> with open(get_sample_path("model.pkl"), "rb") as f:
...     model = pickle.load(f)
>>> profiles = ProfileData(get_sample_path("Profiles.h5"))
>>> [lab.shape for lab in classify_predict(model, profiles, batch_size=50)]
[(50, 3), (25, 3)]

Low-level API#

MiniRocket-based probabilistic classifier of 1D signals.

heavyedge_classify.model.minirocket_classifier(cv=5, calibration='sigmoid', n_jobs=None, verbose=False, random_state=0, n_splits=None)[source]#

MiniRocket-based probabilistic classifier of 1D signals.

Parameters:
cvint, iterable, or cross-validation generator, default=5

Cross-validation strategy. If an integer is passed, it is the number of folds for stratified k-fold CV.

calibration{“sigmoid”, “isotonic”, “temperature”, “sigmoid_ovo”, “isotonic_ovo”}

Calibration method for the classifier.

n_jobsint, default=None

Number of jobs to run in parallel.

verbosebool, default=False

Prints pipeline steps.

random_stateint, default=0

Random seed for reproducibility.

n_splitsint, optional

Number of splits for cross-validation. If passed, overrides cv.

Deprecated since version 1.4.0: The n_splits parameter is deprecated and will be removed in a future version. Use cv instead.

Returns:
model

MiniRocket-based probabilistic classifier.

Examples

>>> from heavyedge import ProfileData
>>> from heavyedge_classify.samples import get_sample_path
>>> from heavyedge_classify.model import minirocket_classifier
>>> import numpy as np
>>> model = minirocket_classifier(cv=5, random_state=42)
>>> X, _, _ = ProfileData(get_sample_path("Profiles.h5"))[:]
>>> y = np.load(get_sample_path("labels.npy"))
>>> model.fit(X[:5], y[:5])
CalibratedClassifierCV(...)