mlreflect.training package

Submodules

mlreflect.training.footprint module

class mlreflect.training.footprint.FootprintRescaler(reflectivity: numpy.ndarray, true_ratio: float, errors: list, q: Optional[numpy.ndarray] = None, wavelength: Optional[float] = None, theta: Optional[numpy.ndarray] = None)[source]

Bases: object

Returns reflectivity curves with rescaled (“incorrect”) footprint for standard (specular) XRR geometries :param reflectivity: Reflected intensity values of one curve (1D) or several curves (2D). :param true_ratio: Assumed “true” ratio of beam width to sample length used to produce the fictional footprint on

the curves. E.g. for beam width of 200 microns and sample length of 10 mm true_ratio = 0.02.

Parameters

errors – List of error factors that the true_ratio can be multiplied by to produce the new (“incorrect”) footprint corrections.
q – ndarray of q-values corresponding the provided intensity matrix. If this value is provided, the wavelength has to be specified as well.
wavelength – Fictional wavelength in Angstroms that is used to calculate the scattering angle.
theta – ndarray angle values corresponding to the provided intensity matrix. If this is provided, q and wavelength do not need to be provided.

static angle_to_q(scattering_angle, wavelength)[source]

static apply_footprint(intensity: numpy.ndarray, scattering_angle: numpy.ndarray, ratio: float) → numpy.ndarray[source]

static correct_footprint(intensity: numpy.ndarray, scattering_angle: numpy.ndarray, ratio: float) → numpy.ndarray[source]

static normalize_to_first(rescaled_intensity: numpy.ndarray, original_intensity: numpy.ndarray)[source]

static normalize_to_max(rescaled_intensity: numpy.ndarray, original_intensity: numpy.ndarray)[source]

static q_to_angle(q, wavelength)[source]

property rescaled_reflectivity

mlreflect.training.noise_generator module

class mlreflect.training.noise_generator.BaseGenerator(reflectivity, labels, batch_functions, batch_size=32, shuffle=True)[source]

Bases: tensorflow.python.keras.utils.data_utils.Sequence

on_epoch_end()[source]: Method called at the end of every epoch.

class mlreflect.training.noise_generator.NoiseGenerator(reflectivity: numpy.ndarray, labels: numpy.ndarray, input_preprocessor: mlreflect.training.preprocessing.InputPreprocessor, batch_size=32, shuffle=True, mode='single', noise_range=None, background_range=None, relative_background_spread: float = 0.1)[source]

Bases: mlreflect.training.noise_generator.BaseGenerator

Generator object that returns a standardized batch of reflectivity and labels with random noise and background.

Parameters

reflectivity – Training reflectivity curves
labels – Training labels on the same order as reflectivity
input_preprocessor – :class:InputPreprocessor object with or without stored standardization values
batch_size – Number of samples per mini batch
shuffle – If True, shuffles reflectivity and labels after every epoch
noise_range – Tuple (min, max) between which the shot noise levels are randomly generated
background_range – Tuple (min, max) between which the background levels are randomly generated
mode – ‘single’: random noise and background levels are generated for every curve of a mini batch ‘batch’: random noise and background levels are generated for each mini batch
relative_background_spread – Relative standard deviation of the normal distribution (e.g. a value of 0.1 means the standard deviation is 10% of the mean)

class mlreflect.training.noise_generator.NoiseGeneratorLog(reflectivity, labels, batch_size=32, mode='single', shuffle=True, noise_range=None, background_range=None, relative_background_spread: float = 0.1)[source]: Bases: mlreflect.training.noise_generator.NoiseGenerator

class mlreflect.training.noise_generator.UniformNoiseGenerator(reflectivity, labels, ip, batch_size=32, mode='single', shuffle=True, uniform_noise_range=(1, 1), scaling_range=(1, 1))[source]: Bases: mlreflect.training.noise_generator.NoiseGenerator

mlreflect.training.prediction module

class mlreflect.training.prediction.Prediction(model_path: str, label_names: List[str])[source]

Bases: object

mean_absolute_error(predicted_labels: Union[pandas.core.frame.DataFrame, numpy.ndarray], test_labels: Union[pandas.core.frame.DataFrame, numpy.ndarray])[source]

mean_absolute_percentage_error(predicted_labels: Union[pandas.core.frame.DataFrame, numpy.ndarray], test_labels: Union[pandas.core.frame.DataFrame, numpy.ndarray])[source]

predict_labels(test_input: numpy.ndarray)[source]

mlreflect.training.preprocessing module

class mlreflect.training.preprocessing.InputPreprocessor[source]

Bases: object

Allows standardization while storing mean and standard deviation for later use.

Returns: InputPreprocessor

property has_saved_standardization

reset_mean_and_std()[source]: Resets previously stored mean and standard deviation for standardization.

revert_standardization(standardized_data: numpy.ndarray)[source]

property standard_mean

property standard_std

standardize(data: numpy.ndarray) → numpy.ndarray[source]: Applies standardization along axis=0 and returns standardized data. Mean and std will be reused.

class mlreflect.training.preprocessing.OutputPreprocessor(sample: mlreflect.data_generation.multilayer.MultilayerStructure, normalization: str = 'min_to_zero')[source]

Bases: object

Class for preprocessing reflectivity labels for training and validation.

Parameters

sample – MultilayerStructure object where the sample layers and their names and parameter ranges are defined.
normalization – Defines how the output labels are normalized. “min_to_zero” (default): shifts minimum value to 0 and scales maximum value to 1). “absolute_max”: scales absolute maximum value to 1).

Returns

OutputPreprocessor

add_constant_labels(predicted_labels_df: pandas.core.frame.DataFrame) → pandas.core.frame.DataFrame[source]: Adds all labels in constant_labels to predicted_labels_df.

property all_label_names

property all_label_parameters

apply_preprocessing(labels: Union[pandas.core.frame.DataFrame, numpy.ndarray]) → Tuple[pandas.core.frame.DataFrame, pandas.core.frame.DataFrame][source]

Removes all constant labels and applies normalization to the non-constant labels.

Parameters: labels – Pandas DataFrame or ndarray of randomly generated labels.
Returns: DataFrame constant_labels: DataFrame
Return type: normalized_labels

property constant_labels

normalize_labels(label_df: pandas.core.frame.DataFrame) → pandas.core.frame.DataFrame[source]: Normalizes all constant labels and returns normalized DataFrame.

property number_of_labels

property number_of_layers

remove_labels(label_df: pandas.core.frame.DataFrame) → pandas.core.frame.DataFrame[source]: Removes labels in constant_labels from label_df and returns DataFrame.

renormalize_labels(label_df: pandas.core.frame.DataFrame) → pandas.core.frame.DataFrame[source]: Removes normalization from all labels in label_df.

restore_labels(predicted_labels: Union[pandas.core.frame.DataFrame, numpy.ndarray]) → pandas.core.frame.DataFrame[source]: Takes the predicted labels, reverts normalization and adds constant labels and returns those as DataFrame.

property used_labels

mlreflect.training.training module

class mlreflect.training.training.Trainer(sample_structure: mlreflect.data_generation.multilayer.MultilayerStructure, q_values: numpy.ndarray, random_seed=None)[source]

Bases: object

Train a neural network model for a given sample structure and q values

Parameters

sample_structure – MultilayerStructure object that describes the sample. Should only have one non-constant layer.
q_values – ndarray of the q values used for training. Should be similar to the experimental q values.
random_seed – random seed for the training data generation. None means the seed is chosen randomly.

generate_training_data(training_samples: int = 131072)[source]: Generate a training data set for training the neural network.

property has_training_data

train(n_epochs=175, batch_size=512, verbose=1, val_split=0.2) → Tuple[mlreflect.models.trained_model.TrainedModel, History][source]

Train a fully-connected neural network with the generated training data.

Parameters

n_epochs – Number of epochs to train for.
batch_size – Number of curves per training batch. Must be smaller than val_split times the training set size.
verbose – Determines the amount of text output during training (0, 1, 2).
val_split – The fraction of the training set that is withheld for validation.

Returns

TrainedModel object that contains the trained keras model as well as other parameters: necessary to predict test data.

history: Training history output from keras model.fit().

Return type

trained_model

Module contents

class mlreflect.training.InputPreprocessor[source]

Bases: object

Allows standardization while storing mean and standard deviation for later use.

Returns: InputPreprocessor

property has_saved_standardization

reset_mean_and_std()[source]: Resets previously stored mean and standard deviation for standardization.

revert_standardization(standardized_data: numpy.ndarray)[source]

property standard_mean

property standard_std

standardize(data: numpy.ndarray) → numpy.ndarray[source]: Applies standardization along axis=0 and returns standardized data. Mean and std will be reused.

class mlreflect.training.NoiseGenerator(reflectivity: numpy.ndarray, labels: numpy.ndarray, input_preprocessor: mlreflect.training.preprocessing.InputPreprocessor, batch_size=32, shuffle=True, mode='single', noise_range=None, background_range=None, relative_background_spread: float = 0.1)[source]

Bases: mlreflect.training.noise_generator.BaseGenerator

Generator object that returns a standardized batch of reflectivity and labels with random noise and background.

Parameters

reflectivity – Training reflectivity curves
labels – Training labels on the same order as reflectivity
input_preprocessor – :class:InputPreprocessor object with or without stored standardization values
batch_size – Number of samples per mini batch
shuffle – If True, shuffles reflectivity and labels after every epoch
noise_range – Tuple (min, max) between which the shot noise levels are randomly generated
background_range – Tuple (min, max) between which the background levels are randomly generated
mode – ‘single’: random noise and background levels are generated for every curve of a mini batch ‘batch’: random noise and background levels are generated for each mini batch
relative_background_spread – Relative standard deviation of the normal distribution (e.g. a value of 0.1 means the standard deviation is 10% of the mean)

class mlreflect.training.NoiseGeneratorLog(reflectivity, labels, batch_size=32, mode='single', shuffle=True, noise_range=None, background_range=None, relative_background_spread: float = 0.1)[source]: Bases: mlreflect.training.noise_generator.NoiseGenerator

class mlreflect.training.OutputPreprocessor(sample: mlreflect.data_generation.multilayer.MultilayerStructure, normalization: str = 'min_to_zero')[source]

Bases: object

Class for preprocessing reflectivity labels for training and validation.

Parameters

sample – MultilayerStructure object where the sample layers and their names and parameter ranges are defined.
normalization – Defines how the output labels are normalized. “min_to_zero” (default): shifts minimum value to 0 and scales maximum value to 1). “absolute_max”: scales absolute maximum value to 1).

Returns

OutputPreprocessor

add_constant_labels(predicted_labels_df: pandas.core.frame.DataFrame) → pandas.core.frame.DataFrame[source]: Adds all labels in constant_labels to predicted_labels_df.

property all_label_names

property all_label_parameters

apply_preprocessing(labels: Union[pandas.core.frame.DataFrame, numpy.ndarray]) → Tuple[pandas.core.frame.DataFrame, pandas.core.frame.DataFrame][source]

Removes all constant labels and applies normalization to the non-constant labels.

Parameters: labels – Pandas DataFrame or ndarray of randomly generated labels.
Returns: DataFrame constant_labels: DataFrame
Return type: normalized_labels

property constant_labels

normalize_labels(label_df: pandas.core.frame.DataFrame) → pandas.core.frame.DataFrame[source]: Normalizes all constant labels and returns normalized DataFrame.

property number_of_labels

property number_of_layers

remove_labels(label_df: pandas.core.frame.DataFrame) → pandas.core.frame.DataFrame[source]: Removes labels in constant_labels from label_df and returns DataFrame.

renormalize_labels(label_df: pandas.core.frame.DataFrame) → pandas.core.frame.DataFrame[source]: Removes normalization from all labels in label_df.

restore_labels(predicted_labels: Union[pandas.core.frame.DataFrame, numpy.ndarray]) → pandas.core.frame.DataFrame[source]: Takes the predicted labels, reverts normalization and adds constant labels and returns those as DataFrame.

property used_labels

class mlreflect.training.Prediction(model_path: str, label_names: List[str])[source]

Bases: object

mean_absolute_error(predicted_labels: Union[pandas.core.frame.DataFrame, numpy.ndarray], test_labels: Union[pandas.core.frame.DataFrame, numpy.ndarray])[source]

mean_absolute_percentage_error(predicted_labels: Union[pandas.core.frame.DataFrame, numpy.ndarray], test_labels: Union[pandas.core.frame.DataFrame, numpy.ndarray])[source]

predict_labels(test_input: numpy.ndarray)[source]

class mlreflect.training.Trainer(sample_structure: mlreflect.data_generation.multilayer.MultilayerStructure, q_values: numpy.ndarray, random_seed=None)[source]

Bases: object

Train a neural network model for a given sample structure and q values

Parameters

sample_structure – MultilayerStructure object that describes the sample. Should only have one non-constant layer.
q_values – ndarray of the q values used for training. Should be similar to the experimental q values.
random_seed – random seed for the training data generation. None means the seed is chosen randomly.

generate_training_data(training_samples: int = 131072)[source]: Generate a training data set for training the neural network.

property has_training_data

train(n_epochs=175, batch_size=512, verbose=1, val_split=0.2) → Tuple[mlreflect.models.trained_model.TrainedModel, History][source]

Train a fully-connected neural network with the generated training data.

Parameters

n_epochs – Number of epochs to train for.
batch_size – Number of curves per training batch. Must be smaller than val_split times the training set size.
verbose – Determines the amount of text output during training (0, 1, 2).
val_split – The fraction of the training set that is withheld for validation.

Returns

TrainedModel object that contains the trained keras model as well as other parameters: necessary to predict test data.

history: Training history output from keras model.fit().

Return type

trained_model

class mlreflect.training.UniformNoiseGenerator(reflectivity, labels, ip, batch_size=32, mode='single', shuffle=True, uniform_noise_range=(1, 1), scaling_range=(1, 1))[source]: Bases: mlreflect.training.noise_generator.NoiseGenerator