mlreflect.training package

Submodules

mlreflect.training.footprint module

class mlreflect.training.footprint.FootprintRescaler(reflectivity: numpy.ndarray, true_ratio: float, errors: list, q: Optional[numpy.ndarray] = None, wavelength: Optional[float] = None, theta: Optional[numpy.ndarray] = None)[source]

Bases: object

Returns reflectivity curves with rescaled (“incorrect”) footprint for standard (specular) XRR geometries :param reflectivity: Reflected intensity values of one curve (1D) or several curves (2D). :param true_ratio: Assumed “true” ratio of beam width to sample length used to produce the fictional footprint on

the curves. E.g. for beam width of 200 microns and sample length of 10 mm true_ratio = 0.02.

Parameters
  • errors – List of error factors that the true_ratio can be multiplied by to produce the new (“incorrect”) footprint corrections.

  • q – ndarray of q-values corresponding the provided intensity matrix. If this value is provided, the wavelength has to be specified as well.

  • wavelength – Fictional wavelength in Angstroms that is used to calculate the scattering angle.

  • theta – ndarray angle values corresponding to the provided intensity matrix. If this is provided, q and wavelength do not need to be provided.

static angle_to_q(scattering_angle, wavelength)[source]
static apply_footprint(intensity: numpy.ndarray, scattering_angle: numpy.ndarray, ratio: float) numpy.ndarray[source]
static correct_footprint(intensity: numpy.ndarray, scattering_angle: numpy.ndarray, ratio: float) numpy.ndarray[source]
static normalize_to_first(rescaled_intensity: numpy.ndarray, original_intensity: numpy.ndarray)[source]
static normalize_to_max(rescaled_intensity: numpy.ndarray, original_intensity: numpy.ndarray)[source]
static q_to_angle(q, wavelength)[source]
property rescaled_reflectivity

mlreflect.training.noise_generator module

class mlreflect.training.noise_generator.BaseGenerator(reflectivity, labels, batch_functions, batch_size=32, shuffle=True)[source]

Bases: tensorflow.python.keras.utils.data_utils.Sequence

on_epoch_end()[source]

Method called at the end of every epoch.

class mlreflect.training.noise_generator.NoiseGenerator(reflectivity: numpy.ndarray, labels: numpy.ndarray, input_preprocessor: mlreflect.training.preprocessing.InputPreprocessor, batch_size=32, shuffle=True, mode='single', noise_range=None, background_range=None, relative_background_spread: float = 0.1)[source]

Bases: mlreflect.training.noise_generator.BaseGenerator

Generator object that returns a standardized batch of reflectivity and labels with random noise and background.

Parameters
  • reflectivity – Training reflectivity curves

  • labels – Training labels on the same order as reflectivity

  • input_preprocessor – :class:InputPreprocessor object with or without stored standardization values

  • batch_size – Number of samples per mini batch

  • shuffle – If True, shuffles reflectivity and labels after every epoch

  • noise_range – Tuple (min, max) between which the shot noise levels are randomly generated

  • background_range – Tuple (min, max) between which the background levels are randomly generated

  • mode – ‘single’: random noise and background levels are generated for every curve of a mini batch ‘batch’: random noise and background levels are generated for each mini batch

  • relative_background_spread – Relative standard deviation of the normal distribution (e.g. a value of 0.1 means the standard deviation is 10% of the mean)

class mlreflect.training.noise_generator.NoiseGeneratorLog(reflectivity, labels, batch_size=32, mode='single', shuffle=True, noise_range=None, background_range=None, relative_background_spread: float = 0.1)[source]

Bases: mlreflect.training.noise_generator.NoiseGenerator

class mlreflect.training.noise_generator.UniformNoiseGenerator(reflectivity, labels, ip, batch_size=32, mode='single', shuffle=True, uniform_noise_range=(1, 1), scaling_range=(1, 1))[source]

Bases: mlreflect.training.noise_generator.NoiseGenerator

mlreflect.training.prediction module

class mlreflect.training.prediction.Prediction(model_path: str, label_names: List[str])[source]

Bases: object

mean_absolute_error(predicted_labels: Union[pandas.core.frame.DataFrame, numpy.ndarray], test_labels: Union[pandas.core.frame.DataFrame, numpy.ndarray])[source]
mean_absolute_percentage_error(predicted_labels: Union[pandas.core.frame.DataFrame, numpy.ndarray], test_labels: Union[pandas.core.frame.DataFrame, numpy.ndarray])[source]
predict_labels(test_input: numpy.ndarray)[source]

mlreflect.training.preprocessing module

class mlreflect.training.preprocessing.InputPreprocessor[source]

Bases: object

Allows standardization while storing mean and standard deviation for later use.

Returns

InputPreprocessor

property has_saved_standardization
reset_mean_and_std()[source]

Resets previously stored mean and standard deviation for standardization.

revert_standardization(standardized_data: numpy.ndarray)[source]
property standard_mean
property standard_std
standardize(data: numpy.ndarray) numpy.ndarray[source]

Applies standardization along axis=0 and returns standardized data. Mean and std will be reused.

class mlreflect.training.preprocessing.OutputPreprocessor(sample: mlreflect.data_generation.multilayer.MultilayerStructure, normalization: str = 'min_to_zero')[source]

Bases: object

Class for preprocessing reflectivity labels for training and validation.

Parameters
  • sampleMultilayerStructure object where the sample layers and their names and parameter ranges are defined.

  • normalization – Defines how the output labels are normalized. “min_to_zero” (default): shifts minimum value to 0 and scales maximum value to 1). “absolute_max”: scales absolute maximum value to 1).

Returns

OutputPreprocessor

add_constant_labels(predicted_labels_df: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame[source]

Adds all labels in constant_labels to predicted_labels_df.

property all_label_names
property all_label_parameters
apply_preprocessing(labels: Union[pandas.core.frame.DataFrame, numpy.ndarray]) Tuple[pandas.core.frame.DataFrame, pandas.core.frame.DataFrame][source]

Removes all constant labels and applies normalization to the non-constant labels.

Parameters

labels – Pandas DataFrame or ndarray of randomly generated labels.

Returns

DataFrame constant_labels: DataFrame

Return type

normalized_labels

property constant_labels
normalize_labels(label_df: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame[source]

Normalizes all constant labels and returns normalized DataFrame.

property number_of_labels
property number_of_layers
remove_labels(label_df: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame[source]

Removes labels in constant_labels from label_df and returns DataFrame.

renormalize_labels(label_df: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame[source]

Removes normalization from all labels in label_df.

restore_labels(predicted_labels: Union[pandas.core.frame.DataFrame, numpy.ndarray]) pandas.core.frame.DataFrame[source]

Takes the predicted labels, reverts normalization and adds constant labels and returns those as DataFrame.

property used_labels

mlreflect.training.training module

class mlreflect.training.training.Trainer(sample_structure: mlreflect.data_generation.multilayer.MultilayerStructure, q_values: numpy.ndarray, random_seed=None)[source]

Bases: object

Train a neural network model for a given sample structure and q values

Parameters
  • sample_structure – MultilayerStructure object that describes the sample. Should only have one non-constant layer.

  • q_values – ndarray of the q values used for training. Should be similar to the experimental q values.

  • random_seed – random seed for the training data generation. None means the seed is chosen randomly.

generate_training_data(training_samples: int = 131072)[source]

Generate a training data set for training the neural network.

property has_training_data
train(n_epochs=175, batch_size=512, verbose=1, val_split=0.2) Tuple[mlreflect.models.trained_model.TrainedModel, History][source]

Train a fully-connected neural network with the generated training data.

Parameters
  • n_epochs – Number of epochs to train for.

  • batch_size – Number of curves per training batch. Must be smaller than val_split times the training set size.

  • verbose – Determines the amount of text output during training (0, 1, 2).

  • val_split – The fraction of the training set that is withheld for validation.

Returns

TrainedModel object that contains the trained keras model as well as other parameters

necessary to predict test data.

history: Training history output from keras model.fit().

Return type

trained_model

Module contents

class mlreflect.training.InputPreprocessor[source]

Bases: object

Allows standardization while storing mean and standard deviation for later use.

Returns

InputPreprocessor

property has_saved_standardization
reset_mean_and_std()[source]

Resets previously stored mean and standard deviation for standardization.

revert_standardization(standardized_data: numpy.ndarray)[source]
property standard_mean
property standard_std
standardize(data: numpy.ndarray) numpy.ndarray[source]

Applies standardization along axis=0 and returns standardized data. Mean and std will be reused.

class mlreflect.training.NoiseGenerator(reflectivity: numpy.ndarray, labels: numpy.ndarray, input_preprocessor: mlreflect.training.preprocessing.InputPreprocessor, batch_size=32, shuffle=True, mode='single', noise_range=None, background_range=None, relative_background_spread: float = 0.1)[source]

Bases: mlreflect.training.noise_generator.BaseGenerator

Generator object that returns a standardized batch of reflectivity and labels with random noise and background.

Parameters
  • reflectivity – Training reflectivity curves

  • labels – Training labels on the same order as reflectivity

  • input_preprocessor – :class:InputPreprocessor object with or without stored standardization values

  • batch_size – Number of samples per mini batch

  • shuffle – If True, shuffles reflectivity and labels after every epoch

  • noise_range – Tuple (min, max) between which the shot noise levels are randomly generated

  • background_range – Tuple (min, max) between which the background levels are randomly generated

  • mode – ‘single’: random noise and background levels are generated for every curve of a mini batch ‘batch’: random noise and background levels are generated for each mini batch

  • relative_background_spread – Relative standard deviation of the normal distribution (e.g. a value of 0.1 means the standard deviation is 10% of the mean)

class mlreflect.training.NoiseGeneratorLog(reflectivity, labels, batch_size=32, mode='single', shuffle=True, noise_range=None, background_range=None, relative_background_spread: float = 0.1)[source]

Bases: mlreflect.training.noise_generator.NoiseGenerator

class mlreflect.training.OutputPreprocessor(sample: mlreflect.data_generation.multilayer.MultilayerStructure, normalization: str = 'min_to_zero')[source]

Bases: object

Class for preprocessing reflectivity labels for training and validation.

Parameters
  • sampleMultilayerStructure object where the sample layers and their names and parameter ranges are defined.

  • normalization – Defines how the output labels are normalized. “min_to_zero” (default): shifts minimum value to 0 and scales maximum value to 1). “absolute_max”: scales absolute maximum value to 1).

Returns

OutputPreprocessor

add_constant_labels(predicted_labels_df: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame[source]

Adds all labels in constant_labels to predicted_labels_df.

property all_label_names
property all_label_parameters
apply_preprocessing(labels: Union[pandas.core.frame.DataFrame, numpy.ndarray]) Tuple[pandas.core.frame.DataFrame, pandas.core.frame.DataFrame][source]

Removes all constant labels and applies normalization to the non-constant labels.

Parameters

labels – Pandas DataFrame or ndarray of randomly generated labels.

Returns

DataFrame constant_labels: DataFrame

Return type

normalized_labels

property constant_labels
normalize_labels(label_df: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame[source]

Normalizes all constant labels and returns normalized DataFrame.

property number_of_labels
property number_of_layers
remove_labels(label_df: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame[source]

Removes labels in constant_labels from label_df and returns DataFrame.

renormalize_labels(label_df: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame[source]

Removes normalization from all labels in label_df.

restore_labels(predicted_labels: Union[pandas.core.frame.DataFrame, numpy.ndarray]) pandas.core.frame.DataFrame[source]

Takes the predicted labels, reverts normalization and adds constant labels and returns those as DataFrame.

property used_labels
class mlreflect.training.Prediction(model_path: str, label_names: List[str])[source]

Bases: object

mean_absolute_error(predicted_labels: Union[pandas.core.frame.DataFrame, numpy.ndarray], test_labels: Union[pandas.core.frame.DataFrame, numpy.ndarray])[source]
mean_absolute_percentage_error(predicted_labels: Union[pandas.core.frame.DataFrame, numpy.ndarray], test_labels: Union[pandas.core.frame.DataFrame, numpy.ndarray])[source]
predict_labels(test_input: numpy.ndarray)[source]
class mlreflect.training.Trainer(sample_structure: mlreflect.data_generation.multilayer.MultilayerStructure, q_values: numpy.ndarray, random_seed=None)[source]

Bases: object

Train a neural network model for a given sample structure and q values

Parameters
  • sample_structure – MultilayerStructure object that describes the sample. Should only have one non-constant layer.

  • q_values – ndarray of the q values used for training. Should be similar to the experimental q values.

  • random_seed – random seed for the training data generation. None means the seed is chosen randomly.

generate_training_data(training_samples: int = 131072)[source]

Generate a training data set for training the neural network.

property has_training_data
train(n_epochs=175, batch_size=512, verbose=1, val_split=0.2) Tuple[mlreflect.models.trained_model.TrainedModel, History][source]

Train a fully-connected neural network with the generated training data.

Parameters
  • n_epochs – Number of epochs to train for.

  • batch_size – Number of curves per training batch. Must be smaller than val_split times the training set size.

  • verbose – Determines the amount of text output during training (0, 1, 2).

  • val_split – The fraction of the training set that is withheld for validation.

Returns

TrainedModel object that contains the trained keras model as well as other parameters

necessary to predict test data.

history: Training history output from keras model.fit().

Return type

trained_model

class mlreflect.training.UniformNoiseGenerator(reflectivity, labels, ip, batch_size=32, mode='single', shuffle=True, uniform_noise_range=(1, 1), scaling_range=(1, 1))[source]

Bases: mlreflect.training.noise_generator.NoiseGenerator