lalegpl.lib.lale.nsga2 module¶

exception lalegpl.lib.lale.nsga2.MaxBudgetExceededException[source]¶: Bases: Exception

class lalegpl.lib.lale.nsga2.NSGA2(*, estimator=None, scoring, best_score=0.0, cv=5, max_evals=50, max_opt_time=None, population_size=10, random_seed=42)¶

Bases: lale.operators.PlannedIndividualOp

Multi Objective Optimizer based on NSGA-II algorithm.

This documentation is auto-generated from JSON schemas.

Example

>>> import lale.datasets.openml
>>> (X_train, y_train), (X_test, y_test) =
...     lale.datasets.openml.fetch('credit-g', 'classification', preprocess=True, astype='pandas')
>>>
>>> # Create sklearn scorer for computing FPR
>>> def compute_fpr(y_true, y_pred):
...     from sklearn.metrics import confusion_matrix
...     tn, fp, fn, tp = confusion_matrix(y_true, y_pred).ravel()
...     fpr = round(fp / (fp + tn), 4)
...     return fpr
>>>
>>> from sklearn.metrics import make_scorer
>>> fpr_scorer = make_scorer(compute_fpr, greater_is_better=False)
>>>
>>> from lale.lib.xgboost import XGBClassifier
>>> clf = XGBClassifier()
>>> nsga2_args = {'estimator': clf, 'scoring': ['accuracy', fpr_scorer],
...               'best_score': [1, 0], 'cv': 3,
...               'max_evals': 20, 'population_size': 10}
>>> opt = NSGA2(**nsga2_args)
>>> trained = opt.fit(X_train, y_train)
>>> # Predict using first pareto-optimal solution (pipeline)
>>> predictions = trained.predict(X_test, pipeline_name='p0')
>>> from sklearn.metrics import accuracy_score
>>> acc = accuracy_score(y_test, predictions)
>>> fpr = compute_fpr(y_test, predictions)
>>> print('Accuracy, FPR - %.3f, %.3f' % (acc, fpr))

Parameters

estimator (union type, not for optimizer, default None) –
Planned Lale individual operator or pipeline.
- operator
- or None
  
  lale.lib.sklearn.LogisticRegression
scoring (array, not for optimizer) –
A list of Scorer objects, or known scorers named by string. The optimizer may take the order into account.
- items : union type
  Scorer object, or known scorer named by string.
  - callable
    
    Callable with signature scoring(estimator, X, y) as documented in sklearn scoring.
    
    The callable has to return a scalar value, such that a higher score is better. This may be created from one of the sklearn metrics using make_scorer. Or it can be one of the scoring callables returned by the factory functions in lale.lib.aif360 metrics, for example, symmetric_disparate_impact(**fairness_info). Or it can be a completely custom user-written Python callable.
  - or ‘accuracy’, ‘explained_variance’, ‘max_error’, ‘roc_auc’, ‘roc_auc_ovr’, ‘roc_auc_ovo’, ‘roc_auc_ovr_weighted’, ‘roc_auc_ovo_weighted’, ‘balanced_accuracy’, ‘average_precision’, ‘neg_log_loss’, or ‘neg_brier_score’
    
    Known scorer for classification task.
  - or ‘r2’, ‘neg_mean_squared_error’, ‘neg_mean_absolute_error’, ‘neg_root_mean_squared_error’, ‘neg_mean_squared_log_error’, or ‘neg_median_absolute_error’
    
    Known scorer for regression task.
best_score (union type, optional, not for optimizer, default 0.0) –
The best score for the specified scorer.

Given that higher scores are better, passing (best_score - score) as a loss to the minimizing optimizer will maximize the score. By specifying best_score, the loss can be >=0, where 0 is the best loss.
- float, default 0.0
  
  The best score for the specified scorer.
  
  Given that higher scores are better, passing (best_score - score) as a loss to the minimizing optimizer will maximize the score. By specifying best_score, the loss can be >=0, where 0 is the best loss.
- or array
  The best score for each specified scorer.
  If not enough are specified, the remainder are assumed to be the default.
  
  Given that higher scores are better, passing (best_score - score) as a loss to the minimizing optimizer will maximize the score. By specifying best_score, the loss can be >=0, where 0 is the best loss.
  - items : float, default 0.0
    
    The best score for the specified scorer.
    
    Given that higher scores are better, passing (best_score - score) as a loss to the minimizing optimizer will maximize the score. By specifying best_score, the loss can be >=0, where 0 is the best loss.
cv (union type, optional, not for optimizer, default 5) –
Cross-validation as integer or as object that has a split function.

The fit method performs cross validation on the input dataset for per trial, and uses the mean cross validation performance for optimization. This behavior is also impacted by the handle_cv_failure flag.
- integer, >=2, >=3 for optimizer, <=4 for optimizer, uniform distribution, default 5
  
  Number of folds for cross-validation.
- or negated type, not for optimizer of integer
  
  Object with split function: generator yielding (train, test) splits as arrays of indices. Can use any of the iterators from https://scikit-learn.org/stable/modules/cross_validation.html#cross-validation-iterators
max_evals (integer, >=1, optional, not for optimizer, default 50) – Number of trials of Hyperopt search.
max_opt_time (union type, optional, not for optimizer, default None) –
Maximum amount of time in seconds for the optimization.
- float, >=0.0
- or None
  
  No runtime bound.
population_size (any type, optional, not for optimizer, default 10) –
random_seed (any type, optional, not for optimizer, default 42) –

fit(X, y=None, **fit_params)¶

Train the operator.

Note: The fit method is not available until this operator is trainable.

Once this method is available, it will have the following signature:

Parameters

X (Any) –
y (Any) –

predict(X, **predict_params)¶

Make predictions.

Note: The predict method is not available until this operator is trained.

Once this method is available, it will have the following signature:

Parameters

X (Any) –
pipeline_name (union type, optional) –
Name of the pipeline to use for prediction
- string
  
  Which pipeline to pick. Must be in the list returned by summary.
- or None
  
  Run predict on the first pipeline.

Returns

result

Return type

Any