Skip to content

make_longley_check_data: Longley Economic Dataset

The make_longley_check_data function generates a Longley-like economic dataset. Ideally suited for testing numerical accuracy and stability in regression models, this dataset is famous for its high multicollinearity among predictors.


Overview

This utility creates a synthetic version of the Longley economic dataset:

  • Structure: 16 observations with 7 economic variables.
  • Characteristics: High collinearity between predictors like GNP, Population, and Year.
  • Purpose: Validating the numerical stability of regression algorithms (e.g., least squares) under conditions of ill-conditioning.
  • Reproducibility: Uses a fixed seed (default 42).

Parameters

Parameter Type Description Default
seed int Random seed for reproducibility. 42

Returns

Return Type Description
data numpy.ndarray Array of shape (16, 7) containing economic indicators.
column_names list[str] List of column names in order: ['GNP.deflator', 'GNP', 'Unemployed', 'Armed.Forces', 'Population', 'Year', 'Employed'].

Example Usage

from machinegnostics.data import make_longley_check_data

# Generate Longley data
data, cols = make_longley_check_data()

print(f"Shape: {data.shape}")
# Output: (16, 7)

print(f"Columns: {cols}")
# Output: ['GNP.deflator', 'GNP', 'Unemployed', 'Armed.Forces', 'Population', 'Year', 'Employed']