make_starwars_check_data: Star Wars Characters Dataset¶
The make_starwars_check_data function generates a synthetic Star Wars-like dataset containing demographics for 87 characters. Inspired by the dplyr dataset in R, this utility is perfect for practicing categorical analysis, grouping operations, and basic data exploration.
Overview¶
This utility creates a character dataset similar to the original R dataset:
- Structure: 87 observations (characters).
- Variables: Height, Mass, Species, and Character Names (synthetic).
- Characteristics:
- Species distribution is skewed towards 'Human' (approx. 55%).
- Physical traits like height and mass are statistically distinct between species (e.g., Wookiees are taller/heavier, Hutts are very heavy).
- Purpose: Ideal for data manipulation tasks (filtering, grouping), joining tables, and categorical visualization.
- Reproducibility: Uses a fixed seed (default 42).
Parameters¶
| Parameter | Type | Description | Default |
|---|---|---|---|
n |
int | Number of characters to generate. | 87 |
seed |
int | Random seed for reproducibility. | 42 |
Returns¶
| Return | Type | Description |
|---|---|---|
height_cm |
numpy.ndarray | Character heights in cm. Shape (n,). |
mass_kg |
numpy.ndarray | Character masses in kg. Shape (n,). |
species |
list[str] | Species label for each entry (e.g., 'Human', 'Wookiee', 'Droid'). |
names |
list[str] | List of placeholder character names (e.g., 'Character 1'). |
Example Usage¶
from machinegnostics.data import make_starwars_check_data
import pandas as pd
# Generate character data
h, m, s, names = make_starwars_check_data()
# Create a DataFrame for easy viewing
df = pd.DataFrame({
'Name': names,
'Species': s,
'Height': h,
'Mass': m
})
print(df.head())
# Output (approx):
# Name Species Height Mass
# 0 Character 1 Human 176.452312 81.231231
# 1 Character 2 Droid 168.123123 84.512341
# ...
# Find the average mass of Humans
human_mass = df[df['Species'] == 'Human']['Mass'].mean()
print(f"Avg Human Mass: {human_mass:.2f} kg")