Interactive SkLearn Series - Pipelines
Stop manual preprocessing. We'll use Pipeline to chain steps—scaling, encoding, and modeling—into a single object, ensuring your training and testing data undergo identical transformations.
Interactive SkLearn Series - Custom Transformers
Extend sklearn's capabilities by building your own transformers. We’ll subclass BaseEstimator to create custom cleaning steps that integrate seamlessly into standard sklearn pipelines.
Interactive SkLearn Series - Feature Discretization
Sometimes continuous data works better as categories. Learn to use KBinsDiscretizer to transform continuous features into buckets, helping linear models handle non-linear relationships.
Interactive SkLearn Series - Categorical Encoding
ML models require numbers, not strings. We’ll convert categorical data into machine-readable formats using One-Hot Encoding for nominal data and Ordinal Encoding for ranked categories.
Interactive SkLearn Series - Handling Missing Values
Real-world data is rarely clean. Move beyond dropping rows by learning to impute missing values using simple strategies like the mean, or advanced multivariate techniques like KNN imputation.
Interactive SkLearn Series - Feature Scaling
Algorithms like SVM and KNN are sensitive to scale. We’ll apply Standardization and Min-Max scaling to normalize features, ensuring no single variable dominates the model due to magnitude.
Interactive SkLearn Series - Data Splitting Strategies
Never test on training data. Learn to use train_test_split for validation, and explore stratified splitting to maintain class balance and time-series splitting for temporal data.
Interactive SkLearn Series - Built-in Datasets
Don't spend hours finding data. We'll use sklearn.datasets to load toy datasets for quick learning and fetch larger, real-world datasets to test your models on actual problems.
Interactive SkLearn Series - The Estimator API
The heart of sklearn lies in fit, predict, and transform. Master these three methods to unlock the entire library, treating every algorithm as a standardized "estimator" object.
Interactive SkLearn Series - Data Representation
ML models expect specific structures. We'll cover feature matrices (X) and target vectors (y), and how to bridge the gap between Pandas DataFrames and NumPy arrays for seamless integration.