Safekipedia
2010 in artificial intelligence2010 softwareData mining and machine learning softwareFree software programmed in Python

Scikit-learn

Adapted from Wikipedia · Discoverer experience

Logo of the machine learning framework scikit learn without tagline

Scikit-learn (formerly scikits.learn and also known as sklearn) is a free and open-source machine learning library for the Python programming language. It helps computers learn from data by using different methods such as classification, regression, and clustering. Some of these methods include support-vector machines, random forests, gradient boosting, k-means, and DBSCAN.

One of the great things about scikit-learn is that it works well with other Python tools like NumPy and SciPy, which are used for handling numbers and scientific calculations. This makes it easier for people to build and test machine learning models without starting from scratch.

Scikit-learn is supported by NumFOCUS, a group that helps keep important open-source projects running. Because it is free and open to everyone, scikit-learn has become very popular among scientists, teachers, and students who want to explore how machines can learn and make decisions.

Overview

The scikit-learn project began as a summer project by French data scientist David Cournapeau. It was later led by a group of researchers from France who released the first public version in 2010. Today, scikit-learn is one of the most popular tools for machine learning and is widely used around the world. It helps computers learn from data to make predictions and decisions.

Features

Scikit-learn provides many useful tools for working with data. It includes a wide range of machine learning methods and ways to prepare data before using it.

The library also offers helpful functions for common tasks, like dividing data into parts for testing and checking results. It makes it easy to follow a clear process from start to finish when working with data.

Examples

Here is a simple example of how to use a random forest classifier with scikit-learn. You start by importing the RandomForestClassifier and creating an instance of it. Then, you provide some data (called features) and labels (called classes) to teach the classifier. Finally, you use the .fit() method to train the classifier with your data.

>>> from sklearn.ensemble import RandomForestClassifier
>>> classifier = RandomForestClassifier(random_state=0)
>>> X = [[ 1,  2,  3],  # 2 samples, 3 features
...      [11, 12, 13]]
>>> y = [0, 1]  # classes of each sample
>>> classifier.fit(X, y)
RandomForestClassifier(random_state=0)

Implementation

Scikit-learn is mostly written in Python and uses NumPy for powerful math and data tasks. Some important parts are written in Cython to make them work faster.

It works well with many other Python tools, like Matplotlib and plotly for making graphs, Pandas for handling data tables, and SciPy for more advanced scientific calculations.

History

Scikit-learn was initially developed by David Cournapeau as a Google Summer of Code project in 2007. Later that year, Matthieu Brucher joined the project and began using it for his thesis work. In 2010, INRIA, the French Institute for Research in Computer Science and Automation, got involved, and the first public release (v0.1 beta) was published in late January 2010.

The project released its first stable version, 1.0.0, on September 24, 2021. The latest version, 1.8, was released on December 10, 2025. This update introduced native Array API support, allowing the library to use PyTorch and CuPy arrays for GPU computations, along with various bug fixes and new features.

Applications

Scikit-learn is used in many different fields to help computers learn from data. In finance, companies like AXA and J.P. Morgan use it to make better decisions about insurance and banking. In retail, places like Booking.com use it to suggest hotels and find fake bookings.

Other companies, such as Spotify and Evernote, use scikit-learn to make recommendations and manage large amounts of information. Schools like Télécom ParisTech teach students how to use it for learning about machine learning.

Awards

Scikit-learn has received notable awards for its contributions to machine learning and open science. In 2019, it was awarded the Inria-French Academy of Sciences-Dassault Systèmes Innovation Prize for its major impact on free software in machine learning. In 2022, it received the Open Science Award for Open Source Research Software from the French Ministry of Higher Education and Research, recognized for its technical quality, large international contributor network, and excellent documentation.

This article is a child-friendly adaptation of the Wikipedia article on Scikit-learn, available under CC BY-SA 4.0.

Images from Wikimedia Commons. Tap any image to view credits and license.