Machine Learning Tools
(Set up the environment)

Why Python?

Why Python?

The Alternatives

MATLAB - good for matrix manipulations, plotting of functions and data, implementation of algorithms, numerical and symbolic computing.
cons: Proprietary Software, Domain Specific
GNU Octave - a scientific Programming Language. Could be regraded as the free equivalent of MATLAB
cons: Domain Specific
R Programming Language - a free software environment for statistical computing and graphics
cons: Complex nature and wide learning curve
Java - Matured, multipurpose language. Lots of machine learning algorithms are written in Java

Python's pros

Clean syntax - easy to write and read
Matured Multipurpose Programming Language
Vibrant community - scientists, programmers
Myriad of ML packages and tools

Setting up Python

Anaconda users notes

Next slides guides you trhough the process of installing the needed tools for DataScience/ML in Python.

But if you are already comfortable with Anaconda (i.e. you know what your $PYTHON_PATH is) you should skip the next slides.

Install Python on Windows - HowTo

Note, we'll be using Python 3!

Download the Installer:
Python Releases for Windows
Select 32-bit and 64-bit versions according to your system
Customize installation:
installation location: C:/Python3 or whatever you like
Select "Add Python to PATH"

Install Python on Windows - more info

How to Install Python on Windows @howtogeek.com
Using Python on Windows @docs.python.org

Install Python on Mac - more info

Using Python on a Macintosh @docs.python.org

Install Python on Linux

Using Python on Linux @docs.python.org

Verify installation:

Open a Command Prompt/Terminal and type:

					# check for python installed version:
					python --versions

					# check for pip installed version:
					pip --version
				
On Unix based systems, use python3 and pip3 instead.

PIP - the Python Package Manager

PIP - the Python Package Manager

The Python Package Index (PyPI) is a repository of software for the Python programming language.
pip is the recommended tool for installing Python packages from
Note, that by using pip, a package will be installed in your system/user python distribution.

More readings

pip - The PyPA recommended tool for installing Python packages.
PyPI - the Python Package Index

Python Virtual Environments

Python Virtual Environments

Virtual environment - Why?

Python apps usually depends on many modules(packages), which depends on other packages, and very often - from a specific version of a package.
Many system tools also depends on specific versions of installed Python packages.
You don't want to pollute your system's Python environment!

Nor to pass through the dependency hell

Virtual environment - What?

Virtual environments create an isolated Python environment for your project.
Keep workflows between projects independent.
There are many tools for working with virtual environments in Python (virtualenv, pyenv, venv, pipenv, poetry, conda, etc.)
During this course we'll be using pipenv!

Higher-level tools (pipenv)

pipenv - package, dependency and environment manager, that aims to bring the best of all packaging worlds (bundler, composer, npm, cargo, yarn, etc.) to the Python world. Windows is a first-class citizen.
Simplicity, Security, Portability (Linux, MacOS, Windows)
Easy to manage project's dependencies.Instead of pip's requirements.txt, pipenv creates two files: the Pipfile - for the packages you installed directly and Pipfile.lock for the dependencies and the specific version of the packages based on your Pipfile.

Higher-level tools (conda)

conda - package, dependency and environment management for any language—Python, R, Ruby, Lua, Scala, Java, JavaScript, C/ C++, FORTRAN
Portability (Linux, MacOS, Windows)
Included in all versions of Anaconda and Miniconda.

pipenv

pipenv

What?

A dependency and virtual environments manager combined into one.
Automatically creates and manages a virtualenvs for your projects.
Functionally similar to npm and other modern package managers.
Useful/advanced readings:
docs.pipenv.org
Simple Python Environments For Data Science

Install/Update/Remove


			$ pip install --user pipenv
		

			$ pip install --upgrade pipenv
		

			$ pip uninstall pipenv
		

Set up PATH

If pipenv isn’t available in your shell after installation, you'll need to add the user base's binary directory to your PATH.
find the user base binary directory, and:
Linux and macOS:

					python -m site --user-base
					# /home/username/.local

					# add it to your ~/.profile or ~/.bashrc file, or:
					export PATH=$PATH:/home/username/.local/bin
			
Windows:

				py -m site --user-site
				#C:\Users\Username\AppData\Roaming\Python36\site-packages

				# add to PATH:
				C:\Users\Username\AppData\Roaming\Python36\Scripts
			
You may need to log out for the PATH changes to take effect.

Manage virtualenv with pipenv

Manage virtualenv with pipenv

Notes

A virtual envorinment is asociated with your project folder (and subfolders with max.depth 3, by default.)
If you need more - set PIPENV_MAX_DEPTH
You don't have to care where the virtualenv really lives.
But you should be sure that you execute next commands in your project folder.

Create Virtual Environment

Pipenv can automatically creates a virtual environment (if there isn't any) and install a package in it, when you run:

				pipenv install packagename
			
The default(system) Pyhton version will be used

Manage virtualenv


				### Create virtualenv with system Python3:
				$ pipenv --three
				Creating a virtualenv for this project…

				### Spawn (activate) a shell within the virtualenv
				$ pipenv shell
				Spawning environment shell...

				### Exit the virtualenv
				(course_intro-SmgljMaj) $ exit
				exit
				$

				### Output virtualenv information
				$ pipenv --venv

				### Remove the virtualenv
				$ pipenv --rm
				Removing virtualenv...
		

All options available with: pipenv --help

PIPENV_VENV_IN_PROJECT

Pipenv creates virtualenv automatically for you and stores it in a predefined path and gives it a random prefixed name.
There is no (easy) way to control the name of the virtual environment being created.
But if you set the environment variable PIPENV_VENV_IN_PROJECT=1, pipenv will store the virtualenv in .venv/ folder per project bases and will name the virtualenv after your project name.

Manage packages for your project



			### install requests package and create a virtenv, if there is no one
			$ pipenv install requests
			Installing requests…

			### remove package
			$ pipenv uninstall requests
			Uninstalling requests-2.18.4
		

List as a graph the installed packages for the given virtualenv


			$ pipenv graph
		

Use the installed module in your program

Create a file my_ip.py
Write in it:

				import requests

				response = requests.get('https://httpbin.org/ip')

				print('Your IP is {0}'.format(response.json()['origin']))
			
Execute your program within the pipenv shell:

				# go to your project folder:
				$ cd your/project/folder

				# activate the virtenv:
				$ pipenv shell

				# run the program within the virtenv:
				(course_intro-SmgljMaj) $ python my_ip.py
			

Exit the virtual environment


			exit
		

Or type ctrl-d

Or close your terminal window :)

Let pipenv install Python distribution

You can create a virtual environment with any Python version, without the need to have it system installed
Pipenv can install any Python version with the use of pyenv, so you have to install it.
pyenv installation

			# install pyenv
			curl -L https://raw.githubusercontent.com/yyuu/pyenv-installer/master/bin/pyenv-installer | bash

			# add it to path
			echo 'export PYENV_ROOT="$HOME/.pyenv"' >> ~/.bash_profile
			echo 'export PATH="$PYENV_ROOT/bin:$PATH"' >> ~/.bash_profile
		

Let pipenv install Python distribution

Install the Requirements!
Now, you can make pipenv to create a virtualenv with any Python version:

				#Create a new project using Python 3.6, specifically:
				$ pipenv --python 3.6

				# you'll be asked by pipenv if you want it to install the Python 3.6, if it is not inyour system
			

Useful video

Intro to Pipenv - A Package Manager for Python by Pretty Printed

Useful video

Kenneth Reitz - Pipenv: The Future of Python Dependency Management - PyCon 2019

pipenv - example: simple_plot with matplotlib

pipenv - example: simple_plot with matplotlib

Create the file

create the file "simple_plot.py" in your project root folder.
Enter in it the code bellow:

			import matplotlib.pyplot as plt
			import numpy as np

			t = np.arange(0.0, 2.0, 0.01)
			s = 1 + np.sin(2*np.pi*t)
			plt.plot(t, s)

			plt.xlabel('time (s)')
			plt.ylabel('voltage (mV)')
			plt.title('About as simple as it gets, folks')
			plt.grid(True)
			plt.savefig("test.png")
			plt.show()
		

example from: matplotlib.org

Prepare Virtualenv

Open a terminal window and navigate to your project root folder.
Make sure you are in it:

			# navigate...
			$ cd your/project/root/folder

			# make sure you are in it - get your current working directory:
			$ pwd
			your/project/root/folder
		

Run the program

try to run the simple_plot.py program


			$ python simple_plot.py
			Traceback (most recent call last):
				File "simple_plot.py", line 1, in <module>
					import matplotlib.pyplot as plt
			ImportError: No module named matplotlib.pyplot
		

Yes, an error occurs, because we do not have the required module (matplotlib) installed

Install dependencies in a virtual environment

We will install the required modules in a safe virtual environment by pipenv!


			# install packages safely with pipenv:
			$ pipenv install matplotlib
			Creating a virtualenv for this project…
			...
			To activate this project`s virtualenv, run the following
				$ pipenv shell
		

Activate the virtual environment


				$ pipenv shell
					Spawning environment shell...
			

Run the program within the virtual environment

Run the simple_plot.py program:

				python simple_plot.py
			
It should run now - a windows with simple plots wil be opened:

All the steps in a gif

Set up numpy, pandas, matplotlib, Jupyter Notebooks, scikit-learn

Meet your tools

numpy

NumPy is the fundamental package for scientific computing with Python.
It contains among other things:
a powerful N-dimensional array object
sophisticated (broadcasting) functions
tools for integrating C/C++ and Fortran code
useful linear algebra, Fourier transform, and random number capabilities
Reference: numpy.org

			$ pipenv install numpy
		

pandas

The Python Data Analysis Library
Provides high-performance, easy-to-use data structures and data analysis tools
An open source, BSD-licensed library
Build on top of numpy
Lot's of Python ML tools are build on top of pandas (statsmodels and scikit-learn)
Reference: pandas.pydata.org:)

			$ pipenv install pandas
		

Matplotlib

A Python 2D plotting library
Produces publication quality figures in a variety of formats
Matplotlib tries to make easy things easy and hard things possible
Can generate plots, histograms, power spectra, bar charts, errorcharts, scatterplots, etc., with just a few lines of code
Reference: matplotlib.org

			$ pipenv install matplotlib
		

seaborn

Seaborn is a sofisticated Python data visualization library
Based on matplotlib
Closely integrated with pandas data structures
Features: An introduction to seaborn
Reference: seaborn.pydata.org

			$ pipenv install seaborn
		

The Jupyter Notebook

A Web application that allows you to create and share documents that contain live code, equations, visualizations and narrative text
Supports for over 40 programming languages
Your code can produce rich, interactive output
Big data integration
Easily share your Notebooks using email, Dropbox, GitHub, the Jupyter Notebook Viewer and more
Reference: jupyter.org

Note, in this course we will use the JupyterLab

JupyterLab

JupyterLab is the next-generation user interface for Project Jupyter
Has all the familiar building blocks of the classic Jupyter Notebook
Provides more flexible and powerful user interface.
Can be extended using extensions that are npm packages
Reference: jupyterlab @github

			$ pipenv install jupyterlab
		

scikit-learn

scikit-learn - Machine Learning in Python
Simple and efficient tools for data mining and data analysis
Accessible to everybody, and reusable in various contexts
Built on NumPy, SciPy, and matplotlib
Open source, commercially usable - BSD license
Reference: scikit-learn.org

			$ pipenv install scikit-learn
		

One-line Install


			# navigate to your project folder:
			$ cd my/project/folder

			# install packages:
			$ pipenv install numpy pandas seaborn matplotlib jupyterlab scikit-learn
		

Git && GitHub

Git && GitHub

What is a VCS and how it makes life easier?

VCSVersionControlSystem
a system that records changes to a file (or directory) over time so that you can recall specific versions later
You can version control all kind of work you do:
Poetry, Images, Music, ...
Programming
DataScience/ML Projects and the DataSets
Easily share your work with other team members
There are different types and implementations of VCS.
In this course we'll use git && GitHub

Installing and configuring git

Installing Git
Note for Windows Users - install from gitforwindows.org which also brings a nice features like git bash, shell integration and more

GitHub

github.com is Web hosting service for projects using git
Alternatives:
GitLab
BitBucket
Create a free account on GitHub Join GitHub

VC Your Project and share it on GitHub

Adding an existing project to GitHub using the command line

Test it with the Demo Project

Test it with the Demo Project

Live Demo

Repo: ProgressBG-MLwithPython-Code
Branch: machine_learning_tools

References

References

NumPy on Youtube

Intro to Numerical Computing with NumPy (Beginner) | SciPy 2019 Tutorial | Alex Chabot-Leclerc

Pandas on Youtube

Brandon Rhodes - Pandas From The Ground Up - PyCon 2015

JupyterLab on YouTube

JupyterLab: The Next-Generation Jupyter Frontend

These slides are based on

customised version of

Hakimel's reveal.js

framework