Machine Learning and Artificial Intelligence are considered as an integral part of future technologies.
Artificial Intelligence is an area focused on developing intelligent machines that work and react like humans. To achieve this Artificial Intelligence considers all the traits that can help achieve the feat, these traits include perception, learning, and planning. Machine learning, on the other hand, focuses on the development of programs in such a way that systems can access data and use it to learn for themselves Artificial Intelligence focuses on making machines smart i.e. react as the situation demands whereas machine learning is based on providing machines access to data, making them learn themselves which makes their decisions learned rather than smart.
For purview of our topic lets focus on Machine Learning now.
Machine Learning can be implemented in different programming languages and Python leads the lot by being the most widely used amongst them. All the tech giants are investing a major amount of their resources in these fields and are looking for fresh minds for the same.
The most important questions. How do I get started?
Well, come to that in a moment.
For now, let’s focus on What’s needed to get started.
We need to have the knowledge of some common terms used in machine learning.
Algorithms are a set of rules that are required for problem-solving operations. Machine learning has a host of algorithms and they vary depending upon the type of machine learning.
A machine learning model consists of the learning algorithm and the training data to learn from i.e. a model is developed using an algorithm with suitable training data.
They are independent individual features that act as inputs to the model.
It is the final output. While training of the model(supervised machine learning) it acts as a pattern generator for a certain set of features and during prediction it is predicted value(output).
It describes any kind of process that is needed to prepare raw data for another processing procedure.
Now let’s get started!
First and foremost we need to start with the building blocks of ML and that DATA.
Whoa! you knew that? Great!
So, we said Data, believe me, we require a lot of it. And data comes after year’s long studies and observations and may even be erroneous and incomplete sometimes.
To handle that we need a miracle Really? Nah! We have a concept known as pre-processing (includes data encoding). Using this we convert textual data into numerical data as machine learning are mathematical functions and no mathematical functions accept textual data as input. And the features that already in numerical form use mean, median, and mode to fill up for the missing values. Ya! that’s true that the amount of makeup depends on the variance of the large collection of data.
ML can be sub-categorized into 3 parts that have supervised, unsupervised, and reinforcement. For simplicity, we’ll stick with supervised learning as of now. In this, we divide the data into input data(features) set and output data set(labels). Depending on the type of ML algorithm that we choose to move forward with, we need to decide upon the categorization of the output label data set.
Now we move on to recursive feature elimination where we select the ML algorithm and provide the number of features that will be considered at the time of model fit.
So far following me? Cool! Now the data set needs to be further divided into training and test data. We should always follow the best approaches hence we’ll go for K-fold validation that not only divides the data but also helps in getting the best possible fit for the model.
Now the training data that we get after K-fold validation is used to train the model. In this process, each fold is once used as a test data and the remaining folds are used to train the model. Hence each fold once forms a test data. Each round of the iteration is given a score.
Based on the results we determine the best combination of the folds that should be used to get the best fit model. Once we get the best fit model for the algorithm using the data that had been provided, this model can be persisted in memory and can be used in the future for the prediction of output labels.
Let’s get a better understanding of this:
Note that all the steps listed below are similar for any language but the tools and software that have been mentioned are for Python developers.
1. Dataframe Creation
Data capturing uses N-dimensional arrays of NumPy and pandas data frame.
Data frames are like excel sheets in which we can define indexes or names to rows and columns. Each column in a data frame represents a feature.
In order to convert textual data into numerical data, it’s preferable to use OneHotEncoder or LabelEncoder but it completely depends on the developer’s choice as some even prefer CountVectorizer, TfidfVectorizer, and HashingVectorizer.
Now to deal with blank values we can use SimpleImputer or imputer.
3. Splitting of the data frame
Data frames as we mentioned above can be easily broken into input and output labels.
4. Recursive Feature Elimination
Recursive feature elimination is a process of recursively removing features and building a model on the specified number of features. In this step, after breaking of data frame we use recursive feature elimination in order to use the selected machine learning algorithm and specify the number of features.
5. K-Fold Cross-Validation and model fit
Once we complete all of that we move on to K-fold validation. K-fold cross-validation is a resampling procedure used to evaluate a model on a limited set of data. Any type of K-fold validation can be used but we prefer the Stratified K-fold algorithm to divide the data into different pairs of training and test datasets and then fit them to the instance of Recursive Feature Elimination created above. The pair with the best score is used to obtain the model.
6. Model Persistence
Finally, the model is persisted using the pickle library for future predictions.
This will help the beginners strengthen their initial concepts of machine learning and will act as a jump start for your endeavors in accomplishing machine learning with python. Of course, there are options available for development in contrast to Python, the popular ones being, of course, Scala, Java and Go. But the market and libraries have a monopoly of python. If you are a person who wants to try new things the options are open. But if you want to save the pain and focus only on the product then python is always in your arsenal.
We are Salesforce Consulting Partner and focused on providing the best ML and AI solutions to any onboard problem. We are developing a dedicated forecasting application based on Machine learning, feel free to reach out to us for your AI needs. Do drop us your queries in the chat section on our website we’ll be happy to respond.