Types of Variables in Machine Learning.
A variable is any characteristics, number, or quantity that can be measured or counted. A variable may also be called a data item. Understanding variables is one of the key steps of working with data in machine learning.
Variable are generally classified into three categories:
- Categorical variables
- Numerical variables
- Mixed variables
Categorical variables.
Categorical variables are also known as qualitative variables that capture outcomes by placing observations into various groups. These groups are mutually exclusive.
One of the common groups that are normally used are:
- gender i.e male or female or binary.
- Marriage status i.e married or not married
- Education level i.e student, graduate, postgraduate
Categorical variable are further divided into two groups:
-Nominal variables
- Ordinal variables.
Nominal Variables — represent outcomes for which the order of the groups is irrelevant.
Some of the common examples are gender classification groups such as binary, male or female.
Ordinal Variables — ordinal variables on the other hand represent outcomes for which the order of the group has some relevance.
Some commonly used examples include classifying Grades as A, B or C. Classifying objects as Large, Larger or Largest.
2. Numerical Variables.
Also known as quantitative variables, they represent quantifiable outcomes whose values are actual numbers. Numerical variables are further classified into two types and these include:
- Discrete variables
- Continuous variable
Discrete variables — Discrete variables are obtained by counting and they assume specific values that cannot be divided into decimal or fractional form.
Discrete variables represent entities such as the number of persons, cars or animals.
Continuous variables — Continuous variables on the other hand result from measurement. Continuous variables represent measurable entities and can take an infinite number of values.
Some common measurements that are classified as continuous variables include: the height, weight, price of a certain entity etc.
3. Mixed Variables
These are random variables that are neither discrete nor continuous. There are a mixture of both where some part is discrete while the other part is continuous.
An example of a random variable of mixed type would be based on an experiment where a coin is flipped and the spinner is spun only if the result of the coin toss is heads. If the result is tails, X = −1; otherwise X = the value of the spinner as in the preceding example. There is a probability of 1⁄2 that this random variable will have the value −1. Other ranges of values would have half the probabilities of the last example.
Variables are important in machine learning as they largely dictate the type of model that can be used on the data.
Thankyou! :-)