That is how to import data from csv file in Python.
import pandas as pd
dataset = pd.read_csv('Data.csv')
It will create a data frame and it will contain exactly the same rows and columns and values from file Data.csv. We will use library pandas which we gave it the shortcut named pd.
x = dataset.iloc[:, :-1].values
y = dataset.iloc[:, -1].values
We call x for the matrix of features and y for the dependent variable vector. First we will get data from dataset so I am adding a dot after dataset to use pandas's data frame "iloc". We can specify the rows that we want to get and put into x. And the trick to take all the rows whatever dataset you have with whatever number of rows is to add a ":". Because a colon in python means a range and when we specify a range without the lower bound and neither the upper bound that means in Python that we are taking everything in the range.
We have to specify which columns want to select with the indexes and to separate he rows that we just took from the columns we need here to add a comma. Add a new range here which this time will be colon minus one. Because colon here means the range. We are taking a range here on the left we have nothing that means that we are taking the first index in the index 0 because index in Python started from 0 and then we are going up to minus one. Because the last column in Python means the index of the last column. It will take all the columns excluding the last one.
In order to finish this line of code, we add dot values and this means that we are taking indeed all the values in all the rows of this dataset and all columns except the last column.
In y, we only want to get the last column. We definitely don't want to get a range. Minus one is exactly the index of the last column.
The last you can check the data which you import from Data.csv to x and y by
print(x)
print(y)
Comments
Post a Comment