- Describe how to handle missing values
- Describe data formatting techniques
- Describe data normalization
- Demonstrate the use of binning
- Demonstrate the use of categorical variables
Question 1: How would you access the column ”body-style"
from the dataframe df
?
- A. [X]
df["body-style"]
- B. [ ]
df=="bodystyle"
Question 2: What is the correct symbol for missing data?
- A. [X] nan
- B. [ ] no-data
Question 1: How would you rename the column "city_mpg"
to "city-L/100km"
?
- A. [ ]
df.rename(columns={”city_mpg”: “city-L/100km”})
- B. [X]
df.rename(columns={”city_mpg”: “city-L/100km”}, inplace=True)
Question 1: Which of the following is the correct formula for z -score or data standardization?
c
Question 1: Why do we convert values of Categorical Variables into numerical values?
- A. [X] Most statistical models cannot take in objects or strings as inputs
- B. [ ] To save memory
In this lesson, you have learned how to:
- Identify and Handle Missing Values: Drop rows with incomplete information and impute missing data using the mean values.
- Understand Data Formatting: Wrangle features in a dataset and make them meaningful for data analysis.
- Apply normalization to a data set: By understanding the relevance of using feature scaling on your data and how normalization and standardization have varying effects on your data analysis.
Question 1: What task do the following lines of code perform?
avg=df['horsepower'].mean(axis=0)
df['horsepower'].replace(np.nan, avg)
- A. [ ] calculate the mean value for the
'horsepower'
column and replace all the NaN values of that column by the mean value - B. [X] nothing; because the parameter
inplace
is not set to true - C. [ ] replace all the
NaN
values with the mean.
Question 2: Consider the dataframe df
; convert the column df["city-mpg"]
to df["city-L/100km']
by dividing 235 by each element in the column 'city-mpg'
.
- A. [ ]
df['city-L/100km'] = df["city-mpg"].diV(235)
- B. [X]
df['city-L/100km'] = 235/df["city-mpg"]
Question 3: What data type is the following set of numbers? 666, 1.1,232,23.12
- A. [X] float
- B. [ ] int
- C. [ ] object
Question 4: The following code is an example of:
(df["length"]-df["length"].mean())/df["length"].std()
- A. [ ] simple feature scaling
- B. [X] z-score
- C. [ ] min-max scaling
Question 5: Consider the two columns 'horsepower'
, and 'horsepower-binned'
; from the dataframe df
; how many categories are there in the 'horsepower-binned'
column?
3