2023 Problem 4 Statistical Description of Multivariate Data for a Real World Dataset 40 points To complete this task | Assignment Collections
Computer Science 2023 Data Mining Hw
2023 Problem 4 Statistical Description of Multivariate Data for a Real World Dataset 40 points To complete this task | Assignment Collections
Problem 4: Statistical Description of Multivariate Data for a Real-World Dataset [40 points]
To complete this task you have to use the crx.data file. This file crx.data contains data collected from credit card applications. All attribute names and values have been changed to meaningless symbols to protect the confidentiality of the data. The dataset is downloaded from the UCI Machine Learning Repository (https://archive.ics.uci.edu/ml/datasets.php).
This dataset is interesting because there is a good mix of attributes — continuous, nominal with small numbers of values, and nominal with larger numbers of values. There are also a few missing values. Read the data in R using the following command.
data <- read.table(“path/crx.data”, sep = “,”);
Here, replace the path with the path of the file crx.data in your computer. After loading the data in R you can access each column using data[ , 1], data[ , 2], … , data[ , 15]. All the data will be in character format when you load it from crx.data you will have to convert the numeric columns from character to numeric using the as.numeric() function as follows. You can view the data using view(data) command.
attribute1 <- as.numeric(data[ , 2])
For missing values, NAs will be introduced by coercion.
There are 16 columns in the data the first 15 columns are the attributes of the data and the 16th column is the label of the data. You have to only analyze the attributes of the data.
- Find which attributes are the nominal attributes and which are continuous attributes.
- Identify the attribute/attributes with missing values (having NA). Drop the attributes with missing values from the data.
- Calculate the central tendency of the rest of the attributes. Remember for the nominal attribute you can only calculate the mode.
- Calculate the five-number summary of the numeric attributes.
- Show box plots for the numeric attributes and identify the attributes having outliers.
- Show pairwise scatter plots of the numeric attributes. Inspect the scatter plots and mention if each pair’s attributes are negatively correlated, positively correlated or there is no correlation.
*Do not forget to label the axes of the plots.
We give our students 100% satisfaction with their assignments, which is one of the most important reasons students prefer us to other helpers. Our professional group and planners have more than ten years of rich experience. The only reason is that we have successfully helped more than 100000 students with their assignments on our inception days. Our expert group has more than 2200 professionals in different topics, and that is not all; we get more than 300 jobs every day more than 90% of the assignment get the conversion for payment.