mtconnect_challenge.Rmd


# Introduction

For a real life working example, we have this dataset graciously provided to us by the National Institute of Standards and Technology (http://www.nist.gov/) for one of their test bed parts. We will be trying to solve the first example from the introduction, which was a real problem faced by the NIST researchers. As we walk through each step of the exploratory process, you will understand how you can use similar techniques to solve similar issues that you might have with your Machine Tools.

# Problem Statement

Accurately estimate the actual cycle time of a part, from the data of a part that was manufactured with interruptions.

# Reading Data into R
First let us define the data files that we are going to be working with. Two sets of 
example data from the MTConnect Agent samples and the result of the MTConnect probe are provided 
along with the package. We will be using the one provided by nist for this analysis. Note that the 
package can read in a compressed file as if it were a normal file as well for the samples.

```{r, results='hide'}
suppressPackageStartupMessages(require(mtconnectR))
suppressPackageStartupMessages(require(dplyr))
file_path_adapter_log = "../data/adapter_logs/nist_test_bed/GF_Agie.tar.gz"
file_path_xml = "../data/adapter_logs/nist_test_bed/Devices.xml"
```

## Taking a quick look at the data files

Before we read in the data into the MTC Device Class, it might help us a bit in understanding
a bit about the data items that we have. 

## Devices XML Data

### get_device_info_from_xml

The `MTConnectDevices` XML document has information about the logical components of one or
more devices. This file can obtained using the `probe` request from an MTConnect Agent.

We can check out the devices for which the info is present in the devices XML using the
`get_device_info_from_xml` function. From the device info, we can select the name of the 
device that we want to analyse further

```{r}
(device_info = get_device_info_from_xml(file_path_xml))
device_name = device_info$name[2]
```

### get_xpaths_from_xml

The `get_xpath_from_xml` function can read in the xpath info for a single device into
a easily read data.frame format.

The data.frame contains the id and name of each data item and the xpath along with the type,
category and subType of the data_item. It is easy to find out what are the data items of a
particular type using this function. For example, we are going to find out the conditions data
items which we will be using in the next step.

```{r}
xpath_info = get_xpaths_from_xml(file_path_xml, device_name)
head(xpath_info)

```

## Getting Sample data by parsing `MTConnectStreams` data

`MTConnectStreams` data from an MTConnect Agent can be collected using a `ruby` script to
generate a delimited log of device data (referred to in this document as *log data*) which is
then used by the `mtconnectR` Package.


## Creating MTC Device Class

`create_mtc_device_from_adapter_data` function can read in both the adapter log
and the xml data for a device and combine it into a single MTCDevice Class with the
data organized separately for each data item.


```{r, results='hide'}
mtc_device = create_mtc_device_from_adapter_data(file_path_adapter_log, file_path_xml, device_name)
names(mtc_device@data_item_list)

```

# Exploring different data items

It looks like we have the position data items that we might need for this analysis in the log
data. Let's see the variation in position. We can plot all the data items in one plot using
ggplot2.

## Plotting the data

```{r}
require("ggplot2")
require("reshape2")
xpos_data = getDataItem(mtc_device, "Xposition") %>% getData()
ypos_data = getDataItem(mtc_device, "Yposition") %>% getData()
zpos_data = getDataItem(mtc_device, "Zposition") %>% getData()

ggplot() + geom_line(data = xpos_data, aes(x = timestamp, y = value))
ggplot() + geom_line(data = ypos_data, aes(x = timestamp, y = value))
ggplot() + geom_line(data = zpos_data, aes(x = timestamp, y = value))

```

## Merging different data items for simultaneous analysis

It looks like the machine is going back and forth quite some distance quite often, across
all the axes. We also don't know how this traversal varies across different axis.
However, we can get a much better idea of the motion if we could plot one 
axis against the other. For that we have to merge the different data items. Since the 
different data items have different timestamp values as the key, it is not as straightforward
as doing a join of one data item against the other. For this purpose, the `mtconnect` packge has a merge
method defined for the MTCDevice Class

```{r}
merged_pos_data = merge(mtc_device, "position") # merge all dataitems with the word position
head(merged_pos_data)
```

Oops. Looks like we have also merged in the angular position. Let's try a more 
directed merge. Also, the names of the data items have the full xpaths attached to them. While this might be
useful in other circumstances to get the hierarchical position of the data, we can dispense with it
now using the `extract_param_from_xpath` function. Let's view the data after that


```{r}
merged_pos_data = merge(mtc_device, "position<POSITION-ACTUAL") # merge all dataitems with the word position
names(merged_pos_data) = extract_param_from_xpath(names(merged_pos_data), param = "DIName", show_warnings = F)
head(merged_pos_data)
```

Much better. Now let's plot the data items in one shot.

```{r}
ggplot(data = merged_pos_data, aes(x = timestamp)) +
  geom_line(aes(y = Xposition, col = 'Xpos')) +
  geom_line(aes(y = Yposition, col = 'Ypos')) +
  geom_line(aes(y = Zposition, col = 'Zpos')) +
  theme(legend.title = element_blank())

```

It does look the sudden traverals are simultaenous across the axes. Plotting one axes
against the other leads to the same conclusion. It also gives us an idea of the different
representations of the part

```{r}
ggplot(data = merged_pos_data, aes(x = Xposition, y = Yposition)) + geom_path()
ggplot(data = merged_pos_data, aes(x = Xposition, y = Zposition)) + geom_path()
ggplot(data = merged_pos_data, aes(x = Zposition, y = Yposition)) + geom_path()

```
So the machine tool is going to the origin every so often. 

# Deriving new process parameters

It might help our analysis to also calculate a few process parameters that the machine
tool is not providing directly. Here we are going to calculate 

- Path Feedrate
- Origin in machine axes
- Distance from origin

## Derived Path Feedrate

Path Feedrate can be calculated as the rate of change of the position values. Here,
we must use the 3-dimensional distance value and not just one of the position vectors.

PFR = Total Distance / Total Time
    = Sqrt(Sum of Squares of distane along individual axis) / time taken for motion

```{r}
position_change_3d = 
  ((lead(merged_pos_data$Xposition, 1) - merged_pos_data$Xposition) ^ 2 +
  (lead(merged_pos_data$Yposition, 1) - merged_pos_data$Yposition) ^ 2 +
  (lead(merged_pos_data$Zposition, 1) - merged_pos_data$Zposition) ^ 2 ) ^ 0.5

merged_pos_data$time_taken = 
  lead(as.numeric(merged_pos_data$timestamp), 1) - as.numeric(merged_pos_data$timestamp)

merged_pos_data$pfr = round(position_change_3d / merged_pos_data$time_taken, 4)

dt.df <- melt(merged_pos_data, measure.vars = c("pfr", "Xposition", "Yposition"))
ggplot(dt.df, aes(x = timestamp, y = value)) +
  geom_line(aes(color = variable)) +
  facet_grid(variable ~ ., scales = "free_y") 

ggplot(data = merged_pos_data, aes(x = timestamp)) + 
  geom_step(aes(y = pfr)) +
  geom_step(aes(y = Xposition)) 

```

Let's add this derived data back into the MTCDevice Class.

```{r}
pfr_data = merged_pos_data %>% select(timestamp, value = pfr) # Structuring data correctly
mtc_device = add_data_item_to_mtc_device(mtc_device, pfr_data, data_item_name = "pfr<PATH_FEEDRATE>",
                                         data_item_type = "Sample", source_type = "calculated")
names(mtc_device@data_item_list)
```

# Identifying Inefficiencies

## Idle times
Our first task is to identify the periods when the machine was idle. For this we 
can use a few approaches.

- Find out the times when the execution status was not active OR
- Find out the times when the machine was not feeding (PFR~0) OR
- Find the periods when the feed override was zero

We will be trying out all the approaches and choosing union of the three as the period
when machine is idle.

```{r}
# Getting all the relevant data
merged_data = merge(mtc_device, "EXECUTION|PATH_FEEDRATE|POSITION")
names(merged_data) = extract_param_from_xpath(names(merged_data), param = "DIName", show_warnings = F)

merged_data = merged_data %>% 
  mutate(exec_idle = F, feed_idle = F, override_idle = F) %>% # Setting everything false by default
  mutate(exec_idle = replace(exec_idle, !(execution %in% "ACTIVE"), TRUE)) %>% 
  mutate(feed_idle = replace(feed_idle, pfr < 0.01, TRUE)) %>% 
  mutate(override_idle = replace(override_idle, Fovr < 1, TRUE)) %>% 
  mutate(machine_idle = as.logical(exec_idle + feed_idle + override_idle))
head(merged_data)  

```

## Machine tool at origin

We need to identify the time spent by the machine at origin. Let's look at the 
X - Y graph again

```{r}
ggplot(data = merged_pos_data, aes(x = Xposition, y = Yposition)) + geom_path()

```

It is clear that the periods when the machine was origin are roughly X > 30, Y < -30.
Adding this into the mix

```{r}
merged_data_final = merged_data %>% 
  mutate(at_origin = F) %>% # Setting everything false by default
  mutate(at_origin = replace(at_origin, Xposition > 30 & Yposition < -30, TRUE)) %>% 
  select(timestamp, machine_idle, at_origin)
head(merged_data_final)

```

# Calculating Summary Statistics

Now we have all the data at our disposal to calculate the time statistics. First we 
need to convert the time series into interval format to get the duratins. We can use
`convert_ts_to_interval` function to do the same.

```{r}
merged_data_intervals = convert_ts_to_interval(merged_data_final)
head(merged_data_intervals)

```

Now we can aggregate across the different states to find the total amount of time
in each state.

```{r}
time_summary = merged_data_intervals %>% group_by(machine_idle, at_origin) %>% 
  summarise(total_time = sum(duration, na.rm = T))

total_time = sum(time_summary$total_time)
efficient_time = sum(time_summary$total_time[1])
inefficient_time = sum(time_summary$total_time[2:4])
interrupted_time = sum(time_summary$total_time[3:4])
time_at_origin = sum(time_summary$total_time[c(2,4)])

```

```{r, echo = FALSE}
print("Results")
print(paste0("Total Time of Operation (including interruptions) = ", total_time, "s"))
print(paste0("Total Time without identified inefficiencies = ", efficient_time, "s"))
print(paste0("Total Time wasted due to interruptions = ", interrupted_time, "s"))
print(paste0("Total Time wasted due to being at origin = ", time_at_origin, "s"))

```