forked from alexsanjoseph/nist-project
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathmtconnect_challenge.Rmd
283 lines (201 loc) · 10.7 KB
/
mtconnect_challenge.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
# Introduction
For a real life working example, we have this dataset graciously provided to us by the National Institute of Standards and Technology (http://www.nist.gov/) for one of their test bed parts. We will be trying to solve the first example from the introduction, which was a real problem faced by the NIST researchers. As we walk through each step of the exploratory process, you will understand how you can use similar techniques to solve similar issues that you might have with your Machine Tools.
# Problem Statement
Accurately estimate the actual cycle time of a part, from the data of a part that was manufactured with interruptions.
# Reading Data into R
First let us define the data files that we are going to be working with. Two sets of
example data from the MTConnect Agent samples and the result of the MTConnect probe are provided
along with the package. We will be using the one provided by nist for this analysis. Note that the
package can read in a compressed file as if it were a normal file as well for the samples.
```{r, results='hide'}
suppressPackageStartupMessages(require(mtconnectR))
suppressPackageStartupMessages(require(dplyr))
file_path_adapter_log = "../data/adapter_logs/nist_test_bed/GF_Agie.tar.gz"
file_path_xml = "../data/adapter_logs/nist_test_bed/Devices.xml"
```
## Taking a quick look at the data files
Before we read in the data into the MTC Device Class, it might help us a bit in understanding
a bit about the data items that we have.
## Devices XML Data
### get_device_info_from_xml
The `MTConnectDevices` XML document has information about the logical components of one or
more devices. This file can obtained using the `probe` request from an MTConnect Agent.
We can check out the devices for which the info is present in the devices XML using the
`get_device_info_from_xml` function. From the device info, we can select the name of the
device that we want to analyse further
```{r}
(device_info = get_device_info_from_xml(file_path_xml))
device_name = device_info$name[2]
```
### get_xpaths_from_xml
The `get_xpath_from_xml` function can read in the xpath info for a single device into
a easily read data.frame format.
The data.frame contains the id and name of each data item and the xpath along with the type,
category and subType of the data_item. It is easy to find out what are the data items of a
particular type using this function. For example, we are going to find out the conditions data
items which we will be using in the next step.
```{r}
xpath_info = get_xpaths_from_xml(file_path_xml, device_name)
head(xpath_info)
```
## Getting Sample data by parsing `MTConnectStreams` data
`MTConnectStreams` data from an MTConnect Agent can be collected using a `ruby` script to
generate a delimited log of device data (referred to in this document as *log data*) which is
then used by the `mtconnectR` Package.
## Creating MTC Device Class
`create_mtc_device_from_adapter_data` function can read in both the adapter log
and the xml data for a device and combine it into a single MTCDevice Class with the
data organized separately for each data item.
```{r, results='hide'}
mtc_device = create_mtc_device_from_adapter_data(file_path_adapter_log, file_path_xml, device_name)
names(mtc_device@data_item_list)
```
# Exploring different data items
It looks like we have the position data items that we might need for this analysis in the log
data. Let's see the variation in position. We can plot all the data items in one plot using
ggplot2.
## Plotting the data
```{r}
require("ggplot2")
require("reshape2")
xpos_data = getDataItem(mtc_device, "Xposition") %>% getData()
ypos_data = getDataItem(mtc_device, "Yposition") %>% getData()
zpos_data = getDataItem(mtc_device, "Zposition") %>% getData()
ggplot() + geom_line(data = xpos_data, aes(x = timestamp, y = value))
ggplot() + geom_line(data = ypos_data, aes(x = timestamp, y = value))
ggplot() + geom_line(data = zpos_data, aes(x = timestamp, y = value))
```
## Merging different data items for simultaneous analysis
It looks like the machine is going back and forth quite some distance quite often, across
all the axes. We also don't know how this traversal varies across different axis.
However, we can get a much better idea of the motion if we could plot one
axis against the other. For that we have to merge the different data items. Since the
different data items have different timestamp values as the key, it is not as straightforward
as doing a join of one data item against the other. For this purpose, the `mtconnect` packge has a merge
method defined for the MTCDevice Class
```{r}
merged_pos_data = merge(mtc_device, "position") # merge all dataitems with the word position
head(merged_pos_data)
```
Oops. Looks like we have also merged in the angular position. Let's try a more
directed merge. Also, the names of the data items have the full xpaths attached to them. While this might be
useful in other circumstances to get the hierarchical position of the data, we can dispense with it
now using the `extract_param_from_xpath` function. Let's view the data after that
```{r}
merged_pos_data = merge(mtc_device, "position<POSITION-ACTUAL") # merge all dataitems with the word position
names(merged_pos_data) = extract_param_from_xpath(names(merged_pos_data), param = "DIName", show_warnings = F)
head(merged_pos_data)
```
Much better. Now let's plot the data items in one shot.
```{r}
ggplot(data = merged_pos_data, aes(x = timestamp)) +
geom_line(aes(y = Xposition, col = 'Xpos')) +
geom_line(aes(y = Yposition, col = 'Ypos')) +
geom_line(aes(y = Zposition, col = 'Zpos')) +
theme(legend.title = element_blank())
```
It does look the sudden traverals are simultaenous across the axes. Plotting one axes
against the other leads to the same conclusion. It also gives us an idea of the different
representations of the part
```{r}
ggplot(data = merged_pos_data, aes(x = Xposition, y = Yposition)) + geom_path()
ggplot(data = merged_pos_data, aes(x = Xposition, y = Zposition)) + geom_path()
ggplot(data = merged_pos_data, aes(x = Zposition, y = Yposition)) + geom_path()
```
So the machine tool is going to the origin every so often.
# Deriving new process parameters
It might help our analysis to also calculate a few process parameters that the machine
tool is not providing directly. Here we are going to calculate
- Path Feedrate
- Origin in machine axes
- Distance from origin
## Derived Path Feedrate
Path Feedrate can be calculated as the rate of change of the position values. Here,
we must use the 3-dimensional distance value and not just one of the position vectors.
PFR = Total Distance / Total Time
= Sqrt(Sum of Squares of distane along individual axis) / time taken for motion
```{r}
position_change_3d =
((lead(merged_pos_data$Xposition, 1) - merged_pos_data$Xposition) ^ 2 +
(lead(merged_pos_data$Yposition, 1) - merged_pos_data$Yposition) ^ 2 +
(lead(merged_pos_data$Zposition, 1) - merged_pos_data$Zposition) ^ 2 ) ^ 0.5
merged_pos_data$time_taken =
lead(as.numeric(merged_pos_data$timestamp), 1) - as.numeric(merged_pos_data$timestamp)
merged_pos_data$pfr = round(position_change_3d / merged_pos_data$time_taken, 4)
dt.df <- melt(merged_pos_data, measure.vars = c("pfr", "Xposition", "Yposition"))
ggplot(dt.df, aes(x = timestamp, y = value)) +
geom_line(aes(color = variable)) +
facet_grid(variable ~ ., scales = "free_y")
ggplot(data = merged_pos_data, aes(x = timestamp)) +
geom_step(aes(y = pfr)) +
geom_step(aes(y = Xposition))
```
Let's add this derived data back into the MTCDevice Class.
```{r}
pfr_data = merged_pos_data %>% select(timestamp, value = pfr) # Structuring data correctly
mtc_device = add_data_item_to_mtc_device(mtc_device, pfr_data, data_item_name = "pfr<PATH_FEEDRATE>",
data_item_type = "Sample", source_type = "calculated")
names(mtc_device@data_item_list)
```
# Identifying Inefficiencies
## Idle times
Our first task is to identify the periods when the machine was idle. For this we
can use a few approaches.
- Find out the times when the execution status was not active OR
- Find out the times when the machine was not feeding (PFR~0) OR
- Find the periods when the feed override was zero
We will be trying out all the approaches and choosing union of the three as the period
when machine is idle.
```{r}
# Getting all the relevant data
merged_data = merge(mtc_device, "EXECUTION|PATH_FEEDRATE|POSITION")
names(merged_data) = extract_param_from_xpath(names(merged_data), param = "DIName", show_warnings = F)
merged_data = merged_data %>%
mutate(exec_idle = F, feed_idle = F, override_idle = F) %>% # Setting everything false by default
mutate(exec_idle = replace(exec_idle, !(execution %in% "ACTIVE"), TRUE)) %>%
mutate(feed_idle = replace(feed_idle, pfr < 0.01, TRUE)) %>%
mutate(override_idle = replace(override_idle, Fovr < 1, TRUE)) %>%
mutate(machine_idle = as.logical(exec_idle + feed_idle + override_idle))
head(merged_data)
```
## Machine tool at origin
We need to identify the time spent by the machine at origin. Let's look at the
X - Y graph again
```{r}
ggplot(data = merged_pos_data, aes(x = Xposition, y = Yposition)) + geom_path()
```
It is clear that the periods when the machine was origin are roughly X > 30, Y < -30.
Adding this into the mix
```{r}
merged_data_final = merged_data %>%
mutate(at_origin = F) %>% # Setting everything false by default
mutate(at_origin = replace(at_origin, Xposition > 30 & Yposition < -30, TRUE)) %>%
select(timestamp, machine_idle, at_origin)
head(merged_data_final)
```
# Calculating Summary Statistics
Now we have all the data at our disposal to calculate the time statistics. First we
need to convert the time series into interval format to get the duratins. We can use
`convert_ts_to_interval` function to do the same.
```{r}
merged_data_intervals = convert_ts_to_interval(merged_data_final)
head(merged_data_intervals)
```
Now we can aggregate across the different states to find the total amount of time
in each state.
```{r}
time_summary = merged_data_intervals %>% group_by(machine_idle, at_origin) %>%
summarise(total_time = sum(duration, na.rm = T))
total_time = sum(time_summary$total_time)
efficient_time = sum(time_summary$total_time[1])
inefficient_time = sum(time_summary$total_time[2:4])
interrupted_time = sum(time_summary$total_time[3:4])
time_at_origin = sum(time_summary$total_time[c(2,4)])
```
```{r, echo = FALSE}
print("Results")
print(paste0("Total Time of Operation (including interruptions) = ", total_time, "s"))
print(paste0("Total Time without identified inefficiencies = ", efficient_time, "s"))
print(paste0("Total Time wasted due to interruptions = ", interrupted_time, "s"))
print(paste0("Total Time wasted due to being at origin = ", time_at_origin, "s"))
```