Use electron to load websites and extract data. Intended for automation, testing, web scraping, etc.
Loads URLs inside an electron webview tag, allows you to execute code on them and stream data from the pages back to your main process.
Run this headlessly on Linux using xvfb-run
.
Please note this is intended to be a fairly low level library that tries to not add much on top of what Electron is doing under the hood, so things that you might think are simple to do can turn out to be relatively complex due to the way web browser events end up working.
Use this in an electron app:
var electron = require('electron')
var createMicroscope = require('electron-microscope')
electron.app.on('ready', function () {
createMicroscope(function (err, scope) {
if (err) throw err
// use your new microscope
})
})
Run it with electron:
$ npm install electron-prebuilt -g
$ electron my-code.js
See the test/
and examples/
folders
Requiring the module returns a constructor function that you use to create a new instance. Pass it an options
object and a ready
callback that will be called with (error, scope)
. scope
is your new instance all ready to go.
The electon BrowserWindow instance, AKA the renderer, which contains the <webview>
that pages are loaded in.
Currently because there are three node processes at play (main, renderer, webview), to access webview
APIs you have to go through the window
, e.g.:
scope.window.webContents.executeJavaScript("document.querySelector('webview').goBack()")
Load a url
, and call cb
with (err)
when loading is done. If there was a problem loading the page err
will be the error, otherwise it means it loaded successfully
Run code
on the currently loaded page. Run this after calling loadURL
. Code must be a string, if it is a function
then .toString()
will be called on it. scope.run
returns a readable stream that emits data generated by your code.
Uses the webview.executeJavascript electron API, which doesn't provide an error handling mechamism. Electron microscope wraps your code in a try/catch
and if an error occurs it will be emitted on the stream. However if you have a syntax error it will likely not catch it so it may appear nothing is happening.
You code must be a function that has this template:
function (send, done) {
// put your custom code here
// call 'send(data)' to write data to the stream
// call 'done()' to end the stream
// calling send is optional, but you must eventually call done to end the stream
}
For example:
var code = `function (send, done) {
for (var i = 0; i < 5; i++) send(i)
done()
}`
var output = scope.run(code)
output.on('data', function (data) {
// will get called for every time send is called above
// data will be the value passed to send
// in this case 5 times: 1, 2, 3, 4, 5
})
output.on('error', function (error) {
// will get called if your code throws an exception
// error will be an object with .message and .stack from the thrown error object
})
Emitted the page wants to start navigation. It can happen when the window.location object is changed or a link is clicked in the page.
Calls cb
with (url)
, forwarded from this event.
This event is like did-finish-load
, but fired when the load failed or was cancelled.
Calls cb
with no arguments, forwarded from this event.
This event is like did-finish-load
, but fired when the load failed or was cancelled.
Calls cb
with (error)
, forwarded from this event.
Corresponds to the points in time when the spinner of the tab starts spinning.
Calls cb
with no arguments, forwarded from this event.
Corresponds to the points in time when the spinner of the tab stops spinning.
Calls cb
with no arguments, forwarded from this event.
Call when you don't want to use the scope anymore. Causes the browser-window
elecron-microscope uses internally to close, which may cause your electron app to exit if you do not have any other active windows.