If you haven’t already read my guide to threads, I suggest starting there.
I’ve spent the last month implementing web worker support in Lime. (Edit: and then I spent another month after posting this.) It turned out to be incredibly complicated, and though I did my best to include documentation in the code, I think it’s worth a blog post too. Let’s go over what web workers are, why you might want to use them, and why you might not want to use them.
To save space, I’m going to assume you’ve heard of threads, race conditions, and threads in Haxe.
About BackgroundWorker and ThreadPool
BackgroundWorker and ThreadPool are Lime’s two classes for safely managing threads. They were added back in 2015, and have stayed largely unchanged since. (Until this past month, but I’ll get to that.)
The two classes fill different roles. BackgroundWorker is ideal for one-off jobs, while ThreadPool is a bit more complex but offers performance benefits when doing multiple jobs in a row.
BackgroundWorker isn’t too different from calling Thread.create() – both make a thread and run a single job. The main difference is that BackgroundWorker builds in safety features.
Recently, Haxe added its own thread pool implementations: FixedThreadPool has a constant number of threads, while ElasticThreadPool tries to add and remove threads based on demand. Lime’s ThreadPool does a combination of the two: you can set the minimum and maximum number of threads, and it will vary within that range based on demand. Plus it offers structure and safety features, just like BackgroundWorker. On the other hand, ThreadPool lacks ElasticThreadPool‘s threadTimeout feature, so threads will exit instantly if they don’t have a job to do.
I always hate reinventing the wheel. Why does Lime need a ThreadPool class when Haxe already offers two? (Ignoring the fact that Lime’s came first.) Just because of thread safety? There are other ways to achieve that.
If only Haxe’s thread pools worked in JavaScript…
Web workers
Mozilla describes web workers as “a simple means for web content to run scripts in background threads.” “Simple” is a matter of perspective, but they do allow you to create background threads in JavaScript.
Problem is, they have two fundamental differences from Haxe’s threads, which is why Haxe doesn’t include them in ElasticThreadPool and FixedThreadPool.
- Web workers use source code.
- Web workers are isolated.
Workers use source code
Web workers execute a JavaScript file, not a JavaScript function. Fortunately, it is usually possible to turn a function back into source code, simply by calling toString(). Usually. Let’s start with how this works in pure JavaScript:
function add(a, b) {
return a + b;
}
console.log(add(1, 2)); //Output: 3
console.log(add.toString()); //Output:
//function add(a, b) {
// return a + b;
//}
That first log() call is just to show the function working. The second shows that we get the function source code as a string. It even preserved our formatting!
If we look at the examples, we find that it goes to great lengths to preserve the original formatting.
toString() input |
toString() output |
|---|---|
function f(){} |
"function f(){}" |
class A { a(){} } |
"class A { a(){} }" |
function* g(){} |
"function* g(){}" |
a => a |
"a => a" |
({ a(){} }.a) |
"a(){}" |
({ [0](){} }[0]) |
"[0](){}" |
Object.getOwnPropertyDescriptor({ get a(){} }, "a").get |
"get a(){}" |
Object.getOwnPropertyDescriptor({ set a(x){} }, "a").set |
"set a(x){}" |
Function.prototype.toString |
"function toString() { [native code] }" |
(function f(){}.bind(0)) |
"function () { [native code] }" |
Function("a", "b") |
"function anonymous(a\n) {\nb\n}" |
That’s weird. In two of those cases, the function body – the meat of the code – has been replaced with “[native code]”. (That isn’t even valid JavaScript!) As the documentation explains:
If the
toString()method is called on built-in function objects or a function created byFunction.prototype.bind,toString()returns a native function string
In other words, if we ever call bind() on a function, we can’t get its source code, meaning we can’t use it in a web worker. And wouldn’t you know it, Haxe automatically calls bind() on certain functions.
Let’s try writing some Haxe code to call toString(). Ideally, we want to write a function in Haxe, have Haxe translate it to JavaScript, and then get its JavaScript source code.
class Test {
static function staticAdd(a, b) {
return a + b;
}
function add(a, b) {
return a + b;
}
static function main() {
var instance = new Test();
trace(staticAdd(1, 2));
trace(instance.add(2, 3));
#if js
trace((cast staticAdd).toString());
trace((cast instance.add).toString());
#end
}
inline function new() {}
}
If you try this code, you’ll get the following output:
Test.hx:15: 3
Test.hx:16: 5
Test.hx:18: staticAdd(a,b) {
return a + b;
}
Test.hx:19: function() {
[native code]
}
The first two lines prove that both functions work just fine. staticAdd is printed exactly like it appears in the JavaScript file. But instance.add is all wrong. Let’s look at the JS source to see why:
static main() {
let instance = new Test();
console.log("Test.hx:15:",Test.staticAdd(1,2));
console.log("Test.hx:16:",instance.add(2,3));
console.log("Test.hx:18:",Test.staticAdd.toString());
console.log("Test.hx:19:",$bind(instance,instance.add).toString());
}
Yep, there it is. Haxe inserted a call to $bind(), a function that – perhaps unsurprisingly – calls bind().
Turns out, Haxe always inserts $bind() when you try to refer to an instance function. This is in fact required: otherwise, the function couldn’t access the instance it came from. But it also means we can’t use instance functions in web workers. Or can we?
After a lot of frustration and effort, I came up with ThreadFunction. Read the source if you want details; otherwise, the one thing to understand is that it can only remove the $bind() call if you convert to ThreadFunction ASAP. If you have a variable (or function argument) representing a function, that variable (or argument) must be of type ThreadFunction.
//Instead of this...
class DoesNotWork {
public var threadFunction:Dynamic -> Void;
public function new(threadFunction:Dynamic -> Void) {
this.threadFunction = threadFunction;
}
public function runThread():Void {
new BackgroundWorker().run(threadFunction);
}
}
//...you want to do this.
class DoesWork {
public var threadFunction:ThreadFunction<Dynamic -> Void>;
public function new(threadFunction:ThreadFunction<Dynamic -> Void>) {
this.threadFunction = threadFunction;
}
public function runThread():Void {
new BackgroundWorker().run(threadFunction);
}
}
class Main {
private static function main():Void {
new DoesWork(test).runThread(); //Success
new DoesNotWork(test).runThread(); //Error
}
private static function test(_):Void {
trace("Hello from a background thread!");
}
}
Workers are isolated
Once we have our source code, creating a worker is simple. We take the string and add some boilerplate code, then construct a Blob out of this code, then create a URL for the blob, then create a worker for that URL, then send a message to the worker to make it start running. Or maybe it isn’t so simple, but it does work.
Web workers execute a JavaScript source file. The code in the file can only access other code in that file, plus a small number of specific functions and classes. But most of your app resides in the main JS file, and is off-limits to workers.
This is in stark contrast to Haxe’s threads, which can access anything. Classes, functions, variables, you name it. Sharing memory like this does of course allow for race conditions, but as mentioned above, BackgroundWorker and ThreadPool help prevent those.
For a simple example:
class Main {
private static var luckyNumber:Float;
private static function main():Void {
luckyNumber = Math.random() * 777;
new BackgroundWorker().run(test);
}
private static function test(_):Void {
trace("Hello from a background thread!");
trace("Your lucky number is: " + luckyNumber);
}
}
On most targets, any thread can access the Main.luckyNumber variable, so test() will work. But in JavaScript, neither Main nor luckyNumber will have been defined in the worker’s file. And even if they were defined in that file, they’d just be copies. The value will be wrong, and the main thread won’t receive any changes made.
So… how do you transfer data?
Passing messages
I’ve glossed over this so far, but BackgroundWorker.run() takes up to two arguments. The first, of course, is the ThreadFunction to run. The second is a message to pass to that function, which can be any type. (And if you need multiple values, you can pass an array.)
Originally, BackgroundWorker was designed to be run multiple times, each time reusing the same function but working on a new set of data. It wasn’t well-optimized (ThreadPool is much more appropriate for that) nor well-tested, but it was very convenient for implementing web workers.
See, web workers also have a message-passing protocol, allowing us to send an object to the background thread. You know, an object like BackgroundWorker.run()‘s second argument:
class Main {
private static var luckyNumber:Float;
private static function main():Void {
luckyNumber = Math.random() * 777;
new BackgroundWorker().run(test, luckyNumber);
}
private static function test(luckyNumber:Float):Void {
trace("Hello from a background thread!");
trace("Your lucky number is: " + luckyNumber);
}
}
The trick is, instead of trying to access Main.luckyNumber (which is on the main thread), test() takes an argument, which is the same value except copied to the worker thread. You can actually transfer a lot of data this way:
new BackgroundWorker().run(test, {
luckyNumber: Math.random() * 777,
imageURL: "https://www.example.com/image.png",
cakeRecipe: File.getContent("cake.txt"),
calendar: Calendar.getUpcomingEvents(10)
});
Bear in mind that your message will be copied using the structured clone algorithm, a deep copy algorithm that cannot copy functions. This sets limits on what kinds of messages you can pass. You can’t pass a function without first converting it to ThreadFunction, nor can you pass an object that contains functions, such as a class instance.
Copying your message is key to how JavaScript prevents race conditions: memory is never shared between threads, so two threads can’t accidentally access the same memory location at the wrong time. But if there’s no sharing, how does the main thread get any information back from the worker?
Returning results
Web workers don’t just receive messages, they can send them back. The rules are the same: everything is copied, no functions, etc.
The BackgroundWorker class provides three functions for this, each representing something different. sendProgress() for status updates, sendError() if something goes horribly wrong, and sendComplete() for the final product. (You may recall that workers don’t normally have access to Haxe functions, but these three are inlined. Inline functions work fine.)
It’s at about this point we need to talk about another problem with copying data. One common reason to use background threads is to process large amounts of data. Suppose you produce 10 MB of data, and you want to pass it back once finished. Your computer is going to have to make an exact copy of all that data, and it’ll end up taking 20 MB in all. Don’t get me wrong, it’s doable, but it’s hardly ideal.
It’s possible to save both time and memory using transferable objects. If you’ve stored your data in an ArrayBuffer, you can simply pass a reference back to the main thread, no copying required. The worker thread loses access to it, and then the main thread gains access (because unlike Haxe, JavaScript is very strict about sharing memory).
ArrayBuffer can be annoying to use on its own, so it’s fortunate that all the wrappers are natively available. By “wrappers,” I’m talking about Float32Array, Int16Array, UInt8Array, and so on. As long as you can represent your data as a sequence of numbers, you should be able to find a matching wrapper.
Transferring a buffer looks like this: backgroundWorker.sendComplete(buffer, [buffer]). I know that looks redundant, and at first I thought maybe backgroundWorker.sendComplete(null, [buffer]) could work instead. But the trick is, the main thread will only receive the first argument (a.k.a. the message). If the message doesn’t contain some kind of reference to buffer, then the main thread won’t have any way to access buffer.
That said, the two arguments don’t have to be identical. You can pass a wrapper (e.g., an Int16Array) as the message, and transfer the buffer inside: backgroundWorker.sendComplete(int16Array, [int16Array.buffer]). The Int16Array numeric properties (byteLength, byteOffset, and length) will be copied, but the underlying buffer will be moved instead.
pogger
its been awhile
Nice 👍
Interesting read! Can’t wait to see the checklist. ^^
Do you plan on updating run mobile on iOS like you did with android