WEBVTT

00:00.000 --> 00:12.000
So our first talk today is going to be Ben Sully. He's going to be taking us through the

00:12.000 --> 00:18.000
August time series toolkit for Rust and take it away.

00:18.000 --> 00:21.000
Thanks very much.

00:21.000 --> 00:28.000
Thank you. Welcome to the Rust Room, exciting.

00:28.000 --> 00:33.000
I have a bit of post-end before and now here I am, first on stage.

00:33.000 --> 00:40.000
So thanks for coming. I'm going to be talking about orders, which is a time series toolkit for Rust.

00:40.000 --> 00:46.000
It's more important in time series analysis and it also has Python and JavaScript bindings.

00:46.000 --> 00:51.000
And the last of which is being used already and refined as content, which is pretty cool.

00:51.000 --> 00:54.000
I'll talk a little bit about that in a bit.

00:54.000 --> 00:59.000
So who am I?

00:59.000 --> 01:02.000
Oh, sorry. Is it too loud?

01:02.000 --> 01:11.000
Okay. Any better?

01:11.000 --> 01:15.000
Even higher.

01:15.000 --> 01:17.000
This is as much as we can do.

01:17.000 --> 01:19.000
I'll try and speak louder.

01:19.000 --> 01:21.000
So who am I, first of all?

01:21.000 --> 01:24.000
I'm a software engineer at Carfine Labs and based on the UK.

01:24.000 --> 01:31.000
My spare time. I like to do boldering and running ultramarins, which is a normal thing to do.

01:31.000 --> 01:34.000
I've been at Carfine Labs for about four years.

01:34.000 --> 01:39.000
I tend to use Rust for personal projects and hackathons as much as I can.

01:39.000 --> 01:44.000
But recently it's been kind of squeezed into very different bits of Carfine Labs, which is exciting.

01:44.000 --> 01:49.000
My background is kind of in statistics and machine learning.

01:49.000 --> 01:54.000
Oh.

01:54.000 --> 01:57.000
What do you think we need to do?

01:57.000 --> 02:00.000
I can lower it and you're just going to have to project.

02:00.000 --> 02:03.000
Okay.

02:03.000 --> 02:05.000
Yeah, we're going to do a lower.

02:05.000 --> 02:08.000
We still need the microphone for the audio of the video.

02:08.000 --> 02:10.000
But he's going to try to project.

02:10.000 --> 02:11.000
I can do my best.

02:11.000 --> 02:13.000
Is that any better? Can you hear me?

02:13.000 --> 02:16.000
Some more slightly?

02:16.000 --> 02:18.000
I have a back of this one.

02:18.000 --> 02:19.000
I have work.

02:19.000 --> 02:22.000
That one's working.

02:22.000 --> 02:25.000
Okay.

02:25.000 --> 02:29.000
There's a lot of echo.

02:29.000 --> 02:33.000
Okay. Well, I'll try and speak as loud as possible.

02:33.000 --> 02:34.000
So yeah, we're going to do.

02:34.000 --> 02:36.000
This is a quick summary of the talk.

02:36.000 --> 02:38.000
Hopefully technical issues will be fine.

02:38.000 --> 02:40.000
I think the talk about what always is.

02:40.000 --> 02:41.000
What can you do?

02:41.000 --> 02:46.000
And then the second part is going to be lessons that I've learned while translating various

02:46.000 --> 02:49.000
different ML algorithms to Rust from different languages.

02:49.000 --> 02:52.000
Some of them things like C++ or Fortran,

02:52.000 --> 02:53.000
are Python.

02:53.000 --> 02:55.000
They could have been written 30 or 40 years ago.

02:55.000 --> 03:02.000
So there's lots of things involved in even finding decent source implementations.

03:02.000 --> 03:04.000
Well, it takes me time for questions.

03:04.000 --> 03:06.000
And then as a bonus, this was meant to be.

03:06.000 --> 03:08.000
And I mean, talk about I ran out of time.

03:08.000 --> 03:12.000
And if you want to check it out, feel free to download the slides and have a look.

03:12.000 --> 03:17.000
There's some content there on things I ran into and trade-offs that had to be made,

03:17.000 --> 03:21.000
exposing kind of a JavaScript interface using WebAssembly.

03:22.000 --> 03:25.000
So first of all, what's to deal with the name?

03:25.000 --> 03:30.000
So to Orger is kind of a verb or now a meaning to predict.

03:30.000 --> 03:33.000
So I think this might be one of the few times in my life.

03:33.000 --> 03:36.000
I've actually named something well because it means the like thing.

03:36.000 --> 03:39.000
But I don't know, hopefully I'll get lucky when I have a kid.

03:39.000 --> 03:40.000
I'm not sure.

03:40.000 --> 03:44.000
Maybe on name them using the same method I use here,

03:44.000 --> 03:46.000
which was domain name during development.

03:46.000 --> 03:49.000
Basically, look to a source for a word that ends in R.

03:49.000 --> 03:52.000
I hope that the dot RS domain is available.

03:52.000 --> 03:55.000
And then write the project later.

03:55.000 --> 03:57.000
So I've got to summarize what time series is.

03:57.000 --> 04:00.000
Probably most people know this, but it's pretty straightforward.

04:00.000 --> 04:02.000
It's a measurement taken repeatedly.

04:02.000 --> 04:04.000
Generally at the same interval.

04:04.000 --> 04:06.000
Usually it's something numeric.

04:06.000 --> 04:08.000
Either a counter, a floating point number.

04:08.000 --> 04:11.000
So they're really computer's level working with them.

04:11.000 --> 04:12.000
Because they're just numbers.

04:12.000 --> 04:14.000
It's, it's, it's, it's a nice.

04:14.000 --> 04:18.000
Fun optimizations you can do both with compression and storage.

04:18.000 --> 04:21.000
Whether that's in memory or on one disk.

04:21.000 --> 04:24.000
And also the process in them is, it's fun.

04:24.000 --> 04:27.000
You've got lots of optimizations you can do.

04:27.000 --> 04:30.000
In the real world, there's some examples of where you can see them.

04:30.000 --> 04:32.000
They're kind of all over the place.

04:32.000 --> 04:33.000
You bit quit us.

04:33.000 --> 04:35.000
Here's my heart rate plotted over a day.

04:35.000 --> 04:37.000
Like environmental software.

04:37.000 --> 04:39.000
You'll see them everywhere.

04:39.000 --> 04:42.000
So things sort of things we do.

04:42.000 --> 04:44.000
We visualize them, right?

04:44.000 --> 04:47.000
I mean, the farmer definitely uses visualizing things.

04:47.000 --> 04:51.000
Whether one dashboards or in the kind of explore view,

04:51.000 --> 04:54.000
you can see lots of time series there.

04:54.000 --> 04:58.000
We like to set thresholds for them and get alerted

04:58.000 --> 05:02.000
when things exceed certain boundaries.

05:02.000 --> 05:05.000
For example, disk usage going above 90%.

05:05.000 --> 05:08.000
Or maybe you want your service to auto scale

05:08.000 --> 05:12.000
if requests go per part go so high, something like that.

05:12.000 --> 05:14.000
And you can do that dynamically,

05:14.000 --> 05:19.000
a standard threshold or using something a little bit more advanced,

05:19.000 --> 05:22.000
like a normally detection or outlier detection.

05:22.000 --> 05:25.000
And this is kind of where all of this comes in.

05:25.000 --> 05:30.000
So all of this is, as I mentioned, a time series toolkit for Rust.

05:30.000 --> 05:31.000
Right?

05:31.000 --> 05:34.000
It's designed to help you with all of these previous tasks.

05:34.000 --> 05:37.000
More specifically, someone on Reddit pointed out quite early.

05:37.000 --> 05:39.000
It was a time series analysis toolkit.

05:39.000 --> 05:42.000
We don't do the kind of that all of the things you might imagine

05:42.000 --> 05:44.000
in time series libraries to do.

05:44.000 --> 05:48.000
We don't do things like resampling yet or storage.

05:48.000 --> 05:52.000
Instead, we implement various different machine learning algorithms.

05:52.000 --> 05:54.000
So forecasting is the most obvious one.

05:54.000 --> 05:57.000
That's kind of, you want to predict the future.

05:57.000 --> 05:58.000
Or you even want to predict now.

05:58.000 --> 06:01.000
And you often want to do that with confidence into all of the prediction intervals,

06:01.000 --> 06:04.000
so that you know how accurate your predictions are.

06:04.000 --> 06:08.000
Clustering, you want to group lots of series together.

06:08.000 --> 06:11.000
You have hundreds of series and you want to find the groups within those.

06:11.000 --> 06:14.000
Those series that are behaving similarly.

06:14.000 --> 06:17.000
Outline detection is a bit of a confusing one.

06:17.000 --> 06:20.000
But this, we refer to this as when you have lots of different series.

06:20.000 --> 06:23.000
You expect them to all behave the same.

06:23.000 --> 06:27.000
And you want to identify the ones that aren't behaving similarly to the group.

06:27.000 --> 06:31.000
And the change point protection is what you kind of looking across time.

06:31.000 --> 06:35.000
And you want to see where your time series, your behavior, your time series changes.

06:35.000 --> 06:39.000
Whether that changes in magnitude or changes in variance, various different properties

06:39.000 --> 06:43.000
that you can detect there.

06:43.000 --> 06:46.000
Just a little bit of examples of each of those.

06:46.000 --> 06:48.000
Forecasting some of the most obvious one.

06:48.000 --> 06:49.000
You just want to be prediction, right?

06:49.000 --> 06:52.000
You maybe want to account for things like seasonality.

06:52.000 --> 06:55.000
You have daily seasonality or weekly seasonality.

06:55.000 --> 06:57.000
You depend on things happening on weekends.

06:57.000 --> 07:01.000
And you might want to forecast if I say or is it or capacity.

07:01.000 --> 07:03.000
So that's, yeah, that's very straightforward.

07:03.000 --> 07:05.000
In August, we have three algorithms for this.

07:05.000 --> 07:08.000
Very, like, kind of left to right.

07:08.000 --> 07:13.000
They go from a simpler or complex and faster to slower.

07:13.000 --> 07:15.000
Extremely reductive.

07:15.000 --> 07:19.000
And they support, as you get more advanced, they can support things like holidays.

07:19.000 --> 07:25.000
We have, like, maybe you want to model COVID separately to the rest of the kind of the normal world.

07:25.000 --> 07:31.000
And things like Christmas, if you wanted to model that in your time series.

07:31.000 --> 07:34.000
I can't put that in your predictions.

07:35.000 --> 07:39.000
In our detection, this is used to identify when one or more series is behaving differently.

07:39.000 --> 07:45.000
So, as I said, this example here is my, you can see it very clearly sadly.

07:45.000 --> 07:48.000
But it's my pie hole home blocking ads.

07:48.000 --> 07:51.000
And you kind of expect many of it remains to be the same.

07:51.000 --> 07:58.000
But actually, whatever it is, beacons.gvt2.com is absolutely miles and away higher than anything else.

07:58.000 --> 08:00.000
It's a nightmare.

08:01.000 --> 08:04.000
You can also imagine pods in a Kubernetes deployment.

08:04.000 --> 08:08.000
You expect them to have the same CPU usage, if they're being low balanced correctly.

08:08.000 --> 08:11.000
And you want to flag in case that isn't the case.

08:11.000 --> 08:15.000
Two algorithms that we use, medium absolute deviation.

08:15.000 --> 08:20.000
So, this is when series are roughly expected to be constant, but the same.

08:20.000 --> 08:25.000
And it's really simple, you just kind of flag whenever the medium is, if each series is different from the group.

08:26.000 --> 08:31.000
And DB scan, you can use when series have more complex patterns, like seasonality.

08:31.000 --> 08:35.000
But you still expect them to move similarly and move in the same way.

08:35.000 --> 08:38.000
And you want to, so it's more complex algorithm, a little bit slower.

08:38.000 --> 08:43.000
But it can handle these more complicated cases.

08:43.000 --> 08:51.000
The reason you might use medium absolute deviation, I guess, is because of its more intuitive easy to explain.

08:52.000 --> 08:53.000
Clustering.

08:53.000 --> 08:56.000
You might use clustering if you want to identify a creature series.

08:56.000 --> 09:01.000
So here in the screen show, we've got one big band across the top and then just clearly like two separate bands.

09:01.000 --> 09:03.000
A little bit lower down.

09:03.000 --> 09:06.000
This useful if you want to kind of group these together.

09:06.000 --> 09:12.000
So in this case, we would flag those little groups, which is useful in quite a lot of instances.

09:12.000 --> 09:14.000
A little bit about the way that works.

09:14.000 --> 09:19.000
It's quite a fun algorithm, and also it's probably the coolest name of any of the algorithms I've seen.

09:19.000 --> 09:22.000
The dynamic time warping.

09:22.000 --> 09:23.000
Why is it called that?

09:23.000 --> 09:25.000
I don't know, it's awesome.

09:25.000 --> 09:31.000
The gift kind of gives it a way here, but you're calculating different distances between each of your series.

09:31.000 --> 09:34.000
Each pair of your series in your data set.

09:34.000 --> 09:40.000
And as the kind of naïve where you would use it, like you clearly in distance where you can pair the same time stamp.

09:40.000 --> 09:45.000
And dynamic time warping instead, you're optimizing to find pairs of values that minimize the overall distance.

09:45.000 --> 09:49.000
And kind of account for shift in time of those series.

09:49.000 --> 09:53.000
And I evenly, that would be really, really slow, because you've got to do like N squared comparisons.

09:53.000 --> 09:55.000
And there's a top right shows.

09:55.000 --> 10:02.000
But there's lots of little optimizations you can do to speed that up by limiting the window that you are allowing us to differ by.

10:02.000 --> 10:04.000
So that's quite fun to implement.

10:04.000 --> 10:09.000
And there's lots of options you can tweak in August to do that.

10:09.000 --> 10:18.000
So the way that works after you've got your, what you use a dynamic time warping to find distances between each pair of time series.

10:18.000 --> 10:23.000
And after that, you feed that into an algorithm called DB scan.

10:23.000 --> 10:25.000
Here's a distance matrix.

10:25.000 --> 10:30.000
In this case, we've got one series at the top, and then you can see some of the distances are a little bit higher.

10:30.000 --> 10:36.000
And then by feeding that into DB scan, you just get a simple marker of like whether a series is in one cluster or another.

10:36.000 --> 10:41.000
So the API is a pretty simple and easy to use.

10:41.000 --> 10:44.000
And close them into our colors.

10:44.000 --> 10:46.000
Some stuff that August doesn't do.

10:46.000 --> 10:50.000
But I was trying to useful to know when I shouldn't shouldn't use something.

10:50.000 --> 10:52.000
Some of them might be obvious, but maybe not.

10:52.000 --> 10:57.000
So we don't be plotting, plotting isn't really like time series in certain necessarily.

10:57.000 --> 11:01.000
Like, there are, there are libraries in the rest, so plotters is really good.

11:01.000 --> 11:05.000
And you can use U-plot from JavaScript, or there's loads of options like map.

11:05.000 --> 11:07.000
We don't live in Python.

11:07.000 --> 11:09.000
We don't do time series data structures.

11:09.000 --> 11:11.000
I kind of touched on that earlier.

11:11.000 --> 11:14.000
But it's more, we tend to work with just simple like Vex or floats.

11:14.000 --> 11:16.000
Things are quite straightforward.

11:16.000 --> 11:18.000
And I don't know if there is one of those right now.

11:18.000 --> 11:22.000
So there's scope to add it to August or to other libraries later.

11:22.000 --> 11:26.000
And storage and compression, not really are concern either.

11:26.000 --> 11:29.000
There's not a database, it's not really designed for database usage.

11:29.000 --> 11:33.000
So you wouldn't use it for storing these things instead you'd use.

11:33.000 --> 11:37.000
A time series database like Vex or influx, something like that.

11:37.000 --> 11:41.000
You should use it, my opinion.

11:41.000 --> 11:43.000
First of all, you need to have some time series.

11:43.000 --> 11:45.000
That's very straightforward.

11:45.000 --> 11:50.000
But like the code is clean and it's relatively fresh.

11:50.000 --> 11:55.000
I mean, it's been idiomatically converted from original live algorithm and languages,

11:55.000 --> 11:58.000
which is nice, easy to contribute to.

11:58.000 --> 12:00.000
It's rust only, it's nice and portable.

12:00.000 --> 12:03.000
You can use it wherever you want to compile to.

12:03.000 --> 12:06.000
And we have jobs written by some bindings.

12:06.000 --> 12:09.000
And as you might expect, it's relatively quick.

12:09.000 --> 12:12.000
Like it's, I haven't done extensive benchmarking against other languages,

12:12.000 --> 12:14.000
but I've made sure to profile things.

12:14.000 --> 12:16.000
I make sure that it's not like crazily slow.

12:16.000 --> 12:20.000
So it is generally faster than other implementations.

12:20.000 --> 12:22.000
Even compared to sort of NumPy,

12:22.000 --> 12:26.000
which you would expect to be fast to have things very highly optimized.

12:27.000 --> 12:30.000
I think largely because of the extra control you get in rust

12:30.000 --> 12:33.000
and lazy operations like iterators and compiling down

12:33.000 --> 12:35.000
into very optimized code.

12:35.000 --> 12:39.000
You get fast implementations by default.

12:39.000 --> 12:43.000
Okay, so that was a very high level overview of all this.

12:43.000 --> 12:45.000
There's a lot of things I didn't have time together.

12:45.000 --> 12:48.000
Please do check out the docs and the demo and the code.

12:48.000 --> 12:52.000
And I'll touch on those later.

12:53.000 --> 12:56.000
So next up section two is,

12:56.000 --> 13:01.000
I want to talk about the process of actually conversing in LML algorithm

13:01.000 --> 13:03.000
from another language to rust.

13:03.000 --> 13:06.000
And there's a lot of lessons that kind of came there.

13:06.000 --> 13:08.000
I'd like to pass on.

13:12.000 --> 13:16.000
So this is the kind of process that you might expect to go through.

13:16.000 --> 13:19.000
If you're converting an algorithm right that you've found

13:19.000 --> 13:21.000
in a different language.

13:22.000 --> 13:24.000
It looks fairly sensible. It's quite nice, right?

13:24.000 --> 13:27.000
It's probably what the working with legacy code book says.

13:27.000 --> 13:28.000
I'm not sure.

13:28.000 --> 13:32.000
However, things don't usually go to plan.

13:32.000 --> 13:34.000
And especially here.

13:34.000 --> 13:36.000
So when you're looking for source implementations,

13:36.000 --> 13:38.000
there'll be meant.

13:38.000 --> 13:40.000
Especially if it's a popular algorithm,

13:40.000 --> 13:42.000
you'll find several in different languages.

13:42.000 --> 13:44.000
Each of them will have a different trade-offs

13:44.000 --> 13:47.000
and you have to kind of choose which one to go for.

13:48.000 --> 13:51.000
You'll be very lucky if you find tests for these things.

13:51.000 --> 13:52.000
That will be lovely, wouldn't it?

13:52.000 --> 13:54.000
But they don't really exist.

13:54.000 --> 13:57.000
People will tend to maybe write examples and blog posts,

13:57.000 --> 13:59.000
but you don't tend to get tests.

13:59.000 --> 14:02.000
These algorithms are often written by researchers or scientists

14:02.000 --> 14:04.000
who have better things to do.

14:06.000 --> 14:09.000
Your implementations that you found will all disagree.

14:09.000 --> 14:11.000
Which you won't realize until a little bit later.

14:11.000 --> 14:13.000
That'll be fun to figure out as well.

14:14.000 --> 14:16.000
As you're translating, you might kind of think,

14:16.000 --> 14:17.000
this is awful.

14:17.000 --> 14:20.000
What are they doing all of this manual indexing for?

14:20.000 --> 14:23.000
We should definitely rewrite that a little bit better.

14:24.000 --> 14:26.000
And as you're refaturing, you will think,

14:26.000 --> 14:27.000
yeah, we should do this much sooner.

14:27.000 --> 14:29.000
Let's shift this earlier in the stage.

14:29.000 --> 14:32.000
And then you'll end up with this kind of half translated half,

14:32.000 --> 14:36.000
like Frankenstein's monster of a code base.

14:37.000 --> 14:39.000
And who doesn't love optimizing that?

14:39.000 --> 14:40.000
That will come ways sooner.

14:40.000 --> 14:42.000
You'll definitely start that in 20% through.

14:42.000 --> 14:44.000
There's no way you're waiting to be end.

14:44.000 --> 14:49.000
So the end reality looks a little bit more like this.

14:49.000 --> 14:51.000
You find a lot of indentations.

14:51.000 --> 14:53.000
You'll be swatted back and forth between them.

14:53.000 --> 14:56.000
You'll translate some functions line by line.

14:56.000 --> 14:57.000
Others will be done.

14:57.000 --> 14:59.000
It will be a hot mess.

14:59.000 --> 15:00.000
I'm going to be asked.

15:00.000 --> 15:02.000
Get familiar with the debugger.

15:02.000 --> 15:05.000
But fortunately, there are some things that you can do to improve this process.

15:05.000 --> 15:08.000
So I'm going to go through and give you a little bit of advice.

15:08.000 --> 15:10.000
In case this is something you plan on doing.

15:11.000 --> 15:14.000
So first of all, finding the source implementations.

15:14.000 --> 15:16.000
Nothing's going to be perfect.

15:16.000 --> 15:18.000
The whole point in you doing this is that it isn't perfect.

15:18.000 --> 15:20.000
It's not a raster, right?

15:20.000 --> 15:21.000
So there's no way.

15:21.000 --> 15:24.000
But also, it's going to have problems.

15:24.000 --> 15:27.000
It's going to have just going to be slow things in there.

15:27.000 --> 15:30.000
You're going to have to accept that and make it better as you go.

15:30.000 --> 15:33.000
Why are we trying things with published papers?

15:33.000 --> 15:35.000
It feels obvious if you see it.

15:35.000 --> 15:39.000
It's really useful to have a paper or a book or something concrete

15:39.000 --> 15:42.000
that you can then refer to for motivations.

15:42.000 --> 15:45.000
Extra comments, that kind of thing.

15:45.000 --> 15:49.000
Ideally, you want something that uses as few languages as possible.

15:49.000 --> 15:52.000
I mean, even NumPy drops down as I see you.

15:52.000 --> 15:55.000
So you're probably going to be struggling.

15:55.000 --> 15:59.000
For example, an R library that I translated that ETS implementation

15:59.000 --> 16:03.000
which we use in August for forecasting is written in R.

16:03.000 --> 16:06.000
But it has some, R has some built-in Fortran functions.

16:06.000 --> 16:09.000
So you have to go and figure out what they're doing.

16:09.000 --> 16:12.000
And then it also has a manual C++ implementation of an optimizer.

16:12.000 --> 16:15.000
So we had to go and figure out what that was doing too.

16:15.000 --> 16:17.000
It's not recommended.

16:17.000 --> 16:20.000
Ideally, you will find something on Crate.

16:20.000 --> 16:21.000
So that would be the dream.

16:21.000 --> 16:23.000
And often that is the case.

16:23.000 --> 16:25.000
You need it to be open source.

16:25.000 --> 16:26.000
Ideally, it would have tests.

16:26.000 --> 16:28.000
But as I said, that's unlikely.

16:28.000 --> 16:31.000
And if you have a responsive author or a recent commit,

16:31.000 --> 16:34.000
that's great because you can talk to them and ask some questions.

16:34.000 --> 16:37.000
So do you be getting tests?

16:37.000 --> 16:40.000
As I mentioned, you're not going to find much.

16:40.000 --> 16:41.000
But tell where you can get.

16:41.000 --> 16:43.000
So if you see any reproducible examples,

16:43.000 --> 16:46.000
whether it's in papers, blog posts, books,

16:46.000 --> 16:48.000
you can turn those into examples,

16:48.000 --> 16:53.000
integration tests, spent maps in your ROS Crate.

16:53.000 --> 16:56.000
This is going to come to virtual,

16:56.000 --> 16:59.000
but you're probably going to want to test some of the implementation details.

16:59.000 --> 17:02.000
Like, as you go, there's a lot to do.

17:02.000 --> 17:06.000
It's low-level things that you're not going to be confident in.

17:06.000 --> 17:07.000
So write test for them.

17:07.000 --> 17:09.000
If you have to throw them away, that's fine.

17:09.000 --> 17:11.000
That's not a big deal.

17:11.000 --> 17:13.000
You're going to want to be, this is a bit facetious,

17:13.000 --> 17:15.000
but you need a big screen,

17:15.000 --> 17:17.000
because you're going to want those two things side by side,

17:17.000 --> 17:20.000
and be able to step through each line by line.

17:20.000 --> 17:23.000
So yeah, ROS, as I mentioned,

17:23.000 --> 17:25.000
the test screen won't make this all very nice.

17:25.000 --> 17:27.000
I tend to make everything that I find, as an example,

17:27.000 --> 17:30.000
into an implemented integration test and benchmark.

17:30.000 --> 17:33.000
And exercise your public API to make sure it makes sense,

17:33.000 --> 17:35.000
when it's been converted.

17:35.000 --> 17:36.000
Assertions are great.

17:36.000 --> 17:38.000
Make sure you're using debugger set.

17:38.000 --> 17:43.000
They're helpful.

17:43.000 --> 17:46.000
So when you're translating, when you get to the translating bit,

17:46.000 --> 17:49.000
my advice would be to start with iterator and esitors,

17:49.000 --> 17:53.000
and just the standard library, and work with these as much as you can.

17:53.000 --> 17:57.000
Low dependency count is like the new hot thing.

17:57.000 --> 18:01.000
So if you can get by without adding any of these kind of array dependencies,

18:01.000 --> 18:02.000
then that'll be great.

18:02.000 --> 18:07.000
And also, if you learn the adapters and kind of work with the functional style,

18:07.000 --> 18:11.000
there's often enough in there to reproduce even like a really obscure number by functions.

18:11.000 --> 18:13.000
And I'll show you an example in a minute,

18:13.000 --> 18:18.000
but the other advantage is that this gets really heavily optimized by ROSC.

18:18.000 --> 18:23.000
So you end up with really fast code without even realizing that you've done it.

18:23.000 --> 18:26.000
And you might even get things like SIMD for free.

18:26.000 --> 18:28.000
It's fantastic.

18:28.000 --> 18:31.000
As an example at here, as much we can see there.

18:31.000 --> 18:34.000
But the left example is the NumPy code.

18:34.000 --> 18:38.000
And the right is iterator and esitors in ROSC.

18:38.000 --> 18:43.000
This is exactly the same algorithm and it's literally like taking out of the code bases.

18:43.000 --> 18:46.000
There's exactly two allocations in the ROSC version.

18:46.000 --> 18:49.000
God knows what's going on in the left side.

18:49.000 --> 18:52.000
Performance wise, there's a lot of manual indexing.

18:52.000 --> 18:55.000
There's like loads of copying going on.

18:55.000 --> 19:02.000
So you get like a lot of power just by doing these standard iterator adapters and using those as much as possible.

19:02.000 --> 19:07.000
There's another example of that in the optimization section.

19:07.000 --> 19:17.000
However, if you end up working with 2D arrays and metric matrices and anything that has more dimensions,

19:17.000 --> 19:19.000
things get problematic.

19:19.000 --> 19:21.000
Vex of Vex are inefficient.

19:21.000 --> 19:23.000
They're not great for various reasons.

19:23.000 --> 19:26.000
You have to do two loads of interactions to get to the actual value.

19:26.000 --> 19:31.000
And they really don't play well with kind of auto-dref of ROSC functions.

19:31.000 --> 19:38.000
So especially if you're using them as arguments to your public functions, it's not ideal.

19:38.000 --> 19:42.000
So in this example we have a function that takes a 1D array.

19:42.000 --> 19:44.000
It takes a slice of floats.

19:44.000 --> 19:45.000
That's great.

19:45.000 --> 19:47.000
You can pass a reference to a vector to everything.

19:47.000 --> 19:50.000
We'll get also de-reft, really, and everything works.

19:50.000 --> 19:55.000
As soon as we have the 2D function, we want to take a slice because we don't need to own that value.

19:55.000 --> 19:56.000
But now we can't pass a Vex.

19:56.000 --> 19:57.000
We can't.

19:57.000 --> 20:00.000
If you try and pass that, then everything blows up.

20:00.000 --> 20:04.000
Because the inner Vex can't be de-reft.

20:04.000 --> 20:06.000
And the array has got you covered here.

20:06.000 --> 20:11.000
This has really efficient representations of 2D and multiple dimensional array.

20:11.000 --> 20:14.000
So it's really, really useful.

20:14.000 --> 20:17.000
It's basically, it's got a lot of the functionality that NumPy does.

20:17.000 --> 20:20.000
And you can, they're even a good, like, NumPy.

20:20.000 --> 20:25.000
And the array for NumPy uses this doc that you can use there.

20:25.000 --> 20:27.000
So yeah, the function works.

20:27.000 --> 20:30.000
It's great.

20:30.000 --> 20:35.000
So what if you run into a huge dependency?

20:35.000 --> 20:37.000
And actually, as I was thinking on the training about this,

20:37.000 --> 20:39.000
it doesn't really just apply to ML.

20:39.000 --> 20:43.000
If you're rewriting a code base in a different language,

20:43.000 --> 20:47.000
and it has a huge dependency, what can you do like that?

20:47.000 --> 20:50.000
You don't really have time to rewrite it as too much to expect.

20:50.000 --> 20:55.000
So this happened in August when we were translating the profit algorithm.

20:55.000 --> 20:58.000
Profit basically does a bunch of data manipulation,

20:58.000 --> 21:02.000
and then hands everything off to a library called Stamp,

21:02.000 --> 21:06.000
which is a Bayesian framework for doing Bayesian analysis basically.

21:07.000 --> 21:09.000
Rewriting standard would be impossible.

21:09.000 --> 21:12.000
It's like, a best in class framework.

21:12.000 --> 21:15.000
It's written by experts in the field.

21:15.000 --> 21:18.000
It's like, how a combination of APIs of C++ with, like,

21:18.000 --> 21:20.000
17,000 commits.

21:20.000 --> 21:22.000
And the way that it's called is like,

21:22.000 --> 21:25.000
you compile a binary that represents your model

21:25.000 --> 21:27.000
and class data to it in files.

21:27.000 --> 21:30.000
So, that's not going to be rewritten.

21:30.000 --> 21:33.000
That's going to be too hard.

21:33.000 --> 21:38.000
Instead, we kind of thought, well, how can we still use this?

21:38.000 --> 21:40.000
And we want to use it in the browser as well.

21:40.000 --> 21:43.000
Well, how are we going to be able to do it?

21:43.000 --> 21:47.000
We thought, maybe we could compile stand to WebAssembly, right?

21:47.000 --> 21:51.000
That's a bit of a wild idea, like, some really old,

21:51.000 --> 21:53.000
ten-year-old C++ code bases.

21:53.000 --> 21:54.000
Will it compile to WebAssembly?

21:54.000 --> 21:57.000
Surely there's dependencies there, like, system calls,

21:57.000 --> 22:00.000
and various different APIs that WebAssembly doesn't support.

22:00.000 --> 22:02.000
And also, how are you going to pass the data,

22:02.000 --> 22:04.000
like, WebAssembly only has numbers,

22:04.000 --> 22:08.000
and we need to pass much more complicated things it.

22:08.000 --> 22:13.000
Fortunately, the WebAssembly system interface now exists,

22:13.000 --> 22:15.000
and the component model.

22:15.000 --> 22:19.000
It's a bit nascent, but these things handle exactly this.

22:19.000 --> 22:21.000
Exactly these use cases.

22:21.000 --> 22:23.000
I'm not going to go into too much detail because I'm not

22:23.000 --> 22:27.000
time, but basically, the model is you write an idea

22:27.000 --> 22:30.000
that represents the kind of work that you want to do,

22:30.000 --> 22:32.000
and you do that in a language called WIT,

22:32.000 --> 22:34.000
which broadly represents a prototype,

22:34.000 --> 22:36.000
it's like a kind of idea of a standard idea.

22:36.000 --> 22:38.000
Well, there's an example in a second.

22:38.000 --> 22:40.000
And then you write a tiny bit, promise,

22:40.000 --> 22:44.000
it's not much, tiny bit of C++ to implement that,

22:44.000 --> 22:48.000
using the stand libraries and calling in to stand as a WIT.

22:48.000 --> 22:51.000
And then you can compile that to a WebAssembly component,

22:51.000 --> 22:53.000
and that's basically a WebAssembly module,

22:53.000 --> 22:56.000
self-contained, portable, and can be run using any,

22:56.000 --> 23:00.000
in theory, any WebAssembly runtime.

23:00.000 --> 23:04.000
So the idea looks a little bit like this.

23:04.000 --> 23:06.000
Sorry for the dark mode.

23:06.000 --> 23:09.000
You have records and stocks and all of these things.

23:09.000 --> 23:12.000
When you, the component model's tooling has a bunch of,

23:12.000 --> 23:17.000
but a generator to convert that into idiomatic code for your language.

23:17.000 --> 23:18.000
It doesn't have to be rust.

23:18.000 --> 23:22.000
So this will get started into idiomatic rust, idiomatic go,

23:22.000 --> 23:25.000
or type script, anything you can't need.

23:26.000 --> 23:30.000
It also automatically handles conversion of those types.

23:33.000 --> 23:37.000
So if we've got that, we then need to use it from rust.

23:37.000 --> 23:43.000
So we're using, there's a variant that comes along with the component model of bind gen,

23:43.000 --> 23:46.000
which basically takes the path to your idl file,

23:46.000 --> 23:52.000
and generates you a bunch of stocks and traits and everything for your code,

23:52.000 --> 23:58.000
and you can then call that from inside rust as if it were a stock,

23:58.000 --> 24:01.000
but it is a stock and has functions and methods,

24:01.000 --> 24:05.000
and you can pass everything in there as if it were something you didn't.

24:05.000 --> 24:09.000
In this example, like we were embedding the WebAssembly that we just compiled.

24:09.000 --> 24:13.000
So that's in the binary, there's no runtime dependencies, whatever.

24:13.000 --> 24:20.000
There's a bunch of, setting up the WebAssembly runtime is just a bunch of machinery really.

24:20.000 --> 24:25.000
And then we can call the optimized function just like it was a regular function with regular stocks.

24:25.000 --> 24:29.000
And it runs inside WebAssembly at near native speed.

24:29.000 --> 24:33.000
We don't have any, like, build time dependencies on any C compiler,

24:33.000 --> 24:37.000
or any C++ compiler, and everything is purely embedded in the binary.

24:37.000 --> 24:42.000
So that means we can use it from, we can use it in WebAssembly if we need to.

24:42.000 --> 24:48.000
A few caveats, like I said, it is new, it is, there's a lot of playing around.

24:48.000 --> 24:51.000
This is some of the repose in the bytecode lines.

24:51.000 --> 24:54.000
Org, a huge respect to the bytecode lines.

24:54.000 --> 24:58.000
There's just so much going on there, I don't know how they produce so many things.

24:58.000 --> 25:03.000
It's early days, and you have to do a lot of hacking around, but it's fun.

25:03.000 --> 25:06.000
There's a lot of tools to learn about that kind of thing.

25:06.000 --> 25:11.000
Another downside is that WebAssembly doesn't support certain features, like exceptions,

25:11.000 --> 25:15.000
and we just have to hard abort in those cases, which is not the end of the world,

25:15.000 --> 25:17.000
but it's not necessarily that pretty.

25:18.000 --> 25:22.000
And the minute only, wasm times supports this, but that's not, that's fine.

25:22.000 --> 25:25.000
Wasm times put it in rust, so we can easily just embed that.

25:28.000 --> 25:34.000
Okay, so we're moving on to refactoring now, and I've really got a little bit of advice here,

25:34.000 --> 25:37.000
because I had to cut that out.

25:37.000 --> 25:42.000
But when it comes to writing idiomatic code, you have loads of options.

25:42.000 --> 25:45.000
This is about using the type of system responsibly.

25:46.000 --> 25:51.000
And what I mean by that is, you can do some really clever things with rust type system.

25:51.000 --> 25:57.000
It's probably better than the source language that you're coming from.

25:57.000 --> 26:00.000
So you can use things like typestick to avoid things being misused,

26:00.000 --> 26:05.000
where you want to embed, like a state machine in your type system,

26:05.000 --> 26:09.000
to make sure that people can't, for example, fit a model twice,

26:09.000 --> 26:12.000
or predict when using an unfitted model.

26:12.000 --> 26:14.000
And that's really powerful when you combine it with that type,

26:14.000 --> 26:18.000
so things are combined, and that state machine flows really nicely.

26:18.000 --> 26:24.000
The downside is that your users will probably want some way of doing this at one time.

26:24.000 --> 26:29.000
And if they want to do it at one time, then they have to store these two different structs anyway,

26:29.000 --> 26:32.000
with the different type, they're different types.

26:32.000 --> 26:36.000
So they're going to have to use an either way, like they can't,

26:36.000 --> 26:41.000
unless they store these things in two separate options, which is a poor way of doing things.

26:41.000 --> 26:45.000
So you might want to, you will notice this when you're writing to your bindings,

26:45.000 --> 26:50.000
if you write some of the Python JavaScript, because you are basically your user at that point.

26:50.000 --> 26:55.000
And anything can change when you're writing when you're writing Python JavaScript.

26:55.000 --> 27:01.000
So here's an example, like in machine learning that often you have a concept of an unfit model,

27:01.000 --> 27:05.000
where it's just been created with some hyperparameters, and then you pass it in data,

27:05.000 --> 27:08.000
and it turns into an actual model that you can then make predictions with.

27:09.000 --> 27:14.000
So the type state where you're doing this might be to have an unfit model, which has a fit method,

27:14.000 --> 27:18.000
and then a fit model, which has a predict method, and you can't miss use that.

27:18.000 --> 27:23.000
However, when your user wants to have a button that turns one into the other,

27:23.000 --> 27:26.000
they're going to have to have some way of representing that, and they're going to need to have some kind of tag,

27:26.000 --> 27:32.000
which ends up being probably this enum that we have with an unfit and a fit model.

27:33.000 --> 27:38.000
So you can consider that when you're writing your APIs, but how they're going to be used,

27:38.000 --> 27:42.000
you may just want to offer this instead of the type state model instead.

27:42.000 --> 27:47.000
Obviously, this turns one error into runtime errors, so everything now returns a result.

27:47.000 --> 27:50.000
It's just the way it is, unfortunately.

27:51.000 --> 27:55.000
Finally, a little section on optimization.

27:55.000 --> 27:59.000
So as I mentioned, we want to use real-life examples as benchmarks.

27:59.000 --> 28:03.000
Make them as realistic as possible, if you have your own data, then that's better.

28:03.000 --> 28:07.000
And split them up into the sort of things that your users are likely to do.

28:07.000 --> 28:14.000
In ML, that's going to be things like pre-processing, fitting, predicting, or clustering, that kind of side of things.

28:15.000 --> 28:20.000
Use criterion, read about the timing loop, as well.

28:20.000 --> 28:24.000
So there's no way of passing your data into criterion,

28:24.000 --> 28:27.000
or that's itter, it's a batch, itter batch, or itter batch, or itter.

28:27.000 --> 28:33.000
These differ based on the way, how you have to use the data that you pass into the benchmark,

28:33.000 --> 28:38.000
whether you have to take it by reference or multiple reference or open as an object.

28:42.000 --> 28:47.000
For profiling, so I kind of made this slide for myself as much as everything else,

28:47.000 --> 28:53.000
because the internet recommends various profilers, and I always forget which one that it is that works best.

28:53.000 --> 28:57.000
As an example, it's sampling, it's always sampling, I always forget that.

28:57.000 --> 28:59.000
It's great, it's perfect.

28:59.000 --> 29:07.000
The invocation is tricky because you're calling sampling, which then needs to call cargo bench,

29:07.000 --> 29:12.000
which needs its own arguments, which then needs to call criterion, which has got its own arguments.

29:12.000 --> 29:15.000
So there's a lot of kind of manipulation.

29:15.000 --> 29:19.000
Basically, I would recommend copying this and using it everywhere.

29:20.000 --> 29:23.000
So by default, this uses the five-box profiler,

29:23.000 --> 29:28.000
pops it open in a browser, and then you can use it to kind of debug things immediately.

29:28.000 --> 29:32.000
It's really fantastic, it has everything you would expect from a profiler,

29:32.000 --> 29:37.000
with call three using the four flame graphs, source code, and line counts, that kind of thing.

29:37.000 --> 29:39.000
Including the assembly itself.

29:39.000 --> 29:45.000
You can even share it with other people, and upload it into, kind of, get your issue up here already easily.

29:46.000 --> 29:52.000
In the notes, for this, if you want to download this slide, there are more tips on performance,

29:52.000 --> 29:57.000
specifically written by Nicholas and other group, who just lost performance work is great.

30:00.000 --> 30:04.000
A few more hints on optimization, so specifically,

30:04.000 --> 30:08.000
your slowness is likely to be in allegations.

30:08.000 --> 30:12.000
Generally, you wouldn't want to pre-allocate with capacity.

30:13.000 --> 30:18.000
If that doesn't help, and you're finding a lot of more allegations than you expect, still.

30:18.000 --> 30:21.000
We're not using the same backing buffer a lot.

30:21.000 --> 30:25.000
This sounds obvious when you say it, but a lot of these original algorithms, as you're translating them,

30:25.000 --> 30:28.000
will, they won't have the same level of control that rust offers.

30:28.000 --> 30:34.000
So you're, as you're translating it, you might not notice that you're actually reallocating every single time in a loop.

30:34.000 --> 30:39.000
If you can reuse those same backing buffers, try and keep it out of your public APIs.

30:39.000 --> 30:44.000
You can use a lot of jemilee species up by nicely.

30:44.000 --> 30:48.000
And be aware of how vexed from itter allocates.

30:48.000 --> 30:55.000
The docs for this are actually under from iterator, the implementation of from iterator for vex,

30:55.000 --> 31:00.000
which is, I would say, about two levels away from where you would expect to be, perhaps.

31:00.000 --> 31:05.000
But it underlies the collect method of iterator, using a lot more than you might think,

31:05.000 --> 31:11.000
it has some nuances around how and when it allocates and preallocates things.

31:11.000 --> 31:14.000
This should get you quite far, like with these tips you'll probably be okay.

31:14.000 --> 31:19.000
But if you want to squeeze everything out, you may want to do some more useful things.

31:19.000 --> 31:27.000
So one thing I found was really helpful is to replace any explicit indexing that you're doing with iterator zip or iterator tools,

31:27.000 --> 31:32.000
either. So if you're iterating over a lot of things and you need to assign to an output array or anything like that,

31:32.000 --> 31:38.000
rather than indexing, even using, so on the left here, this old example you used get unchecked.

31:38.000 --> 31:43.000
So you can use unsafe get unchecked and pass it in index.

31:43.000 --> 31:47.000
But then you're adding on safe into your code, which ideally you would avoid doing.

31:47.000 --> 31:50.000
And you can avoid doing by using zip.

31:50.000 --> 31:53.000
And the compiler knows exactly what's going on there.

31:53.000 --> 31:56.000
It can optimize things, it can eliminate the bounds checks, anyway.

31:56.000 --> 32:02.000
And you get by the cache locality because it knows how to load everything in cache lines.

32:02.000 --> 32:11.000
Another one that we found quite often, division and square root is surprisingly slow.

32:11.000 --> 32:14.000
So you need to avoid doing it in loops.

32:14.000 --> 32:22.000
Often you can store a scaling factor instead, so in this case we were doing some multiplication and division on every single call to iterators.

32:22.000 --> 32:25.000
There's an iterator next implementation, right?

32:25.000 --> 32:30.000
We can actually store the scale factor instead, and then we only have to do the most of the divisions once,

32:30.000 --> 32:35.000
and we just do multiplications on every iteration.

32:35.000 --> 32:40.000
Okay, another example of this, by the way, was square root.

32:40.000 --> 32:45.000
So in the implementation of the dynamic time warping that we talked about earlier,

32:45.000 --> 32:48.000
we calculated that you could be in distance.

32:48.000 --> 32:53.000
That involves swearing something and then taking the square root for every calculation.

32:53.000 --> 32:56.000
Instead of that, you can just square everything, sum all the squares,

32:56.000 --> 32:59.000
and then do one division at one square root at the end.

32:59.000 --> 33:00.000
It's pretty obvious.

33:00.000 --> 33:07.000
I was wrong when you look at outside of the code, but it helps loose, and it's a huge optimization we did.

33:07.000 --> 33:09.000
Okay, that was a lot.

33:09.000 --> 33:11.000
I hope there's some useful tips there.

33:11.000 --> 33:13.000
I think we're going to take questions.

33:13.000 --> 33:16.000
Let me do some other questions.

33:17.000 --> 33:19.000
But including questions.

33:19.000 --> 33:20.000
Okay.

33:20.000 --> 33:24.000
I'll do questions first, because I don't know if I'll get through everything otherwise.

33:24.000 --> 33:27.000
So, yeah, there's a little summary.

33:27.000 --> 33:29.000
Please do try August.

33:29.000 --> 33:33.000
Then it's only had time series, and you want to try analyzing it or working with it at all.

33:33.000 --> 33:34.000
Give it a go.

33:34.000 --> 33:39.000
We have, as I said, Python bindings, and JavaScript bindings on MPM.

33:39.000 --> 33:42.000
It's being used in the fastest front end, which is exciting.

33:42.000 --> 33:47.000
We use it for outlight detection already, and there's also APIs for doing forecasting,

33:47.000 --> 33:50.000
and change point section there.

33:50.000 --> 33:51.000
Give it a go.

33:51.000 --> 33:53.000
Give, porting sum and algorithms.

33:53.000 --> 33:55.000
It's fun.

33:55.000 --> 33:58.000
You can have a lot of fun optimizing things and speeding things up,

33:58.000 --> 34:02.000
and yeah, writing some really nice APIs at this piece of piece of piece of piece.

34:02.000 --> 34:05.000
And I would say, explore WebAssembly, both of the, like,

34:05.000 --> 34:10.000
the complex dependencies side, using maybe using WebAssembly components, maybe not.

34:10.000 --> 34:14.000
And for the JavaScript bindings, there is a whole section of this talk,

34:14.000 --> 34:20.000
which I've probably not got time to cover, on the different trade-offs that you can make when using things like

34:20.000 --> 34:24.000
Wasm Pack, and Wasm Vine Gen to generate your Wasm bindings.

34:24.000 --> 34:28.000
And that's usable from anywhere, whether it's in the browser or different languages in one time.

34:28.000 --> 34:30.000
So, give that a go.

34:30.000 --> 34:33.000
I'm going to take some questions.

34:33.000 --> 34:35.000
If anybody has any.

34:35.000 --> 34:39.000
And then we'll, I thought you're not going to have time to do anything else.

34:39.000 --> 34:41.000
Yeah, sure.

34:41.000 --> 34:43.000
Question the mic.

34:43.000 --> 34:51.000
I kind of already did check online, but the NDRA doesn't support like mixed mix types of columns.

34:51.000 --> 34:57.000
So, it's like 2D array or 3D array, but not like 2 arrays of different types.

34:57.000 --> 35:03.000
Do you have happened to have any recommendations of such mixed type?

35:03.000 --> 35:05.000
Yeah, libraries.

35:05.000 --> 35:09.000
But I think the question is about mixed type arrays.

35:09.000 --> 35:15.000
I would suggest using something like, it's more of a data frame approach.

35:15.000 --> 35:20.000
So, in Python, you would traditionally use pandas or something like that.

35:20.000 --> 35:23.000
Pollas is the kind of rust version of pandas.

35:23.000 --> 35:25.000
And it does have rust APIs.

35:25.000 --> 35:27.000
I have to say I haven't played with them much.

35:27.000 --> 35:31.000
And I don't know how well documented they are, because that's how I checked.

35:31.000 --> 35:34.000
It was mainly the Python bit that was documented.

35:34.000 --> 35:38.000
I imagine that would be the way to go is to use something like pollas.

35:38.000 --> 35:40.000
Yeah.

35:40.000 --> 35:43.000
Hello, thank you for the talk.

35:43.000 --> 35:48.000
I'm Chris, how do you exchange large amounts of data with your vast and looking

35:48.000 --> 35:49.000
bonus?

35:49.000 --> 35:54.000
Basically, how do you transfer the large quantities of data in and out of it?

35:54.000 --> 36:00.000
Do you use any kind of civilization for a mad like patchy arrow or something like that?

36:00.000 --> 36:03.000
I can see what you were saying about the microphone.

36:03.000 --> 36:04.000
It's not been great now.

36:04.000 --> 36:08.000
I think the question about passing things in and out of web assembly.

36:08.000 --> 36:09.000
Yeah.

36:09.000 --> 36:17.000
So, in both the case of when you're using wasm bind gen and using the component model,

36:17.000 --> 36:21.000
they basically use the float 64 array type in the JavaScript side.

36:21.000 --> 36:26.000
And it gets converted into a linear memory block, which is then mem copied into a vex.

36:26.000 --> 36:31.000
So, it's not done using any kind of chunking or any kind of library like arrow.

36:31.000 --> 36:35.000
I haven't noticed any performance bottlenecks in that side of things whatsoever.

36:35.000 --> 36:37.000
It's pretty much a memory I think.

36:37.000 --> 36:38.000
So, yeah.

36:38.000 --> 36:43.000
As long as everything's size is green, you can do that efficiently already.

36:43.000 --> 36:56.000
Thank you for the talk.

36:56.000 --> 37:03.000
On the topic of optimizations, did you try speeding up bottlenecks using Cindy?

37:03.000 --> 37:08.000
And if so, what's your story doing that in stable rust?

37:08.000 --> 37:13.000
I have an open issue to try and do some of these things using Cindy.

37:13.000 --> 37:14.000
But I haven't got around to it.

37:14.000 --> 37:17.000
I think the outlier detection one would be a really nice candidate.

37:17.000 --> 37:22.000
Because, actually, if you look at the NumPy code, it already is kind of vectorized.

37:22.000 --> 37:24.000
It's written in a vectorized way.

37:24.000 --> 37:26.000
So, we could try doing that.

37:26.000 --> 37:33.000
I haven't checked if that has, maybe the compiler's already added those Cindy optimizations in there.

37:33.000 --> 37:37.000
And I would love someone to come along and show me how to do it because I've never actually worked with it.

37:37.000 --> 37:46.000
But when I looked into using the portable simmed in the rust libraries, it's still nightly only, I think.

37:46.000 --> 37:50.000
So, I was never quite sure how to do it using stable rust.

37:50.000 --> 37:56.000
I'm not sure that's such a big concern for this because, well, it would be for rust users.

37:56.000 --> 38:04.000
But for the Python and the JavaScript bindings, we can happily just use a nightly compiler to create those bindings in the first place.

38:04.000 --> 38:06.000
So, that would be useful.

38:06.000 --> 38:13.000
So, if there's no more questions, can we thank Ben?

38:13.000 --> 38:16.000
Thank you.

