May 16, 2023

10s of Millions of Rows, Faster Loading Times

We're very excited to share the most significant release in our product's history!

After a few months of hard work and ~2720 cups of ☕️, we're happy to let you know that you can now work with much larger datasets. Not only that, but Graphext got way faster - 8x faster.

Why embark on such an ambitious refactor?

As our user base continues to grow and your needs become more diverse and complex, we want to ensure that our product remains agile, scalable, and responsive to your ever-evolving requirements. This refactor enables us to lay a solid foundation for future updates and enhancements that will take your data science capabilities to new heights.

What does this mean to you?

Loading Times are 8x Faster

We have significantly improved project loading times, with the enhancement becoming more noticeable as dataset sizes increase. You can now experience time reductions of up to 8x, making your larger data projects load much faster than before. 

Datasets With Tens of Millions of Rows

You can now analyze datasets containing tens of millions of rows while improving the performance. We've pushed the boundaries by testing this with datasets containing over 35 million rows, and we still haven't reached the limit. Go ahead and experience firsthand, the enhancements with this project that includes four months of NYC Taxi trips, a dataset with more than 10 million rows. Explore project


What’s the magic behind Graphext?

Memory optimizations

All datasets are now compressed in memory during runtime. We've developed a custom compression algorithm based on the bitpacking technique that allows for random access with single-value granularity, without the need to uncompress the entire dataset or even blocks of it, while keeping overhead to a minimum.

We're now using more memory-efficient helper data structures for things like caching filters.

Performance optimizations

We now take advantage of the multiple cores available in modern CPUs to speed up processing and let the browser main thread do its job, updating the UI without computation blocking it.

Optimizations coming soon

We will use SIMD (Single Instruction, Multiple Data) instructions. They enable parallel processing of data, allowing for improved performance and faster computations. By performing the same operation on multiple data points simultaneously, SIMD instructions can significantly speed up tasks such as multimedia processing, graphics rendering, and scientific simulations, making your programs more efficient and responsive.


Soon you will have more news from us. New cool features are coming up in the next few days. 

Stay tuned & happy analyzing!

With love, the Graphext team 💙