Math library
category: code [glöplog]
I usually use very few library and prefer build almost everything from scratch... sort, fft, gui, database and so on. Once I build my own stuff, I could re-use it among my projects. If there is an issue, I just go in my code and fix it.
But, sometimes I run into problems and wishes a simple solution can be available without having to code it myself...
Last week, I run into that problem which seem easy... Consider a vector of N elements (where N is near to hundred of millions!). I wanted to know if some part of that vector is similar than others with at least 92% similar. Hum, I started to code something... launch my software and wait... 2... 3.. hours.. hum, still around 2% done! :(I though, this could be process really fast with other algorithm.. longer to code.
It happen from time to time.. few month ago I wanted to process BIG INT number.. using 4096-bit integer! I even though that today perhaps there is a datatype for a C compiler where you can specify how many N-bit it corresponds... I though.. huh.. no! Wow... I am surprise no one even put that in.
I would like to know if you have some math library that you use for that kind of "exotic" software!
But, sometimes I run into problems and wishes a simple solution can be available without having to code it myself...
Last week, I run into that problem which seem easy... Consider a vector of N elements (where N is near to hundred of millions!). I wanted to know if some part of that vector is similar than others with at least 92% similar. Hum, I started to code something... launch my software and wait... 2... 3.. hours.. hum, still around 2% done! :(I though, this could be process really fast with other algorithm.. longer to code.
It happen from time to time.. few month ago I wanted to process BIG INT number.. using 4096-bit integer! I even though that today perhaps there is a datatype for a C compiler where you can specify how many N-bit it corresponds... I though.. huh.. no! Wow... I am surprise no one even put that in.
I would like to know if you have some math library that you use for that kind of "exotic" software!
Searching for similarities in large vectors sounds like what rzip does for data in files. See http://rzip.samba.org/. It's basically your normal LZ compressor except it works with a block size of 900 MiB rather than 64k. I bet you could borrow stuff from it for your algo.
I second gmp, that's mainly what is used to do the crypto math on openssl and it has probably had more eyes on it then most other people's big-number libraries.
Maybe python ?
I'm using it on "Big" Problems like finding pattern in Datasets around 5-6Gb.
Speed is not the same as C but with some tweaking it could get as nearly as fast as C.
Handles things like N-Scaling quite nice.
I'm using it on "Big" Problems like finding pattern in Datasets around 5-6Gb.
Speed is not the same as C but with some tweaking it could get as nearly as fast as C.
Handles things like N-Scaling quite nice.
it's called correlation.
iftt(fft(vector_a)*fft(vector_b))
iftt(fft(vector_a)*fft(vector_b))