[nylug-talk] Send a set really fast
Peter C. Norton
spacey-nylug at lenin.net
Fri May 4 12:34:43 EDT 2007
On Fri, May 04, 2007 at 10:53:19AM -0500, Michael Bacarella wrote:
> I have hosts a,b and z
>
> On host z is a Python process that submits a query to
> each of hosts a and b. Host a and b have a unique response
> to the query they receive: sets of integers (aset and bset).
>
> Host z must receive these responses and perform
> zset = aset.intersection(bset) The sets of integers
> size in the millions of elements.
>
> I'm having a performance problem pulling aset and
> bset into the Python process on z. Bandwidth
> is not the bottleneck (if it were it would be an easy solution).
> It simply takes a significant amount of CPU time to take the
> string '1,2,3,...N' recv()'d from the socket and call
> split(',') on it. On our hosts it takes whole seconds with
> N > 1,000,000
>
> My inclination is to look for the Python equivalent of
> what I would do in C, write() and read() an array
> over the socket since all hosts run the same architecture.
>
> I thought (c)pickle/unpickle might be the equivalent but
> they aren't, they're approximately as slow as split.
>
> Is anything equivalent?
You may want to see if the operations available in the numpy module
include a fast matrix pickle/unpickle pair along with the faster
matrix operations. String splitting involves a lot of memory
operations that are better handled by a proper matrix type since the
general string case doesn't seem to be optimized for what you're
doing.
-Peter
--
The 5 year plan:
In five years we'll make up another plan.
Or just re-use this one.
More information about the nylug-talk
mailing list