[nylug-talk] Send a set really fast

Peter C. Norton spacey-nylug at lenin.net
Fri May 4 12:34:43 EDT 2007


On Fri, May 04, 2007 at 10:53:19AM -0500, Michael Bacarella wrote:
> I have hosts a,b and z
> 
> On host z is a Python process that submits a query to
> each of hosts a and b.  Host a and b have a unique response
> to the query they receive: sets of integers (aset and bset).
> 
> Host z must receive these responses and perform
> zset = aset.intersection(bset)  The sets of integers
> size in the millions of elements.
> 
> I'm having a performance problem pulling aset and
> bset into the Python process on z.  Bandwidth
> is not the bottleneck (if it were it would be an easy solution).
> It simply takes a significant amount of CPU time to take the
> string '1,2,3,...N' recv()'d from the socket and call
> split(',') on it.  On our hosts it takes whole seconds with
> N > 1,000,000
> 
> My inclination is to look for the Python equivalent of
> what I would do in C, write() and read() an array
> over the socket since all hosts run the same architecture.
> 
> I thought (c)pickle/unpickle might be the equivalent but
> they aren't, they're approximately as slow as split.
> 
> Is anything equivalent?

You may want to see if the operations available in the numpy module
include a fast matrix pickle/unpickle pair along with the faster
matrix operations. String splitting involves a lot of memory
operations that are better handled by a proper matrix type since the
general string case doesn't seem to be optimized for what you're
doing.

-Peter

-- 
The 5 year plan:
In five years we'll make up another plan.
Or just re-use this one.



More information about the nylug-talk mailing list