[nylug-talk] Send a set really fast

Michael Bacarella mbac at netgraft.com
Fri May 4 11:53:19 EDT 2007


I have hosts a,b and z

On host z is a Python process that submits a query to
each of hosts a and b.  Host a and b have a unique response
to the query they receive: sets of integers (aset and bset).

Host z must receive these responses and perform
zset = aset.intersection(bset)  The sets of integers
size in the millions of elements.

I'm having a performance problem pulling aset and
bset into the Python process on z.  Bandwidth
is not the bottleneck (if it were it would be an easy solution).
It simply takes a significant amount of CPU time to take the
string '1,2,3,...N' recv()'d from the socket and call
split(',') on it.  On our hosts it takes whole seconds with
N > 1,000,000

My inclination is to look for the Python equivalent of
what I would do in C, write() and read() an array
over the socket since all hosts run the same architecture.

I thought (c)pickle/unpickle might be the equivalent but
they aren't, they're approximately as slow as split.

Is anything equivalent?



More information about the nylug-talk mailing list