[nylug-talk] Send a set really fast

Jay Sulzberger jays at panix.com
Fri May 4 12:11:54 EDT 2007



On Fri, 4 May 2007, Michael Bacarella wrote:

> I have hosts a,b and z
>
> On host z is a Python process that submits a query to
> each of hosts a and b.  Host a and b have a unique response
> to the query they receive: sets of integers (aset and bset).
>
> Host z must receive these responses and perform
> zset = aset.intersection(bset)  The sets of integers
> size in the millions of elements.
>
> I'm having a performance problem pulling aset and
> bset into the Python process on z.  Bandwidth
> is not the bottleneck (if it were it would be an easy solution).
> It simply takes a significant amount of CPU time to take the
> string '1,2,3,...N' recv()'d from the socket and call
> split(',') on it.  On our hosts it takes whole seconds with
> N > 1,000,000
>
> My inclination is to look for the Python equivalent of
> what I would do in C, write() and read() an array
> over the socket since all hosts run the same architecture.
>
> I thought (c)pickle/unpickle might be the equivalent but
> they aren't, they're approximately as slow as split.
>
> Is anything equivalent?

http://en.wikipedia.org/wiki/Bloom_filter
http://www.eecs.harvard.edu/~michaelm/NEWWORK/postscripts/BloomFilterSurvey.pdf
http://www.eecs.harvard.edu/~michaelm/NEWWORK/postscripts/cbf2.pdf
http://www.perl.com/pub/a/2004/04/08/bloom_filters.html

oo--JS.


More information about the nylug-talk mailing list