IntSet: Compressed mergeable sets of integers

class intset.IntSet(wrapped)[source]

An IntSet is a compressed immutable representation of a sorted list of unsigned 64-bit integers with fast membership, union and range restriction.

It mostly behaves as if it were a sorted list of deduplicated integer values. In particular, you can index it if it were, and it will sort and compare equal (to other IntSets) as if it were.

Note that unlike lists, intsets may feasibly have more than sys.maxint elements, and calling len() on such an intset may raise an OverflowError. If you wish to avoid this, use .size() instead.

Because IntSet is immutable, unlike list it may also be used as a hash key.

It also supports set operations. In particular, all the boolean operations are supported:

x & y: An IntSet containing the values that are present in both x and y

x | y: An IntSet containing the values present in either x or y

x - y: An IntSet containing the values present in x but not y

x ^ y: An IntSet containing the values present in x or y but not both

~x: An IntSet containing all values in the range 0 <= i < 2 ** 64 that
are not present in x (IntSet can represent this efficiently. It won’t allocate 2 ** 64 integers worth of memory).

IntSets may be constructed either from the dedicated class methods or by calling the class as you usually would for a set. So IntSet([1, 2, 3]) is an IntSet containing the values 1, 2 and 3.

When calling an IntSet this way, non-integer values which are iterable sequences of length 2 will be interpreted as intervals start <= x < end. So e.g. IntSet([1, [10, 100]]) will contain the numbers 1 and 10, ..., 99.

class Builder[source]

An IntSet.Builder is for building up an IntSet incrementally through a series of insertions.

This will typically be much faster than repeatedly calling insert on an IntSet object. The intended usage is to repeatedly call insert() or insert_interval() on a builder, then call build() at the end. Note that you can continue to insert further data into a Builder afterwards if you wish, and this will not affect previously built IntSet instances.

insert(value)[source]

Add a single value to the IntSet to be built.

insert_interval(start, end)[source]

Add all values x such that start <= x < end to the IntSet to be built.

build()[source]

Produce a new IntSet with all the values previously inserted to this builder.

You may call build() more than once, and any values inserted in between those calls will also be present, but previously built values will be unaffected by subsequent inserts

classmethod IntSet.empty()[source]

Return an empty IntSet.

classmethod IntSet.single(value)[source]

Return an IntSet containing only the single value provided.

classmethod IntSet.interval(start, end)[source]

Return an IntSet containing only the values x such that start <= x < end

classmethod IntSet.from_iterable(values)[source]

Return an IntSet containing everything in values, which should be an iterable over intsets in the valid range.

classmethod IntSet.from_intervals(intervals)[source]

Return a new IntSet which contains precisely the intervals passed in.

IntSet.size()[source]

This returns the same as len() when the latter is defined, but IntSet may have more values than will fit in the size of index that len will allow.

IntSet.insert(value)[source]

Returns an IntSet which contains all the values of the current one plus the provided value.

IntSet.discard(value)[source]

Returns an IntSet which contains all the values of the current one except for the passed in value.

Returns self if the value is not present rather than raising an error

IntSet.restrict(start, end)[source]

Return a new IntSet with all values x in self such that start <= x < end.

IntSet.isdisjoint(other)[source]

Returns True if self and other have no common elements.

IntSet.intersects(other)[source]

Returns True if there is an element i such that i in self and i in other.

IntSet.issuperset(other)[source]

Returns True if every element of other is also in self.