Handy Python trick with list data

I had to manipulate a large amount of text data at work, and came across two neat tricks I’d like to remember.

If I have a Product A that’s always bundled with Product B, but I also sell Product B as a separate item, how many users do I have that have only purchased Product B? It’s not as easy as querying all Product B purchasers, because I’ll pick up purchasers of Product A as well.

So, I made a list of all Product A purchasers, and a list of all Product B purchasers, and made a final list of Product B purchasers that didn’t show up on the Product A purchasers list.


#AccountsWithProductA is a list of all purchasers of Product A

#AccountsWithProductB is a list of all purchasers of Product B

from sets import Set

SetOfProductAUsers = set(AccountsWithProductA)

SetOfProductBUsers = set(AccountsWithProductB)

SetOfProductBUsers -= SetOfProductAUsers

After those operations, SetOfProductBUsers only contains exclusive Product B users. It’s a handy manipulation that’s tough to do with lists alone.

Another problem I faced is that the initial parse of the data to extract accounts would fill the list with duplicate accounts since many accounts would purchase the products again from time to time. I did some Googling to track down a way to prune duplicates in a list and found this handy StackOverflow post.

I took the naive approach at both actions (removing duplicates and pruning members of a list that existed in another list) and didn’t have much luck. What I was doing worked with my test data, but when I passed the gigantic real data at it I ran out of memory. There’s a lot about Python internals I don’t understand!

This entry was posted in Hobbies, Programming. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s