Customer Data & Analytics Blog

3 steps to engineer customer segments from social media data

Andrew Malinow, PhD | 4 minute read


For retailers, understanding their customers- their preferences, purchasing behaviors, and even things that they don’t like about a product or service, and using this information to personalize the customer’s experience is critical.

Typically, retailers create market segments- meaningful ways to group their known customers, based on different purchasing patterns. For the Consumer Packaged Goods industry (CPG), creating segments based on known, identifiable customers is more difficult. If I buy a Zebco fishing rod at Target, Zebco has no idea that I exist as a customer. All that Zebco will know is that they shipped 500 fishing rods to five Target locations. So, for the Zebco’s of the world- any company not selling directly to consumers- how can these organizations learn more about the preferences and predilections of the people who are buying and using their products? How will they further segment their customer base to better serve them?

Social media platforms have matured to the point where people of all ages and demographics are
posting content on sites like Reddit, Facebook, Instagram, Amazon (product reviews) to name a few of the more commonly used platforms. These platforms provide an articulation of the human experience, including the consumer experience, at an unprecedented scale. This cultural development provides a tremendous opportunity for CPG companies to connect with their customers, and learn more about them. While CPG companies may not be able to learn that John Doe bought their product, they can learn that Jdoe455, estimated to be a male, 35-45, with two sons, recently returned a fishing rod to Target because it broke the first time he took his younger son fishing. This is just one example - below is a real example of customer segments that were created from Amazon Product Review data on Keurig coffee brewers:

Step 1: Source some data

Screen Shot 2018-09-06 at 11.12.05 AM

review_text helpfulvotes
My favorite coffee pot to date. Keurig has gotten so cheap and poorly made over the last few years and we've gone through 4 of them! This one feels SOLID and quality and performs the same way. If it lasts beyond 2 years I will be thrilled. 29
This is a must have when you get the optional direct water line hook up. Makes perfect cups of coffee with adjustable water amounts. 69

Note the amount of reviews- over seven thousand. Could someone manually read all of them?
Absolutely, and someone at Keurig may be doing this. However, what do we know about the people who are writing the reviews? Is there a way to get a more aggregated sense
of who is saying what about the brewers? Coming up with actionable insights from manually
reading 7,722 reviews is not practical- not only because it is time consuming, but because the
data is too granular- instead, what we need is a meaningful way to group the reviews and then compare thematic differences between them. Amazon does not make their review data available
via API, so the reviews must be scraped.

Step 2: Engineer features to create segments

What we need is the ability to create segments, and then leverage text mining activities across the segments to surface aggregated insights. Now that we have scraped the reviews that we are interested in, we need to engineer some features that we can use to filter the reviews. Thankfully there are several Open Source libraries that are able to estimate age and gender based on a writing sample- the more data that is fed into the algorithm for a single individual, the more accurate the estimate- so, if someone has written more than one review, all of these reviews can be concatenated into a single document that is fed into the algorithm (rather than feeding it just a single review, which may only be a few sentences long). Adding these additional features to our Amazon Product Reviews looks like this:

Screen Shot 2018-09-06 at 11.28.07 AM

After engineering just these 2 additional features, estimated age (6 categories/bins) and gender, we are able to create 12 segments. The next step is to perform some text analytics to see how the different segments might be experiencing the product differently—and what themes might be consistent across segments.

Step 3: Use natural language processing to create additional features

We can engineer features from the review that characterize the nature of the review. For example,
if we use POS-Tagging (Parts-of-Speech) and create bi-grams (word pairs), we can capture noun-adjective bi-grams to extract phrases like ‘small footprint’, ‘cheap plastic’ (about a brewer), or verb-adverb bi-grams to capture phrases like ‘tastes bitter’, or ‘smells great’. These bi-grams can then be used to further segment the data. Another useful feature to engineer would be to conduct a sentiment analysis and use review polarity to separate positive from negative reviews.

Overall, understanding your customer is at the heart of every successful business, and the best way to learn about them is directly from them including through social media data.

Andrew Malinow leads the Data Science team at Zylotech, where he leverages his background as a Cognitive Psychologist, statistical expertise and passion for surfacing actionable insights from large,
messy data sets. At home he loves to spend time with his wife and 4 kids, doing anything outdoors,
and tending to his ever-growing flock of chickens on his farm in Pomfret, CT.

If you liked this post, check out our other blog post on how strengthening social media with customer data.

Topics: Customer Analytics