Sellers of Customer Data Platforms (CDPs) promise their software will gather data from various applications, and assemble it into a single-source-of-truth “golden record” for each customer.
It’s a lovely vision, but rarely achieved. And that’s perfectly okay. The trick is to press on anyway.
Let’s use this common CDP use case to illustrate the complexity: Identifying customers among the hoards of anonymous visitors to your website.
It’s a challenge. Anonymity was central to the internet’s design. And while there are lots of ways to identify anonymous website visitors, they all have their limitations.
Let’s imagine a fellow named Robert Williams, a swing dance aficionado, who interacts with Ella, the publisher of (the fictitious, I believe) Ella’s Swing Dance Magazine.
Robert meets Ella on his commute to work, and she tells him he ought to read her magazine. On his lunch break, Robert searches for the magazine website on the desktop he uses at the office. When Robert’s web browser makes a request to Ella’s Swing Dance Magazine website, Ella’s CDP puts a cookie on that device and creates a user profile. The profile includes the following information:
Profile 1
IP address: 25.23.108.5
User-Agent: Mozilla/5.0 (Linux NT 10.0)
Referrer: https://www.google.com
The record might also include what pages were visited, and what type of content the visitor seems to prefer. The visitor is still anonymous to Ella’s CDP. The profile is one of the millions of unknown visitors.
When Robert gets home that evening, he types the URL of Ella’s website into his iPad. Her CDP dutifully puts a cookie on that device and creates a new profile. But on this visit, Robert decides to sign up for Ella’s free e-newsletter with one of his junk email addresses. The CDP captures the email address from the form submission and creates a second profile, which has more information than the first.
Profile 2
IP address: 32.12.100.21
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_6)
Referrer: [blank]
Email: bob2387@hotmail.com
Name: Bob Williams
Nothing in this second record enables Ella’s CDP to conclude the records are tied to the same individual. The records were created on different devices at different times, and share no information identifying Robert.
Two weeks later, Robert and Ella are jitterbugging at Mobtown Ballroom in Baltimore. Ella has a few copies of her magazine, and Robert takes one home. He signs up for a print subscription using one of the blow-in cards. Ella’s fulfillment service dutifully records this new subscriber data, which is then imported into the CDP, creating Robert’s third profile with still more information:
Profile 3
Name: Robert Williams
Address: 123 Main Street
City: Bowie
State: Maryland
Zip: 20715
Phone: (301) 555-1212
Email: me@robertwilliams.com
This profile has valuable information, including a new email address. But this profile has no data from online activity, so it doesn’t help with online ad targeting or customer journey data.
Robert now has three profiles in Ella’s CDP. There’s no way to merge any of them. We know they’re all Robert. The CDP doesn’t.
Fortunately, Ella’s magazine has the good sense to include some special online content for print subscribers as a way to link offline and online behavior. A QR code printed in the magazine allows Robert to view a video on the website about the Travelling Charleston. Robert scans the QR code with his iPad. That takes him to the website, where the CDP recognizes the cookie it put on that device earlier.
Bingo! Now Ella’s CDP can merge the iPad profile (#2) with the subscription information (#3). (Note! This only works if the QR code carries some information about Robert’s subscription!)
Several good things happen as a result:
- Robert’s three profiles have been consolidated into two
- Robert has become a known user in Ella’s CDP
- Ella’s CDP knows that Robert uses two different email addresses
- Robert’s subscription information (offline behavior) and the profile created when he accessed Ella’s site from his iPad (online behavior) are now linked.
The record created from Robert’s desktop remains anonymous.
Note that, in this scenario, Ella’s CDP has been configured to accept multiple emails in a customer’s profile. Some companies designate the email address as a unique field – allowing only one per profile. In that case, the records would not merge, and Robert’s subscription information would remain in its own profile, not connected to any online activity.
Will Ella’s CDP ever be able to attach Robert’s work computer to his online profile? Maybe. For example, if Robert opens one of Ella’s e-newsletters on his work computer, the CDP might (depending on how strict it is about such things) recognize that as Robert and merge the profiles.
Identifying individuals from their online and offline behaviors and creating single records may seem complicated, but it’s quite a bit less confusing than what happens in real life. Consider the complexity added when Robert’s smartphone and home desktop are added to the equation.
Merging records: deterministic vs. probabilistic method. Which is right for you?
The “golden record” the CDP salesman likes to highlight assumes all these different sources of information can be merged, but they need to have a field in the record to merge on. What’s that going to be?
Most companies opt for an email address as the best piece of personally identifiable information on which to merge records. But as we’ve seen, people have multiple email addresses. They also change over time.
If you stick with a strictly deterministic matching method, you’ll need to match a unique field (like an email address or a social media account) across multiple profiles to create your “golden record,” and you’ll inevitably leave some information behind.
There are other options. Some CDPs use probabilistic methods to merge profiles. That method enables you to match records that might otherwise remain distinct. But you risk incorrectly merging profiles and creating a customer experience headache.
You can’t create a single record for each customer that covers all the chaos and weird realities of how people behave. What you can do, and what you must do, is decide where that matters.
There are use cases where improperly merged profiles yield very bad customer experience outcomes. Stick with deterministic matching in those cases, even though you’re going to lose some of the data on interactions with that customer. You’ll have multiple profiles for some individuals, many of which will remain “unknown.”
Other use cases are far more forgiving. If you want to create a segment of people who share a particular interest, you don’t need to get down to the individual. In these cases, probabilistic methods are sufficient.
In any event, recognize that “golden records” are a nice idea, but you’ll never actually get there.
Great article that I shared on LI.
It’s like trying to identify duplication cates during a m/p. 90% effective is still better than not using the process.
Jeff