long exposure astrophotography

Getting Back to Basics: The 5 W’s (and 2 H’s) of Data for Individualized Marketing Science

Data tips for enabling consumer journey modeling and analysis

By: Michael Minami, VP, Marketing Sciences COE, Advanced Analytic Solutions 

Two key objectives of marketing science are: 1) to understand and to predict drivers of the consumer journey from awareness and consideration through conversion and repurchase; and 2) to optimize the return on marketing investment of the marketing communications mix. This requires disciplined, dogged, focused effort to source, research, analyze, and model on key pieces of data associated with marketing audiences, campaigns, responses, conversions, and costs.

The original 5 W’s and an H provides a simple but very useful framework for this information gathering. This framework has been used most notably by journalists and investigators to gather all the facts necessary for a discovery or investigation, as well as for subsequent analysis and insights into circumstances, causes, actions, etc.

<Insert diagram: Who What Where When Why How>

The framework is at least a few thousand years old (perhaps more), and it may be considered archaic, time-worn, and even trivial in certain circles. However, if we can put down our always-on devices, take a few deep breaths, slow down our minds, and think clearly for a moment, it can actually be extremely useful in these modern times for individualized marketing science.

Let’s dust off the framework, add an "H," and see how it can be used, shall we?


Who are you targeting with your marketing messages? This needs to be as specific and granular as possible; perhaps the right phrase is "hyper-specific and hyper-granular." If you look at your data and reporting and the answer to the "who" is "all people within a particular DMA" or "people between the ages of 45 and 54 with an income greater than $50,000," this is useful but not specific enough for individualized marketing science.

Ideally, your data will be individual level: either an identified person with a unique ID or a unique anonymous ID (e.g., cookie, device ID, etc.) using identity graph options that can help to resolve cross-channel identity. That is, every single record or row in your data would need to have an associated user ID.


What is the specific marketing message that you are sending, What product/service is being promoted/offered, and What was the response? Again, this needs to be as hyper-specific and hyper-granular as possible. If your data only tells you that you sent a remarketing message, a brand campaign message, or an offer, this is useful — but not specific enough for individualized marketing science.

Ideally, your marketing message has associated metadata such as a campaign name, a sub-campaign name, an ad name, a deployment identified, message format (e.g., png, jpg, email format, video format), subject lines and headers, the product or service promoted and/or purchased, call-to-action identifiers, image identifiers, and response KPIs (impressions, opens, clicks, video completion percentages, page views, form submits, transactions, etc.).


Where did the customer receive the marketing message, and Where did they engage or transact? What publisher, site, app, platform, inbox, device, OS, geo, etc., was the marketing message sent to and did the targeted individual receive the impression/message, and open or click on the message and eventually convert/transact? If your data only tells you fragments of where the message was sent, viewed, opened, clicked, or converted, this is useful — but not specific enough for individualized marketing science.

The pathway of a marketing message — from campaign plan to design and creation to ad bid, buy, and negotiation to deployment in an outbound sending platform to display on a receiving site, app, or platform to user impression and response to user transaction and back to the marketing team — is a long, circuitous, disjointed path full of data ghosts and goblins. Without full begin-to-end orchestration and coordination across multiple teams, significant data wrangling, statistical inference, taxonomy creation and redesign, and retrafficking/remessaging is required to enable effective individualized marketing science models.


When was the marketing communication sent, and when did the targeted individual receive the impression or message, open or click on the message, and eventually convert? If you look at your data and reporting and the answer to the "when" is "within the last year or quarter or 30 days" or "between Sunday and Saturday of this week," this is useful — but it may not be specific enough for individualized marketing science.

Ideally, your data will be time-stamped at least to the date and time (YYYY/MM/DD/HH/MM/SS) for every send and response for every individual. Although data at a more aggregated level may be suitable for marketing mix modeling and longer-time-window propensity classification models, survival analysis, response decay curve, incremental lift, and attribution analyses would suffer from accuracy deficiencies or not be feasible at all without the most granular time stamps.


Why was the marketing message sent in the first place? And why did the customer engage, convert, and transact (or not)? If the answer to the "why" is "Because we need to acquire more customers and because our message resonated with them," this is useful — but it may not be specific enough for individualized marketing science.

Answering "why" questions can be a bit more challenging because the information is typically not encoded within standard transactional data logs. However, with some additional investigation and "legwork," clues can be found that lead to the most valuable of answers — which are the answers to "why."

Campaigns are designed with specific objectives in mind: for example, brand/product awareness, product education and differentiation, engagement and conversion, and sales with KPIs such as reach, frequency, impressions, clicks, views, and transactions. So customer impression and response data must be assessed within the context of the original campaign objectives. Individualized marketing science requires us to follow the trail of the hyper-specific and hyper-granular campaign, ad, and send/response by individual ID data back to the original campaign plan documentation to compare the objectives to the results.

The reasons why a customer engages, converts, and transacts (or not) also are not always readily available in the data. However, vital clues can be found in hyper-specific and hyper-granular data by analyzing impression/click frequency distributions, survival curves, click probabilities, unsubscribe/blocker events, etc. In addition, user-driven event flags in contact center logs, web transaction logs, and sales team notes can provide the highest variable importance data. These data can be leveraged to develop the most valuable and actionable individualized marketing science insights.


How was the campaign deployed, and how can you accurately, safely, and securely track the individual consumer’s journey across ads, websites, apps, platforms, stores, etc.? Campaigns can be deployed across multiple channels (e.g., display, paid search, social, email, direct mail, and contact center), resulting in multiple disparate metrics data sets that don’t all nicely align for individualized journey modeling. This is useful — but it may not be specific enough for individualized marketing science.

The marketing science team would need to provide a framework to enable holistic and consistent journey modeling. In addition, there typically are redundant and overlapping tagging/tracking frameworks across the different channel platforms that require a logical framework to "translate and connect the journey points" across the tagging/tracking platforms.

How Much?

How much did the campaign cost, and what benefits did it generate in terms of conversions, awareness, loyalty, and engagement? I took some liberty by adding a second "H" because this may be the most important question to answer for a CMO — that is, how much value or ROI did the marketing team create?). If your data enables calculation of an overall marketing ROI, departmental ROI, or multi-campaign ROI, this is useful — but it may not be specific enough for individualized marketing science.

Data on benefits and costs is typically readily available: Benefits data can be easy to analyze, but properly attributing those benefits back to campaign activities requires a consistent methodology and careful thought. Cost data is much more challenging to properly decouple, reconcile, and allocate for individualized marketing science. Costs can exist in a multitude of locations, such as internal financial systems, campaign planning solutions, ad supply and demand platforms, third-party deployment vendor systems, etc. — all with different categorizations and dimensional breakdowns (e.g., CPM, CPC, cost per email, cost per mail piece, design and copywriting labor, overtime periods, etc.). Care must be taken to architect a marketing cost framework that accounts for top-down/bottoms-up reconciliation and allocation of costs, including consideration of time lag factors, to marketing campaign activities in order to properly compare benefits and costs.

Macro-level time-series forecasting in the form of marketing mix modeling enables return on marketing investment modeling at an aggregate level. But micro-level individualized marketing science requires a different paradigm with different cost considerations.

In conclusion, dive deep into the data — think hyper-specific and hyper-granular when assessing the data available. Chances are that you won’t immediately have the data required for individualized marketing science: data tied to individuals; campaigns; ads; content, images, and messages; deployment; and response.

Clearly conceptualize the individual consumer journey across marketing and conversion touchpoints, and define terms, connection points, and objectives. Start with the end in mind. Have a clear set of use cases for analysis and modeling that cuts across audiences, marketing messages, and value and that utilizes the individualized marketing science data. 

It’s a perfect time to get back to basics by answering the 5 W’s and 2 H’s for every individualized marketing science project. The framework may be old, but the genius of its simplicity is timeless.