How Online Advertising Works

Prabhakar Krishnamurthy
16 min readSep 22, 2021

[This article was written in 2017 and the discussion of tracking techniques reflects the state of the art that year]

Photo by Joshua Earle on Unsplash

You briefly consider a Hawaii vacation and look at packages online. Now you can’t open a web page without seeing palm trees. Here’s why.

In 2016, both in the United States and around the world, more advertising dollars were spent online than anywhere else, with online eclipsing TV for the first time. This news shouldn’t come as a surprise. According to Global Web Index, a company that collects data on users’ online activities, in 2015 the average person in the United States spent an average of 6.2 hours daily involved in some kind of online activity: watching videos, listening to music, consuming social media, messaging on computers, tablets, or mobile phones. This is far more than the 5.1 hours a day spent watching TV (according to Nielsen). That’s why US-based advertisers spent over 72 billion dollars on online advertising in 2016 according to the consulting firm PwC, and the vast majority of online publishers today rely on advertising for their revenues. Google and Facebook, two of the highest-valued corporations in the world, generate most of their revenue from online advertising: approximately 95 percent of revenue for Google and over 80 percent for Facebook.

Clearly, online advertising has become a powerful economic phenomenon for the business world. Many advertisers — for example mail-order businesses — exist only due to their ability to advertise online. Similarly, publishers of websites and apps are able to offer their content free to users by generating revenue from ads on their websites or apps. Of course, this is possible only if users are willing to view ads and act on them. As someone who has been building online advertising systems for the past decade, I can say that advertisers and publishers are investing heavily in the quality of the ad experience by making ads more engaging, better targeted and more personalized.

So how do advertisers figure out what ads to show, which users to show them to and when? For search advertising — ads that appear when you type a term into a search engine, that’s easier to do — your query suggests what you are looking for at that moment, so online services can easily feed you ads directly related to your query. For display advertising — the ads that appear on web pages when a user visits them — it’s a lot harder to figure out what ads to show to what user. To do so requires responsibly following individuals around on the internet; learning as much as possible about individual users without excessively invading their privacy; and then getting them to actually pay attention to the ads that they are shown. It’s a tough balance.

The simplest and most frequently used piece of this puzzle is user tracking. Say you visit a travel site to look for deals on a Hawaii vacation. You click on a few offers. You like one or two, but leave the site without making a purchase. For the next several days, as you visit completely unrelated websites, you are barraged with ads from the travel site you visited and presented with special offers, some for products similar to that Hawaii vacation package, some completely different. There’s a point to all this repetition, even if you start to find it annoying: According to CMO.com, retargeting users improves the chance of their completing a purchase on the merchant’s website by 70 percent. The travel site in this example likely used a very basic form of user tracking — a browser cookie, the most common method of tracking and one of the oldest. A regular cookie is essentially a small text file, sometimes only a few kilobytes in size, which contains information the page will load for you upon subsequent visits. For example, you might have noticed when you visit an online store you have visited before, that it remembers your user name. It may remember items you had placed in your shopping cart on your last visit. These cookies are stored on your computer. A tracking cookie is a type of cookie that works a bit differently. Most tracking cookies are harmless and are intended to log information for advertising purposes. As an example, if you go to a website that hosts online advertising from a third-party vendor, the third-party vendor such as Criteo can place a tracking cookie on your computer. If visit another Web site that also runs advertisements from the same third-party vendor, then that vendor knows you have visited both Web sites. Cookies were great when users just went online on their personal computers or laptops. Nowadays, though, a user typically switches between several devices: a laptop, a smartphone and a tablet — perhaps even multiple devices of each type. Since cookies only work on a single device, they are pretty useless in a multiple-device world, and they don’t even work on applications (apps) that run on mobile devices.

So advertisers have gotten more sophisticated. Consider this scenario: Summer is approaching and you are looking for a grill for your large backyard. You go to your neighborhood appliance store and check out a few models, but decide you want to do a little research before purchasing one. Sitting in your car, using your mobile phone, you find one of the grills you just saw, in an online store at a better price; you place it in your shopping cart, thinking you’ll probably complete the transaction later, but may still shop around a bit. For the next few days, you see ads for grills from the online store, not just on your phone, but on your computer, usually, but not always, when you are visiting sites related to home and garden. You eventually make an online purchase — and then you start seeing ads for products that are of interest to grill owners, like a grill brush and a grill cover, both on your phone and on your computer. The tracking in this example may seem uncanny. It includes two features of today’s sophisticated tracking technology: the ability to track users across devices and the use of algorithms to make highly relevant product recommendations.

Advertisers have two technologies available that can track across multiple devices: “deterministic” and “probabilistic.” Deterministic cross-device tracking is a very precise technology, but it requires users to sign in to their websites and apps on every device they use. Facebook and Twitter, for example, can track users this way, and it’s obviously not hard; they simply record whatever information they need and link it to your unique identifier. That works great for personalized social media platforms or subscription sites, but most people who are just browsing shopping and or reading news don’t want to sign into every one.

So ad tech companies like Drawbridge and Tapad use probabilistic cross-device tracking, a less exact approach. These companies use statistical models to figure out which individuals are using which sets of devices, be they phones, tablets, or laptops. Their exact methods are closely held secrets, but typically these companies start by acquiring billions of data points from activity logs, the detailed records of what apps people use and when, websites they visit, operating systems they are using at the time, IP addresses, make and model of the devices they use, GPS coordinates, time of day, and other information about devices and what people are doing with them. They can tap this information whenever an app or website being used or visited sends a request for an ad since pertinent information about the user is sent along with the ad call to the ad serving system to help match ads to users more precisely. They then apply a host of techniques, including machine learning, data mining and heuristic rules, to identify the device and the individual associated with it. Identifying the device is easy if a device ID is available: Apps running on Android or Apple phones have access to a unique id for each device, unless the user has chosen not to be tracked. When a device ID is unavailable, companies rely on something called a “device fingerprint” — a nearly unique identifier for a device the tracking companies assemble out of characteristics including device type and model, operating system version, installed font libraries, and clock-skew. Matching the device to an individual is more complicated. It uses something those in the industry call a device graph. The user in a device graph isn’t known by name to the companies, but rather the system represents him by a system generated unique identifier. Users are described by their demographic attributes, life-stage (e.g. young, new mom) and life-style attributes (e.g. hiker, traveller), political leaning (liberal, politically-active), and interests (college football-fan, avid sci-fi reader) inferred from data about his past behavior.

Creating the device graph starts by clustering potentially related devices, those belonging to a user based on common behavior patterns. The companies that serve up ads can choose from a variety of methods to perform the clustering. A popular clustering method relies on the fact that co-located devices connected to the Internet using the same Internet Service Provider (ISP) will have the same public IP address. And, it turns out, two devices that frequently have a common public IP address will be linked to the same user — the user might often use both his laptop and his phone at home, at work, and on public networks.

Several companies such as Crosswise (acquired by Oracle), Drawbridge and AdBrain have also applied machine-learning methods, such as “Classification” and “Learning to Rank” to the problem. Classification is the problem of identifying to which of a set of categories a new item (user or device in this context) belongs. Using such patterns, companies build a device graph connecting devices to users probabilistically. Some companies claim a precision of up to 97 percent.

Probabilistic tracking works for companies such as Drawbridge and Tapad that have access to a vast amount of data about users. However, most advertisers do not have this amount of data. To overcome this hurdle, Adobe in 2016 launched a Device Co-op, where advertisers can pool their data to create a cluster of devices belonging to the same user. The members of the co-op give Adobe access to users login IDs and HTTP header data, cryptographically hashed; that hides a consumer’s real-world identity and instead gives them a unique Internet identity. Adobe will then process this data to create groups of devices (“device clusters”) used by a particular person or household. Adobe will then provide these device groups as a marketing service offering so co-op members can measure their advertising performance, categorize individuals based their interests and attributes and advertise directly to them across all of their devices.

Companies like Silverpush have developed technology to extend this kind of cross-device tracking to TVs. With their technology, advertisers can set up a so-called audio beacon in their TV commercials. An audio beacon is an ultrasonic sound that humans cannot hear but can be easily detected by microphones on mobile devices. Your phone, if its running an app that uses Silverpush’s technology, reacts to the signal by noting what shows you are watching. Later, your mobile device can show you ads based on the TV programs you watched, say if you regularly watch food and cooking shows, you might start seeing food product ads on your phone, alternately advertisers can peg you into a specific demographic group based on the programs you watch. While Silverpush claims to have abandoned the use of this technology for ad tracking purposes, in 2017 a team of researchers at Technische Universitat Braunschweig in Germany found that over 200 Android apps were using Silverpush’s publicly available software to listen to these audio beacons. However, companies such as Lisnr and Shopkick are using audio beacon tracking technology to track users in or near stores. They do this legitimately — by informing users of the tracking and allowing them to opt-out.

The tracking technology described so far enables advertisers to monitor your online activities. However personal devices such as mobile phones and tablets carry sensors that allow advertisers to monitor your offline activity as well. Location tracking is the most obvious one. All smartphones have a GPS receiver. If you give your permission to an app on your phone to detect your location, the phone’s operating system uses the GPS receiver to continually record your location coordinates in terms of your latitude and longitude. The app receives this information, and can determine the zip code of your location or whether or not if you are in an area of interest to an advertiser. For instance, Starbucks can create a campaign to show ads to people who are within a set distance of a Starbucks location. They can also pop you a coupon.
Advertisers can also make use of your current location to infer the weather you are experiencing and show you appropriate ads — for example, for a cold can of Coke on a hot summer day. Companies such as Placed and Factual can map your location coordinates to determine your shopping or eating habits, determining which stores or restaurants you visit or pass regularly.
Companies — pretty much all publishers of apps on your phone — can log your location over time and can infer your home and work locations from the fact that you are in the same geographical coordinates during the night or day regularly.

That’s how advertisers, publishers, and third-party data providers use tracking technology to monitor your activity in the online and physical world, both in the moment and to accumulate data on your activities over time. But they don’t stop there. These companies hire large teams of data scientists to develop profiles of you from the patterns of your activities — your interests, your attitudes, and even demographic attributes like gender, age, and socio-economic status. Here are a few examples of how they do that. If you browse many sports-related websites displaying Google’s AdSense ads or watches sports-related videos on YouTube, Google may associate a sports interest with your cookie or Google Account and show you more sports-related ads. Similarly, if the sites you visit have a majority of female visitors, Google might infer that you are a female and target ads with that demographic in mind.

An article in the New York Times a few years ago, reported how the retailer Target under the belief that people develop new buying habits just prior to childbirth had gone to great lengths to identify which of its customers were about to have a baby, based on the items they put in their online shopping carts. Target did a detailed analysis of its customers’ shopping habits and found out which products they were more likely to buy as they were preparing for a new baby. That allowed them to get a head start on other retailers and start marketing to the parents even before the baby is born.

Your photos can be also used to infer your interests. Even if you have not tagged your photos, companies can use machine-learning technology to analyze your images; Flickr, for example, is using deep learning technology to classify its images into categories automatically based on their content. Categories include things like cloud, sunset, nature, ocean, car and dog. The tool should even be able to tell what some well-known objects are. For example, if you take a picture of the Golden Gate Bridge, it should be able to recognize that and determine that you’re in San Francisco, something local advertisers would be very interested in knowing.

Researchers have been successful in inferring users’ private attributes, including home addresses, sexual orientations, and interests, using their public data in online social networks. State-of-the-art methods leverage a user’s both public friends and public behaviors (page likes on Facebook, apps that the user reviewed on Google Play) to create a profile of the user’s private attributes.

A large and lucrative business sector has developed around the collection, processing, and sharing of data on users. Large companies like Google, Facebook, as well as smaller companies like Bluekai (now part of Oracle), Lotame, Excelate (now part of Nielsen), and Drawbridge have invested in technology and hired large teams of researchers to develop comprehensive and precise profiles of users. Such information is highly valuable to advertisers.

Advertisers need to show the right ads to the right users at the right time to realize the best returns on the dollars they spend. The merchant John Wanamaker reportedly said, “Half the money I spend on advertising is wasted; the trouble is I don’t know which half”. But Wanamaker said that in the days before the Internet. Part of the reason for the movement of advertising dollars to the Internet is the increased ability to target users who are most likely to respond to their advertising and the fact that user-tracking technology can also measure user’s response to their advertisements. A well targeted advertising campaign can yield a high conversion rate: the ratio of the number of users who purchase the advertiser’s product or take a desired action, to the number that ad impressions that the advertiser paid for. A conversion can mean hundreds or thousands of dollars in profit for advertisers who advertise for insurance policies, automobiles, etc. Advertisers measure effectiveness of their campaigns by measuring the lift in conversions due to advertising. They can do this by a controlled experiment where an audience is randomly split into two groups — one, the test group is shown the ad, and the other is not. An important step in this is to link user’s ad views to purchases — this is quite straightforward if the purchase happens online. Offline purchases are harder to track. Advertisers often use store visits as a proxy for purchases, store visits can be identified using location tracking through your cell phone, as described earlier.

That’s the world of Internet advertising today. But change is coming, thanks to two trends. For one, more and more of our devices, not just our computing and communications devices, but our automobiles and household appliances, are being tooled with sensors and connected to the Internet. Also, the way we interact with our devices is changing and, therefore, so will ads. Here’s a peak at what that will mean.

Consider automobiles. Some newer automobiles ship with Internet connections, intended for uses such as engine diagnostics and driver assistance, and companies such as Dash Labs are making devices that will retrofit older cars to do the same. These devices allow data such as fuel efficiency, location, type of car, engine health, type of trip, and driver behavior like driving speed, acceleration and braking, to be collected. Dash Labs partners companies like Edmunds.com and to collect additional information, such as VIN number, ambient weather, and road conditions. With all this data, Dash Labs can predict, say that a battery is about to fail and send the user a coupon or offers for battery brands and retail stores.

Startups like Evrythng, in partnership with companies like Thinfilm are allowing consumer products manufacturers to connect their products to the Internet and to smartphones as soon as they are manufactured, by giving each product a digital identity and tracking through its lifecycle. That means, for example, packages of food on a store shelf could send you a coupon as you walk by them, or medications sitting in your medicine cabinet could alert you when they are about to expire. Recently, Diageo, an alcoholic beverage company, partnered with Evrthing and Thinfilm for a Father’s Day pilot in Brazil in which consumers attached a personalized film tribute to their dad to the bottle of whisky they were giving as a gift; which the dad can read by using their phone to scan the QR code on the bottle.

Ad experiences will become much more immersive using virtual reality technology providing users with experiences that draw them into a virtual world of the advertiser’s creation. Marriott Hotels, in its first foray into virtual reality, created traveling phone-booth like VR systems in public spaces like shopping malls that delivered a 4D sensory experience to ‘teleport’ guests to locations like beaches in Hawaii and streets in downtown London — near Marriott Hotels, of course. With Smartphones such as Samsung’s Galaxy V8 supporting a high end VR technology, such immersive advertisements will likely become more widespread.

What will online advertising look like in future? Consider the following trends: a) users are increasingly using ad-block to selectively block out bad ads — irrelevant ads, ads that slow down the page load, or ads with annoying features like pop-ups that get in the way of users enjoying online content; b) with more of our daily-use devices — automobiles, kitchen appliances — being connected to the internet, advertisers will be able to track more of our daily activities. Our privacy is being continually eroded and users will accept it — surveys show that millennials are less concerned about privacy than the rest of us. c) More of our senses are involved in online experiences — Virtual Reality (VR) and Augmented Reality (AR) are already coming into vogue and providing us rich visual and aural experiences. Researchers in the City University of London are building devices that can send smells and touch over the internet; d) Communications speeds are continually increasing, the advent of 5G devices — the next generation mobile network standard — expected in 2018, will allow almost any connected device to communicate with any other with very low latencies; e) Interfaces to content and services are increasingly conversational and powered by AI, today’s examples include: Siri, Alexa (Amazon), and Google Assistant. This style of interface will allow personalized ad-hoc interactions rather than scripted ones — by inferring the user’s intent the AI engine will generate highly relevant responses, which can include ads. Rather than presenting generic information, Ads can be customized to the user’s needs.

Future advertising will provide users rich experiences — the buzzword is immersive — and engage all of our senses including touch and smell using devices attached to your phone or computer. Ads will also be highly personalized to you. Here are some advertising scenarios that we might see in the near future. Say you are looking for a restaurant to take your date to for dinner. As you search for restaurants you are presented with a few ads, you click on one of them and you are presented with an Augmented Reality (AR) experience of what actually sitting at a table at the restaurant feels like complete with sounds and the smells of the food (smells delivered through an olfactory device connected to your phone) the restaurant guesses you might like based on its knowledge of food preferences.

You walk into a clothing store. There are cameras installed on the clothing store that are relaying images of you to a smart system that infers your physical attributes — height, girth, skin tone. As you walk by the store’s large display screens, the store displays shirts that it predicts you will like. You stop and look at the displayed shirts. There are a few you like. You touch one of the shirts and a full size image of the shirt appears on the screen. The store prompts for your permission to take a 3D image of you so that it can show what the shirt will look like on you. You accept and then you get a view of what it might look like on you from various angles. You try other shirts — different styles and colors. You decide on one of them and ask to purchase the shirt — the store searches its inventory to see if the shirt is available in the store. It does not find the exact color and pattern but shows you close matches and also offers to place an order for the exact shirt you liked. You chose to place an order for the shirt you liked and ask for it to be delivered home.

--

--