-
Notifications
You must be signed in to change notification settings - Fork 151
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding a paid advertising medium #130
Comments
Interesting idea @kingo55 ! What do the other maintainers think? |
Makes sense from my POV. Would go specific with display, cpc, ppc etc. |
@alexanderdean and @lstrojny I've been grouping some of the sites together but I don't think we can reliably specify the type. For example, Taboola does content marketing but they may also do display through their network display:
Taboola:
domains:
- trc.taboola.com
- api.taboola.com Do we classify them as "display", "content marketing" or something else? In some ways they sound more like a network than a traffic source. Thoughts? |
Awesome... that makes things easy then. Here's a draft I've been working on: paid:
Google:
domains:
- www.googleadservices.com
- partner.googleadservices.com
- googleads.g.doubleclick.net
- tpc.googlesyndication.com
- googleadservices.com
Taboola:
domains:
- trc.taboola.com
- api.taboola.com
- taboola.com
Criteo:
domains:
- cas.jp.as.criteo.com
- cas.criteo.com
Doubleclick:
domains:
- ad.doubleclick.net
- ad-apac.doubleclick.net
- s0.2mdn.net
- s1.2mdn.net
- dp.g.doubleclick.net
- pubads.g.doubleclick.net
AppNexus:
domains:
- ib.adnxs.com
- adnxs.com
- 247realmedia.com
Sizmek:
domains:
- bs.serving-sys.com
PubMatic:
domains:
- sshowads.pubmatic.com
Acuity Ads:
domains:
- acuityplatform.com
OpenX:
domains:
- us-ads.openx.net
- openx.net
- servedbyopenx.com
- openxenterprise.com
Tribal Fusion:
domains:
- cdnx.tribalfusion.com
Eyeota:
domains:
- eyeota.net
Sociomantic Labs:
domains:
- sociomantic.com
ONE by AOL:
domains:
- nexage.com
Neustar AdAdvisor:
domains:
- adadvisor.net
Casale Media:
domains:
- casalemedia.com
BidSwitch:
domains:
- bidswitch.net
StickyADS.tv:
domains:
- stickyadstv.com
- sfx.stickyadstv.com
Mixpo:
domains:
- mixpo.com
Yieldmo:
domains:
- yieldmo.com
Jivox:
domains:
- jivox.com
Adform:
domains:
- adform.net
Fluct:
domains:
- adingo.jp
AudienceScience:
domains:
- wunderloop.net
MicroAd:
domains:
- microad.jp
LifeStreet:
domains:
- lfstmedia.com
Rubicon Project:
domains:
- optimized-by.rubiconproject.com
SteelHouse:
domains:
- steelhousemedia.com
Sovrn:
domains:
- lijit.com
Sonobi:
domains:
- sonobi.com
ZEDO:
domains:
- zedo.com
- z1.zedo.com
AdRoll:
domains:
- adroll.com
Flashtalking:
domains:
- flashtalking.com
- servedby.flashtalking.com
Outbrain:
domains:
- paid.outbrain.com
Plista:
domains:
- farm.plista.com
White Pages:
domains:
- www.whitepages.com.au
- mobile.whitepages.com.au
MyShopping.com.au:
domains:
- www.myshopping.com.au
GetPrice.com.au:
domains:
- www.getprice.com.au
Finder.com.au:
domains:
- www.finder.com.au
- fcc.finder.com.au
Mozo:
domains:
- mozo.com.au
- a.mozo.com.au
InfoChoice:
domains:
- www.infochoice.com.au
- keyfactssheet.infochoice.com.au
RateCity.com.au:
domains:
- ratecity.com.au
- direct.ratecity.com.au
- www.ratecity.com.au |
Whoa - great list @kingo55 ! |
Thanks @alexanderdean - here's first cut that seems to do the job in the Python lib: kingo55@ea2d99c If you want me to keep the paid changes separate to the other source changes, I can split them out and submit in separate pull requests. |
Yes please, separate PR would be great! |
@kingo55 @alexanderdean FYI most online marketing campaigns I know are using UTM parameters to identify payed traffic https://en.wikipedia.org/wiki/UTM_parameters. |
Hey @DCMNMarc - sure, we make use of UTM parameters in Snowplow heavily.
Appreciate the point but if people had said the same thing about IP:geolocation then we would never have had things like MaxMind... |
this is true but do you think the amount of work for generating and managing such a bug database fits into your workload even it there is already a solution for it using UTM parameters? Also what happens if you detect paid traffic which is on the same time a known referrer type (like a search engine). As far as I know currently you only support just one. |
Good point. Any given referrer URI should only be found in the database once. If the same URI is used for two different mediums, like search and paid, then we should give the traffic the benefit of the doubt and make it search (i.e. don't assume paid).
UTM parameters are great but they are only suggestive and they can be omitted, incorrect or spoofed. The adtech landscape is huge [1] but the top 20 vendors very likely account for more than 80% of all revenues, and @kingo55's list is a great start... So I am broadly in favor of adding this... [1] http://www.lumapartners.com/resource-center/lumascapes-2/ |
great link @alexanderdean As long as this list doesn't effect other mediums (I would highly recommend a test for it) then I'm fine with it as I don't need to use this feature ;) |
Good idea @DCMNMarc - added a linter ticket to enforce this kind of thing #132 Do you use referer-parser directly or as part of Snowplow? |
I'm using the python version directly in a spark application |
@DCMNMarc - more useful than grouping it all under the vast expanse of "unknown" IMO. Paid traffic behaves very differently and is often laden with bots and unique data in URLs. We run Snowplow across a range of sites with inconsistently manually tagged URLs. Makes sense to group them from that perspective. |
Having this in referer parser would be very useful to us. Did this ever get merged, or can we just manually update the config from above? @alexanderdean does making our own updates to these data files cause any problems when upgrading to future versions of snowplow @kingo55 do you have any updates to the list version you posted here last July? |
This hasn't been merged yet @ryanrozich; making edits to this file inside Snowplow shouldn't break anything. |
@ryanrozich - This was my latest commit: https://github.com/snowplow/referer-parser/pull/139/commits Not sure why it was failing the tests though. |
Wanted to confirm my understanding if we were to try to make use of these edits.
Also, to confirm, if we use the commercial version of MaxMind, we would have to self host anyway? We do have a concern about how much effort it requires to self-host these assets? What are the best practices there to keep up to date, and how much additional time does it typically take per release. Sounds like using Transmit to sync the hosted assets, and then wash and repeat steps 2-5 above? Thanks! |
No, you don't have to self-host the jars that Snowplow runs just because you are self-hosting the commercial MaxMind file(s).
Yes the 5 steps you list are the correct ones @rbolkey
It would likely add 30 mins or so per upgrade, assuming a fast network connection. |
A lot of Google Display network traffic just shows up under "unknown", likewise a lot of other display networks show up like this. To use this data in Snowplow, we need to look for mkt_network = 'Google AdWords' and refr_urlhost = 'googleads.g.doubleclick.net'
Do you think it's worth classifying them with referer-parser? Happy to submit a pull request with the changes...
I suspect we'd need to put some thought into the category naming. e.g. Would we go specific - "display" "cpc" "ppc" or general "advertising"?
The text was updated successfully, but these errors were encountered: