How we do it
Our Methodology
Gator.IO uses many different detection methods to determine whether the user is valid or not. Some of these methods are algorithmic, and others are learned over time by
detecting patterns in the data. Not all methods will be listed in order to protect our intellectual property and to prevent reverse-engineering.
Scoring
Some methods are fairly conclusive about whether the user is valid or not. Other methods produce a likelyhood of validity, which we show as a score from 0-1000. The lower the score,
the more likely the user is invalid. For example, a user proxying in through a data center's I.P. address is highly unlikely to be a valid user. On the other hand, certain
countries originate the bulk of invalid traffic, but also have real users.
Scores will generally be between zero and about 500. A score below 100 is considered 'invalid', meaning there is almost a certainty that the user is not real. The upper end of the
scoring range will be used in the future for whitelisting methods.
Bots
A large and growing percentage of web traffic is generated by bots, spiders, extensions, headless browsers, toolbars and other means (collectively called bots). The bots have
become increasingly sophisticated in how they disguise themselves, therefore requiring continously evolving detection methods.
Here are some of the methods we employ:
Method |
Description |
Block List |
We check every I.P. address against our database of known infected machines. This detects machines that have been hijacked as spambots and also
machines that are infected with viruses and generate large amounts of automated traffic and clicks. This database is maintained in realtime in
order to detect emerging sources.
|
Data Center Origin |
We maintain a database of data center I.P. address ranges, since many bot networks will use data centers to create or proxy
traffic. A session from within, for example, an Amazon AWS data center address block is unlikely to be valid.
|
Public Web Proxies |
Similar to using a data center to proxy traffic, public web proxies are also used. We maintain a realtime database of public web proxies in order
to score sessions from them.
|
TOR |
TOR has legitimate uses, but hides the origin of the user, so it can be used to generate random sessions.
|
Spoofed User Agents |
Bots often rotate their user agents in order to appear to be more than one device and generate realistic looking traffic. We have
developed technology to match the user agent to the browser's capabilities and detect sessions that have altered their user agent.
|
Invalid Searches |
To appear to be from a search engine, often bots create fake referrer headers. In many cases, these headers differ from real search engine
referrer structures.
|
Collusion |
This method detects the coincidence of a set of I.P. addresses and a set of publisher sites.
|
Other Proprietary Methods |
We currently have developed several other methods for detecting fraudulent sessions and this continues to be a primary focus of our research
efforts.
|
Hidden Users
Hidden users are from sessions where no page is ever visible on the screen. This is often, but not necessarily due to bots, since there are many generated by search engines pre-loading
pages in the background in order to improve performance. Also, a page may be behind a tab that is never shown, or offscreen. Hidden sessions score zero due to this.
Primary reasons for hidden sessions:
Reason |
Description |
Preloading |
Search engines will preload pages in the background while a user types in a search query. The search engine attempts to
predict which link or links the user will click on and loads the pages from those links. This is a way to improve the performance of web
browsing, however many of the preloaded pages are never made visible and should not be counted.
|
Browser Window Hidden |
This occurs when a browser window is behind another window.
|
Background Browser Tabs |
A browser tab can be launched in the background and load pages. These pages are never visible unless the user opens the tab.
|
Bots |
Even if the session is not detected as a bot, the session will often never be visible and be scored as invalid.
|
Our technology tracks whether a session is ever viewed and updates the visibility based on that. For example, if a page is hidden during a pre-load, it is initially recorded as
hidden with a score of zero. If the user clicks on the link to view the preloaded page, that is detected and the session is updated with a new score.
Each session is scored and reports all have options to include or exclude users based on score. For example, you may want to view campaigns where the score is less than 100. This
would show you the campaigns that are referring the worst quality users.
Signup