Assessing the state of breached data search services

Written by Trevor Giffen and edited by Steve Ragan with notable peer collaboration for analysis.

Hi, my name is Trevor Giffen, and I believe that the threat intelligence industry needs to provide an affordable and competitive means of searching and monitoring credential exposures for organizations. I would like to tell you why.

Introducing breached-data-as-a-service.

A breached data search service collects data exposures, indexes them as records in a database, and makes those records searchable, typically using an email address or domain name.

Despite being traditionally used as a "hack tool" by criminals, these services are also used by the cybersecurity industry to protect organizations (i.e. preventing ATOBECpassword sprayingcredential stuffing) .

The traditional use case of breached data search services encouraged criminal activity, but the cybersecurity industry has built a new use case of using the same, or similar, tools to fight back.

To prove the legacy of breached data search services, this research assesses low-cost, high-yield grey zone case studies, alongside the threat intelligence industry: WeLeakInfo, LeakedSource, and DeHashed.

If you would like to understand the history and timeline of these services in more detail, consider watching one of my presentations on the subject (HackFest, 2018; conINT, 2020).

Legacy proof 1: WeLeakInfo.

WeLeakInfo first appeared in 2017 as a breached data search service. WeLeakInfo allowed anyone to register an account and purchase account credentials, among other personally identifiable information, offering low-cost and high-yield results, through a web portal and an API.

While many cybercriminals used the service as a tool to compromise accounts, many security researchers also used the service as a tool to protect people with using WeLeakInfo's unmatched data coverage and near-unrestricted API access. WeLeakInfo lowered the bar-for-entry to organizations by providing corporate invoices, satisfying compliance requirements and tax write-offs.

WeLeakInfo's API gained popularity in the cybersecurity industry because it allowed domain name queries with well-structured responses (i.e. JSON format) that make it easy to parse (i.e. jq) thousands of records into human-readable spreadsheets (i.e. CSV format) that can be used for risk mitigation activities (i.e. password resets) in an organization.

WeLeakInfo's domains are announced by the FBI as being seized on 2020-01-16, and at least two 22-year-old men were announced by the PSNI as being arrested for operating the greater WeLeakInfo service. WeLeakInfo possessed significantly more data sources than competitors, in part due to their alleged direct involvement in the criminal economy.

Later, the NCA announced that WeLeakInfo's customer transaction records were used as an investigative pivot to arrest 21 UK-based cybercriminals; additionally, at least 69 individuals in England, Wales and Northern Ireland were visited and warned of their potentially criminal activity, 60 of which were served with cease and desist notices. Those who are in trouble are likely known cybercriminals, it should be unlikely for law enforcement to bother practicing security professionals.

WeLeakInfo: This domain has been seized by NCA / DoJ / Politie / PSNI / Bundeskriminalamt

Legacy proof 2: LeakedSource.

LeakedSource first appeared in 2015 as a breached data search service, existing as WeLeakInfo's chief predecessor. LeakedSource was mostly used and advertised as a "hack tool" for criminals, but they did provide a self-described API for businesses "to help businesses determine which users can be found in leaked databases," specifically noting that "this information can be important to cybersecurity companies."

In the cases of LeakedSource and WeLeakInfo, everything falls apart after having a notorious run of being the leading provider of credential exposures on hacking forums, effectively marketing to an audience that views them as "hack tools". Remove the names of LeakedSource and WeLeakInfo, and it is difficult to find their difference. There is no observed evidence of LeakedSource being embraced by the cybersecurity industry while it existed, but well-intended security practitioners may have used it to protect organizations.

LeakedSource ended in an abrupt halt in 2017 when the service was seized by law enforcement and the founder (Canadian) was arrested as a part of "Project Adoration". The founder was arrested on suspicion of making ~$247,000 CAD from the business. This is our first known precedent of a breached data search service being shut down by law enforcement.

LeakedSource: "We have created an API for business use to help businesses which users can be found in leaked databases."

Legacy proof 3: DeHashed.

DeHashed first appeared in 2017 as a breached data search service, existing as WeLeakInfo's successor due to their comprehensive provisioning of data. Today, DeHashed exists in a grey zone, both morally and legally, due to their unvetted commercial offerings.

DeHashed profoundly denounced any association to similar entities (LeakedSource) on social media, seemingly to show that they believe their moral compass is well-guided (exhibit 1exhibit 2). Even if there is moral justification for DeHashed's methodology, their legal status remains unclear. DeHashed claims to be legally operating in the United States, providing them some credit; the previous takedown precedents occurred in Canada (LeakedSource) and the United Kingdom (WeLeakInfo); the fact is, legal trouble is plausible.

DeHashed makes a seemingly honest effort to market themselves to the cybersecurity industry, the law enforcement sector, and educational institutions; they also make an effort to not advertise on hacking forums. DeHashed can still be used as a hack tool for criminals, but they are also widely used by well-intended practitioners and organizations to mitigate risk. Ideally this credential access would not be available to the entire public, and limited to data owners and security practitioners protecting one or multiple organizations. 

DeHashed: homepage offering "enterprise security" with a granular search feature

Legacy proof 4: Threat intelligence industry.

Practitioners can possess an expectation for the vendor platforms to allow near-unrestricted monitoring and retroactive searching for their own organization, their organization's subsidiaries, or in the case of managed service providers, their client organizations. The competitive coverage of credential exposures is required for the success of practical use cases, including everything from security analysis, to offensive security, to even cyber due diligence reviews for mergers and acquisitions.

Many practitioners must still rely on grey zone services to protect organizations due to their vendors' lack of coverage, so an expectation for vendors to "do better" can be argued as reasonable. Specifically, there is a noticeable lack of coverage for "public dumps" on RaidForums, XSS Forum, and Exploit Forum. When a "public dump" is shared, it becomes an open, free, accessible commodity that must be within the scope of vendors, and should be collected within one-to-four weeks. Also, no reasonable practitioner should expect a commercial threat intelligence vendor to regularly collect "private dumps," to ethically prevent the trading and selling of datasets.

Data sharing forum: official datasets index

It can be proven that competitive credential exposure coverage is a stakeholder intelligence requirement. A variety of threat intelligence vendor companies provide threat intelligence platforms (TIP) with credential exposures as a commonly marketed data source. Many threat analysts and executives purchase threat intelligence vendor products on the merit that they can rely on them to meet their requirements, typically including competitive credential intelligence.

It can also be proven that threat intelligence vendors consider credential exposure coverage as an intelligence requirement. Threat intelligence vendors providing credential exposure monitoring and/or searching capabilities, based on public marketing material, could include Cyjax (email credential monitoring), Recorded Future (credential leakage monitoring), Intel 471 (credential intelligence), Cybersixgill (detection of leaked credentials), Flashpoint Intelligence (compromised credentials monitoring), Digital Shadows (data leakage detection), Advanced Intelligence (monitoring of compromised credentials), Flare Systems (detect data leaks), and more. Ideally they could would employ someone to fulfill this requirement.

But what about SpyCloud and ID Agent? The existing models of SpyCloud and ID Agent are not enough to accommodate the near-unrestricted ability to search any domain name at low-to-medium cost while providing competitive results. While SpyCloud is known for providing competitive credential intelligence, they are high-cost, particularly for managed service providers; and at no fault of their own, they are one of the first to commercially navigate this path. And while ID Agent is known for providing acceptable credential intelligence and collaborating with managed service providers, they are medium-cost, and their sourcing lacks some ideal transparency.

And what about Have I Been Pwned (HIBP)? Many of us know HIBP as the leading flagship of ethical breached data search services. HIBP protects people with email queries and email monitoring, protects organizations with validated domain monitoring, and provides a centralized means of validating the contents of popular breach exposures. Despite HIBP's commendable efforts to provide credential intelligence to the world at large (mostly for free!), their data collection excludes too many sources. HIBP provides a fair and honest service to the world at large, regardless.

Threat intelligence vendor customers should not have to access grey zone breached data search services to achieve competitive credential exposure coverage at an affordable cost; if vendor customers must access these grey zone services, then vendors are failing to meet stakeholder intelligence requirements. This research could support an empirical opinion that the threat intelligence industry is failing to provide credential monitoring and searching capabilities that are both affordable and competitive.

The cybersecurity industry used WeLeakInfo and similar services to provide affordable and competitive data as a commodity, to protect organizations.

Often when consumers compare breached data search services, total indexed account records are compared. Looking at this comparison, it appears that the services have comparable coverage.

Breached data benchmark: account records

Some services index recycled credential compilation lists containing billions of records worth of excessive overlap (i.e. AntiPublicPemiblancExploitInCollection #1db8151ddCit0day, et al.); it is acceptable to index such records, if a transparency condition is met, to provide the security practitioner contextual awareness.

In contrast, looking at indexed data sources, a different story is told. There are significant coverage discrepancies between indexed account records and indexed data sources. For a security practitioner, this means that a service without competitive credential exposure coverage will have many blind spots.

Breached data benchmark: data sources

Commercial threat intelligence vendors are excluded from these counts, because they are either not publicly available or are not trusted to provide an accurate count due to lack of community review.

  • On 2020-01-08, WeLeakInfo indexes 12,415,528,536 account records derived from 10368 exposures, prior to being seized.
  • On 2020-03-13, DeHashed indexes 13,355,474,100 account records derived 622 exposures.
    • Caveat: DeHashed's data wells may not be up-to-date, actual numbers may be slewed.
  • On 2021-03-18, Vigilante.pw indexes 8,817,389,451 account records derived from 6971 exposures.
    • Caveat: Vigilante.pw does not provide a breached data search service, but has examinable relationships with DeHashed and HIBP.
  • On 2021-03-18, Have I Been Pwned (HIBP) indexes 10,624,652,379 account records derived from 518 exposures.
    • Caveat: HIBP has an anomalous dependence on DeHashed's datasets and Vigilante.pw's insights; DeHashed is referenced as a data provider (55 of 518 exposures), Vigilante.pw is referenced for data validation (13 of 518 exposures).
  • On 2021-03-18, RaidForums "officially" indexes 9,721,685,049 account records derived from 490 exposures.
    • Caveat: thousands of unaccounted datasets are indexed "unofficially" on third-party hosts.
  • On 2017-01-12, LeakedSource indexes 3,109,103,084 account records derived from "1000s of databases", prior to being seized.
    • Caveat: LeakedSource is excluded from these counts because "1000s" is too vague. Take note that they had counted thousands of sources in 2017 and alternatives in 2021 only count hundreds.

Looking at the data, it becomes easy to understand why security practitioners turn to grey zone breached data search services to protect organizations. An argument can be made that the threat intelligence industry failed to provide a means of doing so that is both competitive and affordable.

The cybersecurity industry turned to WeLeakInfo when the threat intelligence industry failed them. This claim is alleged with leaked transaction records from 141 companies offering security services.

A portion of the WeLeakInfo customer base became victims of a security incident in 2021, providing us a partial view of their all-time transaction records, including names and emails that are attributable to the cybersecurity industry. The actor claiming responsibility for the incident alleges that the impact is limited to the Stripe payment processing tool, and that they could not access other transaction records (i.e. PayPal, Coinbase).

A scope is established based on a leaked file to build supporting evidence for our claims. This file includes 23,748 all-time transaction records, 22,048 email addresses, and 7,676 email addresses after converting to lowercase and deduplicating. In total, there are 7,676 customer email addresses, from which 863 unique domain names are derived; 141 of the 863 unique domains are associated with the cybersecurity industry in high confidence based on a thorough human analysis.

Regarding WeLeakInfo's revenue and profits, a total sum profit of at least £200,000.00 GBP (treated as approximate total revenue) is derived from a police statement by PSNI. Data points are also derived from leaked Stripe transaction records for building statistics, including partial total revenue, partial total profit, Amount (USD), Converted Amount (GBP), Customer Email, Card Brand (MasterCard, Visa, American Express), Card Funding (credit, debit, prepaid), and Card Issue Country (US, GB, et al.).

unified_payments.csv: id,Description,Seller Message,Created (UTC),Amount,Amount Refunded,Currency,Converted Amount,Converted Amount Refunded,Fee,Tax,Converted Currency,Mode,Status,Statement Descriptor,Customer ID,Customer Description,Customer Email,Captured,Card ID,Card Last4,Card Brand,Card Funding,Card Exp Month,Card Exp Year,Card Name,Card Address Line1,Card Address Line2,Card Address City,Card Address State,Card Address Country,Card Address Zip,Card Issue Country,Card Fingerprint,Card CVC Status,Card AVS Zip Status,Card AVS Line1 Status,Card Tokenization Method,Disputed Amount,Dispute Status,Dispute Reason,Dispute Date (UTC),Dispute Evidence Due (UTC),Invoice ID,Invoice Number,Payment Source Type,Destination,Transfer,Interchange Costs,Merchant Service Charge,Transfer Group,PaymentIntent ID,order_id (metadata),coupon (metadata)

WeLeakInfo's alleged exposure: a threat actor shared files containing Stripe transaction records

There is a shaky view of approximately 45.61% of WeLeakInfo's documented revenue, indicating that at least 4.43% of total revenue is paid by the cybersecurity industry, based on the evidence observed. WeLeakInfo received a Stripe revenue of at least £99,527.43 GBP (£91,237.89 GBP after activity fees), of which £8,874.92 GBP is paid by 141 companies offering cybersecurity services.

All unique customer domain names are assessed to identify companies involved with the cybersecurity industry, including cybersecurity, information technology, professional services, and threat intelligence. Even with anonymized data, insights can be derived, including consumer demand by industry (revenue sources) and consumer demand by nationality (customer countries), among other details. Based on the findings, statistics are built and visualized.

WeLeakInfo's Stripe transactions: geography of the cybersecurity industry purchasing services

WeLeakInfo's Stripe transactions: total revenue received from the cybersecurity industry

WeLeakInfo's Stripe transactions: payment method of the cybersecurity industry

All domain names are validated as representing entities who provide security services in some manner.

  • Total cybersecurity industry
    • WeLeakInfo global revenue: ~4.43% (at least)
    • WeLeakInfo Stripe revenue: ~8.92%
    • Customers: 141
    • Payments: 300
    • Paid (GBP): £8,874.92
    • Paid (USD): $11,573.80
    • Card brand: MasterCard (141), Visa (137), American Express (22)
    • Card type: credit (164), debit (120), prepaid (16)
    • Card countries: GB (80), US (79), AU (32), DE (14), CA (11), ES (10), NL (8), BR (8), IN (6), FR (6), CH (6), PL (5), PK (4), NO (4), IT (4), DK (4), LT (3), ZA (2), VN (2), HU (2), AR (2), SG (1), SE (1), MX (1), IL (1), CL (1), BG (1), BE (1), AT (1)
    • Includes "cybersecurity", "information technology and cybersecurity", "professional services and cybersecurity", and "threat intelligence and cybersecurity"
    • Excludes information technology without security services, financial services, law firms, digital advertising agencies, and more
  • Subtotal "general cybersecurity" entities
    • WeLeakInfo global revenue: ~3.34% (at least)
    • WeLeakInfo Stripe revenue: ~6.70%
    • Customers: 101
    • Payments: 215
    • Paid (GBP): £6,672.96
    • Paid (USD): $8,686.00
    • Card brand: Visa (106), MasterCard (99), American Express (10)
    • Card type: credit (111), debit (93), prepaid (11)
    • Card countries: US (52), GB (52), AU (28), CA (10), DE (9), NL (8), ES (8), BR (8), PL (5), IN (5), CH (5), IT (4), DK (4), ZA (2), VN (2), HU (2), FR (2), AR (2), SG (1), PK (1), MX (1), CL (1), BG (1), BE (1), AT (1)
  • Subtotal "information technology and cybersecurity" entities
    • WeLeakInfo global revenue: ~0.34% (at least)
    • WeLeakInfo Stripe revenue: ~0.68%
    • Customers: 25
    • Payments: 42
    • Paid (GBP): £676.71
    • Paid (USD): $873.75
    • Card brand: Visa (20), MasterCard (15), American Express (7)
    • Card type: credit (28), debit (10), prepaid (4)
    • Card countries: US (21), GB (6), PK (3), LT (3), FR (2), AU (2), SE (1), IN (1), DE (1), CH (1), CA (1)
  • Subtotal "professional services and cybersecurity" entities
    • WeLeakInfo global revenue: ~0.38% (at least)
    • WeLeakInfo Stripe revenue: ~0.77%
    • Customers: 8
    • Payments: 12
    • Paid (GBP): £762.58
    • Paid (USD): $1,007.00
    • Card brand: Visa (7), American Express (4), MasterCard (1)
    • Card type: credit (8), debit (4)
    • Card countries: US (6), NO (4), FR (2)
    • Includes conglomerate brands
  • Subtotal "threat intelligence and cybersecurity" entities
    • WeLeakInfo global revenue: ~0.38% (at least)
    • WeLeakInfo Stripe revenue: ~0.77%
    • Customers: 7
    • Payments: 31
    • Paid (GBP): £762.67
    • Paid (USD): $1,007.00
    • Card brand: MasterCard (26), Visa (4), American Express (1)
    • Card type: credit (17), debit (13), prepaid (1)
    • Card countries: GB (22), DE (4), ES (2), AU (2), IL (1)
    • Both "threat intelligence and cybersecurity" and "professional services and cybersecurity" coincidentally paid the same amount in USD

As the data shows, there is international demand for breached data as a commodity from at least the cybersecurity industry, even if it requires using grey zone services. It's also evident that there is a need for the near-unrestricted ability to search for domain names and email addresses. If threat intelligence vendors that have been "vetted" by law enforcement cannot be relied on, then the grey zone services will continue to take their place to realize intelligence requirements.

This begs a question: Buying Breached Data: When Is It Ethical?

The threat intelligence industry and international justice departments should accommodate the provisioning of strong credential intelligence to vetted security practitioners to help protect organizations.

For this to be permitted, legal systems would ideally allow breached data to be legally collected by authorized entities and commercially internationally distributed to well-intended and well-vetted entities (buyers and sellers alike) as a commodity within at least the cybersecurity industry. Access to breached data search services should be accommodated with an assurance that payments are not directly funding any criminal activity.

The cybersecurity industry should have a near-unrestricted ability to search any email address or domain name at a low-to-medium cost while providing competitive results, on a moral condition that each search benefits the state of security.

This problem is bigger than it seems.

In February 2020, weeks after the U.S. Department of Justice announced the seizure of WeLeakInfo, that same justice department became the first government entity to recognize the need for lawful underground collections in a publication called "Legal Considerations when Gathering Online Cyber Threat Intelligence and Purchasing Data from Illicit Sources", an intriguing coincidence. These legal breakthroughs ideally will keep momentum, preferably in countries of known consumers (GB, US, AU, DE, CA, ES, NL, et al.).

There is at least one plausible way for a legally-assured breached data search service to exist within the herein described criteria. A well-established threat intelligence vendor could step-in and provide a breached data search tool with near-unrestricted results at a low-to-medium cost as a civil service for vetted security practitioners. Established vendors have the staff, the technology, the lawyers, the money, and the vision to make this possible; small vendors will struggle to do the same.

These services cannot stay in the shadows forever. In an ideal world, vendors can provided this data in an affordable and competitive manner to its owner or those who are protecting people and companies.

Author's note: I changed some text to assure intended communication. To clarify, I do not support the operation of selling credential intelligence to "everyone and anyone"; the only people who should be authorized to purchase this information are well-vetted, well-intended data owners (i.e. organizations themselves) and security practitioners (i.e. consultants protecting multiple organizations). All opinions expressed are my own and do not reflect my employer(s).