Thoughts on Crawling and Understanding the Darknet
Sourced through Scoop.it from: blog.lewman.is
Darknets have been around for a decade or so. Some of the most well-known are from the Tor network; Silk Road, Wikileaks, Silk Road 2, StrongBox, and so on. For good or bad, Silk Road is what helped bring darknets to the masses.
The current trend in information security is to try to build insight and intelligence into and from the underground or the darknet. Many companies are focused on the “darknet.” The idea is to learn about what’s below the surface, or near-future attacks or threats, before they affect the normal companies and people of the world. For example, an intelligence agency wants to learn about clandestine operations in its borders, or a financial company wants to learn about attacks on its services and customers before anyone else.
I’m defining the darknet as any services which requires special software to access the service, such as;
1. Tor’s hidden services,
3. FreeNet, and
There are many more services out there, but in effect they all require special software to access content or services in their own address space.
Most darknet systems are really overlay networks on top of TCP/IP, or UDP/IP. The goal is to create a different addressing system than simply using IP addresses (of either v4 or v6 flavor).XMPP could also be considered an overlay network, but not a darknet, for example. XMPP shouldn’t be considered a darknet because it relies heavily on public IPv4/IPv6 addressing to function. It’s also trivial to learn detailed metadata about conversations from either watching an XMPP stream, or XMPP server.
The vastness of address spaces
Let’s expand on address space. In the “clearnet” we have IP addresses of two flavors, IPv4 and IPv6. Most people are familiar with IPv4, the classic xxx.xxx.xxx.xxx address. IPv6 addresses are long in order to create a vast address space for the world to use, for say, the Internet of Things, or a few trillion devices all online at once. IPv6 is actually fun and fantastic, especially when paired with IPSec, but this is a topic for another post. IPv4 address space is 32-bit large, or roughly 4.3 billion addresses. IPv6 address space is 128-bits large, or trillions on trillions of addresses. There are some quirks to IPv4 which let us use more than 4.3 billion addresses, but the scale of the spaces is what we care about most. IPv6 is vastly larger. Overlay networks are built to create, or use, different properties of an address space. Rather than going to a global governing body and asking for a slice of the space to call your own, an overlay network can let you do that without a central authority, in general.
There are other definitions or nomenclature for darknets, such as the deep web:
noun 1. the portion of the Internet that is hidden from conventional search engines, as by encryption; the aggregate of unindexed websites: private databases and other unlinked content on the deep web.
Basically, the content you won’t find on Google, Bing, or Yahoo no matter how advanced your search prowess.
How big is the darknet?
No one knows how large is the darknet. By definition, it’s not easy to find services or content. However, there are a number of people working to figure out the scope, size, and to further classify content found on it. There are a few amateur sites trying to index various darknets; such as Ahmia, and others only reachable with darknet software. There are some researchers working on the topic as well, see Dr. Owen’s video presentation, Tor: Hidden Services and Deanonymisation. A public example is DARPA MEMEX. Their open catalog of tools is a fine starting point. […]”<