What’s Really New in the New gTLD Space?

Andy Simpson | Dec 22, 2014

As someone who has long studied trends in the domain name industry, the opening of hundreds of new gTLDs has intrigued me for quite some time on many levels.  One question I found myself pondering was: Will new gTLDs create “new” naming trends or redundant domains across many TLDs?  With more than 3 million domains delegated in the new TLD space there is now a corpus to study to answer this question.

The short answer is clear from these first two pie charts which illustrate the percentage of the second-level domains (SLDs) that were available in .COM as of 12/15/2014:

To answer the redundancy question, I looked at comparing SLDs in new gTLDS with SLDs in .COM. The results show that a significant majority (~84%) of the SLD strings being registered in the new gTLDs are also registered in .COM.  However, there are 521,834 new gTLD domain names (493,563 unique SLD strings, or 16% of all new gTLD SLDs) that are registered in new gTLDs but not in .COM.  

Next, I looked at whether the combination of the SLD and new gTLD string is available as a SLD in .COM –e.g. Andy.NewgTLD as AndyNewgTLD.COM.  As can be seen in the second pie chart, when the new gTLD is combined with the SLD, nearly 75% of the names registered in new gTLDs are available in .COM today.

Digging deeper, I proceeded to explore which new gTLDs are the home of new SLDs that don’t exist in .COM?

For starters, a few gTLDs seem to have a disproportionate number of these strings.  The bar chart shows the top new gTLDs that have “distinct” new gTLD names along with the percentage of their zone that is in effect now “distinct” from .COM:

One interesting takeaway is that the IDN TLDs all skew higher in terms of the portion of their base that is “distinct”.  Intuitively, it seems that these may be an area where broader internationalization in new gTLDs help the domains make sense (i.e., IDN.IDN).

A couple of other interesting facts:

  1. Using a domain tokenization algorithm I have written that identifies domains that are made up of exclusively English keywords, 153,316 (31%) of the SLDs registered in new gTLDs that are available in .COM are keyword exclusive domains.  A few example strings that were available in .COM at the time of writing include: pvcbusiness, emailinvention or searchcustomerservicejobs.
  2. I also observed that it is possible that end users and / or applications are confused using the new gTLDs and are trying to reach the .COM domains in error.  I observed this by looking at DNS requests for the new gTLD strings that are available in .COM.  When observing the DNS requests for the string in .COM, I found that for more than 20 thousand strings they began being requested as .COM strings only after the new gTLD string was registered. While this is an opportunity for applicants that may wish to acquire the corresponding SLDs within .COM, it also further illustrates the universal acceptance challenges that continue to exist with new gTLDs.

If you want to learn more about these domains that are innovating in the SLD arena, here is how to go about it:

  1. Obtain access to the new gTLD zone files from ICANN by using the Centralized Zone Data Service (CZDS) at http://czds.icann.org/
  2. Obtain access to the .COM zone file by following instructions available here: http://www.verisigninc.com/en_US/channel-resources/domain-registry-products/zone-file-information/index.xhtml.  Other established gTLDs also make their zone files public but those that began operation before the new gTLDs are not yet available in CZDS (except .museum which has migrated to the new system).
  3. Once you have secured access, you can download the zone files from the respective servers, per the terms of the zone file access agreements.  The files you will receive are essentially server configuration files that an authoritative name server references to determine how to respond when asked about a domain.  The files contain various DNS records (typically NS and A) for domains that can be transformed into a list of SLDs that should currently be active within the corresponding TLDs.
  4. Use your favorite programming language to combine, compare and analyze.  I typically use a hybrid of unix utilities and scripting languages like awk and perl.

Look for more analysis like this in the future from me and I hope to see what other interesting insights others are able to derive on their own.

Hi Andy, Before comparing with .com, did you remove from the gTLD zones all SLDs from the collision list ? Cheers JS
A very interesting study. Several of the results come as no surprise and have been described anecdotally – in particular, the availability of SLD and SLD+TLD strings in .COM that have been registered in nTLDs. But it's especially nice to have hard numbers to bash over the heads of some relentlessly propagandistic popup gophers. Have I understood corrctly that your tokenization algorithm was applied only to the pure SLD case and not to the SLD+TLD string? I'd be curious what percentage of the latter strings are composed entirely of dictionary terms, treating the TLD itself as though it were included in the dictionary. I expect that 31% english-keyword-exclusive number would significantly higher if proper nouns such as surnames and city names were included in the lexicon. As described, you would have been setting aside such items as VermontBridge.club and BronxFitness.club, wouldn't you? That depends entirely on whether "Vermont" or "Bronx" are found in your word list, which they wouldn't be if that list is taken strictly from a dictionary. One especially important geo keyword to include would be "NYC", owing to its use as a TLD. The phenomena behind those numbers are actually quite complex. Really these results are just a jumping off point; so hopefully we'll see some more research in the future.
Thanks for your comment. Yes, I did.
I'm trying to figure out what conclusions you're inviting us to come to, Andy. What's the answer to the question you posed in your opening para?
Kevin, my question is somewhat rhetorical at this stage. As many others have said, it is early to draw conclusions about the new gTLD program; however it is a good time to set the stage for analysis. Currently, a lot of the registrations resemble what already exists in other TLDs, so there is not a lot new. And as previously illustrated by others, the content on domains in the new gTLD space is quite thin. We will continue to analyze this to determine what is new and hopefully others will too.
Joseph, my exclusive keyword analysis is restricted to sld only. Given that the TLDs are keywords in themselves, if the sld is all keywords the TLD doesn't add much to the analysis as the counts would come out the same. Regarding our dictionary, it is learned from Web content and contains words, abbreviations and proper nouns. For English there are roughly 250k terms we use as keywords including the locations you have cited.