Status: second working draft
The DNS architecture for OpenNIC into 2007 has been pretty sound, with the exception of the "single point of failure" at ns0 due to a policy of all TLDs, both OpenNIC and ICANN, being aggregated into a single distributed root zone on that host alone.
Important and useful elments of this structure are preserved in the following suggestion for moving forward..
- each TLD must sponsor one tier1 and preferably one tier2 DNS server
- each TLD's tier1 server is authoritative master for their TLD zone, and slave for the other TLDs and root.
- tier1 servers should provide appropriate responses to querys from recursing (tier2) nameservers
- i.e. they do not need to provide recursive answers to the general public.
- all tier1 servers must provide public authoritative response for all OpenNIC TLDs and the root
- for Bind configurations, tier1 hosts will have 'zone' declarations for each OpenNIC TLD and the root
- all tier1 servers must provide bi-directional zone transfer with all other tier1 servers
- DNSSEC required for transfer between tier1 hosts ?? ref. DNSSEC deliberation
- Policy for xfer to others than tier1 ?? ref. Xfer Deliberation
- all tier2 servers provide recursive response to anybody and everybody so that the public can use them for all internet access.
- for Bind configurations, tier2 hosts need only one 'zone' declaration as slave for the root zone '.' , with tier1 masters.
- Can/Should we provide/support a "hint" zone file ?? ref. HintDeliberation
- Ideally user ISPs would provide this service, but somebody has to, and more is better.
- Zone files must specify the authoritative master in the SOA record, and should provide NS records for all tier1 hosts
- Zone file SOA serials shall be in the form of yyyymmddnn where yyyy=year, mm=month(numeric), dd=date, n is in the range 0..9
- ns0 should not be an authoritative host for anything other than root, but may serve as tier1
A single ns0 (tier0) host could continue to aggregate all the ICANN and other zones for integration into the tier1 distribution; however, several tier1 hosts should have the ability to become tier0/ns0 in the event ns0 goes out of service, thereby removing the historic single point of failure.
The tricky part about a distributed root is that the root zone which is authoritative for '.' must contain ALL served TLDs, aggregating OpenNIC's zones with ICANN's and others; and discovery of which TLDs are being used/served.
CategoryArchitecture
CategoryHostmastering
The reason for a tier2 host is to provide public resolution for users whose ISPs are not providing it. It could be considered an optional requirement for a TLD, but somebody else would have to do it for them or they won't be visible to the public; and it seems to me to be reasonable responsibility for the zone ops. I have proposed this rquirement as "preferred".
To remove single point of failure, multiple tier1 hosts should be able to take over as ns0 if necessary. The aggreegate root should change only very seldom, as long as the sub-root servers are stable. Robin can hopefully share his experience with this.
Further down the chain, the tier2 servers all slave their copy from one of the tier1 servers (which can be a list of multiple IP's) so they have built-in redundancy in case a tier1 server goes down. And any end-users who are running their own personal DNS server will either slave their copy from the tier2 servers, or they can manually get it from the website, where we would have a list of mirrors pointing directly to the file on all the tier1 servers. Again, plenty of redundancy.
The other important role it can serve is to be another slave for each zone, a useful additional host which tier1s can list as master for the zones they slave. Yes, maybe they can use any or all other tier1 for this also.
All T1s should allow AXFRs to and from other T1s, with only getting the root
zone from NS20(T0). AXFRs should not be allowed to any other servers.
Avo mentioned he has a system to ban servers who _abuse_ his T1s. Avo can you
elaborate on this more. I.E How do you do it, how do you recognize a
mis-configured server..... Should this method be adopted globally for T1s? If
so, we should discuss a way of informing ppl that there server is not configured
correctly, by instead of having dont.ask.me.XXXXXX have a pointer to a wiki/glue
site saying what could be wrong and how to fix.