From ripedenis at yahoo.co.uk Wed Aug 5 14:46:09 2020 From: ripedenis at yahoo.co.uk (ripedenis at yahoo.co.uk) Date: Wed, 5 Aug 2020 12:46:09 +0000 (UTC) Subject: [db-wg] To internationalise or not, that is the question? In-Reply-To: References: <819264879.8918301.1595498428162.ref@mail.yahoo.com> <819264879.8918301.1595498428162@mail.yahoo.com> <8D9C970E-DF8E-466F-A6CC-F57F21AA9664@ripe.net> <1371298892.13522800.1596044640743@mail.yahoo.com> <1366458769.13473293.1596062825008@mail.yahoo.com> Message-ID: <408789813.410305.1596631569362@mail.yahoo.com> Colleagues We have a problem with UTF-8. Many people keep saying you want it, we should have it, lets do it...But every time we get to these difficult, non technical questions every one goes silent. This is why we have never implemented UTF-8 since it was first mentioned many years ago. No one in the community seems to know how to answer these questions. So I have a suggestion. The RIPE NCC has the manpower with the expertise to investigate these issues. I propose we put a task on the RIPE NCC to do a thorough investigation of UTF-8 in the RIPE Database from all possible angles and report back to the community. This can be a starting point to a more meaningful discussion. We need to know what impact having non Latin1 characters in different parts of the data set will have on the RIPE Registry, the RIPE NCC members, the different user groups of the RIPE Database and the social, legal and political impact of such a change. Which parts of the data set can/should/shouldn't be allowed to be in other character sets. Who really needs access to this data and what parts of it need to be understandable or interpreted. Which does bring into question the whole purpose of the RIPE Database and the data contained therein. Thoughts??? cheersdenis co-chair DB-WG On Friday, 31 July 2020, 20:20:10 CEST, Leo Vegoda wrote: Hi Denis, These are good questions. As so many of the answers lie with the RIPE NCC or the NRO, I suppose we need input from them to proceed further. Kind regards, Leo On Wed, Jul 29, 2020 at 3:47 PM ripedenis at yahoo.co.uk wrote: > > Hi Leo > > Some of the questions that need to be answered include: > > -who needs to be able to read/understand/interpret which parts of the data in the RIPE Database (maybe both the community and the NCC need input to answer this)? > -is any of the data contained in the RIPE Database essential for the operation of the registry and not duplicated anywhere else (maybe the NCC and the NCC Services WG need input to answer this)? > -is any of the data important to LEAs and governments, is that a consideration, do they have the resources to understand the data in any format (community and LEAs input needed for this one)? > -One of the mission statements of the NRO is "Providing and promoting a coordinated Internet number registry system" so if we are going to internationalise the public face of the registry should it be coordinated(is that a community, RIR or NRO question)? > > cheers > denis > > co-chair DB-WG > > On Wednesday, 29 July 2020, 21:09:55 CEST, Leo Vegoda wrote: > > > Hi Denis, > > I agree that this is a registry issue and not just a database issue, > which is why I sent the message I did on 8 July. > > I'd like to understand how much of this work should be led by the RIPE > NCC versus the community. Also, because of the breadth of the issues, > should the discussion be here or on another list? > > Kind regards, > > Leo Vegoda > > On Wed, Jul 29, 2020 at 10:45 AM ripedenis at yahoo.co.uk > wrote: > > > > Hi Leo > > > > As I have said many times, internationalising the RIPE Database is not a technical issue, it is a registry issue. I think it does need a separate process from the database requirements. Especially if we consider it as a cross registry issue. > > > > Incidentally I did suggest on this mailing list several months ago that the requirements task force considers the issue of UTF-8. No one from the task force has yet replied to me on that or any other comment I have made about the requirements. > > > > cheers > > denis > > > > co-chair DB-WG > > > > On Wednesday, 29 July 2020, 18:20:14 CEST, Leo Vegoda wrote: > > > > > > Hi, > > > > Thanks for providing the impact analysis for this initial change. > > > > What should the process be for introducing greater support for > > internationalization in the RIPE Database? George, Cynthia and others > > have made good points about the need to improve internationalization > > of more than just e-mail addresses. Is that support something that > > should be handled through the process that follows the final report of > > the Database TF or does it need to be addressed separately? > > > > Thanks, > > > > Leo > > > > On Wed, Jul 29, 2020 at 8:03 AM Edward Shryane via db-wg wrote: > > > > > > Dear Colleagues, > > > > > > Here is the impact analysis for the NWI-11 implementation. > > > > > > The Database team plans to implement NWI-11 as per the Solution Definition: > > > https://www.ripe.net/ripe/mail/archives/db-wg/2020-June/006525.html > > > > > > (1) Impact to Whois Update > > > > > > The implementation will automatically apply Punycode encoding (as per RFC 5891) to the domain part of an email address during Whois update. > > > > > > The encoding is only applied to an IDN domain name, and changes the current behaviour as follows: > > > - ASCII encoded values will not be affected (as before). > > > - Non-ASCII but latin-1 encoded values will be encoded as Punycode. > > > - Non-latin-1 encoded values (e.g. UTF-8) will also be encoded as Punycode. These values previously were transformed to latin-1, with a '?' substitution. > > > > > > The local part of an email address must only contain ASCII characters. If non-ASCII characters are found in the local part, the address is rejected as invalid. > > > > > > This change will only affect attributes with an email address syntax (i.e. abuse-mailbox, e-mail, irt-nfy, mnt-nfy, notify, ref-nfy, upd-to). > > > > > > If an email address is converted to Punycode, a warning will be included in the update response. > > > > > > Any Punycode conversion failure will result in the attribute value being rejected as invalid. A workaround in this case is to encode the value before submitting the update. > > > > > > (2) Impact to Whois Query > > > > > > When querying the RIPE database, any Punycode encoded email address is returned as-is (i.e it is not decoded). > > > > > > (3) Impact to Existing Data > > > > > > We will perform a cleanup to convert any existing non-ASCII (but latin-1 encoded) IDN domain names to Punycode in attributes with an email address syntax. This affects very few objects. The maintainer(s) will be notified by email beforehand. > > > > > > (4) Impact to Whois Documentation > > > > > > We will update the database documentation with details of this behaviour change. > > > > > > (5) Release Timeline > > > > > > We expect the NWI-11 implementation to take about 1 month (including code changes and testing), and will include the feature in the Whois 1.98 release. > > > > > > As usual, we will deploy the release to the Release Candidate environment for 2 weeks before production, to allow for testing. > > > > > > Regards > > > Ed Shryane > > > RIPE NCC > > > > > > > > > > > > On 23 Jul 2020, at 12:00, ripedenis at yahoo.co.uk wrote: > > > > > > Hi Ed > > > > > > The chairs see there is a consensus to move forward with implementing Punycode. Can you present an impact analysis explaining what changes you propose, what effect those changes will have on updates and queries (by all the different methods), if anyone needs to modify their software interacting with the database. > > > > > > cheers > > > denis > > > > > > co-chair DB-WG > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ch at ntrv.dk Wed Aug 5 15:29:26 2020 From: ch at ntrv.dk (Chriztoffer Hansen) Date: Wed, 5 Aug 2020 15:29:26 +0200 Subject: [db-wg] To internationalise or not, that is the question? In-Reply-To: <408789813.410305.1596631569362@mail.yahoo.com> References: <819264879.8918301.1595498428162.ref@mail.yahoo.com> <819264879.8918301.1595498428162@mail.yahoo.com> <8D9C970E-DF8E-466F-A6CC-F57F21AA9664@ripe.net> <1371298892.13522800.1596044640743@mail.yahoo.com> <1366458769.13473293.1596062825008@mail.yahoo.com> <408789813.410305.1596631569362@mail.yahoo.com> Message-ID: On Wed, 5 Aug 2020 at 14:46, ripedenis--- via db-wg wrote: > We have a problem with UTF-8. Many people keep saying you want it, we should have it, let's do it...But every time we get to these difficult, non-technical questions every one goes silent. This is why we have never implemented UTF-8 since it was first mentioned many years ago. No one in the community seems to know how to answer these questions. > > So I have a suggestion. The RIPE NCC has the manpower with the expertise to investigate these issues. I propose we put a task on the RIPE NCC to do a thorough investigation of UTF-8 in the RIPE Database from all possible angles and report back to the community. This can be a starting point to a more meaningful discussion. > > We need to know what impact having non Latin1 characters in different parts of the data set will have on the RIPE Registry, the RIPE NCC members, the different user groups of the RIPE Database and the social, legal and political impact of such a change. Which parts of the data set can/should/shouldn't be allowed to be in other character sets. Who really needs access to this data and what parts of it need to be understandable or interpreted. Which does bring into question the whole purpose of the RIPE Database and the data contained therein. > > Thoughts??? +1 from me on this. My primary reason for the +1 is being able to correctly spell the name of $legal-entity (e.g. contact/representative person, company name). Similar to how if you have your legal name changed. Your passport will also need to be re-issued. I do not know how often this has been an issue in the past. But I consider it high time to do something, instead of, as you refer to denis, continue to "drag our feet". (the world is larger than [0-9a-zA-Z]. The RIPE community does not walk "in too-small shoes") -- Best regards, CHRIZTOFFER From eshryane at ripe.net Thu Aug 6 10:04:22 2020 From: eshryane at ripe.net (Edward Shryane) Date: Thu, 6 Aug 2020 10:04:22 +0200 Subject: [db-wg] To internationalise or not, that is the question? In-Reply-To: <408789813.410305.1596631569362@mail.yahoo.com> References: <819264879.8918301.1595498428162.ref@mail.yahoo.com> <819264879.8918301.1595498428162@mail.yahoo.com> <8D9C970E-DF8E-466F-A6CC-F57F21AA9664@ripe.net> <1371298892.13522800.1596044640743@mail.yahoo.com> <1366458769.13473293.1596062825008@mail.yahoo.com> <408789813.410305.1596631569362@mail.yahoo.com> Message-ID: Hi Denis, Colleagues, As you requested, the Database team will prepare a thorough investigation (impact analysis) of UTF-8 in the RIPE Database, as a starting point for further discussion. Regards Ed Shryane RIPE NCC > On 5 Aug 2020, at 14:46, ripedenis at yahoo.co.uk wrote: > > Colleagues > > We have a problem with UTF-8. Many people keep saying you want it, we should have it, lets do it...But every time we get to these difficult, non technical questions every one goes silent. This is why we have never implemented UTF-8 since it was first mentioned many years ago. No one in the community seems to know how to answer these questions. > > So I have a suggestion. The RIPE NCC has the manpower with the expertise to investigate these issues. I propose we put a task on the RIPE NCC to do a thorough investigation of UTF-8 in the RIPE Database from all possible angles and report back to the community. This can be a starting point to a more meaningful discussion. > > We need to know what impact having non Latin1 characters in different parts of the data set will have on the RIPE Registry, the RIPE NCC members, the different user groups of the RIPE Database and the social, legal and political impact of such a change. Which parts of the data set can/should/shouldn't be allowed to be in other character sets. Who really needs access to this data and what parts of it need to be understandable or interpreted. Which does bring into question the whole purpose of the RIPE Database and the data contained therein. > > Thoughts??? > > cheers > denis > > co-chair DB-WG > > > > On Friday, 31 July 2020, 20:20:10 CEST, Leo Vegoda wrote: > > > Hi Denis, > > These are good questions. As so many of the answers lie with the RIPE > NCC or the NRO, I suppose we need input from them to proceed further. > > Kind regards, > > Leo > > On Wed, Jul 29, 2020 at 3:47 PM ripedenis at yahoo.co.uk > > wrote: > > > > Hi Leo > > > > Some of the questions that need to be answered include: > > > > -who needs to be able to read/understand/interpret which parts of the data in the RIPE Database (maybe both the community and the NCC need input to answer this)? > > -is any of the data contained in the RIPE Database essential for the operation of the registry and not duplicated anywhere else (maybe the NCC and the NCC Services WG need input to answer this)? > > -is any of the data important to LEAs and governments, is that a consideration, do they have the resources to understand the data in any format (community and LEAs input needed for this one)? > > -One of the mission statements of the NRO is "Providing and promoting a coordinated Internet number registry system" so if we are going to internationalise the public face of the registry should it be coordinated(is that a community, RIR or NRO question)? > > > > cheers > > denis > > > > co-chair DB-WG > > > > On Wednesday, 29 July 2020, 21:09:55 CEST, Leo Vegoda > wrote: > > > > > > Hi Denis, > > > > I agree that this is a registry issue and not just a database issue, > > which is why I sent the message I did on 8 July. > > > > I'd like to understand how much of this work should be led by the RIPE > > NCC versus the community. Also, because of the breadth of the issues, > > should the discussion be here or on another list? > > > > Kind regards, > > > > Leo Vegoda > > > > On Wed, Jul 29, 2020 at 10:45 AM ripedenis at yahoo.co.uk > > > wrote: > > > > > > Hi Leo > > > > > > As I have said many times, internationalising the RIPE Database is not a technical issue, it is a registry issue. I think it does need a separate process from the database requirements. Especially if we consider it as a cross registry issue. > > > > > > Incidentally I did suggest on this mailing list several months ago that the requirements task force considers the issue of UTF-8. No one from the task force has yet replied to me on that or any other comment I have made about the requirements. > > > > > > cheers > > > denis > > > > > > co-chair DB-WG > > > > > > On Wednesday, 29 July 2020, 18:20:14 CEST, Leo Vegoda > wrote: > > > > > > > > > Hi, > > > > > > Thanks for providing the impact analysis for this initial change. > > > > > > What should the process be for introducing greater support for > > > internationalization in the RIPE Database? George, Cynthia and others > > > have made good points about the need to improve internationalization > > > of more than just e-mail addresses. Is that support something that > > > should be handled through the process that follows the final report of > > > the Database TF or does it need to be addressed separately? > > > > > > Thanks, > > > > > > Leo > > > > > > On Wed, Jul 29, 2020 at 8:03 AM Edward Shryane via db-wg > wrote: > > > > > > > > Dear Colleagues, > > > > > > > > Here is the impact analysis for the NWI-11 implementation. > > > > > > > > The Database team plans to implement NWI-11 as per the Solution Definition: > > > > https://www.ripe.net/ripe/mail/archives/db-wg/2020-June/006525.html > > > > > > > > (1) Impact to Whois Update > > > > > > > > The implementation will automatically apply Punycode encoding (as per RFC 5891) to the domain part of an email address during Whois update. > > > > > > > > The encoding is only applied to an IDN domain name, and changes the current behaviour as follows: > > > > - ASCII encoded values will not be affected (as before). > > > > - Non-ASCII but latin-1 encoded values will be encoded as Punycode. > > > > - Non-latin-1 encoded values (e.g. UTF-8) will also be encoded as Punycode. These values previously were transformed to latin-1, with a '?' substitution. > > > > > > > > The local part of an email address must only contain ASCII characters. If non-ASCII characters are found in the local part, the address is rejected as invalid. > > > > > > > > This change will only affect attributes with an email address syntax (i.e. abuse-mailbox, e-mail, irt-nfy, mnt-nfy, notify, ref-nfy, upd-to). > > > > > > > > If an email address is converted to Punycode, a warning will be included in the update response. > > > > > > > > Any Punycode conversion failure will result in the attribute value being rejected as invalid. A workaround in this case is to encode the value before submitting the update. > > > > > > > > (2) Impact to Whois Query > > > > > > > > When querying the RIPE database, any Punycode encoded email address is returned as-is (i.e it is not decoded). > > > > > > > > (3) Impact to Existing Data > > > > > > > > We will perform a cleanup to convert any existing non-ASCII (but latin-1 encoded) IDN domain names to Punycode in attributes with an email address syntax. This affects very few objects. The maintainer(s) will be notified by email beforehand. > > > > > > > > (4) Impact to Whois Documentation > > > > > > > > We will update the database documentation with details of this behaviour change. > > > > > > > > (5) Release Timeline > > > > > > > > We expect the NWI-11 implementation to take about 1 month (including code changes and testing), and will include the feature in the Whois 1.98 release. > > > > > > > > As usual, we will deploy the release to the Release Candidate environment for 2 weeks before production, to allow for testing. > > > > > > > > Regards > > > > Ed Shryane > > > > RIPE NCC > > > > > > > > > > > > > > > > On 23 Jul 2020, at 12:00, ripedenis at yahoo.co.uk wrote: > > > > > > > > Hi Ed > > > > > > > > The chairs see there is a consensus to move forward with implementing Punycode. Can you present an impact analysis explaining what changes you propose, what effect those changes will have on updates and queries (by all the different methods), if anyone needs to modify their software interacting with the database. > > > > > > > > cheers > > > > denis > > > > > > > > co-chair DB-WG > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From horvath.agoston at gmail.com Sun Aug 16 20:57:29 2020 From: horvath.agoston at gmail.com (=?UTF-8?B?SG9ydsOhdGggw4Fnb3N0b24gSsOhbm9z?=) Date: Sun, 16 Aug 2020 20:57:29 +0200 Subject: [db-wg] RPSL parser nits In-Reply-To: <8cd9a9fa-bab3-db52-7df8-ea61e6e4a3b5@foobar.org> References: <483d83cc1d234b198444ba005b15ea4d@iks-service.de> <8cd9a9fa-bab3-db52-7df8-ea61e6e4a3b5@foobar.org> Message-ID: Hi Nick et al, The ripe whois database (https://github.com/RIPE-NCC/whois) has an excellent (=clean, performant and heavily tested) RPSL parser in it for java. It's minimal work to copy the package, or, since this is BSD licence, you could release a copy of it for others to use too. Cheers, Agoston On Thu, Jul 2, 2020 at 5:12 PM Nick Hilliard via db-wg wrote: > Lutz Donnerhacke via db-wg wrote on 02/07/2020 15:47: > > I try to be a bit more expressive in the aut-num of ASN199284, but > > fail to get accepted at least the valid parts by the RPSL parser. > rpsl(ng) hasn't seen any significant development work since the 1990s, > and the only real update since then was to support ipv6 (rfc4012 in > 2005). The only functional parser out there, irrtoolset, is crippled > with functional shortcomings and is basically abandonware. > > You may want to think twice about whether it's worth investing time and > effort in rpsl in 2020. > > Nick > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From nick at foobar.org Mon Aug 17 00:58:33 2020 From: nick at foobar.org (Nick Hilliard) Date: Sun, 16 Aug 2020 23:58:33 +0100 Subject: [db-wg] RPSL parser nits In-Reply-To: References: <483d83cc1d234b198444ba005b15ea4d@iks-service.de> <8cd9a9fa-bab3-db52-7df8-ea61e6e4a3b5@foobar.org> Message-ID: Horv?th ?goston J?nos wrote on 16/08/2020 19:57:> The ripe whois database (https://github.com/RIPE-NCC/whois) has an > excellent (=clean, performant and heavily tested) RPSL parser in it for > java. It's minimal work to copy the package, or, since this is BSD > licence, you could release a copy of it for others to use too. yip but do I understand it correctly that this is a syntactic parser only? I.e. there is no semantic expression evaluator. If this is the case, you'd need a lot of glue to be able to convert rpsl expressions into structures suitable for injection into templating engines. This is the hard bit in irrtoolset that wrecked everyones' heads. Nick From horvath.agoston at gmail.com Tue Aug 18 18:54:19 2020 From: horvath.agoston at gmail.com (=?UTF-8?B?SG9ydsOhdGggw4Fnb3N0b24gSsOhbm9z?=) Date: Tue, 18 Aug 2020 18:54:19 +0200 Subject: [db-wg] RPSL parser nits In-Reply-To: References: <483d83cc1d234b198444ba005b15ea4d@iks-service.de> <8cd9a9fa-bab3-db52-7df8-ea61e6e4a3b5@foobar.org> Message-ID: Correct! I just drawing attention that while implementing the whois server, we've also checked some rpsl parsers, and most of them failed the sanity checks, even for parsing. As for interpreting the mp-export/import lines, those have a grammar that is also included in the rpsl parser in whois server, although only used for evaluating syntactic correctness. I think they also could form the basis of a more complex software. By the way, at this point a generic solution becomes impossible as everyone wants to use the resulting expression tree differently, so it's always a custom work from the grammar upwards. Cheers, Agoston On Mon, Aug 17, 2020 at 12:58 AM Nick Hilliard wrote: > Horv?th ?goston J?nos wrote on 16/08/2020 19:57:> The ripe whois > database (https://github.com/RIPE-NCC/whois) has an > > excellent (=clean, performant and heavily tested) RPSL parser in it for > > java. It's minimal work to copy the package, or, since this is BSD > > licence, you could release a copy of it for others to use too. > > yip but do I understand it correctly that this is a syntactic parser > only? I.e. there is no semantic expression evaluator. If this is the > case, you'd need a lot of glue to be able to convert rpsl expressions > into structures suitable for injection into templating engines. This is > the hard bit in irrtoolset that wrecked everyones' heads. > > Nick > -------------- next part -------------- An HTML attachment was scrubbed... URL: From L.Donnerhacke at iks-service.de Fri Aug 28 11:11:42 2020 From: L.Donnerhacke at iks-service.de (Lutz Donnerhacke) Date: Fri, 28 Aug 2020 09:11:42 +0000 Subject: [db-wg] RPSL requirements in aut-num objects In-Reply-To: <20200827094548.GB88356@bench.sobornost.net> References: <483d83cc1d234b198444ba005b15ea4d@iks-service.de> <8cd9a9fa-bab3-db52-7df8-ea61e6e4a3b5@foobar.org> <1275626803.2850876.1593704821756@mail.yahoo.com> <7f71d16a-2a66-938a-aaea-a7811996eac0@foobar.org> <91d264e09a8b46d49a69048f85ec80da@iks-service.de> <20200827094548.GB88356@bench.sobornost.net> Message-ID: > On Thu, Jul 02, 2020 at 05:58:19PM +0000, Lutz Donnerhacke via db-wg > wrote: > > I'd suggest to remove the crippled parser from the records in the > > first step. > > Even if someone tries to use the data-set, it's not even possible to > > bring in correct data. > > How do you expect an adoption under this condition? > > > > So, my primary question: Where is the source code and how to > > contribute? > > Secondary question: Can we first declare the fields as free from text > > fields, before removing them? > > What is the benefit of declaring such fields 'free form'? RFC 2622 requires to accept attributes which are not defined in the dictionary during paring. Of course, then the parser has no ability to validate the arguments or the allowed operators. It should issue a warning or ignore the attribute. Unfortunately the current parser in the RIPE DB has a different understanding of RPSL: it does not even allow attributes defined in the RFC (i.e. next-hop, protocols). So it's not possible to fill the aut-num with valid RPSL, or even adding new attributes. As long as the parser is broken, the field might be not validated (or issue a warning instead of an error). And the parser should be fixed. That's why I'm asking for the location of the source code.