Pages:
Author

Topic: [MERGED] BIP-39 List of words in Portuguese accepted!! - page 3. (Read 1019 times)

legendary
Activity: 2352
Merit: 6089
bitcoindata.science
I really like your initiative, I was waiting until I got on PC to type this.

Make sure all the words in your list are words that people have heard of. Words not familiar to most people in your locale should be avoided. In some of the previous PRs for other wordlists, there were such words inside. Here's an example of this in the French wordlist.

Also I'd recommend limiting the maximum length of each word to 8, according to the below comment, it will save you time from having to revise your PR:

Hi. As I am interested in the creation of all word lists (to a reasonable extent), not only the German one, let me express my thoughts here as well. I am glad to see that there are contributors willing to work on word lists. However, what bothers me is that whenever a person (a group of people) shows up, take(s) care of just one list. I.e. to be exact, what bothers me is the fact that for each new list very similar problems needs to be tackled. For example requirements - for languages with Latin alphabet the maximum word lenght should be 8, due to the limitations of the displays of hardware wallets. Or requirements that first 4 letters should uniquely define a word? Not too mention about requirements like the one related to Levenshtein distance. Can't such requirements be shared across many languages? Especially that once developed tools (to ease work with Levenshtein distance) could be reused. That is why I launched a separate repository just for the creation of word lists: https://github.com/p2w34/wlips. I have launched it with a vision of tackling the creation of all word lists, not just one. Please do not get me wrong - I am not saying the work in this PR needs to be somehow stopped/abandoned/whatever - I am not the one to judge which approach is better. Let me also mention that I am the author of PR with Polish word list and I know how much time is needed to create such list from scratch. I just wanted to mention here there is also another approach possible. Thank you.

So apparently hardware wallets can only display up to 8 characters of a word. The rest won't be visible so there is a possibility for collision when using hardware wallets.

Levenshtien distance between two words is the number of characters you need to alter, add or remove to transform the first word to the second. Make sure the distance between all letters is not too low, there isn't a defined minimum but I would make it at least 2.

If everything goes well then judging by the opening and closing times of previous PRs, it should take about a month between opening the PR and getting it merged to the tree. Good luck!

Thank you so much for your input.

I will take a closer look on first 4 letter requirement (which is not 100% yet) and this Levenshtein_distance.

I will try to make or find a Levenshtein distance script in python to check our list.
legendary
Activity: 1568
Merit: 6660
bitcoincleanup.com / bitmixlist.org
I really like your initiative, I was waiting until I got on PC to type this.

Make sure all the words in your list are words that people have heard of. Words not familiar to most people in your locale should be avoided. In some of the previous PRs for other wordlists, there were such words inside. Here's an example of this in the French wordlist.

Also I'd recommend limiting the maximum length of each word to 8, according to the below comment, it will save you time from having to revise your PR:

Hi. As I am interested in the creation of all word lists (to a reasonable extent), not only the German one, let me express my thoughts here as well. I am glad to see that there are contributors willing to work on word lists. However, what bothers me is that whenever a person (a group of people) shows up, take(s) care of just one list. I.e. to be exact, what bothers me is the fact that for each new list very similar problems needs to be tackled. For example requirements - for languages with Latin alphabet the maximum word lenght should be 8, due to the limitations of the displays of hardware wallets. Or requirements that first 4 letters should uniquely define a word? Not too mention about requirements like the one related to Levenshtein distance. Can't such requirements be shared across many languages? Especially that once developed tools (to ease work with Levenshtein distance) could be reused. That is why I launched a separate repository just for the creation of word lists: https://github.com/p2w34/wlips. I have launched it with a vision of tackling the creation of all word lists, not just one. Please do not get me wrong - I am not saying the work in this PR needs to be somehow stopped/abandoned/whatever - I am not the one to judge which approach is better. Let me also mention that I am the author of PR with Polish word list and I know how much time is needed to create such list from scratch. I just wanted to mention here there is also another approach possible. Thank you.

So apparently hardware wallets can only display up to 8 characters of a word. The rest won't be visible so there is a possibility for collision when using hardware wallets.

Levenshtien distance between two words is the number of characters you need to alter, add or remove to transform the first word to the second. Make sure the distance between all letters is not too low, there isn't a defined minimum but I would make it at least 2.

If everything goes well then judging by the opening and closing times of previous PRs, it should take about a month between opening the PR and getting it merged to the tree. Good luck!
legendary
Activity: 2352
Merit: 6089
bitcoindata.science
Hello everyone

I am part of a group of 4 users (sabotag3x, alegotardo, Tryninja and me) in the Portuguese board who are creating a list of 2048 words in Portuguese to be submitted to https://github.com/bitcoin/bips/tree/master/bip-0039

Our bitcointalk topic for dicussion is:
[2020] Lista de Palavras em Português para o BIP-0039

We followed many rules to add the words, that can be seen here:
https://github.com/sabotag3x/bips/blob/master/bip-0039/bip-0039-wordlists.md
Quote
Words can be uniquely determined typing the first 4 characters.
No accents or special characters.
No complex verb forms.
No plural words, unless there's no singular form.
No words with double spelling.
No words with the exact sound of another word with different spelling.
No offensive words.
No words already used in other language mnemonic sets.
The words which have not the same spelling in Brazil and in Portugal are excluded.
No words that remind negative/sad/bad things.


Our work is nearly done (we have now a few more than 2048, which are going to be carefully excluded, but all those words follow the criteria above) and it is almost ready to make the pull request to the main branch.

I would like to know if is there any suggestion or any special procedure that we didn't make before making the pull request.

Our list can be seen here:
https://github.com/sabotag3x/bips/blob/master/bip-0039/portuguese.txt

I hope our small group will be able to get into bitcoin history.

Thanks everyone.
Pages:
Jump to: