The project comprises of different parts of programming, one of this is my favourite – the autocorrect and autosuggest algorithm.
A quick summary of the algorithm:
1. Take a very large text of the language, you want to create the autocorrect for.
For example, you could take a “.txt” file, that contains a whole book of a popular Novel. Peter used a concatenation of publicly-available english books from the Gutenberg project. Since, I was building an auto-correct for german foods, I had to outsource – unfortunately, I couldn’t find a publicly available list of german ingredients and food, that also provides how frequently these recipes are used, which means I had to crawl websites, I crawled about 300 pages from delicious.de overnight – I had to do this about 3 -4 times. Thanks to a web crawler from Dexi.io, I used.
2. Seperate each words and create an array of words and permutation (how often the word occurs in the text).
3. Get the word from the user
4. Is it a correct word? Check how many changes do you have to make to get a correct word – do you have to had letters, remove letters or swipe two letters’ positions? e.g
a. teh is an incorrect word, if you try to make the from teh, that is a swipe of ‘h’ and ‘e’.
b. ric -> rich/rice is an addiction of letter ‘h’ and ‘e’ respectively
c. Computerd -> computer is a subtraction of letter ‘d’.
5. Get all possible correct words and show th e word with the highest frequency as first result. For example in every english text (considering a very large text of approximately 3000 words), the word “the” should appear more often than the word “they“, so if a user types “tehy“, your algorithm, should favour the suggestion of the word “the” before “they“.
Like I said, Peter explained it better. check out the explanation here.
Well, after getting random collations of words, ingredients, qualifiers and quantifiers from the spider, I had to seperate food names from ingredient names and also both of these from qualifiers like gram, kilogram. Essentially, the crawler delivered phrases like ” 100g rice , 1 kg of rice, 3 kilogrammes of beans, 3 cups of vegetable oil, 2 small tablespoons of salt, 2l Milch, 2 coconuts”. Did you notice the new problem? Exactly, there was no reasonable way for a program to easily differentiate qualifiers from the ingredients, sometimes there are spaces between qualifiers(2 small tablespoon), sometimes there are spaces between the ingredient (vegetable oil), sometimes the qualifiers are abbreviations (compare 1kg and 3 kilogrammes), sometimes words and qualifiers are not separated with “of”, sometimes the qualifiers are grammes,sometimes kilogrammes, so basically I had to write a ton of Regular Expressions to check a very large file. Well, I found out Sublime Text can not search through .txt files larger than 5 mb using Reg Ex and I had to create a php file that searches with reg ex . Thank you, Rasmus Lerdorf, for making PHP!!
At the end of the day after always being awake until 3 AM, I had been able to create seperate arrays of the ingredients, qualifiers, assigned qualifiers and their abbreviations – in german. I also learnt some cool words such as “Bund”- apparently bund is a german word, that fits almost all ingredients, it could mean one piece or an appropriate quantity of ingredient – doesn’t matter, weiter.
The next step was choosing the right qualifier for each ingredient, for example you can say 1 litre of water but you can’t say 1 litre of rice, even though you can say 1 kg of water (ice). You can also say 1 piece of Mango but not 1 piece of beans -very inappropriate. Also each ingredient may also have more than one qualifier, 1 can of cocacola, 1 litre of cocacola, 1 bottle of cocacola =>a many-to-many relationship – doesn’t matter, weiter.
Although, there are still a lot to do after that, I would end the journey of the auto-suggestion there.
The less difficult problems were storing the data and the user input locally. At first, I used the JS localstorage, but there was a big problem, I had to implement a search function. You can not efficiently search the localstorage, so I had to rebuild the whole storage logic again using indexDB with DexieJs, I used Dexie in one of my other top secret projects (RainbowCoil), I won’t talk about this here – I could get arrested, lmao, seriously.
I also had to create a custom search algorithm that considers the german umlauts, because german introduced 5 more letters that are not found in the english language!, this means the search should understand Wässer should also show Wasser as a search result, Äpfel and Apfel too. Mark Twain wrote a joke about the german language.
In conclusion, I wrote a whole ton of codes in the night, till I finally submitted the code. There are still a lot of things, I did apart from the storage and word suggestion, such as, an image for each product using a placeholder service, sorting of lists, addition of similar item – for example, 1kg and 500 grammes should be 1.5kg, but those two were the most exhausting, since they exposed my programming to a new part, I have never explored before.
You can test the code here at https://recipe.piccmaq.com.ng/ . By the way, Eddi Nez means EDeka INtelligentes Zettel – I had no better name. It is however only compatible with german words and it is in german.
Here is a Google translated copy of the email, I recieved after my submission was reviewed by the team.
our partner company EDEKA DIGITAL and we really appreciated your submission for the Code Competition – “INEZ The intelligent shopping list.”
You’ve done you a lot of thoughts in your solution. The selection of winners and evaluation of the solutions is not easy EDEKA DIGITAL.
Unfortunately it was not enough this time for a place on the podium 🙁
Basic functions: 7
Code Quality: 6
The evaluation and your feedback has been created by the Department of EDEKA DIGITAL:
– Database using your own crawler and manually completed data sets
– Instructions on how to launch the application was available
– data is stored locally in a IndexedDB
– Product and units are suggested after the user inputs a product
– Quantities are detected
– Add to shopping list
– Same types of products are recognized and added(case-sensitive)
-Quantity/Amounts of added articles can be changed later
– The code looks fine
– By using a framework or the MVC pattern, the application could be better structured and would be more scalable
Tip: Look at how we evaluated your solution: https://www.it-talents.de/blog/partnerunternehmen/so-wertet-edeka-digital-deinen-code-aus
We will shortly send a certificate for your participation.
Hopefully you had fun during the Code Competition, and it helped to improve your skills.
I very much hope to have you at the next Competitions back again, we shall, in any case, for exciting tasks. For example, the current Competition:
The results of the competition and the winners (and their solutions), will soon be introduced on our blog.
If you have any questions or feedback, you can contact me.
Greetings from Bielefeld
Your IT-Talents.de Team
Below is the original Readme.md, file I submitted along. I will probably release the codes on my github soon.