I just finish to type an scrapper example. Now, it is very simple version, but it seems works fine and it include a good test coverage.
My idea is add:
- Multithreading to downloading webs
- Adds a better data parser, to avoid noise data
- Include some page rank alg. to give more value some webs
- Dockerize or add support for heroku:
Here the link: link