I just finish to type an scrapper example. Now, it is very simple version, but it seems works fine and it include a good test coverage.

My idea is add:

  • Multithreading to downloading webs
  • Adds a better data parser, to avoid noise data
  • Include some page rank alg. to give more value some webs
  • Dockerize or add support for heroku:

Here the link: link