There’s a need in numerous projects of scraping different listings portals – usually real estate portals or cars portals i.e. sites like mobile.de. That includes discovering and downloading all the adverts (listings) from the website and exporting the data in required format (CSV, exposing a data download REST API or else).
The project has a developed stack that allows implementing new scraping projects with rather basic knowledge of Ruby stack, because a major part is abstracted inside a library that exposes a kind of simplistic DSL.
What will you do?
- Devise and implement a uniform approach to check if the scraped site has changed: e.g. scrape limited number of listings and check if all data is downloaded (100% of adverts have the title, more than 50% have information about transmission).
- Change architecture to distribute background processes of a single microservice on multiple servers.
- Review the performance of the database layer for the projects with over 10M records (the type of db to use, db server configuration, actual usage - i.e. SQL queries). Propose improvements if necessary.
- Ruby-people. The current stack is based on Ruby and at least some part of it will likely remain on it so either knowledge or interest to pick up a new language/technology ecosystem is required.
- The people that can start part-time but can be in it for a longer run, gradually increasing the engagement if they are interested.
- Independent. People who can work on their own without constant supervision, setting their own deadlines and sticking to them, deciding when to invest their own time into learning and so on.
- Curious and eager to improve the tech.
What we offer:
- Flexible environment
APPLY TO THIS JOB