Selenium seems to be for us the perfect tool for web scraping in python, better than the mainstream python library of requests. The main reason is the leverage of web drivers, that makes most servers think that we are common Web Browser users. On the other hand, we experienced certain limitations using Requests. Despite the use of headers that hid our “robot condition”, we were easily detected by servers and banned.

Unfortunately, no tool is perfect and we have had problems with Selenium. The main one has been Google Chrome updates and version changes. During the last period, Google has been updating their web browser n a higher pace than Selenium web driver update speed. Therefore, the drivers become useless and we cannot scrap.

The solution

The solution that we propose is the opposite of what we were doing. We should not make Selenium’s version fit Chrome’s, but Chrome’s version fit Selenium’s.

The first step is to uninstall Google Chrome from our PC. Then we have to go to these two URLs (in another browser):

The first one is slimjet’s download page. Slimjet has Chrome’s old versions imitations, downloading one of their Chromes we can obtain a replica of elder versions. We have to make one of those versions fit one of the ChromeDrivers of the second URL.

Note that the ChromeDrivers have a version but they support other versions. The important thing is that the Google Chrome version is older than Selenium’s.

With the installation of the new Google Chrome, you will see that in chrome://settings/help the version of the web browser is older and now scraping is possible again with Selenium.

CC BY-NC-ND 4.0 Fixing Selenium Bugs por Miguel Cózar está licenciado bajo una Licencia Creative Commons Atribución-NoComercial-SinDerivar 4.0 Internacional.

Deja un comentario

Tu dirección de correo electrónico no será publicada. Los campos obligatorios están marcados con *