6 Pro Tips for Optimizing Web Automation Using Puppeteer

Using Puppeteer to automate web browser-related tasks is a great way to speed up your work as a developer. In this article, we have listed six tips that will help you optimize them.
by Josephine Loo ·

Contents

    Web automation helps you to navigate the web without doing it manually. This includes performing tasks such as filling out forms, clicking buttons, navigating pages, scraping data from websites, and testing web applications. By automating repetitive web browser-related tasks, you can allocate more time and effort in building important features.

    One of the most popular libraries for web automation in the JavaScript ecosystem is Puppeteer. It provides a high-level API to control Chrome/Chromium over the DevTools Protocol.

    Puppeteer is a powerful tool but it can take some time to master, especially if you're new to it. That's why in this article, we'll go over six tips to make your web automation experience with Puppeteer better.

    Tip 1: Try Different ‘wait-’ Options to Wait for Complete Page Load

    Puppeteer needs to know when to perform the next action when it navigates to a URL. For example, Puppeteer must wait until the page is loaded completely before taking a screenshot of a page.

    When navigating to a URL, you need to specify a Puppeteer life cycle event for the WaitForOptions.waitUntil property. This applies to the Page.goto() and Page.waitForNavigation() methods.

    The script will only continue to run when one of these events is fired:

    • load (default) - consider navigation to be finished when the load event is fired.
    • networkidle0 - consider navigation to be finished when there are no more than 0 network connections for at least 500ms
    • networkidle2 - consider navigation to be finished when there are no more than 2 network connections for at least 500ms.
    • domcontentloaded - consider navigation to be finished when the DOMContentLoaded event is fired.

      await page.goto('https://www.google.com/', { waitUntil: 'networkidle0' }) await page.waitForNavigation({ waitUntil: 'networkidle0' })

    However, these events do not always indicate that the page has been loaded completely. For example, there could be a few JavaScript scripts that are still running in the background when one of these events is triggered, and could modify the content of the page.

    If you know that a particular HTML element will only exist after the script has done running, use waitForSelector with CSS selectors to wait for it.

    await page.waitForSelector('.js-result')
    

    To ensure that the element has been loaded completely, set visible to true.

    await page.waitForSelector('.js-result', {visible: true})
    

    All websites are built differently. If Puppeteer executes the next line of code earlier than expected, try switching different life cycle events. Alternatively, use waitForSelector to wait for an HTML element.

    🐻 Bear Tips: There are also other things you can wait for like frame, function, request, etc. You can find the details in the Puppeteer API documentation.

    Tip 2: Use Session Cookies to Skip the Login Page

    If you need to scrape or crawl data that requires authentication, skipping the login page can save your time.

    Instead of logging in using Puppeteer, log in manually on your Chrome browser. Then, export the session cookies to a JSON file using the cookies export extension and use it in your code.

    As Puppeteer has the Page.setCookie() method, you can use the same logged-in session. This will help you navigate to the password-protected URL directly as long as the session is valid.

    const cookiesString = fs.readFileSync('cookies.json');
    const parsedCookies = JSON.parse(cookiesString);
    
    if (parsedCookies.length !== 0) {
        for (let cookie of parsedCookies) {
        	await page.setCookie(cookie);
        }
    }
    
    await page.goto("password-protected-url", { waitUntil: 'networkidle0' });
    

    You can also do this when you need to run the script multiple times or run different scripts on the website.

    If you prefer to use Puppeteer to log in, you can retrieve the browser cookies using Page.cookies(). Save them as a JSON file in your specified directory and use them for subsequent script runs.

    const cookiesObject = await page.cookies();
    
    fs.writeFile('cookies.json', JSON.stringify(cookiesObject), function (err) {
      if (err) {
        console.log('The session could not be saved.', err);
      }
      console.log('The session has been saved successfully.');
    });
    

    Tip 3: Use ‘userDataDir’ to Reutilize the Same Browser Instance

    This tip is also useful for the previous scenario where we want to skip the login page for a password-protected URL. Chromium’s user data directory contains profile data such as history, bookmarks, cookies, as well as other per-installation local states.

    Launching Puppeteer with the userDataDir property will save these data and use the same browser instance every time.

    puppeteer.launch({
      userDataDir: "./user_data"
    });
    

    As the session cookies are saved in the user data directory and Puppeteer uses the same browser instance, you can use this to skip the login page too. Login for the first time, and the session will be saved for subsequent script runs.

    However, the profile data will take up some space.

    puppeteer user data directory folder.png

    If other profile data are not utilized, storing only the browser cookies will save more space. That said, it only applies to persisting a login session. Other browser profile data could still be useful for other cases.

    Tip 4: Print the Browser’s Console Messages to Node.js to Debug Easier

    This tip is for doing automated testing. The console messages of a website's client code will only show up in the browser’s inspector and not logged directly to Node.js.

    browser console error.png

    You can open the browser inspector automatically when running Puppeteer by setting devtools to true.

    const browser = await puppeteer.launch({devtools: true});
    

    However, this will open it in the “Elements” tab and there is no way to open it in the “Console” tab automatically.

    To view the console messages in real-time while Puppeteer is running, listen to the browser’s console event. It will return a payload with the logged text.

    page.on('console', (message) => console.log(`${message.type().substr(0, 3).toUpperCase()} ${message.text()}`))
    

    This will show the console messages in your Node.js log immediately and make debugging easier.

    debug log.png

    🐻 Bear Tips: You can also use the debugger in the browser to stop it in the location where the word “debugger” is found in debug mode.

    Tip 5: Delete the Puppeteer Folder Before Switching the Browser for Cross-Browser Testing

    Cross-browser testing is an important practice of testing to make sure that the code works across various browsers. You can use Firefox for Puppeteer by specifying it in the product property.

    const browser = await puppeteer.launch({
    	product: 'firefox'
      });
    

    However, when you install Puppeteer, only Chromium is installed. Even though you have specified firefox for the product property, it will launch Chromium.

    Therefore, you need to delete the /node_modules/puppeteer folder and reinstall puppeteer with the product specified as Firefox.

    PUPPETEER_PRODUCT=firefox npm i
    

    This will install Firefox in the /node_modules/puppeteer folder.

    Untitled

    🐻 Read the full tutorial: How to Run Puppeteer with Firefox Instead of Chrome

    Tip 6: Disable CSS, Images, and Other Unneeded Resources to Speed up Puppeteer

    The difference of 0.5-1 second doesn't show much significance when you have a few pages to scrape but you will see the impact on performance when it's tens of thousands of pages.

    If CSS and images are not necessary for your project, disabling them makes the pages load faster. You can intercept the HTTP requests sent by the browser and block any network that is requesting the unneeded resources.

    await page.setRequestInterception(true);
    
    page.on('request', (request) => {
        if (['image', 'stylesheet', 'font'].indexOf(request.resourceType()) !== -1) {
            request.abort();
        } else {
            request.continue();
        }
    });
    

    Bannerbear blog without CSS, scripts and images.png

    If you see ERR Failed to load resource: net::ERR_FAILED in the browser’s console, fulfill the requests with a 200 status can avoid them.

    if (['image', 'stylesheet', 'font'].indexOf(request.resourceType()) !== -1) {
          request.respond({status: 200, body: 'aborted'})
      }
    

    Conclusion

    I hope these six tips can make your web automation tasks a breeze and improve your web automation experience. Nonetheless, reading the Puppeteer API documentation and familiarizing yourself with it will help you too.

    Happy automating!

    Other Puppeteer/automation tips and tutorials:

    About the authorJosephine Loo
    Josephine is an automation enthusiast. She loves automating stuff and helping people to increase productivity with automation.

    How to Automatically Create Eye-Catching Images for Your App's Link Sharing Using Bannerbear (Node.js)

    Want to make your app’s link sharing stand out? Learn how to use Bannerbear to automatically create eye-catching images that boost click-through rates! This tutorial walks you through building a backend service with Node.js to automatically generate custom images based on dynamic data.

    How to Auto-Generate Social Media Posts For a Business Conference Using Bannerbear (Node.js): Part 2

    In Part 2 of our series on auto-generating social media posts for a business conference using Bannerbear, we'll be creating posts for panel sessions. The image for each post will feature key details like the session topic, moderator, and panelists.

    How to Auto-Generate Social Media Posts For a Business Conference Using Bannerbear (Node.js)

    Save time and enhance your event’s online presence with this step-by-step Node.js tutorial. Learn how to use Google Sheets, Google Drive, and Bannerbear to automatically create images for social media posts and streamline your content creation process.

    Automate & Scale
    Your Marketing

    Bannerbear helps you auto-generate social media visuals, banners and more with our API and nocode integrations

    6 Pro Tips for Optimizing Web Automation Using Puppeteer
    6 Pro Tips for Optimizing Web Automation Using Puppeteer