TACTICS | How to automagically export historical class data from InstantMandarin, LingoBus, and other language learning websites into spreadsheets

You can "automagically" turn your language learner’s class histories listed on webpages into structured tabular data of the sort you see in spreadsheets like Microsoft Excel and Google Sheets and Apple Numbers. This class history data is useful, and will become more useful as your learner grows in ability. You should export these class histories from places like InstantMandarin.com and LingoBus.com before these sites shut down or hide this valuable data from view. You don’t have to be a geek or a gearhead to accomplish this. Here’s how.

SUMMARY | You can automagically turn your language learner’s class histories listed on webpages into structured tabular data of the sort you see in spreadsheets like Microsoft Excel and Google Sheets and Apple Numbers. This class history data is useful, and will become more useful as your learner grows in ability. You should export these class histories from places like InstantMandarin.com before these sites shut down or hide this valuable data from view. You don’t have to be a geek or a gearhead to accomplish this. Here’s how.

I started tigerba.com to track the ongoing revolution in online language learning, to share strategies for imparting fluency and literacy in young language learners, and to document the highs and lows of my kids’ studies in Mandarin Chinese. This blog entry is an oddly geeky choice for a second public post to a nascent website. But with the impending death of InstantMandarin.com at the end of December, it is extremely timely. Sometimes you just gotta get your geek on.

If you put your kids on an aggressive schedule of online 1-on-1 language learning lessons, you will eventually find some measure of success. With enough time and effort, your children will become what’s generally considered “fluent” and “literate.” And at some point both you and other people will want to figure out how exactly you got your kid to this enviable place of fluency and literacy.

The proliferation of online language lesson providers in the last five years has had two wonderful effects:

1) Many kids are now learning languages to very high levels of proficiency even in the absence of fluent family members to help them in their language learning journey, and

2) An extremely detailed record is being automatically created that documents how exactly these proficient language learners became proficient, and what it took in terms of class time, study time, and other learning inputs. At some point this historical record will probably become important to you and your learners. You should keep it.

If you look online at websites like LingoBus and InstantMandarin, you can pull up historical class data for your learner(s), albeit in a fairly messy and unusable form. For example, here’s my son Sam方平山‘s 740-lesson history with InstantMandarin paginated across 74 pages. As of December 31, this data will probably disappear as InstantMandarin.com goes offline:


Here’s Sam方平山’s 240-lesson class history with LingoBus‘ lower-level “Speaking and Listening” track, which he started on August 31, 2020 and completed on June 11, 2022. Sam’s complete class history for “Speaking and Listening” is paginated across 57 pages.

方平山 having fun in negative 30F temps, Harbin, China,
December 2019

This historical class data is actually quite interesting. You can do stuff with it. You can make decisions based on it. You can calculate the totality of hours your learner has attended online language learning classes in their entire lifetime, which is a very interesting number. You can assess the velocity of your learner’s learning and how exactly classes figured into that. You can try to gauge the optimum number of online classes per week for your particular learner, and have some actual information to inform your thinking. You can assess how going on vacation from online classes did or did not affect learning outcomes. You can make nifty plots and graphs to impress teachers and send your defenseless Thanksgiving guests packing, posthaste. Five years on, when you’re tooling around the fabulous annual snow and ice festival in Harbin, China, you can remember the name of that completely awesome teacher from Harbin that your kid once had, easily look them up, and take teacher out for a memorable lunch. These connections are more valuable than you know, now.

Aeronautical engineers have a geeky saying about ramjet engines: “the faster it goes, the faster it goes.” (… and if you even know what a ramjet is and why this saying is meaningful, rest assured your geek cred is already off the scale.) I offer this yet-geekier, yet-dorkier variant of that implicitly geeky maxim: the more data you have, the more information you have. Put another way, the more data you have, the more interesting that data becomes. As your child finds success in language learning, the information about your kids’ class history will become ever more interesting and useful. Now or later, you will want to have your kids’ historical class information in a “useful form.”

A bunch of records listed on a webpage and paginated over 74 pages like what I showed earlier is not a “useful form.” And if your kid has been taking lessons with multiple providers (InstantMandarin.com, LingoBus.com, LingoAce.com, Wukong.com, Italki.com, etc.) the problem is compounded. You cannot easily view the totality of your data, sort it, sift it, graph it, or manipulate it in anything resembling a convenient and consistent way.

What you really want here is to have your kid’s class data in a structured and consistent tabular form. Data newbies tend to call this format by its trade name: “spreadsheets.” Visual interfaces to data like Microsoft Excel, Google Sheets, Apple Numbers, and other spreadsheet programs allow mere mortals a sortable, sift-able, mentally manageable view of data.

In short, we wanna take all of that stuff that’s printed on many webpages from here to kingdom come across the Interwebs, and somehow distill all of that data into a single unified spreadsheet that represents the totality of our children’s online language learning.

What to do? In the bad old days of the early 2000s we would have written custom programming to try to pull data of interest out of webpages.

What to do?

Circa 2024, we can wave magic wands and somehow use absolutely free online tools to complete this work far faster than even an expert programmer might.

I managed to capture all of my LingoBus and InstantMandarin classroom data to a unified spreadsheet in about an hour. I didn’t use any programming even though a long-lost part of me groks software developer tools, idioms, and methodologies. The remainder of this post will be a practical how-to for how you, too, can grab your language learner’s classroom data and export it to your spreadsheet of choice, using online tools that are easy for non-programmers to use, and 100% free.

Here are the steps you will want to take to automagically capture your learner’s data off of sites like instantmandarin.com and lingobus.com. The really short version is that you will be efficiently capturing the website data corresponding to your learner’s class history as a collection of image files using an open-source screenshot program, and then using a very smartly programmed commercial web-based application to transform those images to structured spreadsheet data. Easy and free.

Behold: fresh, tasty structured data!

1) Download Greenshot, an open source screenshot program: https://getgreenshot.org/

2) open the InstantMandarin website and scroll to “booking history”

3) use the Greenshot hotkey to capture the screen region showing the first page of the booking history. Choose to save the file using the default naming format

4) use the arrows on the instantmandarin.com or lingobus.com website interfaces to move to the next page of data

5) use the Greenshot hotkey corresponding to “capture last screen region”.  Choose to save the file using the default naming format

6) use the arrows on the instantmandarin.com or lingobus.com website interfaces to move to the next page of data, and repeat step 5

7) continue capturing all screenshots representing all of your class data and put those screenshots in a single folder. This took me less than 10 minutes for 74 pages of data

8) Open the Nanonets website (https://nanonets.com/)  and establish a free account. This free account will let you bulk-convert image files into structured data. It is free to do this – Nanonets is trying to attract big-money customers by demonstrating how well their software works.

9) Once your free account is established, bulk-upload all of the screenshots you took of all of your data to the Nanonets website

10) Use the Nanonets “convert to tables” option. I tried a variety of image-to-structured data tools. Nanonets was by far the best, turning the images into data with no errors and making logical choices about where columns actually were. It worked well for both InstantMandarin and LingoBus. Good job, Nanonets, I will surely remember you in my next career at the GSA.

11) Clean up and save out the automagically created Excel file, which in my case concatenated all 71 pages of InstantMandarin data into one super-clean, super-sortable Excel file.

Follow these steps and improvise as necessary for your learner. The strategy I’ve described here should work for most online websites that present class histories in a roughly table-like form.

InstantMandarin.com will disappear sometime around December 31 of this year. Your data has great value. Gather ye rosebuds and grab your learner’s data while you can.

虎爸虎妈?

Sign up to receive awesome content in your inbox: articles, advice, lesson plans, reviews, travel ideas, alerts.
No cost. No spam. No kidding.

*We won't give away or sell your info, period.