5 sleuthing skills that’ll reveal the age of any web page
You may not nail it exactly, but you can get close.
Usually, the date an article or web page was published is on the screen in front of you. But sometimes a page will try to masquerade as an ageless wonder, which is problematic when you need to know if it’s still relevant. Don’t fret: There are ways to lift the veil of mystery.
To be clear, unearthing an exact date is not guaranteed—you may only be able to come up with an estimate for how old the information is. Often, that’s good enough.
Easiest: Look at the URL
A page’s address is technically “on the screen in front of you,” but easy to miss. Check there first. Unfortunately, these aren’t always consistent or exact. Some of Popular Science’s older articles have URLs that include the year and month (but not the day) they were published. Our newer stories do not.
Easy: Investigate the XML sitemap
An XML sitemap is simply a list of URLs for a given website, with basic information about each one. It’s there to guide search engine crawlers on their never-ending quest to gather data. To see it, head to the address bar and add /sitemap.xml to the end of the page URL.
If you’re lucky, it’ll be well-organized, like the one for the White House website. For more frequently updated sites, like Lifehacker, you may get a massive list of last-modified dates. Worst-case, it won’t work at all and you’ll get a 404 error, as with PopSci.
Medium: Use the Wayback Machine
The Internet Archive’s Wayback Machine is a repository of snapshots cataloging billions of pages across the web. Simply paste the URL you want to investigate into the site’s search bar and hit Enter. This will return a timeline showing when the tool captured an image of the page in question. Click the year you want, then one of the highlighted calendar dates to see what it looked like at that time.
For this PopSci story on how to print and scan items with your phone, the earliest date on the Wayback Machine is March 14, 2017—the day the article hit the web. Although this is accurate, that may not always be the case. The page you’re looking at may have been logged some time after it was published, or it may not have been recorded at all.
Harder: Harness Google’s advanced search functions
Sometimes Google results come with dates. If they don’t, you can force the search engine’s hand. Copy the address of the page you want to know about, head to the search bar, and type inurl:. Then paste the URL after the colon (no space). This will tell Google to show you only results from that exact site.
Next, go to the address bar (not the search bar) and add &as_qdr=y25 to the end of the URL that’s in there. This command tells Google to show you results from the last 25 years. To break it down a bit more, “as” stands for “advanced search,” “qdr” is shorthand for “query date range,” and “y25” means “the last 25 years.” You can alter that last bit to use “d” for days, “w” for weeks, or “m” for months, followed by any number you want.
When you hit Enter on that modified URL, Google will display a date with your search result. But like the other options listed here, there’s no guarantee how accurate this is. It could be the published date, the day it was last modified, or when Google indexed it. PopSci’s story on the best ways to reheat pizza, for example, displays Feb. 7, 2020. That’s the day we first published it, but it was updated on Feb. 5, 2021.
Another more time-consuming way to nail down a page’s first Google appearance is to use the inurl: command, find Tools under the search bar, and click on the Any time dropdown menu. Select Custom range… and plug in some dates. By searching year-by-year and constantly narrowing your date range, you should be able to find when a page first went live, but this is not an efficient process.
Hardest: Dig into the source code
Right-click on any web page and you should see an option to view the source code. On Google Chrome, it appears as View Page Source. Choose it, and you’ll get a look behind the curtain. Buried in all that information, you may be able to find when the page was created or modified. Use Ctrl+F on Windows or Cmd+F on macOS to open the search function and do your best to track it down. Try finding keywords such as “date”, “published”, “publishdate”, “modified”, “datemodified”, or something similar.
PopSci is clear about when its stories have been published and updated, but you can find that date in the source code by searching “last_updated_date”. Be careful, though: There may be dates for other items on the page, like photos. These may not be the same age as the rest of the content.
The sheer inconsistency and potential for complications is what earned this strategy its place as the hardest one on our list. If it works well, you can find your answer quickly. If it doesn’t, well, you’ve got a lot of code to sift through.