Internet Search Tips

Jedd · 2021-04-18T23:48:21

To augment the suggestion around learning / using hot-keys, ctrl-shift-t (chrome) undoes the most recent tab or browser window close. It's insanely handy, but from casual observation, not well known / used.

If you have a mouse with some additional side buttons (that you don't already have mapped) I can strongly recommend mapping them to ctrl-pgup and ctrl-pgdn, so you can go left/right with your tabs in the current browser without having to relocate either your hands or your mouse pointer. Because I tend to open many links to new tabs (middle-click) - a consequence of living in a country (Australia) with poor internet speeds - I'm now lost whenever I have to use a mouse without those buttons.

Watching people use the mouse to go to the hamburger menu to recover a just-closed-tab, or to the tab bar to switch between tabs, or moving the mouse to the scrollbar to move the content up/down, is the contemporary equivalent of watching someone play Solitaire intentionally poorly.

CiceroCiceronis · 2021-04-19T03:26:03

This very closely mirrors my experience. I bought an MMO mouse to use in Photoshop but found that it was probably even more useful after mapping forward, back, switch/kill/resurrect tab and so on. I also configured normal click to “always open in the same tab/window” and open in new tabs exclusively with middle click. The efficiency increases are impressive.

Gwern’s tips for “quick searching” (which I implement using DDG !bangs and Firefox Saved Searches) are very useful too, when you know you’ll use a website regularly, I have about 30 custom saved searches now.

Jedd · 2021-04-19T04:09:14

Back when it still worked for most of the web (and before it became the progenitor of most contemporary browsers) KDE's Konqueror was my favourite and default browser.

It came with a bunch of built-in web search keywords that sped up your search intents (all customisable of course). Usually two or three letter prefixes, that were terminated by a colon, so you could, f.e. 'ggl: khtml history' (takes you direct to first google 'lucky' result) or 'wp: charles eaton' (search and show wikipedia's top hit for charles eaton), etc.

justinnhli · 2021-04-19T00:27:31

Another option for people who don't have additional mouse buttons is to install addons for mouse gestures, such as Gesturefy [1]. You can then bind undo close tab, next/prev tab, and others to a mouse gesture, allowing you to do those actions without button hunting on the screen.

[1] https://addons.mozilla.org/en-US/firefox/addon/gesturefy/

sellyme · 2021-04-19T01:02:51

> If you have a mouse with some additional side buttons (that you don't already have mapped) I can strongly recommend mapping them to ctrl-pgup and ctrl-pgdn, so you can go left/right with your tabs in the current browser without having to relocate either your hands or your mouse pointer.

These hotkeys are functionally the same as Ctrl+Tab and Ctrl+Shift+Tab, which - at least in my idle position - don't require moving your hands any meaningful amount.

Jedd · 2021-04-19T01:08:14

Totally agree - if your hands are on the keyboard, then use the keyboard shortcuts, no movement required.

If your hand is on your mouse, though, this means you don't have to move that hand back. (I'm assuming left-hand mouse uses won't find anything involving TAB to be hand-agnostic. : )

My usage pattern tends to be keyboard - search, tune search - then mouse - open lots of new tabs, then scroll up / down with mouse, and flick between those new tabs with the mouse, perhaps select some text with the mouse.

I think everyone agrees that frequent relocation between mouse / keyboard is sub-optimal.

userbinator · 2021-04-18T20:45:01

Unfortunately, using the advanced search operators "too much" can get you banned from Google for a few hours, where you get an infinite series of CAPTCHAs. What counts as too much seems to vary widely, but I've triggered it with as few as one query for some obscure phrase using site: .

Google is definitely far worse for obscure things than it was a few years or a decade ago. 2010 is roughly when I started noticing it.

dredmorbius · 2021-04-19T04:51:21

Yet another reason to employ DDG as your primary search engine. I've never been rate-limited by it.

(Google Web Search rate limits seemed to kick in around 2015 or so.)

You can still search Google (or metasearch) with bang queries, so, !S (for the Startpage Google proxy search) or !G (for google directly).

Google rate-limiting / CAPTCHA is vastly worse if you're on Tor or VPN. For the most part I simply avoid Google entirely.

CGamesPlay · 2021-04-19T01:40:21

I've seen two links to this domain in as many days and I absolutely can't stand the developer's treatment of links. I can't actually click on them, because just hovering them opens them in some ephemeral pseudo-popup, and I can't read the main article, because scrolling will invariably hover over some link which will then open that popup and cover up the article I was reading.

It's absurd to me that this apparently independent blogger's website is the best argument I've ever seen for "JavaScript should be disabled by default."

gwern · 2021-04-19T13:47:10

Like on Wikipedia, which we borrowed the design from, popups can be controlled & disabled with the 'gear' icon.

I'm not sure what you mean by "can't actually click on them". Popups are positioned away from the link and do not cover up the original link, which can be clicked on like normal. (You can also click on the obvious thing in the popup, the title, which is a hyperlink to the original.)

The scroll thing is definitely bad, but fixing it is a little difficult. (GreaterWrong solves it by a hybrid of 2 listeners which tries to guess if you are mousing or scrolling instead of just on-hover.) I don't know if/when that will be fixed. In the mean time, I'm boosting the timeout since more than one person has complained about popups being too quick.

CGamesPlay · 2021-04-19T14:00:23

> Popups are positioned away from the link and do not cover up the original link, which can be clicked on like normal. (You can also click on the obvious thing in the popup, the title, which is a hyperlink to the original.)

This was absolutely not the case, at least on my viewport size. FWIW, I also didn't realize about clicking on "the obvious thing in the popup", possibly because this is never a thing that you click on in websites.

> The scroll thing is definitely bad, but fixing it is a little difficult.

What's the motivation for this being a hover effect? Seems like clicking on the link to open it is a pretty well-tested paradigm.

[append]

One big difference between what I saw on your site and Wikipedia is that Wikipedia's are only previews and (AFAICT) feature no internal interactions.

gwern · 2021-04-19T17:43:00

If your viewport is so small that there's nowhere to position the popup without overlapping, there's not much that can be done... But I think in those cases, it's supposed to be switching to popins, so if you're still getting popups, that may be a bug. You'll need to be more specific with dimensions and screenshots.

(Alternately, you may be doing something weird which is not our fault. At this point, I can't rule that out since we've fixed most of the common bugs. A recent example: one person was complaining about how many clicks it took on his laptop... we were baffled how popups could require 3 clicks to open a link when it is carefully designed to let you click once as normal; and it turned out he failed to mention he was actually in mobile popin mode, because his browser reported no pointing devices, because when he unplugged and plugged back in his mouse during a desktop session, his Chrome somehow fails to register the replug and continues to claim to the CSS media-queries that he had no pointer device even as he's busy clicking on links! A bizarre scenario I would never have thought of, but true.)

> FWIW, I also didn't realize about clicking on "the obvious thing in the popup", possibly because this is never a thing that you click on in websites.

It totally is. Titles/headlines and section headings are self-hyperlinked all the time on the Internet (including WP), it's standard, and the links are further bolded, then underlined and get link icons in case that wasn't clear (just like all the other links on gwern.net).

> What's the motivation for this being a hover effect? Seems like clicking on the link to open it is a pretty well-tested paradigm.

Reduces friction, of course. How do you know if you want to see the whole thing? Every additional step or bit of friction greatly cuts down on use. It's the 1% law of UI. I'd make it eyetracking gaze-based if only I could...

> One big difference between what I saw on your site and Wikipedia is that Wikipedia's are only previews and (AFAICT) feature no internal interactions.

Yes, the logged-out WP previews are terrible. I'm glad we've been able to improve on them with internal links & recursive popups, although it took a ridiculous amount of work to get them right. (I still haven't figured out why they went for such limited hobbled previews, when the feature was modeled off Lupin's popups going all the way back to like 2004, which offered so much more and are invaluable to any WP editor. Like... why deliberately erase all of the links in the text? They aren't even available as an API option for that feature!)

CGamesPlay · 2021-04-20T00:28:25

Honestly, I followed shp0ngle's advice and disabled the popup behavior, which makes your site much more bearable to me.

> You'll need to be more specific with dimensions and screenshots.

My browser is Safari on Mac OS, using no external hardware. My viewport is 1059 x 751, as reported by window.innerWidth/innerHeight. Here is a 1-minute video which demonstrates by experience with the popups. https://seafile.cloud.cgamesplay.com/f/bd439c7761ec408c8f8a/

Anything else I could say would be purely subjective, and it's your website so you can do with it whatever you like. You've heard my feedback; thanks for being receptive to it.

gwern · 2021-04-20T01:07:55

Thanks for the details and video. That confirms what I thought playing around earlier: turns out, we don't have a max/min-height set, and so it's possible to have popups on screens where it's impossible for the popup to not overlap. (For some reason, the positioning is also almost as bad as possible when the popup must overlap, but that is a secondary issue.)

I think this can be easily fixed by adding a minimum height to the media-queries and falling back to popins... Obormot might want to do something different, so we'll see. But shouldn't be hard either way.

Alekhine · 2021-04-19T03:24:46

I actually like it, I just wish it wasn't so fast. The delay is less than a second when really it ought to be more like a a second and a half.

shp0ngle · 2021-04-19T05:08:03

you can disable those pop-ups.

make one of them appear and click on the gear icon top right.

vincengomes · 2021-04-19T17:22:28

You conveyed exactly what i wanted to say. I don't remember reaching for 'Disable JavaScript' this fast for a website.

_Microft · 2021-04-18T19:50:01

I'm unsure about other browsers but Firefox has "find as you type" functionality built. It allows to search the current webpage by simply starting to type. If the typed text matches a link, one can press enter to follow it. This feature makes navigating and searching the current page a breeze and can greatly speed up your web browsing in general.

Here are settings related to the feature:

To enable it from about:config, you want to set accessibility.typeaheadfind to true. The timeout after which the search bar disappears again is set as number of milliseconds in accessibility.typeaheadfind.timeout. The default of 5000 milliseconds might be excessive if you do not want the bar to be in the way during browsing. I'm very happy with 1500 for that which gives 1.5 seconds after the last keystroke to e.g. start editing the search string before the search bar disappears again.

Edit: it looks like you can enable typeaheadfind in the preferences nowadays. Tweaking the timeout still requires going to about:config, though.

nitrogen · 2021-04-18T20:05:03

I believe that you can trigger this with the forward-slash key, and close the bar with Enter, if you prefer not to have every keypress trigger a search.

_Microft · 2021-04-18T20:20:18

The only time that it is inconvenient to have it enabled all the time is when a page wants to react to some keys in which case I need to manually disable it. I do not like to have to press a key to start searching during normal web browsing as "/" requires either two keys or reaching for the number pad in my keyboard layout, making it no better than Ctrl+F in my opinion (something, something, "Falsehoods programmers believe about keyboard layouts/shortcuts"? ;) See also [0]). It is simply a personal preference, I guess.

[0] https://news.ycombinator.com/item?id=26743028

Sniffnoy · 2021-04-19T03:32:19

Gwern mentions the benefit of being affiliated with a university so you can use ILL (in particular, Illiad). I think it's worth mentioning some ways you can get ILL without having to be a student or professor. (Note that these are not actually going to be practical for most people, but for people who happen to be in the right situation to take advantage of them, they can be quite useful.)

1. Live in New York City. Yes, the New York Public Library seems to provide access to Illiad! I haven't actually tried making use of this, but it's on the website. Obviously you're not going to move to New York just to take advantage of this, but if you happen to already live there, you have this option!

I expect there are other cities and non-university organizations that provide access to Illiad; I mention New York just because it's one I know of. If other people know of others, I'd be glad to know of them!

2. Get a position as a "visiting scholar" at a nearby university. :) OK, this one will likely require knowing someone there, and maybe having a PhD since there may be some minimum requirements, but generally if you can get a professor to name you can one you can become a "visiting scholar" -- this isn't a job, they don't pay you anything and you're not required to do anything for them, and as such there's no hiring process, you just can get named one if you meet the minimum requirements (they may want to see some sort of CV also). They won't pay you any money but you will get library access, including ILL! So, y'know, that's useful. :)

Obviously that route isn't open to everyone either. But depending on your situation it can certainly be easier than enrolling as a student or getting a job as a professor!

tjalfi · 2021-04-19T15:42:08

I'm not familiar with Illiad but King County Library System[0] in Washington state has excellent interlibrary loans. I have received books from libraries all over the US, Canada, prisons, museums, and even the Library of Congress.

Another option is that some university libraries have a membership program; you can borrow books and place interlibrary loans for an annual fee.

[0] https://kcls.bibliocommons.com/

lavoiems · 2021-04-18T17:57:52

A neat trick that is not presented is to use https://www.connectedpapers.com/

The website presents a graph of related works clustered by similarities.

domenicrosati · 2021-04-18T20:29:48

https://scite.ai does this as well (also citations are classified and analyzed whether they provide supporting or contrasting arguments to the citations)

The scite extension also works with connected papers so you can see that info there as well.

Disclaimer: I work on scite

tux · 2021-04-18T17:40:48

Thanks for the article, this reminded me of GHDB; https://www.exploit-db.com/google-hacking-database

hargup · 2021-04-19T09:20:27

This is interesting, thanks for sharing.

HellsMaddy · 2021-04-18T18:28:33

I want to add a suggestion to the hotkey shortcuts section: I use the Chrome/Firefox addon SurfingKeys [0] with my own configuration [1] in which I’ve added search engine auto-suggestions for just over 50 sites. So, for example, to start searching Google Scholar I type `ags`, or to search GitHub I type `agh`. Check out the screenshots [2] to see what I mean.

I’m currently working on cleaning up the code and making installation as simple as pasting a GitHub release URL into the SurfingKeys settings. I hope to have this done within a week or two.

[0]: https://github.com/brookhong/Surfingkeys

[1]: https://github.com/b0o/surfingkeys-conf

[2]: https://github.com/b0o/surfingkeys-conf#screenshots

jraby3 · 2021-04-18T19:44:09

This sounds similar to the !bangs in DuckDuckGo, like !yt for YouTube or !gi for google image search. It’s one of my favorite features.

HellsMaddy · 2021-04-18T20:44:24

Yeah, it’s essentially the same concept, but with suggestions shown immediately for the context you’re searching in. For example when searching Wikipedia you’ll get snippets of articles and thumbnail images. You can even access DDG bangs using `aD!<bang>`. I’m also looking into adding first class support for DDG instant answers.

smiley1437 · 2021-04-18T17:28:14

Great summary of tips

Long ago I realized the the only reason I have a job is my ability to google stuff lol

lancesells · 2021-04-18T20:07:13

I feel the same. Just talking the time to search and leave no stone unturned makes so many things much easier.

the_arun · 2021-04-18T17:40:24

The site is designed beautifully to distract me to look around how it has been implemented rather than the main topic - search tips. Yes, Font could have been better for readability :)

Black101 · 2021-04-18T19:33:27

Push the reader button in the address bar?

the_arun · 2021-04-18T21:17:43

In Brave browser reader mode is not enabled if shield is enabled for the site. So I had to disable shield to see reader mode. Not intuitive, but there is work around.

Black101 · 2021-04-18T21:59:49

What is their reasoning for blocking reader mode by default?

the_arun · 2021-04-19T03:23:41

I think reader mode needs scripts to run. With shield enabled, looks like it is blocking reader mode script

sneak · 2021-04-18T18:47:40

I miss the days when Google was AND search always and by default.

Now they're terrified of not returning any results. Even when there aren't any results, they return a page full of ads that looks like results - at some point in the last dozen years, the empty "no results found" google page bit the dust.

sbierwagen · 2021-04-18T20:13:16

To be fair, 20 years ago Google was used by an educated minority, and now it's the default interface to all human knowledge by every person in the world. It's a completely different product now, with a very different, and vastly larger, customer base.

As noted by many other people, Google's complete dominance of all web search for a decade makes the lack of any attempts of competition notable by their absence. If VCs are profit-maximizing, we should be seeing a new Cuil like, every month. Search advertising is a huge market, and is super profitable! Why isn't anyone trying to capture some of it? If Google is so bad now compared to some imagined heyday, then why is Bing also bad, despite the money Microsoft has spent on it?

zeeshanqureshi · 2021-04-18T16:44:31

Great set of tips.

On a side note, I wish the site had a simple, easy to read fonts option similar to the switchable light/dark mode.

_Microft · 2021-04-18T18:13:03

Maybe your browser's reader mode fits the bill? Firefox allows to choose serif/sans-serif fonts, font-size, line spacing and background color for example.

zeeshanqureshi · 2021-04-18T18:41:37

I hesitate to admit this, because it makes me look stupid but you are right.

I should remember to use the reader mode more often.

_Microft · 2021-04-18T19:16:10

It is easy to forget a tool which one rarely needs, there is nothing stupid about that in my opinion :)

zeeshanqureshi · 2021-04-18T19:33:25

True :)

Yizahi · 2021-04-18T21:21:04

Well that's obvious, but really it's a shame to force reader mode on such a beautiful site. And fonts there were selected with some purpose it seems, it's just that font hinting on Windows makes them off, they probably look good on Macintosh.

jl6 · 2021-04-18T19:34:41

Fravia’s searchlores for the 21st century!

tomcooks · 2021-04-18T20:50:52

Way to few Latin mottos to be even close to SL /s

mcshicks · 2021-04-19T00:28:00

Searching internet archive for books is really great. I do prefer books to online reading, and often times I can get a book I'm interested in via inter-library loan if I am willing to wait. However if I can read the first chapter online on the internet archive I know if it's worth the wait to read the actual book. I do see it mentioned quite often on HN, but I feel like the population in general when I mention it I don't get the same response I do if I mention say Google or Wikipedia. I would not say IA is on par with those in terms of how often I use it, but it definitely offers something significant that is not offered elsewhere as far as I can tell.

grimgrin · 2021-04-18T22:04:52

off base question, i suppose inspired by porch sitting + reading gwern's case studies: https://gwern.net/Search#case-studies

i think i would have found most of the examples gwern listed, maybe not as quickly. i go wild on google iteratively before jumping to another search engine

but, is there a tournament or contest along the lines of 'producing some result via searches' quicker than others? im thinking a form of this might exist at defcon/thotcon/similar

ironically, instead of searching, im asking here haha

jamiek88 · 2021-04-18T23:21:40

If idly daydreamed about this becoming a sport like esports. Huge crowds cheering as I find that obscure English ePub on a Chinese site, they roar with delight as I DDG !bang their faces off.

Monster energy sponsoring the fastest ‘search parent directory’ surfers.

Ahhh if only.

Also while I’m at it I’d like ‘kicking stones against other stones at juuuuuust the right angle to ping against another stone while walking leashed dogs’ to become an Olympic sport.

sellyme · 2021-04-19T01:05:05

> but, is there a tournament or contest along the lines of 'producing some result via searches' quicker than others? im thinking a form of this might exist at defcon/thotcon/similar

If there is, it will definitely have deliberately horrible SEO.

WarOnPrivacy · 2021-04-18T15:34:54

I see Gwern doesn't encourage using operands in DDG.

Maybe that's because DDG ignores them.

inetsee · 2021-04-18T17:25:16

DDG does support some search operands. https://help.duckduckgo.com/duckduckgo-help-pages/results/sy...

WarOnPrivacy · 2021-04-18T17:55:31

We're both correct. They support and ignore operands.

"cats and dogs": Results for exact term "cats and dogs". If no results are found, we'll try to show related results.

The entire point of using quotes (or + back in the day) is to limit the results to the search term. Fluffling up the results with stuff we aren't asking for forces us to consider and disregard each one of those unasked-for results - until we get frustrated and go to Google.

mikevin · 2021-04-18T18:50:47

I really dislike that kind of search because I'm forced to scan the results to see if they really contain what I'm looking for. Try searching "arm" with anything programming related term and count the articles about armchairs, it's infuriating.

Dylan16807 · 2021-04-18T23:36:40

That's not ignoring. If any results exist with the full quote, then it will only show you results with the full quote.

Also google does the same thing all the time.

Also I just tried to get that behavior on duck duck go and it didn't do it. For my test quote, with three words that appear together on many pages but never adjacent, it just said no results found.

But if it works like google, it will tell you at the top of the page when there were no real results and it's serving fallback results. Look for that message instead of looking for a lack of results.

sneak · 2021-04-18T18:48:37

Google does this shit too, to a slightly lesser extent.

feanaro · 2021-04-18T20:09:00

This is painful to read. Instead of DDG noticing how much Google sucks at respecting operators and beating them at it, scoring praise from power users, they somehow managed to be even worse than Google.

WarOnPrivacy · 2021-04-18T19:20:15

Google once became as bad as DDG is. About 3 years ago they unwound that, a fair amount. Presently, I get some G searches w/ no results - which is helpful.

dredmorbius · 2021-04-19T05:17:16

An excellent set of (re)search methods, many of which I'm well familiar. A few notes and additions:

- There are numerous public-domain full-text archives, including Project Gutenberg, Internet Archive, and many small specialised library collections (usually focused on a given topic, e.g., Online Library of Liberty). Less useful for post-1925 materials, but often high-quality renderings (either scans or proofread re-typeset / typed-in documents) available. Google Books also allows full PDF downloads for public-domain works, generally.

- NYPL's Secretly Public Domain project has been reviewing copyright renewal records to find works published since 1923, and before 1964, whose copyright was never renenwed. Other projects (Internet Archive notably) have been flagging these works as being in the public domain, and hence freed of any download restrictions.

https://www.nypl.org/blog/2018/03/30/unlocking-record-americ...

- OpenLibrary / Internet Archive increasingly have current under-copyright books available for at least 1hr and up to 14 day loan. The reader is less elegant than it had been in past, but is viable.

- HathiTrust is all but useless with its download restrictions. It's helpful to determine if records exist.

- Worldcat gets only a brief mention by Gwern. It's a union catalog (a combined library catalog of a vast number of libraries worldwide), of books, articles, and other document types, and is an excellent way of determining if a book exists, what an author's output is, and/or the documents within a given search space. !worldcat DDG bang search, "ti:" is title, "au:" is author, "kw:" is keyword. Space any colons (":") occurring within search terms, or omit them entirely. You'll still have to either find the digital record elsewhere, or track down a library, but quite useful.

- You can save online materials to the Internet Archive using the 'save' URL:

   https://web.archive.org/save/<original_url>

So to save this particular HN discussion we'd specify:

   https://web.archive.org/save/https://news.ycombinator.com/item?id=26847596

You can submit that through any HTTP client (curl, wget, lynx, w3m, your GUI browser, etc.). Requests can be trivially scripted and batched.

This is ... documented somewhere (I stumbled across it myself), though I'm not finding the specifics. Related "save page now" functionality is mentioned here: https://blog.archive.org/2019/10/23/the-wayback-machines-sav...

- Motorised paper cutters are available at some photocopy shops. Inquire as to whether or not you can have books debinded by them. (Generally anything resembling paper is fine, though the blades can be damaged by metal or other materials.)

CompArtisan · 2021-04-18T17:49:25

Quite relevant and useful. For articles that are blocked by a paywall I usually search the article URL on www.archive.org and there's usually an unblocked one there.

ergot_vacation · 2021-04-18T19:14:24

Some neat stuff here. Only VERY briefly mentioned (so briefly I missed it at first) however: Substituting Yandex for Google is great for many use cases. Being Russian, Yandex is no doubt heavily censored, but only for things important to Russian politics. Ironically, this means that for non-Russian users, it's considerably LESS censored than Google, which has SEVERELY crippled its search in recent years in the name of politics, "politics," DMCA madness, "right to be forgotten" etc.

The image search is especially impressive. Remember when Google Images used to give you actual results when trying to find the source of an obscure image? Yandex still does, and it does a bunch of other neat things too, like automatically trying to transcribe text from an image if it's text-heavy. My instinct is that a lot of this capacity exists in Google Images, but is either mostly hidden from the user or deliberately hobbled to stop the oh so evil content pirates.

Zero privacy of course. Assume the Russian government is watching in realtime as you hammer in another inane search. But for some use cases that's fine.

userbinator · 2021-04-18T20:48:37

Bing is also significantly less censored, especially for adult content, but also has a smaller index than Google and less operators.

EE84M3i · 2021-04-19T00:39:43

Mentioned in the article, but Yandex also does some degree of facial recognition (or at least similarity) when doing reverse image searches. It works surprisingly well.

Abishek_Muthian · 2021-04-19T05:04:36

I haven't used Yandex much, But I started noting down failed search results according to the search engines in the hopes of one day creating a 'Search Engine Wall of Shame' to provide genuine feedback to the Search Engines in the areas they could improve[1]. I'll try Yandex for such queries too.

[1] Those who are interested in a 'Search Engine Wall of Shame', URL for the discussion in my profile(#207).

derefr · 2021-04-18T22:35:06

Is there a search portal that does backend-side searches of all these politically-disjoint large search providers, and then merges and deduplicates the result?

eli · 2021-04-18T23:30:33

Hah that was a big thing in the mid 90s. And, remarkably, this site still seems to be going strong https://www.dogpile.com/

I thought it was a really dumb name at the time, but I remembered it some 20 years later.

1vuio0pswjnm7 · 2021-04-19T04:18:48

I have actually been working on a prototype for something similar to the "meta-search" engines of the 90s, but the intent is to deal with the present situation of search engines placing limits on number of results, and to rid SERPs of needless cruft (Javascript, CSS, advertising, etc.) making them look more standardised according to personal taste. I never really liked meta-search engines because they were too slow to return results. Instead this script lets me query search engines directly, one or more at a time, in succession, or all at the same time, and merge the results into simple, aesthetically-pleasing HTML files.

I use a text-only browser to read HTML so what I am describing here is not designed with "modern" browsers in mind.

The approach I take to avoid search result limits is that I search from the command line and store the result URLs in standardised "search results files" in a "search directory", one file per unique query. When I reach the results limit for the particular search engine, I can repeat the search on another search engine. The search results file is created for two reasons: 1. it allows me to strip out all the cruft from SERPs and mix results from different search engines into one file that looks great in a text-only browser, and 2. it tells me where I left off for each search engine, so I can continue any search at a later time, i.e., get more results.

The script reads from the search directory and presents me with a menu of numbered searches. I continue a search at any time by selecting a number. For example:

   1 this is an example  
   2 foo
   3 foo bar
   4 foo bar baz

I typically browse results by pointing the text-only browser at the search directory.

The search result files are each named according to the URL-encoded search string. The file format is very simple. There are three type of lines: 1. a title tag for the search query, 2. a link for each result URL, and 3. an HTML comment for each HTTP request, indicating to the script where I left off, i.e., the last result number; this is more or less equivalent to a "Next page" or "More results" link. Each result URL and comment is prefixed with a search engine identifier, i.e., a prefix. Thus the script can read the search results file comments and I can easily see which results came from XYZ search engine versus ABC search engine.

  <title>this is an example</title>
  <!-- X q=this%20is%20an%20example -->
  X <a href="https://example.com">https://example.com</a><br>
  <!-- X q=this%20is%20and%20example&p=2 -->  
  <!-- A q=this%20is%20an%20example --> 
  A <a href="https://example.net/index.html">https://example.net/index.html</a><br>
  <!-- A q=this%20is%20an%20example&start=2 -->

This example search results file above shows 1 result retrieved from XYZ search engine (prefix "X") and another result retrieved from ABC search engine (prefix "A"). It shows comments indicating where to continue the search for each search engine. Normally there would be around 50-100 result URLs per HTTP request.

This approach assumes the search result limits are temporal, i.e., they are limits on how many results can be retrieved in a given period. That may not be true for every search engine.

1vuio0pswjnm7 · 2021-04-18T23:21:00

"Zero privacy of course."

Surely this is not suggesting that Google searches afford any privacy. ;)

While it might not be the Russian government who is watching, Google searches are certainly not ephemeral nor free from analysis in real-time. Aside from "things important to Russian politics", I would guess Google on behalf of its customers, who could be anyone, including governments, is far more interested in what someone is searching than the Russian government.

The point I am getting at here is that there is privacy from a government and there is privacy from a company. Each is a form of privacy, but only the company is in the business of commercialising the information it derives from violating privacy. (Not to mention that, at least in the US, the government is subject to a body of privacy law that does not apply to companies.)

Both the government and the company may violate privacy in the interests of staying in power. They could, e.g., suppress certain information when it is in their interests. However only the ad services company has the additional motivation to collect information to generate profits. The government may believe it has no choice but to monitor web search. The company OTOH freely chooses to monitor web search, as a business.

Anyway, here is a question I have about Yandex.

Google, Bing and other search engines such as DuckDuckGo now limit the number of results that can be retrieved in one session. Based on personal observation, with Google the ceiling is currently 300, with Bing and DDG, it's more like 250.^1 I wonder if Yandex is doing the same.

These limits by Google, Bing, DDG, etc. are eliminating "discoverability" via searching the web. "General" searches that would yield more than 300 results will not return more than 300 results. Users collecting large numbers of results from a general search is effectively prohibited. Google's idea of discovery is "I'm feeling lucky". Of all the silly changes Google has made to search, the most useless feature persists.

Perhaps "broad" searches do not benefit an online ad services business as much as more specific searches do. Limiting total results also puts more pressure on websites to try to be listed within the first 300. (Solution: Buy ads from Google.) A website who is at position 301 is undiscoverable thanks to Google's inexplicable truncation. Interestingly, on some, perhaps all, of their different "UI's", Google no longer numbers results.

1. To illustrate the truncation, try a search for a common string that would appear in the <title> tag of more than 300 pages on the web. https://www.google.com/search?q=title:[common string]&num=100&filter=0

mgraczyk · 2021-04-18T23:38:43

Google uses search data to improve search and show you ads that you're more likely to click on. You can also use incognito/private mode to opt out of this.

Russia uses search data for things like applying pressure to political dissidents, blackmailing family members, harassing and attacking journalists etc. You can't easily opt out of this if you or your family is in Russia.

I consider the Google level of privacy completely acceptable, the Russian one not acceptable.

sellyme · 2021-04-19T00:59:26

> Surely this is not suggesting that Google searches afford any privacy. ;)

Depends on who you want privacy from.

Google's entire business model relies on them being the only company on the planet who knows as much about me (or you, or any other reader) as they do. I know that anything I do while using their services is going to be logged and analysed by machines like crazy for the rest of time, but I'm also reasonably confident that they're not going to let the interns page through this data whenever they get bored, and I'm very confident that they're going to do everything in their power to prevent my complete browser history from ending up for sale on some data breach forum.

1vuio0pswjnm7 · 2021-04-20T21:44:20

In some cases, I can get over 400 results with Google, but never more than 500.

RinTohsaka · 2021-04-20T14:44:02

Is there something I could self host that would perform aa well as yandex?

gwern · 2021-04-18T14:55:37

Should just link to https://gwern.net/Search - the URL works fine, and the IA version has various glitches like the link icons.

dang · 2021-04-18T19:14:28

Changed from https://web.archive.org/web/20210307110938/https://www.gwern.... Thanks!

traceroute66 · 2021-04-18T15:26:07

The IA version is also over a year out of date (2020-01-21 vs 2021-03-29)

forgotpwd16 · 2021-04-18T15:47:14

Comparing date archived and modification, this was recently updated. Considering the various features the site has, is there a way to compare with older versions (such as is possible in a wiki)?

gwern · 2021-04-19T13:41:44

No, sorry. Ever since I blew through the Github size limits, I haven't found a good static Hakyll-compatible way to present history, especially histories with lots of tiny edits. (As I keep experimenting with new approaches, I have to do lots of mass edits to old pages to update them.)