scy, to Bash
@scy@chaos.social avatar

Friendly reminder that

#!/bin/sh
set -e

is better than

#!/bin/sh -e

because the latter stops working when you do "sh script.sh" instead of "./script.sh".

scy, to Bash
@scy@chaos.social avatar

TIL that you can

. <(some_command)

in bash to read bash-formatted variable assignments into the current environment. In other words, the dot ("source") command supports reading from process substitution.

some_command | . /dev/stdin

on the other hand does not work, I guess because it's running in a subshell…?

Replace some_command with something like echo foo=bar if you don't quite understand what I mean.

jason, to linux
@jason@toots.dgplug.org avatar

Just got done reading Command Line Interface Guidelines, by Parish, Prasad, Fishman and Tashian.

Very pragmatic, very well written.

As someone who is interested in writing CLI programs, this was extremely helpful. Well worth the read.

Read it here: https://clig.dev/

h/t to @a13cui for the recommendation.

#CLI #Linux #ShellScripting

remixtures, to ai Portuguese
@remixtures@tldr.nettime.org avatar

: "AI tools empower technical writers with scripting capabilities, whether it be shell scripts, Python scripts, CLIs available at your work, or more. In particular, shell scripting can help you automate parts of your build process that are tedious, making it easier to push docs through advanced build and publish processes. In a world of doc ops, where continuous builds and publishing are becoming the norm, tech writers need as much automation as possible with these processes."

https://idratherbewriting.com/learnapidoc/ai-tools-build-publish-api-docs.html

dredmorbius, to random

Hacker News front-page analytics

A question about what states were most-frequently represented on the HN homepage had me do some quick querying via Hacker News's Algolia search ... which is NOT limited to the front page. Those results were ... surprising (Maine and Iowa outstrip the more probable results of California and, say, New York). Results are further confounded by other factors.

Thread: https://news.ycombinator.com/item?id=36076870

HN provides an interface to historical front-page stories (https://news.ycombinator.com/front), and that can be crawled by providing a list of corresponding date specifications, e.g.:

https://news.ycombinator.com/front?day=2023-05-25<br></br>

Easy enough.

So I'm crawling that and compiling a local archive. Rate-limiting and other factors mean that's only about halfway complete, and a full pull will take another day or so.

But I'll be able to look at story titles, sites, submitters, time-based patterns (day of week, day of month, month of year, yearly variations), and other patterns. There's also looking at mean points and comments by various dimensions.

Among surprises are that as of January 2015, among the highest consistently-voted sites is The Guardian. I'd thought HN leaned consistently less liberal.

The full archive will probably be < 1 GB (raw HTML), currently 123 MB on disk.

Contents are the 30 top-voted stories for each day since 20 February 2007.

If anyone has suggestions for other questions to ask of this, fire away.

And, as of early 2015, top state mentions are:

 1. new york:         150<br></br> 2. california:       101<br></br> 3. texas:             39<br></br> 4. washington:        38<br></br> 5. colorado:          15<br></br> 6. florida:           10<br></br> 7. georgia:           10<br></br> 8. kansas:            10<br></br> 9. north carolina:     9<br></br>10. oregon:             9<br></br>

NY is highly overrepresented (NY Times, NY Post, NY City), likewise Washington (Post, Times, DC). Adding in "Silicon Valley" and a few other toponyms boosts California's score markedly. I've also got some city-based analytics.

#hn #hackernews #data #DataAnalysis #WebCrawling

dredmorbius,

I'm wanting to test some reporting / queries / logic based on a sampling of data.

Since my file-naming convention follows ISO-8601 (YYYY-MM-DD), I can just lexically sort those.

And to grab a random year's worth (365 days) of reports from across the set:

ls rendered-crawl/* | sort -R | head -365 | sort<br></br>

(I've rendered the pages, using w3m's -dump feature, to speed processing).

The full dataset is large enough and my awk code sloppy enough (several large sequential lists used in pattern-matching) that a full parse takes about 10 minutes, so the sampling shown here speeds development better than 10x while still providing representative data across time.

dredmorbius,

HN Front Page / Global Cities Mentions

One question I've had about HN is how well or poorly it represents non-US (or even non-Silicon Valley) viewpoints and issues.

Pulling from the Globalization and World Cities Research Network list, the top 50 global cities names appearing in HN front-page titles:

  1   191  San Francisco<br></br>  2   164  London<br></br>  3   117  Boston<br></br>  4    86  Seattle<br></br>  5    60  Tokyo<br></br>  6    58  Paris<br></br>  7    56  Chicago<br></br>  8    56  Hong Kong<br></br>  9    55  New York City<br></br> 10    50  Berlin<br></br> 11    50  Phoenix<br></br> 12    45  Rome<br></br> 13    40  Detroit<br></br> 14    36  Singapore<br></br> 15    31  Vancouver<br></br> 16    30  Los Angeles<br></br> 17    27  Austin<br></br> 18    23  Beijing<br></br> 19    20  Dubai<br></br> 20    19  Shenzhen<br></br> 21    19  Toronto<br></br> 22    17  Amsterdam<br></br> 23    16  Copenhagen<br></br> 24    16  Houston<br></br> 25    16  Moscow<br></br> 26    15  Atlanta<br></br> 27    14  Barcelona<br></br> 28    14  Denver<br></br> 29    13  Baltimore<br></br> 30    13  San Jose<br></br> 31    13  Stockholm<br></br> 32    12  San Diego<br></br> 33    12  Sydney<br></br> 34    11  Cairo<br></br> 35    10  Munich<br></br> 36    10  Wuhan<br></br> 37     9  Helsinki<br></br> 38     9  Miami<br></br> 39     9  Mumbai<br></br> 40     9  Philadelphia<br></br> 41     9  Shanghai<br></br> 42     9  Vienna<br></br> 43     8  Montreal<br></br> 44     7  Beirut<br></br> 45     7  Dublin<br></br> 46     7  Istanbul<br></br> 47     6  Bangalore<br></br> 48     6  Dallas<br></br> 49     6  Kansas City<br></br> 50     6  Minneapolis<br></br>

(Best viewed in original on toot.cat.)

Note that some idiosyncrasies affect this, e.g., "New York City" appears rarely, whilst "New York" may refer to the city, state, any of several newspapers, universities, etc. "New York" appears 315 times in titles (mostly as "New York Times").

I've independently verified that, for example, "Ho Chi Minh City" doesn't appear, though "Ho Chi Minh" alone does:

https://news.ycombinator.com/item?id=15374051, on the 2017-9-30 front page: https://news.ycombinator.com/front?day=2017-09-30

So apply salt liberally.

Edits: tyops & speling.

  • All
  • Subscribed
  • Moderated
  • Favorites
  • anitta
  • kavyap
  • DreamBathrooms
  • InstantRegret
  • magazineikmin
  • thenastyranch
  • ethstaker
  • Youngstown
  • vwfavf
  • slotface
  • rosin
  • ngwrru68w68
  • khanakhh
  • PowerRangers
  • provamag3
  • Durango
  • everett
  • mdbf
  • modclub
  • cisconetworking
  • osvaldo12
  • GTA5RPClips
  • tacticalgear
  • cubers
  • normalnudes
  • Leos
  • tester
  • megavids
  • All magazines