I’ve had regular expressions on the mind lately. No idea why. But any time I see a string of text that clearly adheres to a standard (a serial, asset tag, etc.) I end up thinking about a regex that would validate it.
These two characters contain limitless power. They are every book ever written. Every thought that has ever been conjured. Every joy and every regret. Every victory and every defeat. Every whisper between lovers. They are you, me, and all of human existence. Let us propel them into deep space so one day some distant future civilization might witness them and know all that is Us. #regex
TIL about Python regex named capture groups and the groupdict() method of the Match class. So I guess I’ll be committing more regex crimes. #python#regex
Solved a *problem with #regex with more regex today.
*Not actually a problem with the regex itself, but one of unclear business requirements, but for anyone that said I'd regret the DNS regex I wrote a month later, I ate that soup today and it honestly wasn't bad.
The weird thing about Regex escaping is you only need to escape the first part of a valid Regex pair if you’re looking to use that character. For example, look at the following regex. #dotnet#regex
@profoundlynerdy#Perl and even more #Rakulang have unmatched #regex capabilities and #Unicode support for text processing. You can match very special text patterns, especially regarding multiple languages or scripts.
Example: find all numbers in a text in Arabic, Roman, Chinese or some other form but not in mixed scripts and with at least 3 graphemes (not characters, not codepoints, not bytes) @Perl
Reply to a similar #Vim video, see the video description for a link.
It's about reformatting strings of text via #regex interactively with a live preview of the result. visual-regexp.el to the rescue, this is quite nice in Emacs.
understanding regular expressions enough to get real work done with them is a double-edged sword: it’s a power verging on magic, but it also lures you down the time-suck/rabbit-hole. #regex#grep@bbedit
Mon N+1 : "J'aurais besoin de comprendre. Je t'avais transmis ce gros fichier de données toutes bordéliques régurgitées et tu as fourni un CSV tout propre classé et filtré, tu pourrais me passer le script que tu avais utilisé pour faire ça ?"
Moi : "Ah mais j'ai pas de script."
Lui : "Mais comment tu as fait ça ?"
Moi, tout fier : "C'est le pouvoir de la REGEX !"
J'adore les regex. Ça résout tout, les regex ! Tiens, je sais, je vais faire un parser HTML en regex !
The thing about coding with #regex is that it feels like I'm getting paid to do Sudoku puzzles for a living.
Tip for those who are asked to review code with regex: Rather than focusing on the regex itself, ask to see the automated tests that it is ran against and look for gaps in the tests rather than getting lost in the weeds with scrutinizing the regex itself unless there's an obvious significant performance problem.
@jgillich Saying you won't ever learn/use #regex is like saying "I refuse to learn how to drive a stick shift", which might get to you to pass a driver's test, but it's going to be awkward some day when you need to borrow a car to get somewhere and it's a stick and now you have to try to figure out then and there how not to burn out someone else's clutch.
@vwbusguy My advice is essentially the opposite. Focus on the #regex, at least to get started. Regexes are code. Just like any other programming language, you have to learn the syntax and practice a bit, but the same principles apply as with program code in general.
When reviewing code, start by reading it. If there's something unclear, ask about it. Don't accept a regex consisting of 100 characters in one line without a single space. Compared to most other languages, regex syntax is terse: Few (if any) keywords, lots of symbols. Divide complex regexes into simple parts that are assembled into bigger constructs. You probably wouldn't accept a patch that adds hundreds of lines of unfactored code that has complex logic and nested loops, but no indentation or whitespace and no functions, so why write your regexes this way?
If your language builds regexes from strings, use string concatenation, formatting/indentation, comments, and named variables to make the structure of the pattern clear. If your language has the /x modifier, use it to allow sensible formatting and comments right in the regex (remember to escape with `` or [ ] any spaces that should match literally). If your language supports (?(DEFINE)...) and the (?&foo) syntax for named "regex subroutines", consider using it (but also consider restructuring your code: it might be trying to do too much in a single regex).
Once you understand the structure of the regex and how it is meant to work, it becomes much easier to review the tests: Are there any? Do they cover every input variant, exercising all parts of the regex, both matching and failing? (Failing matches are also relevant for finding performance issues: If a regex finds a match, it usually does so quickly. But a regex with exponential backtracking can take forever to fail because it'll try a huge number of variations before giving up on a string that doesn't match.)
There is an infamous regex for RFC 822 email addresses out there on the internet[1]. It is thousands of characters long and utterly incomprehensible. However, it was not written manually: It is essentially "object code", assembled by commented code using string concatenation from named variables that follow the structure of the BNF grammar in the RFC. Strive for the latter, not the former.
These are the two lines of code (minus some constants and file operations) that saved me a while lot of tediousness, today... #ruby#regex#programming#Jenkins
It ended up being a bit more involved than my post earlier, but was able to script it to automate the patching and PRs. It looks terrible, but I'm pretty happy with the end result.
It’s nothing to do with #Perl and everything to do with shitty #regex possible in almost any #programming language.
Though it’s true that Perl’s reputation took a lot of damage from shitty developers filling the world with shitty #regexes in their shitty Perl code. So you’re in a big group, albeit via #Ruby.
"I hate #regex, but I think this worked fine. I used #regexxer, a helper to find and replace stuff on multiple files, for those [of us] less well versed with the traditional CLI regex workflow."
Any other tips for user friendly find-and-replace tools?
In addition to being an open source client-side RSS reader with an algo-driven timeline that keeps your votes and searches on your device, https://myrssalgo.org just added a bunch of RegEx¹ functionality.
Yes, there’s a RegEx-enabled search, but wait for it… now you can use RegEx to promote and mute content in your timeline—the dream is real!