Ruby 抽取 HTML 文档中的所有 URL 地址

清华大佬耗费三个月吐血整理的几百G的资源,免费分享!....>>>

require 'uri'

text = %{"test
<a href="http://www.a.com/">http://www.a.com/</a>, and be sure
to check http://www.a.com/blog/. Email me at <a href="mailto:b@a.com">b@a.com</a>.}


END_CHARS = %{.,'?!:;}
p URI.extract(text, ['http']).collect { |u| END_CHARS.index(u[-1]) ? u.chop : u }