Golang HTML内容安全过滤与净化插件库bluemonday的使用

bluemonday是一个用Go语言实现的HTML净化器，它速度快且高度可配置。它接收不受信任的用户生成内容作为输入，并根据允许的HTML元素和属性列表返回经过净化的HTML，以便您可以安全地将内容包含在网页中。

基本用法

安装bluemonday：

go get github.com/microcosm-cc/bluemonday

基本使用示例：

package main

import (
    "fmt"

    "github.com/microcosm-cc/bluemonday"
)

func main() {
    // 为每个唯一的策略执行一次，并在程序生命周期内使用该策略
    // 策略创建/编辑不适用于多个goroutine
    p := bluemonday.UGCPolicy()

    // 然后可以使用该策略来净化大量输入，并且在多个goroutine中使用该策略是安全的
    html := p.Sanitize(
        `<a onblur="alert(secret)" href="http://www.google.com">Google</a>`,
    )

    // 输出:
    // <a href="http://www.google.com" rel="nofollow">Google</a>
    fmt.Println(html)
}

三种净化方式

bluemonday提供三种调用Sanitize的方式：

p.Sanitize(string) string
p.SanitizeBytes([]byte) []byte
p.SanitizeReader(io.Reader) bytes.Buffer

默认策略

bluemonday提供两种默认策略：

bluemonday.StrictPolicy() - 相当于剥离所有HTML元素及其属性，因为它没有任何允许列表
bluemonday.UGCPolicy() - 允许广泛选择对用户生成内容安全的HTML元素和属性

自定义策略构建

您可以构建自己的策略：

package main

import (
    "fmt"

    "github.com/microcosm-cc/bluemonday"
)

func main() {
    p := bluemonday.NewPolicy()

    // 要求URL可通过net/url.Parse解析，并且是mailto:、http://或https://
    p.AllowStandardURLs()

    // 我们只允许<p>和<a href="">
    p.AllowAttrs("href").OnElements("a")
    p.AllowElements("p")

    html := p.Sanitize(
        `<a onblur="alert(secret)" href="http://www.google.com">Google</a>`,
    )

    // 输出:
    // <a href="http://www.google.com">Google</a>
    fmt.Println(html)
}

添加元素到策略

添加元素到策略：

p.AllowElements("b", "strong")

或者使用正则表达式：

p.AllowElementsMatching(regex.MustCompile(`^my-element-`))

添加属性

添加属性到所有元素：

p.AllowAttrs("dir").Matching(regexp.MustCompile("(?i)rtl|ltr")).Globally()

添加属性到特定元素：

p.AllowAttrs("value").OnElements("li")

链接处理

链接是难以安全净化的内容之一，也是恶意内容的最大攻击媒介之一。

基本链接处理：

p.AllowAttrs("href").Matching(regexp.MustCompile(`(?i)mailto|https?`)).OnElements("a")

更安全的链接处理：

p.RequireParseableURLs(true)
p.AllowRelativeURLs(true)
p.AllowURLSchemes("mailto", "http", "https")
p.RequireNoFollowOnLinks(true)

数据URI

允许数据URI图像：

p.AllowDataURIImages()

策略构建辅助方法

bluemonday提供了一些辅助方法来简化策略构建：

// 全局允许"dir"、"id"、"lang"、"title"属性
p.AllowStandardAttributes()

// 允许"img"元素及其标准属性
p.AllowImages()

// 允许有序和无序列表，以及定义列表
p.AllowLists()

// 允许HTML表格及所有适用元素和非样式属性
p.AllowTables()

生产环境就绪性

bluemonday已在生产环境中使用，从广泛使用且经过大量现场测试的OWASP Java HTML Sanitizer迁移而来。它通过了广泛的测试套件（包括AntiSamy测试以及针对任何提出问题的测试）。

局限性

目前不包含任何帮助允许和净化CSS的工具。这意味着除非您希望在单个正则表达式中完成繁重的工作（不建议），否则您不应该在任何地方允许"style"属性。

同样，<script>和<style>被认为是有害的。默认情况下不会呈现这些元素（及其内容），并且需要您显式设置p.AllowUnsafe(true)。

更多关于golang HTML内容安全过滤与净化插件库bluemonday的使用的实战教程也可以访问 https://www.itying.com/category-94-b0.html

htzhanglong 1楼

更多关于golang HTML内容安全过滤与净化插件库bluemonday的使用的实战系列教程也可以访问 https://www.itying.com/category-94-b0.html

bluemonday - Go语言的HTML内容安全过滤库

bluemonday是一个用于Go语言的HTML内容安全过滤库，它可以帮助开发者净化用户输入的HTML内容，防止XSS(跨站脚本)攻击。下面我将详细介绍它的使用方法和示例代码。

基本特性

允许开发者定义允许的HTML元素和属性
移除所有不在白名单中的HTML标签和属性
支持CSS过滤
良好的性能表现
易于扩展和定制

安装

go get github.com/microcosm-cc/bluemonday

基本使用示例

简单净化

package main

import (
	"fmt"
	
	"github.com/microcosm-cc/bluemonday"
)

func main() {
	// 创建一个严格的策略，只允许文本内容
	p := bluemonday.StrictPolicy()
	
	// 示例HTML输入
	html := `<b>粗体</b>和<script>alert('xss')</script>`
	
	// 净化HTML
	sanitized := p.Sanitize(html)
	
	fmt.Println(sanitized)
	// 输出: 粗体和
}

宽松策略示例

package main

import (
	"fmt"
	
	"github.com/microcosm-cc/bluemonday"
)

func main() {
	// 创建一个宽松的策略
	p := bluemonday.UGCPolicy()
	
	// 示例HTML输入
	html := `
		<h1>标题</h1>
		<p>段落<a href="https://example.com" onclick="alert('xss')">链接</a></p>
		<script>alert('xss')</script>
	`
	
	// 净化HTML
	sanitized := p.Sanitize(html)
	
	fmt.Println(sanitized)
	// 输出: <h1>标题</h1>
	//       <p>段落<a href="https://example.com">链接</a></p>
}

自定义策略

package main

import (
	"fmt"
	
	"github.com/microcosm-cc/bluemonday"
)

func main() {
	// 创建自定义策略
	p := bluemonday.NewPolicy()
	
	// 允许基本标签
	p.AllowStandardAttributes()
	p.AllowElements("p", "b", "i", "u", "a")
	
	// 允许a标签的href属性，但限制协议
	p.AllowAttrs("href").OnElements("a")
	p.RequireParseableURLs(true)
	p.AllowURLSchemes("http", "https")
	
	// 示例HTML输入
	html := `
		<p>段落<b>加粗</b>和<i>斜体</i></p>
		<a href="https://example.com">安全链接</a>
		<a href="javascript:alert('xss')">危险链接</a>
	`
	
	// 净化HTML
	sanitized := p.Sanitize(html)
	
	fmt.Println(sanitized)
	// 输出: <p>段落<b>加粗</b>和<i>斜体</i></p>
	//       <a href="https://example.com">安全链接</a>
	//       危险链接
}

高级用法

允许特定CSS类

p := bluemonday.NewPolicy()
p.AllowElements("div", "span")
p.AllowAttrs("class").OnElements("div", "span")
p.AllowClasses(map[string][]string{
	"div":  {"container", "wrapper"},
	"span": {"highlight"},
})

处理HTML片段

p := bluemonday.UGCPolicy()
html := `<p>段落</p><iframe src="http://example.com"></iframe>`
sanitized := p.Sanitize(html)
// iframe会被移除

处理相对URL

p := bluemonday.UGCPolicy()
p.AllowRelativeURLs(true)
p.RequireParseableURLs(true)

性能考虑

bluemonday在设计时就考虑了性能问题，但如果你需要处理大量HTML内容，可以考虑以下优化：

重用Policy对象，不要每次处理都创建新的
对于相同的净化规则，使用单例Policy
在可能的情况下，对输入内容先进行长度检查

最佳实践

默认使用最严格的策略，只按需放宽
对用户生成内容始终进行净化
结合其他安全措施如CSRF防护一起使用
定期检查并更新库版本

总结

bluemonday是Go语言中处理HTML内容安全的强大工具，它提供了灵活的配置选项和良好的默认策略。通过合理使用，可以有效地防止XSS攻击，同时保留必要的HTML格式。

更多高级用法和详细文档可以参考官方GitHub仓库：github.com/microcosm-cc/bluemonday