Golang实现基于关键词的文本过滤

Golang实现基于关键词的文本过滤 大家好,

我正在尝试使用Golang根据关键字过滤文本。基本上,我执行的是以下代码:

package main

import (
	"fmt"
	"os/exec"
)

func branches() {
	out, err := exec.Command("git", "branch", "-a", "--sort=-committerdate", "--column", "--format='%(committerdate)%09%(refname:short)'").Output()

	if err != nil {
		// log.Fatal(err)
		fmt.Println("Couldn't find any branches in the repository")
	}

	str1 := string(out)

	fmt.Println(str1)
}

func main() {
	fmt.Printf("1. CHECK ALL BRANCHES: ")
	branches()
}

并得到以下输出:

go run main.go
1. CHECK ALL BRANCHES: 'Mon Oct 3 12:20:53 2022 +0000	master'
'Mon Oct 3 12:20:53 2022 +0000	origin/HEAD'
'Mon Oct 3 12:20:53 2022 +0000	origin/master'
'Mon Oct 3 12:12:01 2022 +0000	origin/release/v1'
'Wed Apr 27 06:26:22 2022 +0000	origin/release/v2'
'Tue Feb 15 14:46:55 2022 +0000	origin/release/v3'
'Mon May 24 16:05:45 2021 +0300	origin/release-v1'
'Tue Oct 6 14:43:56 2020 +0300	origin/release-v1.0.0'

目标是获取所有分支时间早于2022年的行,即2021年、2020年、2019年等(年份 = 关键字,如果这有助于实现主要目标),并在命令行/终端中显示这些行。

也许有人能建议如何实现这一点?


更多关于Golang实现基于关键词的文本过滤的实战教程也可以访问 https://www.itying.com/category-94-b0.html

8 回复

难道 v 不包含这个吗?

更多关于Golang实现基于关键词的文本过滤的实战系列教程也可以访问 https://www.itying.com/category-94-b0.html


你好 @mje

感谢你提供示例的回复!这帮助解决了 if/else 语句的一个问题。

对于如何获得预期的“分支输出”,你有什么想法吗?

go run main.go 检查所有分支: 发现一些过时的分支: ’Mon May 24 16:05:45 2021 +0300 origin/release-v1’ ’Tue Oct 6 14:43:56 2020 +0300 origin/release-v1.0.0’

请改为将 out 分割成多行。将你的目标年份放入一个 map[string]bool 中。使用正则表达式从每一行中提取年份(regexp package - regexp - Go Packages)。检查 map 是否包含提取到的年份作为键,如果是,则打印一条关于找到过时年份的消息,并将一个布尔变量设置为 true。如果在循环结束后该变量仍为 false,则打印你的成功消息。

func main() {
    fmt.Println("hello world")
}

如果没有找到任何内容

		for _, tz := range tst {
			if _, exists := mapper[tz]; exists {

				fmt.Printf("Found outdated branch, year: %s \n", tz)
			} else {
				fmt.Printf("Passed")
			}
		}

应该是

                allGood := true
		for _, tz := range tst {
			if _, exists := mapper[tz]; exists {

				fmt.Printf("Found outdated branch, year: %s \n", tz)
                                allGood = false
			} 
		}
                if allGood {

				fmt.Printf("Passed")

		}

我找到了一种使用“嵌套 for 循环”来解决此问题的方法。

首先,我将 out 变量转换为字符串,然后将其分割成字段:

str1 := string(out)

s := strings.Fields(str1)

第二步是创建一个包含我感兴趣的日期的字符串数组:

	var strarray [6]string
	strarray[0] = "2016"
	strarray[1] = "2017"
	strarray[2] = "2018"
	strarray[3] = "2019"
	strarray[4] = "2020"
	strarray[5] = "2021"

最后,使用嵌套 for 循环:

for _, v := range s {
		for _, word := range strarray {
			if word == v {
				fmt.Println("Found some outdated branches: ", v)
			}
		}
	}

它可以工作,但输出结果并非我所期望的:

go run main.go
CHECK ALL BRANCHES:
Found some outdated branches:  2021
Found some outdated branches:  2020

我想知道是否有可能输出包含在 git branch 输出中找到的“关键字”的整行,例如:

go run main.go
CHECK ALL BRANCHES:
Found some outdated branches: 
'Mon May 24 16:05:45 2021 +0300	origin/release-v1'
'Tue Oct 6 14:43:56 2020 +0300	origin/release-v1.0.0'

如果什么都没找到:

go run main.go
CHECK ALL BRANCHES: OK!

有什么建议吗?

提前感谢!

是的,你说得对。但是,如果我在 fmt.Printf("Found outdated branch, year: %s \n", v) 这里指定 v,它会显示如下输出:

3. CHECK ALL BRANCHES:
Found outdated branch: 'Mon Oct 3 12:20:53 2022 +0000	master'
'Mon Oct 3 12:20:53 2022 +0000	origin/HEAD'
'Mon Oct 3 12:20:53 2022 +0000	origin/master'
'Mon Oct 3 12:12:01 2022 +0000	origin/release/v1.4'
'Fri Sep 30 12:00:51 2022 +0000	origin/development'
'Wed Apr 27 06:26:22 2022 +0000	origin/release/v1.3'
'Tue Feb 15 14:46:55 2022 +0000	origin/release/v1.2'
'Mon May 24 16:05:45 2021 +0300	origin/release-v1.1'
'Tue Oct 6 14:43:56 2020 +0300	origin/release-v1.0.0'

Found outdated branch: 'Mon Oct 3 12:20:53 2022 +0000	master'
'Mon Oct 3 12:20:53 2022 +0000	origin/HEAD'
'Mon Oct 3 12:20:53 2022 +0000	origin/master'
'Mon Oct 3 12:12:01 2022 +0000	origin/release/v1.4'
'Fri Sep 30 12:00:51 2022 +0000	origin/development'
'Wed Apr 27 06:26:22 2022 +0000	origin/release/v1.3'
'Tue Feb 15 14:46:55 2022 +0000	origin/release/v1.2'
'Mon May 24 16:05:45 2021 +0300	origin/release-v1.1'
'Tue Oct 6 14:43:56 2020 +0300	origin/release-v1.0.0'

你好 @mje

感谢你的回复!我已经尝试应用你的建议,并且成功地获取到了过期的分支:

    str1 := string(out)

	temp := strings.Split(str1, `\n`)

	// Create a map with years
	var mapper = map[string]bool{
		"2016": true,
		"2017": true,
		"2018": true,
		"2019": true,
		"2020": true,
		"2021": true,
	}

	// Check whether the map contains the extracted year as a key
	for _, v := range temp {

		var notvalidID = regexp.MustCompile(`202([0-1])`)
		var notvalidID2 = regexp.MustCompile(`201([0-9])`)

		tsts := notvalidID2.FindAllString(v, -1)
		tst := notvalidID.FindAllString(v, -1)

		for _, tz := range tst {
			if _, exists := mapper[tz]; exists {

				fmt.Printf("Found outdated branch, year: %s \n", tz)
			} else {
				fmt.Printf("Passed")
			}
		}

		for _, tz := range tsts {
			if _, exists := mapper[tz]; exists {

				fmt.Printf("Found outdated branch, year: %s \n", tz)
			} else {
				fmt.Printf("Passed")
			}
		}
	}

输出:

CHECK ALL BRANCHES:
Found outdated branch, year: 2019
Found outdated branch, year: 2018
Found outdated branch, year: 2018

除了上面的输出,是否有可能打印出找到键(年份)的确切行?像这样:

go run main.go
CHECK ALL BRANCHES:
Found some outdated branches:
'Mon May 24 16:05:45 2021 +0300	origin/release-v1'
'Tue Oct 6 14:43:56 2020 +0300	origin/release-v1.0.0'

我担心的第二点是,它没有输出“else”语句的任务。也就是说,如果没有找到任何内容,应该显示“Passed”,但实际并没有:

go run main.go
CHECK ALL BRANCHES:  //如果没有找到任何内容,这里应该显示"Passed"

提前感谢!

要实现基于关键词(年份)的文本过滤,可以解析git输出,提取日期信息并进行比较。以下是修改后的代码:

package main

import (
	"fmt"
	"os/exec"
	"strconv"
	"strings"
	"time"
)

func filterBranchesByYear(output string, targetYear int) []string {
	var filtered []string
	lines := strings.Split(output, "\n")
	
	for _, line := range lines {
		if line == "" {
			continue
		}
		
		// 移除单引号并分割日期和分支名
		line = strings.Trim(line, "'")
		parts := strings.Split(line, "\t")
		if len(parts) < 2 {
			continue
		}
		
		// 解析日期字符串
		dateStr := parts[0]
		date, err := time.Parse("Mon Jan 2 15:04:05 2006 -0700", dateStr)
		if err != nil {
			// 尝试另一种格式(没有时区信息的情况)
			date, err = time.Parse("Mon Jan 2 15:04:05 2006", dateStr)
			if err != nil {
				continue
			}
		}
		
		// 检查年份是否早于目标年份
		if date.Year() < targetYear {
			filtered = append(filtered, line)
		}
	}
	
	return filtered
}

func branches() {
	out, err := exec.Command("git", "branch", "-a", "--sort=-committerdate", "--column", "--format='%(committerdate)%09%(refname:short)'").Output()

	if err != nil {
		fmt.Println("Couldn't find any branches in the repository")
		return
	}

	str1 := string(out)
	
	// 过滤早于2022年的分支
	filtered := filterBranchesByYear(str1, 2022)
	
	fmt.Println("Branches older than 2022:")
	for _, branch := range filtered {
		fmt.Println(branch)
	}
}

func main() {
	fmt.Printf("1. CHECK ALL BRANCHES: ")
	branches()
}

或者,如果你想要更通用的关键词过滤功能,这里有一个基于关键词列表的版本:

package main

import (
	"fmt"
	"os/exec"
	"strings"
)

func filterByKeywords(output string, keywords []string) []string {
	var filtered []string
	lines := strings.Split(output, "\n")
	
	for _, line := range lines {
		if line == "" {
			continue
		}
		
		// 检查是否包含任何关键词
		for _, keyword := range keywords {
			if strings.Contains(line, keyword) {
				filtered = append(filtered, line)
				break
			}
		}
	}
	
	return filtered
}

func branches() {
	out, err := exec.Command("git", "branch", "-a", "--sort=-committerdate", "--column", "--format='%(committerdate)%09%(refname:short)'").Output()

	if err != nil {
		fmt.Println("Couldn't find any branches in the repository")
		return
	}

	str1 := string(out)
	
	// 使用年份作为关键词过滤
	keywords := []string{"2021", "2020", "2019", "2018", "2017"}
	filtered := filterByKeywords(str1, keywords)
	
	fmt.Println("Branches with specified years:")
	for _, branch := range filtered {
		fmt.Println(branch)
	}
}

func main() {
	fmt.Printf("1. CHECK ALL BRANCHES: ")
	branches()
}

第一个版本使用时间解析进行精确的年份比较,第二个版本使用简单的字符串包含检查。根据你的具体需求选择合适的方法。

回到顶部