Flutter机器学习数据预处理插件ml_preprocessing的使用

简介

ml_preprocessing 是一个用于数据预处理的 Dart 库，旨在帮助数据科学家在使用机器学习算法之前对数据进行预处理。数据预处理包括将字符串值转换为数值、处理缺失值等操作。

什么是数据预处理？

数据预处理是一系列用于数据准备的技术，以便在机器学习算法中使用这些数据。例如，你有一个包含性别、国家、身高、体重和糖尿病状态的数据集：

----------------------------------------------------------------------------------------
| Gender | Country | Height (cm) | Weight (kg) | Diabetes (1 - Positive, 0 - Negative) |
----------------------------------------------------------------------------------------
| Female | France  |     165     |     55      |                    1                  |
----------------------------------------------------------------------------------------
| Female | Spain   |     155     |     50      |                    0                  |
----------------------------------------------------------------------------------------
| Male   | Spain   |     175     |     75      |                    0                  |
----------------------------------------------------------------------------------------
| Male   | Russia  |     173     |     77      |                   N/A                 |
----------------------------------------------------------------------------------------

在这个数据集中，有些列包含字符串值（如 Gender 和 Country），而 Diabetes 列中有一个缺失值。为了在数学方程中使用这些数据，我们需要将它们转换为有效的数值表示形式。这就是数据预处理的作用。

为什么需要数据预处理？

数据预处理可以帮助我们解决以下问题：

将字符串值（分类数据）转换为数值。
处理缺失值。
对数值数据进行归一化或标准化。

前提条件

ml_preprocessing 依赖于 ml_dataframe 库中的 DataFrame 类。因此，在你的项目中需要添加 ml_dataframe 作为依赖项。你可以在 pubspec.yaml 文件中添加以下内容：

dependencies:
  ...
  ml_dataframe: ^1.0.0
  ml_preprocessing: ^1.0.0
  ...

使用示例

入门

假设我们从 Kaggle 下载了一个名为 “Black Friday” 的数据集。这是一个非常有趣的数据集，包含大量观测值（约 538000 行）和多个分类特征。

首先，导入所有必要的库：

import 'package:ml_dataframe/ml_dataframe.dart';
import 'package:ml_preprocessing/ml_preprocessing.dart';

然后，读取 CSV 文件并创建一个数据帧：

final dataFrame = await fromCsv('example/black_friday/black_friday.csv', 
  columns: [2, 3, 5, 6, 7, 11]);

分类数据

分析数据集并决定哪些特征需要编码。在我们的例子中，这些特征是：

final featureNames = ['Gender', 'Age', 'City_Category', 'Stay_In_Current_City_Years', 'Marital_Status'];

One-Hot 编码

创建并拟合 One-Hot 编码器：

final encoder = Encoder.oneHot(
  dataFrame,
  columnNames: featureNames,
);

final encoded = encoder.process(dataFrame);

final data = encoded.toMatrix();

print(data);

Label 编码

创建并拟合 Label 编码器：

final encoder = Encoder.label(
  dataFrame,
  columnNames: featureNames,
);

final encoded = encoder.process(dataFrame);

数值数据归一化

使用 Normalizer 类对数值数据进行归一化：

final normalizer = Normalizer(); // 默认使用欧几里得范数
final transformed = normalizer.process(dataFrame);

请注意，如果数据中包含原始分类值，归一化将失败，因为归一化需要数值数据。在这种情况下，你应该先对数据进行编码（例如使用 One-Hot 编码）。

数据标准化

使用 Standardizer 类对数据进行标准化：

final dataFrame = DataFrame([
  [  1,   2,   3],
  [ 10,  20,  30],
  [100, 200, 300],
], headerExists: false);

final standardizer = Standardizer(dataFrame);

final transformed = standardizer.process(dataFrame);

管道

使用 Pipeline 类组织一系列数据预处理操作：

final pipeline = Pipeline(dataFrame, [
  toOneHotLabels(columnNames: ['Gender', 'Age', 'City_Category']),
  toIntegerLabels(columnNames: ['Stay_In_Current_City_Years', 'Marital_Status']),
  normalize(),
  standardize(),
]);

final processed = pipeline.process(dataFrame);

完整示例

以下是一个完整的示例代码，展示了如何使用 ml_preprocessing 库进行数据预处理：

import 'package:ml_dataframe/ml_dataframe.dart';
import 'package:ml_preprocessing/ml_preprocessing.dart';

Future main() async {
  final dataFrame = await fromCsv('example/dataset.csv', columns: [0, 1, 2, 3]);

  final pipeline = Pipeline(dataFrame, [
    toOneHotLabels(
      columnNames: ['position'],
      headerPostfix: '_position',
    ),
    toIntegerLabels(
      columnNames: ['country'],
    ),
  ]);

  print(pipeline.process(dataFrame).toMatrix());
}

通过以上步骤，你可以轻松地对数据进行预处理，使其适合机器学习算法的输入。希望这些示例对你有所帮助！

更多关于Flutter机器学习数据预处理插件ml_preprocessing的使用的实战系列教程也可以访问 https://www.itying.com/category-92-b0.html

itying888 1楼•1 天前

更多关于Flutter机器学习数据预处理插件ml_preprocessing的使用的实战系列教程也可以访问 https://www.itying.com/category-92-b0.html

当然，以下是一个关于如何使用Flutter的ml_preprocessing插件进行数据预处理的代码示例。ml_preprocessing插件提供了一系列数据预处理功能，如标准化、归一化、独热编码等，这对于机器学习模型的数据准备非常有用。

首先，确保你的Flutter项目中已经添加了ml_preprocessing依赖。在pubspec.yaml文件中添加以下依赖：

dependencies:
  flutter:
    sdk: flutter
  ml_preprocessing: ^最新版本号  # 请替换为当前最新版本号

然后，运行flutter pub get来安装依赖。

接下来是一个简单的代码示例，演示如何使用ml_preprocessing进行数据预处理：

import 'package:flutter/material.dart';
import 'package:ml_preprocessing/ml_preprocessing.dart';

void main() {
  runApp(MyApp());
}

class MyApp extends StatefulWidget {
  @override
  _MyAppState createState() => _MyAppState();
}

class _MyAppState extends State<MyApp> {
  @override
  Widget build(BuildContext context) {
    return MaterialApp(
      home: Scaffold(
        appBar: AppBar(
          title: Text('ml_preprocessing Example'),
        ),
        body: Center(
          child: ElevatedButton(
            onPressed: () async {
              // 示例数据
              List<List<double>> data = [
                [1.0, 2.0],
                [3.0, 4.0],
                [5.0, 6.0],
              ];

              // 标准化（Z-score标准化）
              var standardScaler = StandardScaler();
              var standardizedData = await standardScaler.fitTransform(data);

              // 打印标准化后的数据
              print('Standardized Data: $standardizedData');

              // 归一化（Min-Max归一化）
              var minMaxScaler = MinMaxScaler();
              var normalizedData = await minMaxScaler.fitTransform(data);

              // 打印归一化后的数据
              print('Normalized Data: $normalizedData');

              // 独热编码（针对分类数据）
              List<List<int>> categoricalData = [
                [0],
                [1],
                [2],
                [1],
              ];
              var oneHotEncoder = OneHotEncoder();
              var oneHotEncodedData = await oneHotEncoder.fitTransform(categoricalData);

              // 打印独热编码后的数据
              print('One-Hot Encoded Data: $oneHotEncodedData');
            },
            child: Text('Preprocess Data'),
          ),
        ),
      ),
    );
  }
}

在这个示例中，我们展示了如何使用StandardScaler进行标准化，MinMaxScaler进行归一化，以及OneHotEncoder进行独热编码。每个预处理步骤都包括fit和transform方法，fit方法用于计算预处理所需的参数（如均值和标准差），而transform方法则应用这些参数来转换数据。

请注意，由于ml_preprocessing中的方法可能是异步的（尤其是涉及到大量数据时），因此我们使用了async和await关键字来确保数据预处理操作正确完成。

这个示例提供了一个基础框架，你可以根据自己的需求扩展和修改它，以适应不同的数据预处理场景。