字符串解析与有限状态自动机

题目描述

请实现一个函数用来判断字符串是否表示数值(包括整数和小数)。例如,字符串”+100”、”5e2”、”-123”、”3.1416”、”-1E-16”、”0123”都表示数值,但”12e”、”1a3.14”、”1.2.3”、”+-5”及”12e+5.4”都不是。

来源:力扣(LeetCode)
链接:https://leetcode-cn.com/problems/biao-shi-shu-zhi-de-zi-fu-chuan-lcof

有限状态自动机(DFA)

一般这种字符串parser的问题,在编译原理范围内可以用有限状态自动机来作匹配。

DFA的问题,算法通常不是大问题。关键在于针对不同的问题构建合适的状态自动机。其中各个状态之间的关系非常重要。一般来说这部分的设计有点繁杂,但有迹可循。
详细的构建过程在编译原理相关的课程中都会涉猎到,这里不在赘述。感兴趣的同学可以参考这个视频How to construct a DFA in Automata

该题的DFA状态图如下:

DFS state graph

图来自leetcode

题解代码

DFA状态图也就是有向图,我们可以通过一个map来保存一个vertex的边信息和shift的char。同样地,我们使用map来保存所有vertex和边的联系。其实也可以通过array来实现,大同小异。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
class Solution {
public boolean isNumber(String s) {
//这题使用有限自动状态机来做
Map<State,Map<CharType, State>> transferMap = new HashMap<>();

//Init
Map<CharType, State> s0 = new HashMap<>(){{
put(CharType.CHAR_SPACE, State.STATE_START);
put(CharType.CHAR_SIGN, State.STATE_INT_SIGN);
put(CharType.CHAR_NUMBER, State.STATE_INTEGER);
put(CharType.CHAR_POINT, State.STATE_POINT_WITHOUT_INT);
}};
transferMap.put(State.STATE_START, s0);

Map<CharType, State> s1 = new HashMap<>(){{
put(CharType.CHAR_NUMBER, State.STATE_INTEGER);
put(CharType.CHAR_POINT, State.STATE_POINT_WITHOUT_INT);
}};
transferMap.put(State.STATE_INT_SIGN, s1);

Map<CharType, State> s2 = new HashMap<>(){{
put(CharType.CHAR_SPACE, State.STATE_END);
put(CharType.CHAR_EXP, State.STATE_EXP);
put(CharType.CHAR_NUMBER, State.STATE_INTEGER);
put(CharType.CHAR_POINT, State.STATE_POINT);
}};
transferMap.put(State.STATE_INTEGER, s2);

Map<CharType, State> s3 = new HashMap<>(){{
put(CharType.CHAR_NUMBER, State.STATE_FRACTION);
}};
transferMap.put(State.STATE_POINT_WITHOUT_INT, s3);

Map<CharType, State> s4 = new HashMap<>(){{
put(CharType.CHAR_SPACE, State.STATE_END);
put(CharType.CHAR_EXP, State.STATE_EXP);
put(CharType.CHAR_NUMBER, State.STATE_FRACTION);
}};
transferMap.put(State.STATE_POINT, s4);

Map<CharType, State> s5 = new HashMap<>(){{
put(CharType.CHAR_SPACE, State.STATE_END);
put(CharType.CHAR_EXP, State.STATE_EXP);
put(CharType.CHAR_NUMBER, State.STATE_FRACTION);
}};
transferMap.put(State.STATE_FRACTION, s5);

Map<CharType, State> s6 = new HashMap<>(){{
put(CharType.CHAR_SIGN, State.STATE_EXP_SIGN);
put(CharType.CHAR_NUMBER, State.STATE_EXP_NUM);
}};
transferMap.put(State.STATE_EXP, s6);

Map<CharType, State> s7 = new HashMap<>(){{
put(CharType.CHAR_NUMBER, State.STATE_EXP_NUM);
}};
transferMap.put(State.STATE_EXP_SIGN, s7);

Map<CharType, State> s8 = new HashMap<>(){{
put(CharType.CHAR_SPACE, State.STATE_END);
put(CharType.CHAR_NUMBER, State.STATE_EXP_NUM);
}};
transferMap.put(State.STATE_EXP_NUM, s8);

Map<CharType, State> s9 = new HashMap<>(){{
put(CharType.CHAR_SPACE, State.STATE_END);
}};
transferMap.put(State.STATE_END, s9);

var state = State.STATE_START;
var cc = s.toCharArray();

//For automata

for (int i = 0; i < cc.length; i++) {
var t = helper(cc[i]);
if(!transferMap.get(state).containsKey(t)){
return false;
}
state = transferMap.get(state).get(t);
}
return state==State.STATE_END||state==State.STATE_INTEGER||state==State.STATE_POINT||state==State.STATE_EXP_NUM||state==State.STATE_FRACTION;

}
private CharType helper(char ch){
if(ch>='0'&&ch<='9'){
return CharType.CHAR_NUMBER;
}else if(ch=='e'||ch=='E'){
return CharType.CHAR_EXP;
}else if(ch=='.'){
return CharType.CHAR_POINT;
}else if(ch=='+'||ch=='-'){
return CharType.CHAR_SIGN;
}else if(ch==' '){
return CharType.CHAR_SPACE;
}else{
return CharType.CHAR_ILLEGAL;
}
}

enum State{
STATE_START,
STATE_INT_SIGN,
STATE_INTEGER,
STATE_POINT,
STATE_POINT_WITHOUT_INT,
STATE_FRACTION,
STATE_EXP,
STATE_EXP_SIGN,
STATE_EXP_NUM,
STATE_END
}

enum CharType{
CHAR_NUMBER,
CHAR_EXP,
CHAR_POINT,
CHAR_SIGN,
CHAR_SPACE,
CHAR_ILLEGAL
}
}

代码看起来很多,但大部分都是DFS状态图的构建代码。实际自动机的部分代码实现很少。


本博客所有文章除特别声明外,均采用 CC BY-SA 4.0 协议 ,转载请注明出处!