Meeting Date: Tue Mar 10 2026
Source: Google Doc Link
Summary
Discussions with CommOps Manager and Drongbu Lobsang confirmed moving forward with text boundary detection utilizing Gemini via a dual testing approach.
Confirm Text Boundary Detection
The necessity of moving forward with text boundary detection as a precision step for the outliner tool was confirmed. The current segmentation feature is using the AI button and integrating Gemini.
Prioritize Human-Segmented Data
Testing must include human-segmented data, particularly real non-synthetic data, to account for potential differences in OCR output. Drongbu Lobsang was advised to begin obtaining this human-segmented data.
Synthetic vs. Real Data Testing
The potential for rule-based scripts or models to fail on dirty OCR output, despite performing well on clean synthetic data, was highlighted. A dual testing approach using both synthetic and real data is required to ensure model reliability.
Suggested Next Steps
- drongbu lobsang will remind CommOps Manager to move the discussion about boundaries forward.
- drongbu lobsang will remind CommOps Manager to move the discussion about boundaries forward.
- drongbu lobsang will remind CommOps Manager to move the discussion about boundaries forward.
- drongbu lobsang will remind CommOps Manager to move the discussion about boundaries forward.
- drongbu lobsang will remind CommOps Manager to move the discussion about boundaries forward.
Meeting Transcript
Click to expand full transcript
Mar 10, 2026
Test automation - Transcript
00:00:00
drongbu lobsang: Yeah. Okay. So, if you Yeah, you can like uh do some of it yourself. Remind him that we Yeah, we need to move this forward. Okay. Um what's the next thing? Publication publish text and public public report and findings integrate as precision step in the primary tool. Uh so what is the primary tool text boundary detection? So this is this is basically the uh text boundary detection for the outliner when you load a text 3.4. Yeah. So right now you have a button you have an AI button for this for the segmentation. But what does it do? Do you have actually do you run a model behind or what they do Gemini? I see. So so you are already you know using Gemini. So basically uh I think the testing should be done on either like humans uh segmented data or synthetic data. Yeah. So I I think actually with Emma you should start to get some human segmented data on the actual data. No. So, so, uh, in in the two previous cards, I think it would be good to use, you know, to test not only synthetic data, but also like, you know, human, I mean, uh, real data because it's OCR output is going to be quite different. You see what I mean? Yeah, I'm I'm not talking about button. I'm talking about the data for the model testing and for you know training the you know rulebased script. Yeah. So if you just create some synthetic data before that so if we just create synthetic data and uh you know it works very well on synthetic data but then when we get like you know the dirty OCR output yeah then it you know it still might fail. Yeah.
Transcription ended after 00:02:46
This editable transcript was computer generated and might contain errors. People can also change the text after it was created.