サブカル科学研究会のブログ

BookingButler：Googleカレンダー連携で出張管理を自動化するWebアプリ

Posted on Fri Jul 11 2025 | 4 minutes | 1632 words |

出張の移動時間計算とスケジュール調整を自動化するBookingButlerの開発記録。Googleカレンダーと連携し、予定間の最適な移動ルートを算出する技術実装を紹介。 [Read More]

サークル写真管理の分散・重複問題をSHA256ハッシュで解決 - drive-gallery開発事例

Posted on Fri Jul 11 2025 | 5 minutes | 2402 words |

はじめに

サークルやチームでイベントの写真・動画を管理していると、「どこに何の写真があったっけ？」「同じ写真が複数の場所にアップされている」といった問題に直面することはありませんか？本記事では、音楽セッショングループ「Luke Avenue」のメディア管理課題を解決するために開発した「drive-gallery」における分散・重複問題の技術的解決アプローチについて詳しく解説します。

[Read More]

React Go Firebase 写真管理重複排除 SHA256 WebSocket TypeScript

LINEボットのユーザー獲得苦戦問題をMCP連携で解決 - turtle-buttler開発事例

Posted on Fri Jul 11 2025 | 4 minutes | 1922 words |

はじめに

LINEボットを開発したけれど、ユーザーがなかなか定着しない──そんな悩みを抱えている開発者は多いのではないでしょうか。本記事では、筆者が開発したLINEボット「turtle-buttler」におけるユーザー獲得苦戦問題と、MCP（Modular Component Protocol）連携による解決アプローチについて詳しく解説します。

[Read More]

LINE Bot MCP TypeScript Firebase 楽天API Google Gemini ユーザー獲得

How to Solve Audio File Volume Inconsistency and Quality Unification Issues with ffmpeg Normalization

Posted on Sat Jul 5 2025 | 5 minutes | 909 words |

Audio Volume Issues in Audio File Processing

When producing and distributing audio content, do you face these problems?

1. Volume Inconsistency Issues

Volume levels are not unified across multiple audio files
Volume differences occur due to different recording environments and equipment
Listeners need to frequently adjust volume levels

2. Quality Inconsistency Issues

Noise and unwanted frequencies are mixed in
Silent sections are too long and difficult to listen to
Unable to achieve professional-quality audio

3. Manual Processing Limitations

Processing large numbers of audio files individually is inefficient
Automation is difficult with GUI audio editing software
Applying consistent processing standards is challenging

Real-world Audio Quality Challenge Cases

Failure Case: Limitations of Manual Adjustment

# Traditional approach
# 1. Open each file in audio editing software
# 2. Visually and auditorily adjust levels
# 3. Manually apply noise reduction
# 4. Manually cut silent sections

# Problems:
# - Time-consuming for processing large numbers of files
# - Processing standards are subjective and inconsistent
# - Quality variations due to human errors

The solution to this problem is automated volume normalization with ffmpeg.

[Read More]

ffmpeg volume-normalization noise-reduction problem-solving audio-processing

How to Efficiently Solve Low Accuracy and High Cost Issues in Japanese Text Generation with T5

Posted on Sat Jul 5 2025 | 3 minutes | 1242 words |

Challenges in Japanese Text Generation

When working on Japanese text summarization, title generation, and document classification tasks, do you face these problems?

1. Accuracy Issues

Traditional rule-based methods cannot generate natural Japanese text
English-oriented models cannot handle Japanese grammar and expressions
Need to build separate models for multiple tasks

2. Development Cost Issues

Time and resources required for task-specific model development
Different approaches needed for document classification, summarization, and title generation
Enormous effort required for preparing training data and building models

3. Operational Complexity

Need to manage and operate multiple models
Different APIs and interfaces for each task
Complex model updates and maintenance

Real-world Text Generation Challenge Cases

Failure Case: Limitations of Task-specific Individual Development

# Traditional approach
classification_model = load_bert_classifier()      # For document classification
summarization_model = load_summarization_model()   # For summarization
title_generation_model = load_title_model()        # For title generation

# Problems:
# - Managing 3 separate models
# - 3x memory usage
# - High development and maintenance costs

The solution to this problem is Japanese T5 (Text-To-Text Transfer Transformer).

[Read More]

tech nlp T5 technology text-generation problem-solving summarization

20250705document Summary Automation Solution

Posted on Sat Jul 5 2025 | 0 minutes | 0 words |

日本語文書の意味的類似度計算が遅い・精度が低い問題をSentence BERTで解決する方法

Posted on Sat Jul 5 2025 | 5 minutes | 2152 words |

文書類似度計算で直面する課題

日本語の文書検索や推薦システムを構築する際、以下のような問題に直面していませんか？

1. 精度の問題

単語レベルの一致だけでは文書の意味的類似度を正確に測れない
同じ意味でも表現が異なる文書を関連文書として発見できない
従来のTF-IDFやBM25では意味的な類似度が取得できない

2. 計算速度の問題

BERTモデルで毎回文書をエンコードすると時間がかかりすぎる
大量の文書との類似度計算がリアルタイムで実行できない
文書検索のレスポンス時間が数秒〜数十秒かかる

3. 日本語対応の問題

英語向けのモデルでは日本語の意味的類似度が正確に取得できない
日本語特有の表現や文法構造に対応していない
カスタムモデルの構築が困難

実際に遭遇した文書類似度の課題事例

失敗事例：従来手法での限界

# TF-IDFによる類似度計算の例
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity

# 以下のような文書では類似度が正しく計算されない
doc1 = "機械学習の精度を向上させる方法"
doc2 = "AIモデルの性能を改善する手法"
# 結果: 低い類似度（単語が異なるため）

# BERTの直接利用も計算コストが高い
# 毎回エンコードが必要で、大量データに不向き

この問題を解決するのがSentence BERTです。

[Read More]

技術系自然言語処理 BERT 分散表現技術課題解決文書類似度

文書分類の精度・速度・導入コストの三重苦をFasttextで一気に解決する戦略

Posted on Sat Jul 5 2025 | 5 minutes | 2015 words |

文書分類で直面する三重苦

文書分類プロジェクトに取り組む際、以下のような問題に直面していませんか？

1. 精度の問題

既存の手法では十分な精度が出ない
複雑なディープラーニングモデルでも期待した結果が得られない
データセットによって性能が大きく左右される

2. 速度の問題

学習時間が長すぎてイテレーションが回せない
GPUリソースが必要で開発コストが高い
本格的なモデル訓練に数時間〜数日かかる

3. 導入コストの問題

複雑なモデルの構築・運用が困難
環境構築が複雑で他のメンバーに共有できない
プロトタイプ作成に時間がかかりすぎる

実際に遭遇した文書分類の課題事例

失敗事例：複雑なモデルでの挫折

# NeuralClassifierを使った事例
# 複雑な設定ファイルが必要
# GPU環境の準備が必要  
# 結果：精度が期待値以下、学習時間が長い

このような状況で、Facebook Research が公開するFasttextライブラリが解決策として注目されています。

[Read More]

技術系自然言語処理 fasttext 技術分散表現文書分類課題解決

機械学習実験が管理できず再現性がない問題をMLflowで体系的に解決する方法

Posted on Sat Jul 5 2025 | 3 minutes | 1382 words |

機械学習実験で直面する再現性の課題

機械学習の実験を繰り返していると、以下のような問題に直面することがありませんか？

良い結果が出たモデルのパラメータを忘れてしまう
過去の実験結果を比較できず、改善が進まない
チームメンバーと実験結果を共有できない
同じ実験を再実行しても結果が再現できない

これらの問題は、機械学習の実践が「ある種の黒魔術」となってしまう原因でもあります。

[Read More]

技術系 python devops mlflow 技術課題解決機械学習

キーワード検索では見つからない関連文書を発見する方法

セマンティック検索で解決する文書検索の限界

Posted on Fri Jun 20 2025 | 4 minutes | 1747 words |

問題：なぜキーワード検索では欲しい文書が見つからないのか

社内の膨大な文書データベースから「機械学習の性能向上に関する文書」を探しているとします。キーワード検索で「機械学習」「性能向上」と入力しても、本当に必要な文書が見つからない経験はありませんか？

[Read More]

natural-language-processing semantic-search document-similarity problem-solving